All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] arm64, numa: Add numa support for arm64 platforms
@ 2015-11-17 17:20 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

v7:
	- managing numa memory mapping using memblock.
	- Incorporated review comments of Mark Rutland.

v6:
	- defined and implemented the numa dt binding using
	node property proximity and device node distance-map.
	- renamed dt_numa to of_numa

v5:
        - created base verion of numa.c which creates dummy numa without using dt
          on single socket platforms. Then added patches for dt support.
        - Incorporated review comments from Hanjun Guo.

v4:
done changes as per Arnd review comments.

v3:
Added changes to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) platform.
In this patchset, defined and implemented dt bindings for numa mapping
for core and memory using device node property arm,associativity.

v2:
Defined and implemented numa map for memory, cores to node and
proximity distance matrix of nodes.

v1:
Initial patchset to support numa on arm64 platforms.

Note:
        1. This patchset is tested for numa with dt on
           thunderx single socket and dual socket boards.
        2. Numa DT booting needs the dt memory nodes, which are deleted in current efi-stub,
        hence to try numa with dt, you need to rebase with ard's patchset.
        http://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-uefi-early-fdt-handling

Ganapatrao Kulkarni (4):
  arm64, numa: adding numa support for arm64 platforms.
  Documentation, dt, arm64/arm: dt bindings for numa.
  arm64/arm, numa, dt: adding numa dt binding implementation for arm64
    platforms.
  arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
    topology.

 Documentation/devicetree/bindings/arm/numa.txt  | 272 ++++++++
 arch/arm64/Kconfig                              |  35 +
 arch/arm64/boot/dts/cavium/Makefile             |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
 arch/arm64/include/asm/mmzone.h                 |  17 +
 arch/arm64/include/asm/numa.h                   |  56 ++
 arch/arm64/kernel/Makefile                      |   1 +
 arch/arm64/kernel/of_numa.c                     | 265 ++++++++
 arch/arm64/kernel/setup.c                       |   4 +
 arch/arm64/kernel/smp.c                         |   4 +
 arch/arm64/mm/Makefile                          |   1 +
 arch/arm64/mm/init.c                            |  30 +-
 arch/arm64/mm/numa.c                            | 392 ++++++++++++
 14 files changed, 1963 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/kernel/of_numa.c
 create mode 100644 arch/arm64/mm/numa.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 0/4] arm64, numa: Add numa support for arm64 platforms
@ 2015-11-17 17:20 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

v7:
	- managing numa memory mapping using memblock.
	- Incorporated review comments of Mark Rutland.

v6:
	- defined and implemented the numa dt binding using
	node property proximity and device node distance-map.
	- renamed dt_numa to of_numa

v5:
        - created base verion of numa.c which creates dummy numa without using dt
          on single socket platforms. Then added patches for dt support.
        - Incorporated review comments from Hanjun Guo.

v4:
done changes as per Arnd review comments.

v3:
Added changes to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) platform.
In this patchset, defined and implemented dt bindings for numa mapping
for core and memory using device node property arm,associativity.

v2:
Defined and implemented numa map for memory, cores to node and
proximity distance matrix of nodes.

v1:
Initial patchset to support numa on arm64 platforms.

Note:
        1. This patchset is tested for numa with dt on
           thunderx single socket and dual socket boards.
        2. Numa DT booting needs the dt memory nodes, which are deleted in current efi-stub,
        hence to try numa with dt, you need to rebase with ard's patchset.
        http://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-uefi-early-fdt-handling

Ganapatrao Kulkarni (4):
  arm64, numa: adding numa support for arm64 platforms.
  Documentation, dt, arm64/arm: dt bindings for numa.
  arm64/arm, numa, dt: adding numa dt binding implementation for arm64
    platforms.
  arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
    topology.

 Documentation/devicetree/bindings/arm/numa.txt  | 272 ++++++++
 arch/arm64/Kconfig                              |  35 +
 arch/arm64/boot/dts/cavium/Makefile             |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
 arch/arm64/include/asm/mmzone.h                 |  17 +
 arch/arm64/include/asm/numa.h                   |  56 ++
 arch/arm64/kernel/Makefile                      |   1 +
 arch/arm64/kernel/of_numa.c                     | 265 ++++++++
 arch/arm64/kernel/setup.c                       |   4 +
 arch/arm64/kernel/smp.c                         |   4 +
 arch/arm64/mm/Makefile                          |   1 +
 arch/arm64/mm/init.c                            |  30 +-
 arch/arm64/mm/numa.c                            | 392 ++++++++++++
 14 files changed, 1963 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/kernel/of_numa.c
 create mode 100644 arch/arm64/mm/numa.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-11-17 17:20 ` Ganapatrao Kulkarni
@ 2015-11-17 17:20   ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

Adding numa support for arm64 based platforms.
This patch adds by default the dummy numa node and
maps all memory and cpus to node 0.
using this patch, numa can be simulated on single node arm64 platforms.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig              |  25 +++
 arch/arm64/include/asm/mmzone.h |  17 ++
 arch/arm64/include/asm/numa.h   |  47 +++++
 arch/arm64/kernel/setup.c       |   4 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  30 +++-
 arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 506 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9ac16a4..7d8fb42 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -71,6 +71,7 @@ config ARM64
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
 	select HAVE_MEMBLOCK
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
@@ -482,6 +483,30 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	help
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	help
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz
 
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..6ddd468
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,17 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <linux/types.h>
+
+#include <asm/smp.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..c00f3a4
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,47 @@
+#ifndef _ASM_NUMA_H
+#define _ASM_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/* currently, arm64 implements flat NUMA topology */
+#define parent_node(node)	(node)
+
+extern int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+extern int cpu_to_node_map[NR_CPUS];
+extern nodemask_t numa_nodes_parsed __initdata;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+extern void numa_clear_node(unsigned int cpu);
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+extern const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+	return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_reset_distance(void);
+void numa_store_cpu_info(unsigned int cpu);
+#else	/* CONFIG_NUMA */
+static inline void numa_store_cpu_info(unsigned int cpu)		{ }
+static inline void arm64_numa_init(void)		{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 8119479..d9b9761 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
+#include <asm/numa.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -372,6 +373,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..d6e7d6a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -45,6 +45,7 @@
 #include <asm/cputype.h>
 #include <asm/cpu_ops.h>
 #include <asm/mmu_context.h>
+#include <asm/numa.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/processor.h>
@@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 57f57fd..2e57922 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
 
 obj-$(CONFIG_KASAN)		+= kasan_init.o
+obj-$(CONFIG_NUMA)		+= numa.o
 KASAN_SANITIZE_kasan_init.o	:= n
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 17bf39a..8dc9c5d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -37,6 +37,7 @@
 
 #include <asm/fixmap.h>
 #include <asm/memory.h>
+#include <asm/numa.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/sizes.h>
@@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -116,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -133,10 +148,15 @@ static void arm64_memory_present(void)
 static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
+	int nid = 0;
 
-	for_each_memblock(memory, reg)
-		memory_present(0, memblock_region_memory_base_pfn(reg),
-			       memblock_region_memory_end_pfn(reg));
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		nid = reg->nid;
+#endif
+		memory_present(nid, memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+	}
 }
 #endif
 
@@ -193,6 +213,9 @@ void __init bootmem_init(void)
 
 	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
 
+	max_pfn = max_low_pfn = max;
+
+	arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -203,7 +226,6 @@ void __init bootmem_init(void)
 	zone_sizes_init(min, max);
 
 	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..e3afbf8
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,384 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/ctype.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+#include <linux/module.h>
+#include <linux/mmzone.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/string.h>
+#include <linux/topology.h>
+
+#include <asm/smp_plat.h>
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+nodemask_t numa_nodes_parsed __initdata;
+int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
+
+static int numa_off;
+static int numa_distance_cnt;
+static u8 *numa_distance;
+
+static __init int numa_parse_early_param(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3)) {
+		pr_info("%s\n", "NUMA turned off");
+		numa_off = 1;
+	}
+	return 0;
+}
+early_param("numa", numa_parse_early_param);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+
+	if (WARN_ON(node >= nr_node_ids))
+		return cpu_none_mask;
+
+	if (WARN_ON(node_to_cpumask_map[node] == NULL))
+		return cpu_online_mask;
+
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+#endif
+
+static void map_cpu_to_node(unsigned int cpu, int nid)
+{
+	set_cpu_numa_node(cpu, nid);
+	if (nid >= 0)
+		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
+}
+
+static void unmap_cpu_to_node(unsigned int cpu)
+{
+	int nid = cpu_to_node(cpu);
+
+	if (nid >= 0)
+		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
+	set_cpu_numa_node(cpu, NUMA_NO_NODE);
+}
+
+void numa_clear_node(unsigned int cpu)
+{
+	unmap_cpu_to_node(cpu);
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+static void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int cpu;
+	int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate and clear the mapping */
+	for (node = 0; node < nr_node_ids; node++) {
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+		cpumask_clear(node_to_cpumask_map[node]);
+	}
+
+	for_each_possible_cpu(cpu)
+		set_cpu_numa_node(cpu, NUMA_NO_NODE);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(unsigned int cpu)
+{
+	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
+}
+
+/**
+ * numa_add_memblk - Set node id to memblk
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @size:  Size of the new memblk
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int __init numa_add_memblk(int nid, u64 start, u64 size)
+{
+	int ret;
+
+	ret = memblock_set_node(start, size, &memblock.memory, nid);
+	if (ret < 0) {
+		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
+			start, (start + size - 1), nid);
+		return ret;
+	}
+
+	node_set(nid, numa_nodes_parsed);
+	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
+			start, (start + size - 1), nid);
+	return ret;
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+			nid, start_pfn << PAGE_SHIFT,
+			(end_pfn << PAGE_SHIFT) - 1);
+
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+		nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start_pfn;
+	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
+}
+
+/**
+ * numa_reset_distance - Reset NUMA distance table
+ *
+ * The current table is freed.
+ * The next numa_set_distance() call will create a new one.
+ */
+void __init numa_reset_distance(void)
+{
+	size_t size;
+
+	if (!numa_distance)
+		return;
+
+	size = numa_distance_cnt * numa_distance_cnt *
+		sizeof(numa_distance[0]);
+
+	memblock_free(__pa(numa_distance), size);
+	numa_distance_cnt = 0;
+	numa_distance = NULL;
+}
+
+static int __init numa_alloc_distance(void)
+{
+	size_t size;
+	u64 phys;
+	int i, j;
+
+	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
+	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
+				      size, PAGE_SIZE);
+	if (WARN_ON(!phys))
+		return -ENOMEM;
+
+	memblock_reserve(phys, size);
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = nr_node_ids;
+
+	/* fill with the default distances */
+	for (i = 0; i < numa_distance_cnt; i++)
+		for (j = 0; j < numa_distance_cnt; j++)
+			numa_distance[i * numa_distance_cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+
+	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
+			numa_distance_cnt);
+
+	return 0;
+}
+
+/**
+ * numa_set_distance - Set NUMA distance from one NUMA to another
+ * @from: the 'from' node to set distance
+ * @to: the 'to'  node to set distance
+ * @distance: NUMA distance
+ *
+ * Set the distance from node @from to @to to @distance.  If distance table
+ * doesn't exist, one which is large enough to accommodate all the currently
+ * known nodes will be created.
+ *
+ * If such table cannot be allocated, a warning is printed and further
+ * calls are ignored until the distance table is reset with
+ * numa_reset_distance().
+ *
+ * If @from or @to is higher than the highest known node or lower than zero
+ * at the time of table creation or @distance doesn't make sense, the call
+ * is ignored.
+ * This is to allow simplification of specific NUMA config implementations.
+ */
+void __init numa_set_distance(int from, int to, int distance)
+{
+	if (!numa_distance)
+		return;
+
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
+			from < 0 || to < 0) {
+		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
+			    from, to, distance);
+		return;
+	}
+
+	if ((u8)distance != distance ||
+	    (from == to && distance != LOCAL_DISTANCE)) {
+		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+			     from, to, distance);
+		return;
+	}
+
+	numa_distance[from * numa_distance_cnt + to] = distance;
+}
+EXPORT_SYMBOL(numa_set_distance);
+
+int __node_distance(int from, int to)
+{
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
+	return numa_distance[from * numa_distance_cnt + to];
+}
+EXPORT_SYMBOL(__node_distance);
+
+static int __init numa_register_nodes(void)
+{
+	int nid;
+	struct memblock_region *mblk;
+
+	/* Check that valid nid is set to memblks */
+	for_each_memblock(memory, mblk)
+		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
+			return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, numa_nodes_parsed) {
+		unsigned long start_pfn, end_pfn;
+
+		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+		setup_node_data(nid, start_pfn, end_pfn);
+		node_set_online(nid);
+	}
+
+	/* Setup online nodes to actual nodes*/
+	node_possible_map = numa_nodes_parsed;
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret;
+
+	nodes_clear(numa_nodes_parsed);
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+	numa_reset_distance();
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	if (nodes_empty(numa_nodes_parsed))
+		return -EINVAL;
+
+	ret = numa_register_nodes();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_alloc_distance();
+	if (ret < 0)
+		return ret;
+
+	setup_node_to_cpumask_map();
+
+	/* init boot processor */
+	cpu_to_node_map[0] = 0;
+	map_cpu_to_node(0, 0);
+
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	struct memblock_region *mblk;
+
+	pr_info("%s\n", "No NUMA configuration found");
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+	for_each_memblock(memory, mblk)
+		numa_add_memblk(0, mblk->base, mblk->size);
+	numa_off = 1;
+
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-11-17 17:20   ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

Adding numa support for arm64 based platforms.
This patch adds by default the dummy numa node and
maps all memory and cpus to node 0.
using this patch, numa can be simulated on single node arm64 platforms.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig              |  25 +++
 arch/arm64/include/asm/mmzone.h |  17 ++
 arch/arm64/include/asm/numa.h   |  47 +++++
 arch/arm64/kernel/setup.c       |   4 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  30 +++-
 arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 506 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9ac16a4..7d8fb42 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -71,6 +71,7 @@ config ARM64
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
 	select HAVE_MEMBLOCK
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
@@ -482,6 +483,30 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	help
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	help
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz
 
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..6ddd468
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,17 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <linux/types.h>
+
+#include <asm/smp.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..c00f3a4
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,47 @@
+#ifndef _ASM_NUMA_H
+#define _ASM_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/* currently, arm64 implements flat NUMA topology */
+#define parent_node(node)	(node)
+
+extern int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+extern int cpu_to_node_map[NR_CPUS];
+extern nodemask_t numa_nodes_parsed __initdata;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+extern void numa_clear_node(unsigned int cpu);
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+extern const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+	return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_reset_distance(void);
+void numa_store_cpu_info(unsigned int cpu);
+#else	/* CONFIG_NUMA */
+static inline void numa_store_cpu_info(unsigned int cpu)		{ }
+static inline void arm64_numa_init(void)		{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 8119479..d9b9761 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
 #include <asm/kasan.h>
+#include <asm/numa.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -372,6 +373,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..d6e7d6a 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -45,6 +45,7 @@
 #include <asm/cputype.h>
 #include <asm/cpu_ops.h>
 #include <asm/mmu_context.h>
+#include <asm/numa.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/processor.h>
@@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 57f57fd..2e57922 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
 
 obj-$(CONFIG_KASAN)		+= kasan_init.o
+obj-$(CONFIG_NUMA)		+= numa.o
 KASAN_SANITIZE_kasan_init.o	:= n
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 17bf39a..8dc9c5d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -37,6 +37,7 @@
 
 #include <asm/fixmap.h>
 #include <asm/memory.h>
+#include <asm/numa.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/sizes.h>
@@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -116,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -133,10 +148,15 @@ static void arm64_memory_present(void)
 static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
+	int nid = 0;
 
-	for_each_memblock(memory, reg)
-		memory_present(0, memblock_region_memory_base_pfn(reg),
-			       memblock_region_memory_end_pfn(reg));
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		nid = reg->nid;
+#endif
+		memory_present(nid, memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+	}
 }
 #endif
 
@@ -193,6 +213,9 @@ void __init bootmem_init(void)
 
 	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
 
+	max_pfn = max_low_pfn = max;
+
+	arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -203,7 +226,6 @@ void __init bootmem_init(void)
 	zone_sizes_init(min, max);
 
 	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..e3afbf8
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,384 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/ctype.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+#include <linux/module.h>
+#include <linux/mmzone.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/string.h>
+#include <linux/topology.h>
+
+#include <asm/smp_plat.h>
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+nodemask_t numa_nodes_parsed __initdata;
+int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
+
+static int numa_off;
+static int numa_distance_cnt;
+static u8 *numa_distance;
+
+static __init int numa_parse_early_param(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3)) {
+		pr_info("%s\n", "NUMA turned off");
+		numa_off = 1;
+	}
+	return 0;
+}
+early_param("numa", numa_parse_early_param);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+
+	if (WARN_ON(node >= nr_node_ids))
+		return cpu_none_mask;
+
+	if (WARN_ON(node_to_cpumask_map[node] == NULL))
+		return cpu_online_mask;
+
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+#endif
+
+static void map_cpu_to_node(unsigned int cpu, int nid)
+{
+	set_cpu_numa_node(cpu, nid);
+	if (nid >= 0)
+		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
+}
+
+static void unmap_cpu_to_node(unsigned int cpu)
+{
+	int nid = cpu_to_node(cpu);
+
+	if (nid >= 0)
+		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
+	set_cpu_numa_node(cpu, NUMA_NO_NODE);
+}
+
+void numa_clear_node(unsigned int cpu)
+{
+	unmap_cpu_to_node(cpu);
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+static void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int cpu;
+	int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate and clear the mapping */
+	for (node = 0; node < nr_node_ids; node++) {
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+		cpumask_clear(node_to_cpumask_map[node]);
+	}
+
+	for_each_possible_cpu(cpu)
+		set_cpu_numa_node(cpu, NUMA_NO_NODE);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(unsigned int cpu)
+{
+	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
+}
+
+/**
+ * numa_add_memblk - Set node id to memblk
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @size:  Size of the new memblk
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+int __init numa_add_memblk(int nid, u64 start, u64 size)
+{
+	int ret;
+
+	ret = memblock_set_node(start, size, &memblock.memory, nid);
+	if (ret < 0) {
+		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
+			start, (start + size - 1), nid);
+		return ret;
+	}
+
+	node_set(nid, numa_nodes_parsed);
+	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
+			start, (start + size - 1), nid);
+	return ret;
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+			nid, start_pfn << PAGE_SHIFT,
+			(end_pfn << PAGE_SHIFT) - 1);
+
+	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+		nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start_pfn;
+	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
+}
+
+/**
+ * numa_reset_distance - Reset NUMA distance table
+ *
+ * The current table is freed.
+ * The next numa_set_distance() call will create a new one.
+ */
+void __init numa_reset_distance(void)
+{
+	size_t size;
+
+	if (!numa_distance)
+		return;
+
+	size = numa_distance_cnt * numa_distance_cnt *
+		sizeof(numa_distance[0]);
+
+	memblock_free(__pa(numa_distance), size);
+	numa_distance_cnt = 0;
+	numa_distance = NULL;
+}
+
+static int __init numa_alloc_distance(void)
+{
+	size_t size;
+	u64 phys;
+	int i, j;
+
+	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
+	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
+				      size, PAGE_SIZE);
+	if (WARN_ON(!phys))
+		return -ENOMEM;
+
+	memblock_reserve(phys, size);
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = nr_node_ids;
+
+	/* fill with the default distances */
+	for (i = 0; i < numa_distance_cnt; i++)
+		for (j = 0; j < numa_distance_cnt; j++)
+			numa_distance[i * numa_distance_cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+
+	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
+			numa_distance_cnt);
+
+	return 0;
+}
+
+/**
+ * numa_set_distance - Set NUMA distance from one NUMA to another
+ * @from: the 'from' node to set distance
+ * @to: the 'to'  node to set distance
+ * @distance: NUMA distance
+ *
+ * Set the distance from node @from to @to to @distance.  If distance table
+ * doesn't exist, one which is large enough to accommodate all the currently
+ * known nodes will be created.
+ *
+ * If such table cannot be allocated, a warning is printed and further
+ * calls are ignored until the distance table is reset with
+ * numa_reset_distance().
+ *
+ * If @from or @to is higher than the highest known node or lower than zero
+ *@the time of table creation or @distance doesn't make sense, the call
+ * is ignored.
+ * This is to allow simplification of specific NUMA config implementations.
+ */
+void __init numa_set_distance(int from, int to, int distance)
+{
+	if (!numa_distance)
+		return;
+
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
+			from < 0 || to < 0) {
+		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
+			    from, to, distance);
+		return;
+	}
+
+	if ((u8)distance != distance ||
+	    (from == to && distance != LOCAL_DISTANCE)) {
+		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+			     from, to, distance);
+		return;
+	}
+
+	numa_distance[from * numa_distance_cnt + to] = distance;
+}
+EXPORT_SYMBOL(numa_set_distance);
+
+int __node_distance(int from, int to)
+{
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
+	return numa_distance[from * numa_distance_cnt + to];
+}
+EXPORT_SYMBOL(__node_distance);
+
+static int __init numa_register_nodes(void)
+{
+	int nid;
+	struct memblock_region *mblk;
+
+	/* Check that valid nid is set to memblks */
+	for_each_memblock(memory, mblk)
+		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
+			return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, numa_nodes_parsed) {
+		unsigned long start_pfn, end_pfn;
+
+		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+		setup_node_data(nid, start_pfn, end_pfn);
+		node_set_online(nid);
+	}
+
+	/* Setup online nodes to actual nodes*/
+	node_possible_map = numa_nodes_parsed;
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret;
+
+	nodes_clear(numa_nodes_parsed);
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+	numa_reset_distance();
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	if (nodes_empty(numa_nodes_parsed))
+		return -EINVAL;
+
+	ret = numa_register_nodes();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_alloc_distance();
+	if (ret < 0)
+		return ret;
+
+	setup_node_to_cpumask_map();
+
+	/* init boot processor */
+	cpu_to_node_map[0] = 0;
+	map_cpu_to_node(0, 0);
+
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	struct memblock_region *mblk;
+
+	pr_info("%s\n", "No NUMA configuration found");
+	pr_info("Faking a node@[mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+	for_each_memblock(memory, mblk)
+		numa_add_memblk(0, mblk->base, mblk->size);
+	numa_off = 1;
+
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-11-17 17:20 ` Ganapatrao Kulkarni
@ 2015-11-17 17:20   ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

DT bindings for numa mapping of memory, cores and IOs.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 Documentation/devicetree/bindings/arm/numa.txt | 272 +++++++++++++++++++++++++
 1 file changed, 272 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt

diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
new file mode 100644
index 0000000..b87bf4f
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/numa.txt
@@ -0,0 +1,272 @@
+==============================================================================
+NUMA binding description.
+==============================================================================
+
+==============================================================================
+1 - Introduction
+==============================================================================
+
+Systems employing a Non Uniform Memory Access (NUMA) architecture contain
+collections of hardware resources including processors, memory, and I/O buses,
+that comprise what is commonly known as a NUMA node.
+Processor accesses to memory within the local NUMA node is generally faster
+than processor accesses to memory outside of the local NUMA node.
+DT defines interfaces that allow the platform to convey NUMA node
+topology information to OS.
+
+==============================================================================
+2 - numa-node-id
+==============================================================================
+The device node property numa-node-id describes numa domains within a
+machine. This property can be used in device nodes like cpu, memory, bus and
+devices to map to respective numa nodes.
+
+numa-node-id property is a 32-bit integer which defines numa node id to which
+this device node has numa domain association.
+
+Example:
+	/* numa node 0 */
+	numa-node-id = <0>;
+
+	/* numa node 1 */
+	numa-node-id = <1>;
+
+==============================================================================
+3 - distance-map
+==============================================================================
+
+The device tree node distance-map describes the relative
+distance (memory latency) between all numa nodes.
+
+- compatible : Should at least contain "numa,distance-map-v1".
+
+- distance-matrix
+  This property defines a matrix to describe the relative distances
+  between all numa nodes.
+  It is represented as a list of node pairs and their relative distance.
+
+  Note:
+	1. Each entry represents distance from first node to second node.
+	2. If both directions between 2 nodes have the same distance, only
+	       one entry is required.
+	2. distance-matrix shold have entries in lexicographical ascending order of nodes.
+	3. There must be only one Device node distance-map and must reside in the root node.
+
+Example:
+	4 nodes connected in mesh/ring topology as below,
+
+		0_______20______1
+		|               |
+		|               |
+	      20|               |20
+		|               |
+		|               |
+		|_______________|
+		3       20      2
+
+	if relative distance for each hop is 20,
+	then inter node distance would be for this topology will be,
+	      0 -> 1 = 20
+	      1 -> 2 = 20
+	      2 -> 3 = 20
+	      3 -> 0 = 20
+	      0 -> 2 = 40
+	      1 -> 3 = 40
+
+     and dt presentation for this distance matrix is,
+
+		distance-map {
+			 compatible = "numa,distance-map-v1";
+			 distance-matrix = <0 0  10>,
+					   <0 1  20>,
+					   <0 2  40>,
+					   <0 3  20>,
+					   <1 0  20>,
+					   <1 1  10>,
+					   <1 2  20>,
+					   <1 3  40>,
+					   <2 0  40>,
+					   <2 1  20>,
+					   <2 2  10>,
+					   <2 3  20>,
+					   <3 0  20>,
+					   <3 1  40>,
+					   <3 2  20>,
+					   <3 3  10>;
+		};
+
+Note:
+	 1. The entries like <1 0> can be optional if <0 1> and <1 0>
+	    are of same distance.
+
+==============================================================================
+4 - Example dts
+==============================================================================
+
+2 sockets system consists of 2 boards connected through ccn bus and
+each board having one socket/soc of 8 cpus, memory and pci bus.
+
+	memory@00c00000 {
+		device_type = "memory";
+		reg = <0x0 0x00c00000 0x0 0x80000000>;
+		/* node 0 */
+		numa-node-id = <0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		/* node 1 */
+		numa-node-id = <1>;
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu@000 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+			/* node 0 */
+			numa-node-id = <0>;
+		};
+		cpu@001 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@002 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@003 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@004 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@005 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@006 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@007 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@008 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+			/* node 1 */
+			numa-node-id = <1>;
+		};
+		cpu@009 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@00a {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@00b {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@00c {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@00d {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@00e {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@00f {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+	};
+
+	pcie0: pcie0@0x8480,00000000 {
+		compatible = "arm,armv8";
+		device_type = "pci";
+		bus-range = <0 255>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		reg = <0x8480 0x00000000 0 0x10000000>;  /* Configuration space */
+		ranges = <0x03000000 0x8010 0x00000000 0x8010 0x00000000 0x70 0x00000000>;
+		/* node 0 */
+		numa-node-id = <0>;
+        };
+
+	pcie1: pcie1@0x9480,00000000 {
+		compatible = "arm,armv8";
+		device_type = "pci";
+		bus-range = <0 255>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		reg = <0x9480 0x00000000 0 0x10000000>;  /* Configuration space */
+		ranges = <0x03000000 0x9010 0x00000000 0x9010 0x00000000 0x70 0x00000000>;
+		/* node 1 */
+		numa-node-id = <1>;
+        };
+
+	distance-map {
+		compatible = "numa,distance-map-v1";
+		distance-matrix = <0 0 10>,
+				  <0 1 20>,
+				  <1 1 10>;
+	};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
@ 2015-11-17 17:20   ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

DT bindings for numa mapping of memory, cores and IOs.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 Documentation/devicetree/bindings/arm/numa.txt | 272 +++++++++++++++++++++++++
 1 file changed, 272 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt

diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
new file mode 100644
index 0000000..b87bf4f
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/numa.txt
@@ -0,0 +1,272 @@
+==============================================================================
+NUMA binding description.
+==============================================================================
+
+==============================================================================
+1 - Introduction
+==============================================================================
+
+Systems employing a Non Uniform Memory Access (NUMA) architecture contain
+collections of hardware resources including processors, memory, and I/O buses,
+that comprise what is commonly known as a NUMA node.
+Processor accesses to memory within the local NUMA node is generally faster
+than processor accesses to memory outside of the local NUMA node.
+DT defines interfaces that allow the platform to convey NUMA node
+topology information to OS.
+
+==============================================================================
+2 - numa-node-id
+==============================================================================
+The device node property numa-node-id describes numa domains within a
+machine. This property can be used in device nodes like cpu, memory, bus and
+devices to map to respective numa nodes.
+
+numa-node-id property is a 32-bit integer which defines numa node id to which
+this device node has numa domain association.
+
+Example:
+	/* numa node 0 */
+	numa-node-id = <0>;
+
+	/* numa node 1 */
+	numa-node-id = <1>;
+
+==============================================================================
+3 - distance-map
+==============================================================================
+
+The device tree node distance-map describes the relative
+distance (memory latency) between all numa nodes.
+
+- compatible : Should at least contain "numa,distance-map-v1".
+
+- distance-matrix
+  This property defines a matrix to describe the relative distances
+  between all numa nodes.
+  It is represented as a list of node pairs and their relative distance.
+
+  Note:
+	1. Each entry represents distance from first node to second node.
+	2. If both directions between 2 nodes have the same distance, only
+	       one entry is required.
+	2. distance-matrix shold have entries in lexicographical ascending order of nodes.
+	3. There must be only one Device node distance-map and must reside in the root node.
+
+Example:
+	4 nodes connected in mesh/ring topology as below,
+
+		0_______20______1
+		|               |
+		|               |
+	      20|               |20
+		|               |
+		|               |
+		|_______________|
+		3       20      2
+
+	if relative distance for each hop is 20,
+	then inter node distance would be for this topology will be,
+	      0 -> 1 = 20
+	      1 -> 2 = 20
+	      2 -> 3 = 20
+	      3 -> 0 = 20
+	      0 -> 2 = 40
+	      1 -> 3 = 40
+
+     and dt presentation for this distance matrix is,
+
+		distance-map {
+			 compatible = "numa,distance-map-v1";
+			 distance-matrix = <0 0  10>,
+					   <0 1  20>,
+					   <0 2  40>,
+					   <0 3  20>,
+					   <1 0  20>,
+					   <1 1  10>,
+					   <1 2  20>,
+					   <1 3  40>,
+					   <2 0  40>,
+					   <2 1  20>,
+					   <2 2  10>,
+					   <2 3  20>,
+					   <3 0  20>,
+					   <3 1  40>,
+					   <3 2  20>,
+					   <3 3  10>;
+		};
+
+Note:
+	 1. The entries like <1 0> can be optional if <0 1> and <1 0>
+	    are of same distance.
+
+==============================================================================
+4 - Example dts
+==============================================================================
+
+2 sockets system consists of 2 boards connected through ccn bus and
+each board having one socket/soc of 8 cpus, memory and pci bus.
+
+	memory at 00c00000 {
+		device_type = "memory";
+		reg = <0x0 0x00c00000 0x0 0x80000000>;
+		/* node 0 */
+		numa-node-id = <0>;
+	};
+
+	memory at 10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		/* node 1 */
+		numa-node-id = <1>;
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu at 000 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+			/* node 0 */
+			numa-node-id = <0>;
+		};
+		cpu at 001 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 002 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 003 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 004 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 005 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 006 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 007 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 008 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+			/* node 1 */
+			numa-node-id = <1>;
+		};
+		cpu at 009 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 00a {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 00b {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 00c {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 00d {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 00e {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 00f {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+	};
+
+	pcie0: pcie0 at 0x8480,00000000 {
+		compatible = "arm,armv8";
+		device_type = "pci";
+		bus-range = <0 255>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		reg = <0x8480 0x00000000 0 0x10000000>;  /* Configuration space */
+		ranges = <0x03000000 0x8010 0x00000000 0x8010 0x00000000 0x70 0x00000000>;
+		/* node 0 */
+		numa-node-id = <0>;
+        };
+
+	pcie1: pcie1 at 0x9480,00000000 {
+		compatible = "arm,armv8";
+		device_type = "pci";
+		bus-range = <0 255>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		reg = <0x9480 0x00000000 0 0x10000000>;  /* Configuration space */
+		ranges = <0x03000000 0x9010 0x00000000 0x9010 0x00000000 0x70 0x00000000>;
+		/* node 1 */
+		numa-node-id = <1>;
+        };
+
+	distance-map {
+		compatible = "numa,distance-map-v1";
+		distance-matrix = <0 0 10>,
+				  <0 1 20>,
+				  <1 1 10>;
+	};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.
  2015-11-17 17:20 ` Ganapatrao Kulkarni
@ 2015-11-17 17:20   ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

Adding numa dt binding support for arm64 based platforms.
dt node parsing for numa topology is done using device property
proximity and device node distance-map.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig            |  10 ++
 arch/arm64/include/asm/numa.h |   9 ++
 arch/arm64/kernel/Makefile    |   1 +
 arch/arm64/kernel/of_numa.c   | 265 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c       |   2 +
 arch/arm64/mm/numa.c          |  10 +-
 6 files changed, 296 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/of_numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7d8fb42..a18a154 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -494,6 +494,16 @@ config NUMA
 	  local memory controller of the CPU and add some more
 	  NUMA awareness to the kernel.
 
+config OF_NUMA
+	bool "Device Tree NUMA support"
+	depends on NUMA
+	depends on OF
+	default y
+	help
+	  Enable Device Tree NUMA support.
+	  This enables the numa mapping of cpu, memory, io and
+	  inter node distances using dt bindings.
+
 config NODES_SHIFT
 	int "Maximum NUMA Nodes (as a power of 2)"
 	range 1 10
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index c00f3a4..b8c2a3f 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -44,4 +44,13 @@ void numa_store_cpu_info(unsigned int cpu);
 static inline void numa_store_cpu_info(unsigned int cpu)		{ }
 static inline void arm64_numa_init(void)		{ }
 #endif	/* CONFIG_NUMA */
+
+struct device_node;
+#ifdef CONFIG_OF_NUMA
+int __init arm64_of_numa_init(void);
+void __init of_numa_set_node_info(unsigned int cpu, struct device_node *dn);
+#else
+static inline void of_numa_set_node_info(unsigned int cpu,
+		struct device_node *dn) { }
+#endif
 #endif	/* _ASM_NUMA_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 474691f..7987763 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -41,6 +41,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
 arm64-obj-$(CONFIG_PCI)			+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)		+= acpi.o
+arm64-obj-$(CONFIG_OF_NUMA)		+= of_numa.o
 
 obj-y					+= $(arm64-obj-y) vdso/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/of_numa.c b/arch/arm64/kernel/of_numa.c
new file mode 100644
index 0000000..6ecb9a4
--- /dev/null
+++ b/arch/arm64/kernel/of_numa.c
@@ -0,0 +1,265 @@
+/*
+ * OF NUMA Parsing support.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/ctype.h>
+#include <linux/memblock.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+#include <asm/smp_plat.h>
+
+/* define default numa node to 0 */
+#define DEFAULT_NODE 0
+#define OF_NUMA_PROP "numa-node-id"
+
+/* Returns nid in the range [0..MAX_NUMNODES-1],
+ * or NUMA_NO_NODE if no valid numa-node-id entry found
+ * or DEFAULT_NODE if no numa-node-id entry exists
+ */
+static int of_numa_prop_to_nid(const __be32 *of_numa_prop, int length)
+{
+	int nid;
+
+	if (!of_numa_prop)
+		return DEFAULT_NODE;
+
+	if (length != sizeof(*of_numa_prop)) {
+		pr_warn("NUMA: Invalid of_numa_prop length %d found.\n",
+				length);
+		return NUMA_NO_NODE;
+	}
+
+	nid = of_read_number(of_numa_prop, 1);
+	if (nid >= MAX_NUMNODES) {
+		pr_warn("NUMA: Invalid numa node %d found.\n", nid);
+		return NUMA_NO_NODE;
+	}
+
+	return nid;
+}
+
+/* Must hold reference to node during call */
+static int of_get_numa_nid(struct device_node *device)
+{
+	int length;
+	const __be32 *of_numa_prop;
+
+	of_numa_prop = of_get_property(device, OF_NUMA_PROP, &length);
+
+	return of_numa_prop_to_nid(of_numa_prop, length);
+}
+
+static int __init early_init_of_get_numa_nid(unsigned long node)
+{
+	int length;
+	const __be32 *of_numa_prop;
+
+	of_numa_prop = of_get_flat_dt_prop(node, OF_NUMA_PROP, &length);
+
+	return of_numa_prop_to_nid(of_numa_prop, length);
+}
+
+/* Walk the device tree upwards, looking for a numa-node-id property */
+int of_node_to_nid(struct device_node *device)
+{
+	struct device_node *parent;
+	int nid = NUMA_NO_NODE;
+
+	of_node_get(device);
+	while (device) {
+		const __be32 *of_numa_prop;
+		int length;
+
+		of_numa_prop = of_get_property(device, OF_NUMA_PROP, &length);
+		if (of_numa_prop) {
+			nid = of_numa_prop_to_nid(of_numa_prop, length);
+			break;
+		}
+
+		parent = device;
+		device = of_get_parent(parent);
+		of_node_put(parent);
+	}
+	of_node_put(device);
+
+	return nid;
+}
+
+void __init of_numa_set_node_info(unsigned int cpu, struct device_node *device)
+{
+	int nid = DEFAULT_NODE;
+
+	if (device)
+		nid = of_get_numa_nid(device);
+
+	cpu_to_node_map[cpu] = nid;
+}
+
+/*
+ * Even though we connect cpus to numa domains later in SMP
+ * init, we need to know the node ids now for all cpus.
+*/
+static int __init early_init_parse_cpu_node(unsigned long node)
+{
+	int nid;
+
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+	/* We are scanning "cpu" nodes only */
+	if (type == NULL)
+		return 0;
+	else if (strcmp(type, "cpu") != 0)
+		return 0;
+
+	nid = early_init_of_get_numa_nid(node);
+
+	if (nid == NUMA_NO_NODE)
+		return -EINVAL;
+
+	node_set(nid, numa_nodes_parsed);
+	return 0;
+}
+
+static int __init early_init_parse_memory_node(unsigned long node)
+{
+	const __be32 *reg, *endp;
+	int length;
+	int nid;
+
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL)
+		return 0;
+	else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	nid = early_init_of_get_numa_nid(node);
+
+	if (nid == NUMA_NO_NODE)
+		return -EINVAL;
+
+	reg = of_get_flat_dt_prop(node, "reg", &length);
+	endp = reg + (length / sizeof(__be32));
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+		struct memblock_region *mblk;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+		pr_debug("NUMA-DT:  base = %llx , node = %u\n",
+				base, nid);
+
+		for_each_memblock(memory, mblk) {
+			if (mblk->base == base) {
+				if (numa_add_memblk(nid,
+							mblk->base,
+							mblk->size) < 0)
+					return -EINVAL;
+				break;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int __init early_init_parse_distance_map_v1(unsigned long node,
+		const char *uname)
+{
+
+	const __be32 *prop_dist_matrix;
+	int length = 0, i, matrix_count;
+	int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT;
+
+	pr_info("NUMA: parsing numa,distance-map-v1\n");
+
+	prop_dist_matrix =
+		of_get_flat_dt_prop(node, "distance-matrix", &length);
+
+	if (!length) {
+		pr_err("NUMA: failed to parse distance-matrix\n");
+		return  -ENODEV;
+	}
+
+	matrix_count = ((length / sizeof(__be32)) / (3 * nr_size_cells));
+
+	if ((matrix_count * sizeof(__be32) * 3 * nr_size_cells) !=  length) {
+		pr_warn("NUMA: invalid distance-matrix length %d\n", length);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < matrix_count; i++) {
+		u32 nodea, nodeb, distance;
+
+		nodea = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		nodeb = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		distance = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		numa_set_distance(nodea, nodeb, distance);
+		pr_debug("NUMA-DT:  distance[node%d -> node%d] = %d\n",
+				nodea, nodeb, distance);
+
+		/* Set default distance of node B->A same as A->B */
+		if (nodeb > nodea)
+			numa_set_distance(nodeb, nodea, distance);
+	}
+
+	return 0;
+}
+
+static int __init early_init_parse_distance_map(unsigned long node,
+		const char *uname)
+{
+
+	if (strcmp(uname, "distance-map") != 0)
+		return 0;
+
+	if (of_flat_dt_is_compatible(node, "numa,distance-map-v1"))
+		return early_init_parse_distance_map_v1(node, uname);
+
+	return -EINVAL;
+}
+
+/**
+ * early_init_of_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_of_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	int ret;
+
+	ret = early_init_parse_cpu_node(node);
+
+	if (!ret)
+		ret = early_init_parse_memory_node(node);
+
+	if (!ret)
+		ret = early_init_parse_distance_map(node, uname);
+
+	return ret;
+}
+
+/* DT node mapping is done already early_init_of_scan_memory */
+int __init arm64_of_numa_init(void)
+{
+	return of_scan_flat_dt(early_init_of_scan_numa_map, NULL);
+}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d6e7d6a..a2a8c2d 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -520,6 +520,8 @@ static void __init of_parse_and_init_cpus(void)
 
 		pr_debug("cpu logical map 0x%llx\n", hwid);
 		cpu_logical_map(cpu_count) = hwid;
+		/* map logical cpu to node */
+		of_numa_set_node_info(cpu_count, dn);
 next:
 		cpu_count++;
 	}
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index e3afbf8..209c7a9 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -380,5 +380,13 @@ static int __init dummy_numa_init(void)
  */
 void __init arm64_numa_init(void)
 {
-	numa_init(dummy_numa_init);
+	int ret = -ENODEV;
+
+#ifdef CONFIG_OF_NUMA
+	if (!numa_off)
+		ret = numa_init(arm64_of_numa_init);
+#endif
+
+	if (ret)
+		numa_init(dummy_numa_init);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.
@ 2015-11-17 17:20   ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

Adding numa dt binding support for arm64 based platforms.
dt node parsing for numa topology is done using device property
proximity and device node distance-map.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig            |  10 ++
 arch/arm64/include/asm/numa.h |   9 ++
 arch/arm64/kernel/Makefile    |   1 +
 arch/arm64/kernel/of_numa.c   | 265 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c       |   2 +
 arch/arm64/mm/numa.c          |  10 +-
 6 files changed, 296 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/of_numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7d8fb42..a18a154 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -494,6 +494,16 @@ config NUMA
 	  local memory controller of the CPU and add some more
 	  NUMA awareness to the kernel.
 
+config OF_NUMA
+	bool "Device Tree NUMA support"
+	depends on NUMA
+	depends on OF
+	default y
+	help
+	  Enable Device Tree NUMA support.
+	  This enables the numa mapping of cpu, memory, io and
+	  inter node distances using dt bindings.
+
 config NODES_SHIFT
 	int "Maximum NUMA Nodes (as a power of 2)"
 	range 1 10
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index c00f3a4..b8c2a3f 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -44,4 +44,13 @@ void numa_store_cpu_info(unsigned int cpu);
 static inline void numa_store_cpu_info(unsigned int cpu)		{ }
 static inline void arm64_numa_init(void)		{ }
 #endif	/* CONFIG_NUMA */
+
+struct device_node;
+#ifdef CONFIG_OF_NUMA
+int __init arm64_of_numa_init(void);
+void __init of_numa_set_node_info(unsigned int cpu, struct device_node *dn);
+#else
+static inline void of_numa_set_node_info(unsigned int cpu,
+		struct device_node *dn) { }
+#endif
 #endif	/* _ASM_NUMA_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 474691f..7987763 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -41,6 +41,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-entry.stub.o
 arm64-obj-$(CONFIG_PCI)			+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)		+= acpi.o
+arm64-obj-$(CONFIG_OF_NUMA)		+= of_numa.o
 
 obj-y					+= $(arm64-obj-y) vdso/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/of_numa.c b/arch/arm64/kernel/of_numa.c
new file mode 100644
index 0000000..6ecb9a4
--- /dev/null
+++ b/arch/arm64/kernel/of_numa.c
@@ -0,0 +1,265 @@
+/*
+ * OF NUMA Parsing support.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/ctype.h>
+#include <linux/memblock.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+#include <asm/smp_plat.h>
+
+/* define default numa node to 0 */
+#define DEFAULT_NODE 0
+#define OF_NUMA_PROP "numa-node-id"
+
+/* Returns nid in the range [0..MAX_NUMNODES-1],
+ * or NUMA_NO_NODE if no valid numa-node-id entry found
+ * or DEFAULT_NODE if no numa-node-id entry exists
+ */
+static int of_numa_prop_to_nid(const __be32 *of_numa_prop, int length)
+{
+	int nid;
+
+	if (!of_numa_prop)
+		return DEFAULT_NODE;
+
+	if (length != sizeof(*of_numa_prop)) {
+		pr_warn("NUMA: Invalid of_numa_prop length %d found.\n",
+				length);
+		return NUMA_NO_NODE;
+	}
+
+	nid = of_read_number(of_numa_prop, 1);
+	if (nid >= MAX_NUMNODES) {
+		pr_warn("NUMA: Invalid numa node %d found.\n", nid);
+		return NUMA_NO_NODE;
+	}
+
+	return nid;
+}
+
+/* Must hold reference to node during call */
+static int of_get_numa_nid(struct device_node *device)
+{
+	int length;
+	const __be32 *of_numa_prop;
+
+	of_numa_prop = of_get_property(device, OF_NUMA_PROP, &length);
+
+	return of_numa_prop_to_nid(of_numa_prop, length);
+}
+
+static int __init early_init_of_get_numa_nid(unsigned long node)
+{
+	int length;
+	const __be32 *of_numa_prop;
+
+	of_numa_prop = of_get_flat_dt_prop(node, OF_NUMA_PROP, &length);
+
+	return of_numa_prop_to_nid(of_numa_prop, length);
+}
+
+/* Walk the device tree upwards, looking for a numa-node-id property */
+int of_node_to_nid(struct device_node *device)
+{
+	struct device_node *parent;
+	int nid = NUMA_NO_NODE;
+
+	of_node_get(device);
+	while (device) {
+		const __be32 *of_numa_prop;
+		int length;
+
+		of_numa_prop = of_get_property(device, OF_NUMA_PROP, &length);
+		if (of_numa_prop) {
+			nid = of_numa_prop_to_nid(of_numa_prop, length);
+			break;
+		}
+
+		parent = device;
+		device = of_get_parent(parent);
+		of_node_put(parent);
+	}
+	of_node_put(device);
+
+	return nid;
+}
+
+void __init of_numa_set_node_info(unsigned int cpu, struct device_node *device)
+{
+	int nid = DEFAULT_NODE;
+
+	if (device)
+		nid = of_get_numa_nid(device);
+
+	cpu_to_node_map[cpu] = nid;
+}
+
+/*
+ * Even though we connect cpus to numa domains later in SMP
+ * init, we need to know the node ids now for all cpus.
+*/
+static int __init early_init_parse_cpu_node(unsigned long node)
+{
+	int nid;
+
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+	/* We are scanning "cpu" nodes only */
+	if (type == NULL)
+		return 0;
+	else if (strcmp(type, "cpu") != 0)
+		return 0;
+
+	nid = early_init_of_get_numa_nid(node);
+
+	if (nid == NUMA_NO_NODE)
+		return -EINVAL;
+
+	node_set(nid, numa_nodes_parsed);
+	return 0;
+}
+
+static int __init early_init_parse_memory_node(unsigned long node)
+{
+	const __be32 *reg, *endp;
+	int length;
+	int nid;
+
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL)
+		return 0;
+	else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	nid = early_init_of_get_numa_nid(node);
+
+	if (nid == NUMA_NO_NODE)
+		return -EINVAL;
+
+	reg = of_get_flat_dt_prop(node, "reg", &length);
+	endp = reg + (length / sizeof(__be32));
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+		struct memblock_region *mblk;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+		pr_debug("NUMA-DT:  base = %llx , node = %u\n",
+				base, nid);
+
+		for_each_memblock(memory, mblk) {
+			if (mblk->base == base) {
+				if (numa_add_memblk(nid,
+							mblk->base,
+							mblk->size) < 0)
+					return -EINVAL;
+				break;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int __init early_init_parse_distance_map_v1(unsigned long node,
+		const char *uname)
+{
+
+	const __be32 *prop_dist_matrix;
+	int length = 0, i, matrix_count;
+	int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT;
+
+	pr_info("NUMA: parsing numa,distance-map-v1\n");
+
+	prop_dist_matrix =
+		of_get_flat_dt_prop(node, "distance-matrix", &length);
+
+	if (!length) {
+		pr_err("NUMA: failed to parse distance-matrix\n");
+		return  -ENODEV;
+	}
+
+	matrix_count = ((length / sizeof(__be32)) / (3 * nr_size_cells));
+
+	if ((matrix_count * sizeof(__be32) * 3 * nr_size_cells) !=  length) {
+		pr_warn("NUMA: invalid distance-matrix length %d\n", length);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < matrix_count; i++) {
+		u32 nodea, nodeb, distance;
+
+		nodea = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		nodeb = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		distance = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		numa_set_distance(nodea, nodeb, distance);
+		pr_debug("NUMA-DT:  distance[node%d -> node%d] = %d\n",
+				nodea, nodeb, distance);
+
+		/* Set default distance of node B->A same as A->B */
+		if (nodeb > nodea)
+			numa_set_distance(nodeb, nodea, distance);
+	}
+
+	return 0;
+}
+
+static int __init early_init_parse_distance_map(unsigned long node,
+		const char *uname)
+{
+
+	if (strcmp(uname, "distance-map") != 0)
+		return 0;
+
+	if (of_flat_dt_is_compatible(node, "numa,distance-map-v1"))
+		return early_init_parse_distance_map_v1(node, uname);
+
+	return -EINVAL;
+}
+
+/**
+ * early_init_of_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_of_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	int ret;
+
+	ret = early_init_parse_cpu_node(node);
+
+	if (!ret)
+		ret = early_init_parse_memory_node(node);
+
+	if (!ret)
+		ret = early_init_parse_distance_map(node, uname);
+
+	return ret;
+}
+
+/* DT node mapping is done already early_init_of_scan_memory */
+int __init arm64_of_numa_init(void)
+{
+	return of_scan_flat_dt(early_init_of_scan_numa_map, NULL);
+}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d6e7d6a..a2a8c2d 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -520,6 +520,8 @@ static void __init of_parse_and_init_cpus(void)
 
 		pr_debug("cpu logical map 0x%llx\n", hwid);
 		cpu_logical_map(cpu_count) = hwid;
+		/* map logical cpu to node */
+		of_numa_set_node_info(cpu_count, dn);
 next:
 		cpu_count++;
 	}
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index e3afbf8..209c7a9 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -380,5 +380,13 @@ static int __init dummy_numa_init(void)
  */
 void __init arm64_numa_init(void)
 {
-	numa_init(dummy_numa_init);
+	int ret = -ENODEV;
+
+#ifdef CONFIG_OF_NUMA
+	if (!numa_off)
+		ret = numa_init(arm64_of_numa_init);
+#endif
+
+	if (ret)
+		numa_init(dummy_numa_init);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology.
  2015-11-17 17:20 ` Ganapatrao Kulkarni
@ 2015-11-17 17:20     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8
  Cc: gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Adding dt file for Cavium's Thunderx dual socket platform.

Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 arch/arm64/boot/dts/cavium/Makefile             |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
 3 files changed, 890 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi

diff --git a/arch/arm64/boot/dts/cavium/Makefile b/arch/arm64/boot/dts/cavium/Makefile
index e34f89d..7fe7067 100644
--- a/arch/arm64/boot/dts/cavium/Makefile
+++ b/arch/arm64/boot/dts/cavium/Makefile
@@ -1,4 +1,4 @@
-dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb
+dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb thunder-88xx-2n.dtb
 
 always		:= $(dtb-y)
 subdir-y	:= $(dts-dirs)
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
new file mode 100644
index 0000000..eded594
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
@@ -0,0 +1,83 @@
+/*
+ * Cavium Thunder DTS file - Thunder board description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+/include/ "thunder-88xx-2n.dtsi"
+
+/ {
+	model = "Cavium ThunderX CN88XX board";
+	compatible = "cavium,thunder-88xx";
+
+	aliases {
+		serial0 = &uaa0;
+		serial1 = &uaa1;
+	};
+
+	memory@00000000 {
+		device_type = "memory";
+		reg = <0x0 0x01400000 0x3 0xFEC00000>;
+		/* socket 0 */
+		numa-node-id = <0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00400000 0x3 0xFFC00000>;
+		 /* socket 1 */
+		numa-node-id = <1>;
+	};
+
+	distance-map {
+		compatible = "numa,distance-map-v1";
+		distance-matrix = <0 0  10>,
+				  <0 1  20>,
+				  <1 1  10>;
+	};
+};
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
new file mode 100644
index 0000000..b58e5c7
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
@@ -0,0 +1,806 @@
+/*
+ * Cavium Thunder DTS file - Thunder SoC description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/ {
+	compatible = "cavium,thunder-88xx";
+	interrupt-parent = <&gic0>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	psci {
+		compatible = "arm,psci-0.2";
+		method = "smc";
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu@000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+			/* socket 0 */
+			numa-node-id = <0>;
+		};
+		cpu@001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@00a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@00b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@00c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@00d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@00e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@00f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x100>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x101>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x102>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x103>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x104>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x105>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x106>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x107>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x108>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x109>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10a>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10b>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10c>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10d>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10e>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10f>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x200>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x201>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x202>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x203>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x204>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x205>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x206>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x207>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x208>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x209>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@20a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20a>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@20b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20b>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@20c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20c>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@20d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20d>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@20e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20e>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@20f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20f>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu@10000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10000>;
+			enable-method = "psci";
+			/* socket 1 */
+			numa-node-id = <1>;
+		};
+		cpu@10001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10001>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10002>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10003>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10004>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10005>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10006>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10007>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10008>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10009>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1000a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1000b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1000c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1000d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1000e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1000f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10100>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10101>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10102>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10103>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10104>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10105>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10106>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10107>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10108>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10109>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1010a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1010b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1010c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1010d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1010e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1010f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10200>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10201>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10202>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10203>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10204>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10205>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10206>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10207>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10208>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@10209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10209>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1020a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1020b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1020c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1020d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1020e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu@1020f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+	};
+
+	timer {
+		compatible = "arm,armv8-timer";
+		interrupts = <1 13 0xff01>,
+		             <1 14 0xff01>,
+		             <1 11 0xff01>,
+		             <1 10 0xff01>;
+	};
+
+	soc {
+		compatible = "simple-bus";
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		refclk50mhz: refclk50mhz {
+			compatible = "fixed-clock";
+			#clock-cells = <0>;
+			clock-frequency = <50000000>;
+			clock-output-names = "refclk50mhz";
+		};
+
+		gic0: interrupt-controller@8010,00000000 {
+			compatible = "arm,gic-v3";
+			#interrupt-cells = <3>;
+			#address-cells = <2>;
+			#size-cells = <2>;
+			#redistributor-regions = <2>;
+			ranges;
+			interrupt-controller;
+			reg = <0x8010 0x00000000 0x0 0x010000>, /* GICD */
+			      <0x8010 0x80000000 0x0 0x600000>, /* GICR Node 0 */
+			      <0x9010 0x80000000 0x0 0x600000>; /* GICR Node 1 */
+			interrupts = <1 9 0xf04>;
+
+			its: gic-its@8010,00020000 {
+				compatible = "arm,gic-v3-its";
+				msi-controller;
+				reg = <0x8010 0x20000 0x0 0x200000>;
+				numa-node-id = <0>;
+			};
+
+			its1: gic-its@9010,00020000 {
+				compatible = "arm,gic-v3-its";
+				msi-controller;
+				reg = <0x9010 0x20000 0x0 0x200000>;
+				numa-node-id = <1>;
+			};
+		};
+
+		uaa0: serial@87e0,24000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x24000000 0x0 0x1000>;
+			interrupts = <1 21 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+
+		uaa1: serial@87e0,25000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x25000000 0x0 0x1000>;
+			interrupts = <1 22 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+	};
+};
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v7 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology.
@ 2015-11-17 17:20     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-11-17 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

Adding dt file for Cavium's Thunderx dual socket platform.

Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/boot/dts/cavium/Makefile             |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
 3 files changed, 890 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi

diff --git a/arch/arm64/boot/dts/cavium/Makefile b/arch/arm64/boot/dts/cavium/Makefile
index e34f89d..7fe7067 100644
--- a/arch/arm64/boot/dts/cavium/Makefile
+++ b/arch/arm64/boot/dts/cavium/Makefile
@@ -1,4 +1,4 @@
-dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb
+dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb thunder-88xx-2n.dtb
 
 always		:= $(dtb-y)
 subdir-y	:= $(dts-dirs)
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
new file mode 100644
index 0000000..eded594
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
@@ -0,0 +1,83 @@
+/*
+ * Cavium Thunder DTS file - Thunder board description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+/include/ "thunder-88xx-2n.dtsi"
+
+/ {
+	model = "Cavium ThunderX CN88XX board";
+	compatible = "cavium,thunder-88xx";
+
+	aliases {
+		serial0 = &uaa0;
+		serial1 = &uaa1;
+	};
+
+	memory at 00000000 {
+		device_type = "memory";
+		reg = <0x0 0x01400000 0x3 0xFEC00000>;
+		/* socket 0 */
+		numa-node-id = <0>;
+	};
+
+	memory at 10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00400000 0x3 0xFFC00000>;
+		 /* socket 1 */
+		numa-node-id = <1>;
+	};
+
+	distance-map {
+		compatible = "numa,distance-map-v1";
+		distance-matrix = <0 0  10>,
+				  <0 1  20>,
+				  <1 1  10>;
+	};
+};
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
new file mode 100644
index 0000000..b58e5c7
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
@@ -0,0 +1,806 @@
+/*
+ * Cavium Thunder DTS file - Thunder SoC description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/ {
+	compatible = "cavium,thunder-88xx";
+	interrupt-parent = <&gic0>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	psci {
+		compatible = "arm,psci-0.2";
+		method = "smc";
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu at 000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+			/* socket 0 */
+			numa-node-id = <0>;
+		};
+		cpu at 001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 00a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 00b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 00c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 00d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 00e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 00f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x100>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x101>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x102>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x103>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x104>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x105>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x106>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x107>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x108>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x109>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10a>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10b>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10c>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10d>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10e>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10f>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x200>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x201>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x202>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x203>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x204>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x205>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x206>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x207>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x208>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x209>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 20a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20a>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 20b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20b>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 20c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20c>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 20d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20d>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 20e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20e>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 20f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20f>;
+			enable-method = "psci";
+			numa-node-id = <0>;
+		};
+		cpu at 10000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10000>;
+			enable-method = "psci";
+			/* socket 1 */
+			numa-node-id = <1>;
+		};
+		cpu at 10001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10001>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10002>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10003>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10004>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10005>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10006>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10007>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10008>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10009>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1000a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1000b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1000c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1000d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1000e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1000f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10100>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10101>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10102>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10103>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10104>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10105>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10106>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10107>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10108>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10109>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1010a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1010b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1010c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1010d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1010e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1010f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10200>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10201>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10202>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10203>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10204>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10205>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10206>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10207>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10208>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 10209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10209>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1020a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020a>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1020b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020b>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1020c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020c>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1020d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020d>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1020e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020e>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+		cpu at 1020f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020f>;
+			enable-method = "psci";
+			numa-node-id = <1>;
+		};
+	};
+
+	timer {
+		compatible = "arm,armv8-timer";
+		interrupts = <1 13 0xff01>,
+		             <1 14 0xff01>,
+		             <1 11 0xff01>,
+		             <1 10 0xff01>;
+	};
+
+	soc {
+		compatible = "simple-bus";
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		refclk50mhz: refclk50mhz {
+			compatible = "fixed-clock";
+			#clock-cells = <0>;
+			clock-frequency = <50000000>;
+			clock-output-names = "refclk50mhz";
+		};
+
+		gic0: interrupt-controller at 8010,00000000 {
+			compatible = "arm,gic-v3";
+			#interrupt-cells = <3>;
+			#address-cells = <2>;
+			#size-cells = <2>;
+			#redistributor-regions = <2>;
+			ranges;
+			interrupt-controller;
+			reg = <0x8010 0x00000000 0x0 0x010000>, /* GICD */
+			      <0x8010 0x80000000 0x0 0x600000>, /* GICR Node 0 */
+			      <0x9010 0x80000000 0x0 0x600000>; /* GICR Node 1 */
+			interrupts = <1 9 0xf04>;
+
+			its: gic-its at 8010,00020000 {
+				compatible = "arm,gic-v3-its";
+				msi-controller;
+				reg = <0x8010 0x20000 0x0 0x200000>;
+				numa-node-id = <0>;
+			};
+
+			its1: gic-its at 9010,00020000 {
+				compatible = "arm,gic-v3-its";
+				msi-controller;
+				reg = <0x9010 0x20000 0x0 0x200000>;
+				numa-node-id = <1>;
+			};
+		};
+
+		uaa0: serial at 87e0,24000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x24000000 0x0 0x1000>;
+			interrupts = <1 21 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+
+		uaa1: serial at 87e0,25000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x25000000 0x0 0x1000>;
+			interrupts = <1 22 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+	};
+};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-11-17 17:20   ` Ganapatrao Kulkarni
@ 2015-11-27  8:00       ` Shannon Zhao
  -1 siblings, 0 replies; 38+ messages in thread
From: Shannon Zhao @ 2015-11-27  8:00 UTC (permalink / raw)
  To: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8
  Cc: gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w



On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This patch adds by default the dummy numa node and
> maps all memory and cpus to node 0.
> using this patch, numa can be simulated on single node arm64 platforms.
> 
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>

I've tested this patch on QEMU VM.

Tested-by: Shannon Zhao <shannon.zhao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  arch/arm64/Kconfig              |  25 +++
>  arch/arm64/include/asm/mmzone.h |  17 ++
>  arch/arm64/include/asm/numa.h   |  47 +++++
>  arch/arm64/kernel/setup.c       |   4 +
>  arch/arm64/kernel/smp.c         |   2 +
>  arch/arm64/mm/Makefile          |   1 +
>  arch/arm64/mm/init.c            |  30 +++-
>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 506 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/mm/numa.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9ac16a4..7d8fb42 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -71,6 +71,7 @@ config ARM64
>  	select HAVE_GENERIC_DMA_COHERENT
>  	select HAVE_HW_BREAKPOINT if PERF_EVENTS
>  	select HAVE_MEMBLOCK
> +	select HAVE_MEMBLOCK_NODE_MAP if NUMA
>  	select HAVE_PATA_PLATFORM
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_REGS
> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
>  
> +# Common NUMA Features
> +config NUMA
> +	bool "Numa Memory Allocation and Scheduler Support"
> +	depends on SMP
> +	help
> +	  Enable NUMA (Non Uniform Memory Access) support.
> +
> +	  The kernel will try to allocate memory used by a CPU on the
> +	  local memory controller of the CPU and add some more
> +	  NUMA awareness to the kernel.
> +
> +config NODES_SHIFT
> +	int "Maximum NUMA Nodes (as a power of 2)"
> +	range 1 10
> +	default "2"
> +	depends on NEED_MULTIPLE_NODES
> +	help
> +	  Specify the maximum number of NUMA Nodes available on the target
> +	  system.  Increases memory reserved to accommodate various tables.
> +
> +config USE_PERCPU_NUMA_NODE_ID
> +	def_bool y
> +	depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
>  
> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> new file mode 100644
> index 0000000..6ddd468
> --- /dev/null
> +++ b/arch/arm64/include/asm/mmzone.h
> @@ -0,0 +1,17 @@
> +#ifndef __ASM_ARM64_MMZONE_H_
> +#define __ASM_ARM64_MMZONE_H_
> +
> +#ifdef CONFIG_NUMA
> +
> +#include <linux/mmdebug.h>
> +#include <linux/types.h>
> +
> +#include <asm/smp.h>
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +
> +#define NODE_DATA(nid)		(node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_ARM64_MMZONE_H_ */
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> new file mode 100644
> index 0000000..c00f3a4
> --- /dev/null
> +++ b/arch/arm64/include/asm/numa.h
> @@ -0,0 +1,47 @@
> +#ifndef _ASM_NUMA_H
> +#define _ASM_NUMA_H
> +
> +#include <linux/nodemask.h>
> +#include <asm/topology.h>
> +
> +#ifdef CONFIG_NUMA
> +
> +#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
> +
> +/* currently, arm64 implements flat NUMA topology */
> +#define parent_node(node)	(node)
> +
> +extern int __node_distance(int from, int to);
> +#define node_distance(a, b) __node_distance(a, b)
> +
> +/* dummy definitions for pci functions */
> +#define pcibus_to_node(node)	0
> +#define cpumask_of_pcibus(bus)	0
> +
> +extern int cpu_to_node_map[NR_CPUS];
> +extern nodemask_t numa_nodes_parsed __initdata;
> +
> +/* Mappings between node number and cpus on that node. */
> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +extern void numa_clear_node(unsigned int cpu);
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +extern const struct cpumask *cpumask_of_node(int node);
> +#else
> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> +static inline const struct cpumask *cpumask_of_node(int node)
> +{
> +	return node_to_cpumask_map[node];
> +}
> +#endif
> +
> +void __init arm64_numa_init(void);
> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
> +void __init numa_set_distance(int from, int to, int distance);
> +void __init numa_reset_distance(void);
> +void numa_store_cpu_info(unsigned int cpu);
> +#else	/* CONFIG_NUMA */
> +static inline void numa_store_cpu_info(unsigned int cpu)		{ }
> +static inline void arm64_numa_init(void)		{ }
> +#endif	/* CONFIG_NUMA */
> +#endif	/* _ASM_NUMA_H */
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 8119479..d9b9761 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -53,6 +53,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/kasan.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/smp_plat.h>
> @@ -372,6 +373,9 @@ static int __init topology_init(void)
>  {
>  	int i;
>  
> +	for_each_online_node(i)
> +		register_one_node(i);
> +
>  	for_each_possible_cpu(i) {
>  		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>  		cpu->hotpluggable = 1;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index b1adc51..d6e7d6a 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -45,6 +45,7 @@
>  #include <asm/cputype.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/mmu_context.h>
> +#include <asm/numa.h>
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
>  #include <asm/processor.h>
> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  static void smp_store_cpu_info(unsigned int cpuid)
>  {
>  	store_cpu_topology(cpuid);
> +	numa_store_cpu_info(cpuid);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index 57f57fd..2e57922 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -6,4 +6,5 @@ obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
>  obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
>  
>  obj-$(CONFIG_KASAN)		+= kasan_init.o
> +obj-$(CONFIG_NUMA)		+= numa.o
>  KASAN_SANITIZE_kasan_init.o	:= n
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 17bf39a..8dc9c5d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -37,6 +37,7 @@
>  
>  #include <asm/fixmap.h>
>  #include <asm/memory.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/sizes.h>
> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>  	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> +	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> +
> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
> +	max_zone_pfns[ZONE_NORMAL] = max;
> +
> +	free_area_init_nodes(max_zone_pfns);
> +}
> +
> +#else
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  {
>  	struct memblock_region *reg;
> @@ -116,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  
>  	free_area_init_node(0, zone_size, min, zhole_size);
>  }
> +#endif /* CONFIG_NUMA */
>  
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  int pfn_valid(unsigned long pfn)
> @@ -133,10 +148,15 @@ static void arm64_memory_present(void)
>  static void arm64_memory_present(void)
>  {
>  	struct memblock_region *reg;
> +	int nid = 0;
>  
> -	for_each_memblock(memory, reg)
> -		memory_present(0, memblock_region_memory_base_pfn(reg),
> -			       memblock_region_memory_end_pfn(reg));
> +	for_each_memblock(memory, reg) {
> +#ifdef CONFIG_NUMA
> +		nid = reg->nid;
> +#endif
> +		memory_present(nid, memblock_region_memory_base_pfn(reg),
> +				memblock_region_memory_end_pfn(reg));
> +	}
>  }
>  #endif
>  
> @@ -193,6 +213,9 @@ void __init bootmem_init(void)
>  
>  	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>  
> +	max_pfn = max_low_pfn = max;
> +
> +	arm64_numa_init();
>  	/*
>  	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
>  	 * done after the fixed reservations.
> @@ -203,7 +226,6 @@ void __init bootmem_init(void)
>  	zone_sizes_init(min, max);
>  
>  	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> -	max_pfn = max_low_pfn = max;
>  }
>  
>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> new file mode 100644
> index 0000000..e3afbf8
> --- /dev/null
> +++ b/arch/arm64/mm/numa.c
> @@ -0,0 +1,384 @@
> +/*
> + * NUMA support, based on the x86 implementation.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/bootmem.h>
> +#include <linux/ctype.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/memblock.h>
> +#include <linux/module.h>
> +#include <linux/mmzone.h>
> +#include <linux/nodemask.h>
> +#include <linux/sched.h>
> +#include <linux/string.h>
> +#include <linux/topology.h>
> +
> +#include <asm/smp_plat.h>
> +
> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> +EXPORT_SYMBOL(node_data);
> +nodemask_t numa_nodes_parsed __initdata;
> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
> +
> +static int numa_off;
> +static int numa_distance_cnt;
> +static u8 *numa_distance;
> +
> +static __init int numa_parse_early_param(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +	if (!strncmp(opt, "off", 3)) {
> +		pr_info("%s\n", "NUMA turned off");
> +		numa_off = 1;
> +	}
> +	return 0;
> +}
> +early_param("numa", numa_parse_early_param);
> +
> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +EXPORT_SYMBOL(node_to_cpumask_map);
> +
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +/*
> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> + */
> +const struct cpumask *cpumask_of_node(int node)
> +{
> +
> +	if (WARN_ON(node >= nr_node_ids))
> +		return cpu_none_mask;
> +
> +	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> +		return cpu_online_mask;
> +
> +	return node_to_cpumask_map[node];
> +}
> +EXPORT_SYMBOL(cpumask_of_node);
> +#endif
> +
> +static void map_cpu_to_node(unsigned int cpu, int nid)
> +{
> +	set_cpu_numa_node(cpu, nid);
> +	if (nid >= 0)
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> +}
> +
> +static void unmap_cpu_to_node(unsigned int cpu)
> +{
> +	int nid = cpu_to_node(cpu);
> +
> +	if (nid >= 0)
> +		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> +	set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +}
> +
> +void numa_clear_node(unsigned int cpu)
> +{
> +	unmap_cpu_to_node(cpu);
> +}
> +
> +/*
> + * Allocate node_to_cpumask_map based on number of available nodes
> + * Requires node_possible_map to be valid.
> + *
> + * Note: cpumask_of_node() is not valid until after this is done.
> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> + */
> +static void __init setup_node_to_cpumask_map(void)
> +{
> +	unsigned int cpu;
> +	int node;
> +
> +	/* setup nr_node_ids if not done yet */
> +	if (nr_node_ids == MAX_NUMNODES)
> +		setup_nr_node_ids();
> +
> +	/* allocate and clear the mapping */
> +	for (node = 0; node < nr_node_ids; node++) {
> +		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> +		cpumask_clear(node_to_cpumask_map[node]);
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +
> +	/* cpumask_of_node() will now work */
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> +}
> +
> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(unsigned int cpu)
> +{
> +	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> +}
> +
> +/**
> + * numa_add_memblk - Set node id to memblk
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @size:  Size of the new memblk
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_add_memblk(int nid, u64 start, u64 size)
> +{
> +	int ret;
> +
> +	ret = memblock_set_node(start, size, &memblock.memory, nid);
> +	if (ret < 0) {
> +		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> +			start, (start + size - 1), nid);
> +		return ret;
> +	}
> +
> +	node_set(nid, numa_nodes_parsed);
> +	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> +			start, (start + size - 1), nid);
> +	return ret;
> +}
> +EXPORT_SYMBOL(numa_add_memblk);
> +
> +/* Initialize NODE_DATA for a node on the local memory */
> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> +{
> +	const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
> +	u64 nd_pa;
> +	void *nd;
> +	int tnid;
> +
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +			nid, start_pfn << PAGE_SHIFT,
> +			(end_pfn << PAGE_SHIFT) - 1);
> +
> +	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd = __va(nd_pa);
> +
> +	/* report and initialize */
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +		nd_pa, nd_pa + nd_size - 1);
> +	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> +	if (tnid != nid)
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> +
> +	node_data[nid] = nd;
> +	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> +	NODE_DATA(nid)->node_id = nid;
> +	NODE_DATA(nid)->node_start_pfn = start_pfn;
> +	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
> +}
> +
> +/**
> + * numa_reset_distance - Reset NUMA distance table
> + *
> + * The current table is freed.
> + * The next numa_set_distance() call will create a new one.
> + */
> +void __init numa_reset_distance(void)
> +{
> +	size_t size;
> +
> +	if (!numa_distance)
> +		return;
> +
> +	size = numa_distance_cnt * numa_distance_cnt *
> +		sizeof(numa_distance[0]);
> +
> +	memblock_free(__pa(numa_distance), size);
> +	numa_distance_cnt = 0;
> +	numa_distance = NULL;
> +}
> +
> +static int __init numa_alloc_distance(void)
> +{
> +	size_t size;
> +	u64 phys;
> +	int i, j;
> +
> +	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
> +	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> +				      size, PAGE_SIZE);
> +	if (WARN_ON(!phys))
> +		return -ENOMEM;
> +
> +	memblock_reserve(phys, size);
> +
> +	numa_distance = __va(phys);
> +	numa_distance_cnt = nr_node_ids;
> +
> +	/* fill with the default distances */
> +	for (i = 0; i < numa_distance_cnt; i++)
> +		for (j = 0; j < numa_distance_cnt; j++)
> +			numa_distance[i * numa_distance_cnt + j] = i == j ?
> +				LOCAL_DISTANCE : REMOTE_DISTANCE;
> +
> +	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> +			numa_distance_cnt);
> +
> +	return 0;
> +}
> +
> +/**
> + * numa_set_distance - Set NUMA distance from one NUMA to another
> + * @from: the 'from' node to set distance
> + * @to: the 'to'  node to set distance
> + * @distance: NUMA distance
> + *
> + * Set the distance from node @from to @to to @distance.  If distance table
> + * doesn't exist, one which is large enough to accommodate all the currently
> + * known nodes will be created.
> + *
> + * If such table cannot be allocated, a warning is printed and further
> + * calls are ignored until the distance table is reset with
> + * numa_reset_distance().
> + *
> + * If @from or @to is higher than the highest known node or lower than zero
> + * at the time of table creation or @distance doesn't make sense, the call
> + * is ignored.
> + * This is to allow simplification of specific NUMA config implementations.
> + */
> +void __init numa_set_distance(int from, int to, int distance)
> +{
> +	if (!numa_distance)
> +		return;
> +
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> +			from < 0 || to < 0) {
> +		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +			    from, to, distance);
> +		return;
> +	}
> +
> +	if ((u8)distance != distance ||
> +	    (from == to && distance != LOCAL_DISTANCE)) {
> +		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +			     from, to, distance);
> +		return;
> +	}
> +
> +	numa_distance[from * numa_distance_cnt + to] = distance;
> +}
> +EXPORT_SYMBOL(numa_set_distance);
> +
> +int __node_distance(int from, int to)
> +{
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> +		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	return numa_distance[from * numa_distance_cnt + to];
> +}
> +EXPORT_SYMBOL(__node_distance);
> +
> +static int __init numa_register_nodes(void)
> +{
> +	int nid;
> +	struct memblock_region *mblk;
> +
> +	/* Check that valid nid is set to memblks */
> +	for_each_memblock(memory, mblk)
> +		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
> +			return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, numa_nodes_parsed) {
> +		unsigned long start_pfn, end_pfn;
> +
> +		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> +		setup_node_data(nid, start_pfn, end_pfn);
> +		node_set_online(nid);
> +	}
> +
> +	/* Setup online nodes to actual nodes*/
> +	node_possible_map = numa_nodes_parsed;
> +
> +	/* Dump memblock with node info and return. */
> +	memblock_dump_all();
> +	return 0;
> +}
> +
> +static int __init numa_init(int (*init_func)(void))
> +{
> +	int ret;
> +
> +	nodes_clear(numa_nodes_parsed);
> +	nodes_clear(node_possible_map);
> +	nodes_clear(node_online_map);
> +	numa_reset_distance();
> +
> +	ret = init_func();
> +	if (ret < 0)
> +		return ret;
> +
> +	if (nodes_empty(numa_nodes_parsed))
> +		return -EINVAL;
> +
> +	ret = numa_register_nodes();
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = numa_alloc_distance();
> +	if (ret < 0)
> +		return ret;
> +
> +	setup_node_to_cpumask_map();
> +
> +	/* init boot processor */
> +	cpu_to_node_map[0] = 0;
> +	map_cpu_to_node(0, 0);
> +
> +	return 0;
> +}
> +
> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.
> + */
> +static int __init dummy_numa_init(void)
> +{
> +	struct memblock_region *mblk;
> +
> +	pr_info("%s\n", "No NUMA configuration found");
> +	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> +	       0LLU, PFN_PHYS(max_pfn) - 1);
> +	for_each_memblock(memory, mblk)
> +		numa_add_memblk(0, mblk->base, mblk->size);
> +	numa_off = 1;
> +
> +	return 0;
> +}
> +
> +/**
> + * arm64_numa_init - Initialize NUMA
> + *
> + * Try each configured NUMA initialization method until one succeeds.  The
> + * last fallback is dummy single node config encomapssing whole memory and
> + * never fails.
> + */
> +void __init arm64_numa_init(void)
> +{
> +	numa_init(dummy_numa_init);
> +}
> 

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-11-27  8:00       ` Shannon Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Shannon Zhao @ 2015-11-27  8:00 UTC (permalink / raw)
  To: linux-arm-kernel



On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This patch adds by default the dummy numa node and
> maps all memory and cpus to node 0.
> using this patch, numa can be simulated on single node arm64 platforms.
> 
> Reviewed-by: Robert Richter <rrichter@cavium.com>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>

I've tested this patch on QEMU VM.

Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
> ---
>  arch/arm64/Kconfig              |  25 +++
>  arch/arm64/include/asm/mmzone.h |  17 ++
>  arch/arm64/include/asm/numa.h   |  47 +++++
>  arch/arm64/kernel/setup.c       |   4 +
>  arch/arm64/kernel/smp.c         |   2 +
>  arch/arm64/mm/Makefile          |   1 +
>  arch/arm64/mm/init.c            |  30 +++-
>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 506 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/mm/numa.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9ac16a4..7d8fb42 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -71,6 +71,7 @@ config ARM64
>  	select HAVE_GENERIC_DMA_COHERENT
>  	select HAVE_HW_BREAKPOINT if PERF_EVENTS
>  	select HAVE_MEMBLOCK
> +	select HAVE_MEMBLOCK_NODE_MAP if NUMA
>  	select HAVE_PATA_PLATFORM
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_REGS
> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
>  
> +# Common NUMA Features
> +config NUMA
> +	bool "Numa Memory Allocation and Scheduler Support"
> +	depends on SMP
> +	help
> +	  Enable NUMA (Non Uniform Memory Access) support.
> +
> +	  The kernel will try to allocate memory used by a CPU on the
> +	  local memory controller of the CPU and add some more
> +	  NUMA awareness to the kernel.
> +
> +config NODES_SHIFT
> +	int "Maximum NUMA Nodes (as a power of 2)"
> +	range 1 10
> +	default "2"
> +	depends on NEED_MULTIPLE_NODES
> +	help
> +	  Specify the maximum number of NUMA Nodes available on the target
> +	  system.  Increases memory reserved to accommodate various tables.
> +
> +config USE_PERCPU_NUMA_NODE_ID
> +	def_bool y
> +	depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
>  
> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> new file mode 100644
> index 0000000..6ddd468
> --- /dev/null
> +++ b/arch/arm64/include/asm/mmzone.h
> @@ -0,0 +1,17 @@
> +#ifndef __ASM_ARM64_MMZONE_H_
> +#define __ASM_ARM64_MMZONE_H_
> +
> +#ifdef CONFIG_NUMA
> +
> +#include <linux/mmdebug.h>
> +#include <linux/types.h>
> +
> +#include <asm/smp.h>
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +
> +#define NODE_DATA(nid)		(node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_ARM64_MMZONE_H_ */
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> new file mode 100644
> index 0000000..c00f3a4
> --- /dev/null
> +++ b/arch/arm64/include/asm/numa.h
> @@ -0,0 +1,47 @@
> +#ifndef _ASM_NUMA_H
> +#define _ASM_NUMA_H
> +
> +#include <linux/nodemask.h>
> +#include <asm/topology.h>
> +
> +#ifdef CONFIG_NUMA
> +
> +#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
> +
> +/* currently, arm64 implements flat NUMA topology */
> +#define parent_node(node)	(node)
> +
> +extern int __node_distance(int from, int to);
> +#define node_distance(a, b) __node_distance(a, b)
> +
> +/* dummy definitions for pci functions */
> +#define pcibus_to_node(node)	0
> +#define cpumask_of_pcibus(bus)	0
> +
> +extern int cpu_to_node_map[NR_CPUS];
> +extern nodemask_t numa_nodes_parsed __initdata;
> +
> +/* Mappings between node number and cpus on that node. */
> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +extern void numa_clear_node(unsigned int cpu);
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +extern const struct cpumask *cpumask_of_node(int node);
> +#else
> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> +static inline const struct cpumask *cpumask_of_node(int node)
> +{
> +	return node_to_cpumask_map[node];
> +}
> +#endif
> +
> +void __init arm64_numa_init(void);
> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
> +void __init numa_set_distance(int from, int to, int distance);
> +void __init numa_reset_distance(void);
> +void numa_store_cpu_info(unsigned int cpu);
> +#else	/* CONFIG_NUMA */
> +static inline void numa_store_cpu_info(unsigned int cpu)		{ }
> +static inline void arm64_numa_init(void)		{ }
> +#endif	/* CONFIG_NUMA */
> +#endif	/* _ASM_NUMA_H */
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 8119479..d9b9761 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -53,6 +53,7 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/kasan.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/smp_plat.h>
> @@ -372,6 +373,9 @@ static int __init topology_init(void)
>  {
>  	int i;
>  
> +	for_each_online_node(i)
> +		register_one_node(i);
> +
>  	for_each_possible_cpu(i) {
>  		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>  		cpu->hotpluggable = 1;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index b1adc51..d6e7d6a 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -45,6 +45,7 @@
>  #include <asm/cputype.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/mmu_context.h>
> +#include <asm/numa.h>
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
>  #include <asm/processor.h>
> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  static void smp_store_cpu_info(unsigned int cpuid)
>  {
>  	store_cpu_topology(cpuid);
> +	numa_store_cpu_info(cpuid);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index 57f57fd..2e57922 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -6,4 +6,5 @@ obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
>  obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
>  
>  obj-$(CONFIG_KASAN)		+= kasan_init.o
> +obj-$(CONFIG_NUMA)		+= numa.o
>  KASAN_SANITIZE_kasan_init.o	:= n
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 17bf39a..8dc9c5d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -37,6 +37,7 @@
>  
>  #include <asm/fixmap.h>
>  #include <asm/memory.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/sizes.h>
> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>  	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> +	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> +
> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
> +	max_zone_pfns[ZONE_NORMAL] = max;
> +
> +	free_area_init_nodes(max_zone_pfns);
> +}
> +
> +#else
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  {
>  	struct memblock_region *reg;
> @@ -116,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  
>  	free_area_init_node(0, zone_size, min, zhole_size);
>  }
> +#endif /* CONFIG_NUMA */
>  
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  int pfn_valid(unsigned long pfn)
> @@ -133,10 +148,15 @@ static void arm64_memory_present(void)
>  static void arm64_memory_present(void)
>  {
>  	struct memblock_region *reg;
> +	int nid = 0;
>  
> -	for_each_memblock(memory, reg)
> -		memory_present(0, memblock_region_memory_base_pfn(reg),
> -			       memblock_region_memory_end_pfn(reg));
> +	for_each_memblock(memory, reg) {
> +#ifdef CONFIG_NUMA
> +		nid = reg->nid;
> +#endif
> +		memory_present(nid, memblock_region_memory_base_pfn(reg),
> +				memblock_region_memory_end_pfn(reg));
> +	}
>  }
>  #endif
>  
> @@ -193,6 +213,9 @@ void __init bootmem_init(void)
>  
>  	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>  
> +	max_pfn = max_low_pfn = max;
> +
> +	arm64_numa_init();
>  	/*
>  	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
>  	 * done after the fixed reservations.
> @@ -203,7 +226,6 @@ void __init bootmem_init(void)
>  	zone_sizes_init(min, max);
>  
>  	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> -	max_pfn = max_low_pfn = max;
>  }
>  
>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> new file mode 100644
> index 0000000..e3afbf8
> --- /dev/null
> +++ b/arch/arm64/mm/numa.c
> @@ -0,0 +1,384 @@
> +/*
> + * NUMA support, based on the x86 implementation.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/bootmem.h>
> +#include <linux/ctype.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/memblock.h>
> +#include <linux/module.h>
> +#include <linux/mmzone.h>
> +#include <linux/nodemask.h>
> +#include <linux/sched.h>
> +#include <linux/string.h>
> +#include <linux/topology.h>
> +
> +#include <asm/smp_plat.h>
> +
> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> +EXPORT_SYMBOL(node_data);
> +nodemask_t numa_nodes_parsed __initdata;
> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
> +
> +static int numa_off;
> +static int numa_distance_cnt;
> +static u8 *numa_distance;
> +
> +static __init int numa_parse_early_param(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +	if (!strncmp(opt, "off", 3)) {
> +		pr_info("%s\n", "NUMA turned off");
> +		numa_off = 1;
> +	}
> +	return 0;
> +}
> +early_param("numa", numa_parse_early_param);
> +
> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +EXPORT_SYMBOL(node_to_cpumask_map);
> +
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +/*
> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> + */
> +const struct cpumask *cpumask_of_node(int node)
> +{
> +
> +	if (WARN_ON(node >= nr_node_ids))
> +		return cpu_none_mask;
> +
> +	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> +		return cpu_online_mask;
> +
> +	return node_to_cpumask_map[node];
> +}
> +EXPORT_SYMBOL(cpumask_of_node);
> +#endif
> +
> +static void map_cpu_to_node(unsigned int cpu, int nid)
> +{
> +	set_cpu_numa_node(cpu, nid);
> +	if (nid >= 0)
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> +}
> +
> +static void unmap_cpu_to_node(unsigned int cpu)
> +{
> +	int nid = cpu_to_node(cpu);
> +
> +	if (nid >= 0)
> +		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> +	set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +}
> +
> +void numa_clear_node(unsigned int cpu)
> +{
> +	unmap_cpu_to_node(cpu);
> +}
> +
> +/*
> + * Allocate node_to_cpumask_map based on number of available nodes
> + * Requires node_possible_map to be valid.
> + *
> + * Note: cpumask_of_node() is not valid until after this is done.
> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> + */
> +static void __init setup_node_to_cpumask_map(void)
> +{
> +	unsigned int cpu;
> +	int node;
> +
> +	/* setup nr_node_ids if not done yet */
> +	if (nr_node_ids == MAX_NUMNODES)
> +		setup_nr_node_ids();
> +
> +	/* allocate and clear the mapping */
> +	for (node = 0; node < nr_node_ids; node++) {
> +		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> +		cpumask_clear(node_to_cpumask_map[node]);
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +
> +	/* cpumask_of_node() will now work */
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> +}
> +
> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(unsigned int cpu)
> +{
> +	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> +}
> +
> +/**
> + * numa_add_memblk - Set node id to memblk
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @size:  Size of the new memblk
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_add_memblk(int nid, u64 start, u64 size)
> +{
> +	int ret;
> +
> +	ret = memblock_set_node(start, size, &memblock.memory, nid);
> +	if (ret < 0) {
> +		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> +			start, (start + size - 1), nid);
> +		return ret;
> +	}
> +
> +	node_set(nid, numa_nodes_parsed);
> +	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> +			start, (start + size - 1), nid);
> +	return ret;
> +}
> +EXPORT_SYMBOL(numa_add_memblk);
> +
> +/* Initialize NODE_DATA for a node on the local memory */
> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> +{
> +	const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
> +	u64 nd_pa;
> +	void *nd;
> +	int tnid;
> +
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +			nid, start_pfn << PAGE_SHIFT,
> +			(end_pfn << PAGE_SHIFT) - 1);
> +
> +	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd = __va(nd_pa);
> +
> +	/* report and initialize */
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +		nd_pa, nd_pa + nd_size - 1);
> +	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> +	if (tnid != nid)
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> +
> +	node_data[nid] = nd;
> +	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> +	NODE_DATA(nid)->node_id = nid;
> +	NODE_DATA(nid)->node_start_pfn = start_pfn;
> +	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
> +}
> +
> +/**
> + * numa_reset_distance - Reset NUMA distance table
> + *
> + * The current table is freed.
> + * The next numa_set_distance() call will create a new one.
> + */
> +void __init numa_reset_distance(void)
> +{
> +	size_t size;
> +
> +	if (!numa_distance)
> +		return;
> +
> +	size = numa_distance_cnt * numa_distance_cnt *
> +		sizeof(numa_distance[0]);
> +
> +	memblock_free(__pa(numa_distance), size);
> +	numa_distance_cnt = 0;
> +	numa_distance = NULL;
> +}
> +
> +static int __init numa_alloc_distance(void)
> +{
> +	size_t size;
> +	u64 phys;
> +	int i, j;
> +
> +	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
> +	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> +				      size, PAGE_SIZE);
> +	if (WARN_ON(!phys))
> +		return -ENOMEM;
> +
> +	memblock_reserve(phys, size);
> +
> +	numa_distance = __va(phys);
> +	numa_distance_cnt = nr_node_ids;
> +
> +	/* fill with the default distances */
> +	for (i = 0; i < numa_distance_cnt; i++)
> +		for (j = 0; j < numa_distance_cnt; j++)
> +			numa_distance[i * numa_distance_cnt + j] = i == j ?
> +				LOCAL_DISTANCE : REMOTE_DISTANCE;
> +
> +	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> +			numa_distance_cnt);
> +
> +	return 0;
> +}
> +
> +/**
> + * numa_set_distance - Set NUMA distance from one NUMA to another
> + * @from: the 'from' node to set distance
> + * @to: the 'to'  node to set distance
> + * @distance: NUMA distance
> + *
> + * Set the distance from node @from to @to to @distance.  If distance table
> + * doesn't exist, one which is large enough to accommodate all the currently
> + * known nodes will be created.
> + *
> + * If such table cannot be allocated, a warning is printed and further
> + * calls are ignored until the distance table is reset with
> + * numa_reset_distance().
> + *
> + * If @from or @to is higher than the highest known node or lower than zero
> + * at the time of table creation or @distance doesn't make sense, the call
> + * is ignored.
> + * This is to allow simplification of specific NUMA config implementations.
> + */
> +void __init numa_set_distance(int from, int to, int distance)
> +{
> +	if (!numa_distance)
> +		return;
> +
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> +			from < 0 || to < 0) {
> +		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +			    from, to, distance);
> +		return;
> +	}
> +
> +	if ((u8)distance != distance ||
> +	    (from == to && distance != LOCAL_DISTANCE)) {
> +		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +			     from, to, distance);
> +		return;
> +	}
> +
> +	numa_distance[from * numa_distance_cnt + to] = distance;
> +}
> +EXPORT_SYMBOL(numa_set_distance);
> +
> +int __node_distance(int from, int to)
> +{
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> +		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	return numa_distance[from * numa_distance_cnt + to];
> +}
> +EXPORT_SYMBOL(__node_distance);
> +
> +static int __init numa_register_nodes(void)
> +{
> +	int nid;
> +	struct memblock_region *mblk;
> +
> +	/* Check that valid nid is set to memblks */
> +	for_each_memblock(memory, mblk)
> +		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
> +			return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, numa_nodes_parsed) {
> +		unsigned long start_pfn, end_pfn;
> +
> +		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> +		setup_node_data(nid, start_pfn, end_pfn);
> +		node_set_online(nid);
> +	}
> +
> +	/* Setup online nodes to actual nodes*/
> +	node_possible_map = numa_nodes_parsed;
> +
> +	/* Dump memblock with node info and return. */
> +	memblock_dump_all();
> +	return 0;
> +}
> +
> +static int __init numa_init(int (*init_func)(void))
> +{
> +	int ret;
> +
> +	nodes_clear(numa_nodes_parsed);
> +	nodes_clear(node_possible_map);
> +	nodes_clear(node_online_map);
> +	numa_reset_distance();
> +
> +	ret = init_func();
> +	if (ret < 0)
> +		return ret;
> +
> +	if (nodes_empty(numa_nodes_parsed))
> +		return -EINVAL;
> +
> +	ret = numa_register_nodes();
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = numa_alloc_distance();
> +	if (ret < 0)
> +		return ret;
> +
> +	setup_node_to_cpumask_map();
> +
> +	/* init boot processor */
> +	cpu_to_node_map[0] = 0;
> +	map_cpu_to_node(0, 0);
> +
> +	return 0;
> +}
> +
> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.
> + */
> +static int __init dummy_numa_init(void)
> +{
> +	struct memblock_region *mblk;
> +
> +	pr_info("%s\n", "No NUMA configuration found");
> +	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> +	       0LLU, PFN_PHYS(max_pfn) - 1);
> +	for_each_memblock(memory, mblk)
> +		numa_add_memblk(0, mblk->base, mblk->size);
> +	numa_off = 1;
> +
> +	return 0;
> +}
> +
> +/**
> + * arm64_numa_init - Initialize NUMA
> + *
> + * Try each configured NUMA initialization method until one succeeds.  The
> + * last fallback is dummy single node config encomapssing whole memory and
> + * never fails.
> + */
> +void __init arm64_numa_init(void)
> +{
> +	numa_init(dummy_numa_init);
> +}
> 

-- 
Shannon

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.
  2015-11-17 17:20   ` Ganapatrao Kulkarni
@ 2015-11-28  9:30       ` Shannon Zhao
  -1 siblings, 0 replies; 38+ messages in thread
From: Shannon Zhao @ 2015-11-28  9:30 UTC (permalink / raw)
  To: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8
  Cc: gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w



On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
> +static int __init early_init_parse_memory_node(unsigned long node)
> +{
> +	const __be32 *reg, *endp;
> +	int length;
> +	int nid;
> +
> +	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +
> +	/* We are scanning "memory" nodes only */
> +	if (type == NULL)
> +		return 0;
> +	else if (strcmp(type, "memory") != 0)
> +		return 0;
> +
> +	nid = early_init_of_get_numa_nid(node);
> +
> +	if (nid == NUMA_NO_NODE)
> +		return -EINVAL;
> +
> +	reg = of_get_flat_dt_prop(node, "reg", &length);
> +	endp = reg + (length / sizeof(__be32));
> +
> +	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
> +		u64 base, size;
> +		struct memblock_region *mblk;
> +
> +		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +		size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +		pr_debug("NUMA-DT:  base = %llx , node = %u\n",
> +				base, nid);
> +
> +		for_each_memblock(memory, mblk) {
> +			if (mblk->base == base) {
> +				if (numa_add_memblk(nid,
> +							mblk->base,
> +							mblk->size) < 0)
> +					return -EINVAL;
> +				break;
> +			}
> +		}

Maybe this is not right. If the memory spaces of NUMA nodes are
continuous like below:

        memory@60000000 {
                numa-node-id = <0x1>;
                reg = <0x0 0x60000000 0x0 0x20000000>;
                device_type = "memory";
        };

        memory@40000000 {
                numa-node-id = <0x0>;
                reg = <0x0 0x40000000 0x0 0x20000000>;
                device_type = "memory";
        };

There is only one memory region [0x00000040000000-0x0000007fffffff] and
the mblk->base is 40000000, so it will not add the memory node 1.

I think this should do the same thing like ACPI_NUMA but add some codes
to check if the [base, base + size] is located in some memory region.
Or don't check because numa_add_memblk will fail if the [base, base +
size] is not located in some memory region.

Thanks,
-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.
@ 2015-11-28  9:30       ` Shannon Zhao
  0 siblings, 0 replies; 38+ messages in thread
From: Shannon Zhao @ 2015-11-28  9:30 UTC (permalink / raw)
  To: linux-arm-kernel



On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
> +static int __init early_init_parse_memory_node(unsigned long node)
> +{
> +	const __be32 *reg, *endp;
> +	int length;
> +	int nid;
> +
> +	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +
> +	/* We are scanning "memory" nodes only */
> +	if (type == NULL)
> +		return 0;
> +	else if (strcmp(type, "memory") != 0)
> +		return 0;
> +
> +	nid = early_init_of_get_numa_nid(node);
> +
> +	if (nid == NUMA_NO_NODE)
> +		return -EINVAL;
> +
> +	reg = of_get_flat_dt_prop(node, "reg", &length);
> +	endp = reg + (length / sizeof(__be32));
> +
> +	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
> +		u64 base, size;
> +		struct memblock_region *mblk;
> +
> +		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> +		size = dt_mem_next_cell(dt_root_size_cells, &reg);
> +		pr_debug("NUMA-DT:  base = %llx , node = %u\n",
> +				base, nid);
> +
> +		for_each_memblock(memory, mblk) {
> +			if (mblk->base == base) {
> +				if (numa_add_memblk(nid,
> +							mblk->base,
> +							mblk->size) < 0)
> +					return -EINVAL;
> +				break;
> +			}
> +		}

Maybe this is not right. If the memory spaces of NUMA nodes are
continuous like below:

        memory at 60000000 {
                numa-node-id = <0x1>;
                reg = <0x0 0x60000000 0x0 0x20000000>;
                device_type = "memory";
        };

        memory at 40000000 {
                numa-node-id = <0x0>;
                reg = <0x0 0x40000000 0x0 0x20000000>;
                device_type = "memory";
        };

There is only one memory region [0x00000040000000-0x0000007fffffff] and
the mblk->base is 40000000, so it will not add the memory node 1.

I think this should do the same thing like ACPI_NUMA but add some codes
to check if the [base, base + size] is located in some memory region.
Or don't check because numa_add_memblk will fail if the [base, base +
size] is not located in some memory region.

Thanks,
-- 
Shannon

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.
  2015-11-28  9:30       ` Shannon Zhao
@ 2015-12-01  8:43           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-01  8:43 UTC (permalink / raw)
  To: Shannon Zhao
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Rafael J. Wysocki,
	Len Brown, Marc Zyngier

On Sat, Nov 28, 2015 at 3:00 PM, Shannon Zhao <zhaoshenglong-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>
>
> On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
>> +static int __init early_init_parse_memory_node(unsigned long node)
>> +{
>> +     const __be32 *reg, *endp;
>> +     int length;
>> +     int nid;
>> +
>> +     const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>> +
>> +     /* We are scanning "memory" nodes only */
>> +     if (type == NULL)
>> +             return 0;
>> +     else if (strcmp(type, "memory") != 0)
>> +             return 0;
>> +
>> +     nid = early_init_of_get_numa_nid(node);
>> +
>> +     if (nid == NUMA_NO_NODE)
>> +             return -EINVAL;
>> +
>> +     reg = of_get_flat_dt_prop(node, "reg", &length);
>> +     endp = reg + (length / sizeof(__be32));
>> +
>> +     while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
>> +             u64 base, size;
>> +             struct memblock_region *mblk;
>> +
>> +             base = dt_mem_next_cell(dt_root_addr_cells, &reg);
>> +             size = dt_mem_next_cell(dt_root_size_cells, &reg);
>> +             pr_debug("NUMA-DT:  base = %llx , node = %u\n",
>> +                             base, nid);
>> +
>> +             for_each_memblock(memory, mblk) {
>> +                     if (mblk->base == base) {
>> +                             if (numa_add_memblk(nid,
>> +                                                     mblk->base,
>> +                                                     mblk->size) < 0)
>> +                                     return -EINVAL;
>> +                             break;
>> +                     }
>> +             }
>
> Maybe this is not right. If the memory spaces of NUMA nodes are
> continuous like below:
>
>         memory@60000000 {
>                 numa-node-id = <0x1>;
>                 reg = <0x0 0x60000000 0x0 0x20000000>;
>                 device_type = "memory";
>         };
>
>         memory@40000000 {
>                 numa-node-id = <0x0>;
>                 reg = <0x0 0x40000000 0x0 0x20000000>;
>                 device_type = "memory";
>         };
>
> There is only one memory region [0x00000040000000-0x0000007fffffff] and
> the mblk->base is 40000000, so it will not add the memory node 1.
>
> I think this should do the same thing like ACPI_NUMA but add some codes
> to check if the [base, base + size] is located in some memory region.
> Or don't check because numa_add_memblk will fail if the [base, base +
> size] is not located in some memory region.
i am not sure, this kind of mapping we can have, since memblock can only map
contiguous region to only one numa node. however i can add a check as
you suggested.
Thanks for the review!.
>
> Thanks,
> --
> Shannon
>
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms.
@ 2015-12-01  8:43           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-01  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Nov 28, 2015 at 3:00 PM, Shannon Zhao <zhaoshenglong@huawei.com> wrote:
>
>
> On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
>> +static int __init early_init_parse_memory_node(unsigned long node)
>> +{
>> +     const __be32 *reg, *endp;
>> +     int length;
>> +     int nid;
>> +
>> +     const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>> +
>> +     /* We are scanning "memory" nodes only */
>> +     if (type == NULL)
>> +             return 0;
>> +     else if (strcmp(type, "memory") != 0)
>> +             return 0;
>> +
>> +     nid = early_init_of_get_numa_nid(node);
>> +
>> +     if (nid == NUMA_NO_NODE)
>> +             return -EINVAL;
>> +
>> +     reg = of_get_flat_dt_prop(node, "reg", &length);
>> +     endp = reg + (length / sizeof(__be32));
>> +
>> +     while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
>> +             u64 base, size;
>> +             struct memblock_region *mblk;
>> +
>> +             base = dt_mem_next_cell(dt_root_addr_cells, &reg);
>> +             size = dt_mem_next_cell(dt_root_size_cells, &reg);
>> +             pr_debug("NUMA-DT:  base = %llx , node = %u\n",
>> +                             base, nid);
>> +
>> +             for_each_memblock(memory, mblk) {
>> +                     if (mblk->base == base) {
>> +                             if (numa_add_memblk(nid,
>> +                                                     mblk->base,
>> +                                                     mblk->size) < 0)
>> +                                     return -EINVAL;
>> +                             break;
>> +                     }
>> +             }
>
> Maybe this is not right. If the memory spaces of NUMA nodes are
> continuous like below:
>
>         memory at 60000000 {
>                 numa-node-id = <0x1>;
>                 reg = <0x0 0x60000000 0x0 0x20000000>;
>                 device_type = "memory";
>         };
>
>         memory at 40000000 {
>                 numa-node-id = <0x0>;
>                 reg = <0x0 0x40000000 0x0 0x20000000>;
>                 device_type = "memory";
>         };
>
> There is only one memory region [0x00000040000000-0x0000007fffffff] and
> the mblk->base is 40000000, so it will not add the memory node 1.
>
> I think this should do the same thing like ACPI_NUMA but add some codes
> to check if the [base, base + size] is located in some memory region.
> Or don't check because numa_add_memblk will fail if the [base, base +
> size] is not located in some memory region.
i am not sure, this kind of mapping we can have, since memblock can only map
contiguous region to only one numa node. however i can add a check as
you suggested.
Thanks for the review!.
>
> Thanks,
> --
> Shannon
>
thanks
Ganapat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-11-27  8:00       ` Shannon Zhao
@ 2015-12-01  8:45           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-01  8:45 UTC (permalink / raw)
  To: Shannon Zhao
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Rafael J. Wysocki,
	Len Brown, Marc Zyngier

On Fri, Nov 27, 2015 at 1:30 PM, Shannon Zhao <zhaoshenglong-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>
>
> On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This patch adds by default the dummy numa node and
>> maps all memory and cpus to node 0.
>> using this patch, numa can be simulated on single node arm64 platforms.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>
> I've tested this patch on QEMU VM.
>
> Tested-by: Shannon Zhao <shannon.zhao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
thanks a lot!
>> ---
>>  arch/arm64/Kconfig              |  25 +++
>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>  arch/arm64/include/asm/numa.h   |  47 +++++
>>  arch/arm64/kernel/setup.c       |   4 +
>>  arch/arm64/kernel/smp.c         |   2 +
>>  arch/arm64/mm/Makefile          |   1 +
>>  arch/arm64/mm/init.c            |  30 +++-
>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 506 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>  create mode 100644 arch/arm64/include/asm/numa.h
>>  create mode 100644 arch/arm64/mm/numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 9ac16a4..7d8fb42 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -71,6 +71,7 @@ config ARM64
>>       select HAVE_GENERIC_DMA_COHERENT
>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>       select HAVE_MEMBLOCK
>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>       select HAVE_PATA_PLATFORM
>>       select HAVE_PERF_EVENTS
>>       select HAVE_PERF_REGS
>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>         can be controlled through /sys/devices/system/cpu.
>>
>> +# Common NUMA Features
>> +config NUMA
>> +     bool "Numa Memory Allocation and Scheduler Support"
>> +     depends on SMP
>> +     help
>> +       Enable NUMA (Non Uniform Memory Access) support.
>> +
>> +       The kernel will try to allocate memory used by a CPU on the
>> +       local memory controller of the CPU and add some more
>> +       NUMA awareness to the kernel.
>> +
>> +config NODES_SHIFT
>> +     int "Maximum NUMA Nodes (as a power of 2)"
>> +     range 1 10
>> +     default "2"
>> +     depends on NEED_MULTIPLE_NODES
>> +     help
>> +       Specify the maximum number of NUMA Nodes available on the target
>> +       system.  Increases memory reserved to accommodate various tables.
>> +
>> +config USE_PERCPU_NUMA_NODE_ID
>> +     def_bool y
>> +     depends on NUMA
>> +
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> new file mode 100644
>> index 0000000..6ddd468
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mmzone.h
>> @@ -0,0 +1,17 @@
>> +#ifndef __ASM_ARM64_MMZONE_H_
>> +#define __ASM_ARM64_MMZONE_H_
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#include <linux/mmdebug.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/smp.h>
>> +#include <asm/numa.h>
>> +
>> +extern struct pglist_data *node_data[];
>> +
>> +#define NODE_DATA(nid)               (node_data[(nid)])
>> +
>> +#endif /* CONFIG_NUMA */
>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> new file mode 100644
>> index 0000000..c00f3a4
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -0,0 +1,47 @@
>> +#ifndef _ASM_NUMA_H
>> +#define _ASM_NUMA_H
>> +
>> +#include <linux/nodemask.h>
>> +#include <asm/topology.h>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>> +
>> +/* currently, arm64 implements flat NUMA topology */
>> +#define parent_node(node)    (node)
>> +
>> +extern int __node_distance(int from, int to);
>> +#define node_distance(a, b) __node_distance(a, b)
>> +
>> +/* dummy definitions for pci functions */
>> +#define pcibus_to_node(node) 0
>> +#define cpumask_of_pcibus(bus)       0
>> +
>> +extern int cpu_to_node_map[NR_CPUS];
>> +extern nodemask_t numa_nodes_parsed __initdata;
>> +
>> +/* Mappings between node number and cpus on that node. */
>> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +extern void numa_clear_node(unsigned int cpu);
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +extern const struct cpumask *cpumask_of_node(int node);
>> +#else
>> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>> +static inline const struct cpumask *cpumask_of_node(int node)
>> +{
>> +     return node_to_cpumask_map[node];
>> +}
>> +#endif
>> +
>> +void __init arm64_numa_init(void);
>> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
>> +void __init numa_set_distance(int from, int to, int distance);
>> +void __init numa_reset_distance(void);
>> +void numa_store_cpu_info(unsigned int cpu);
>> +#else        /* CONFIG_NUMA */
>> +static inline void numa_store_cpu_info(unsigned int cpu)             { }
>> +static inline void arm64_numa_init(void)             { }
>> +#endif       /* CONFIG_NUMA */
>> +#endif       /* _ASM_NUMA_H */
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 8119479..d9b9761 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -53,6 +53,7 @@
>>  #include <asm/cpufeature.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/kasan.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/smp_plat.h>
>> @@ -372,6 +373,9 @@ static int __init topology_init(void)
>>  {
>>       int i;
>>
>> +     for_each_online_node(i)
>> +             register_one_node(i);
>> +
>>       for_each_possible_cpu(i) {
>>               struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>>               cpu->hotpluggable = 1;
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index b1adc51..d6e7d6a 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -45,6 +45,7 @@
>>  #include <asm/cputype.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/mmu_context.h>
>> +#include <asm/numa.h>
>>  #include <asm/pgtable.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/processor.h>
>> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>  static void smp_store_cpu_info(unsigned int cpuid)
>>  {
>>       store_cpu_topology(cpuid);
>> +     numa_store_cpu_info(cpuid);
>>  }
>>
>>  /*
>> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
>> index 57f57fd..2e57922 100644
>> --- a/arch/arm64/mm/Makefile
>> +++ b/arch/arm64/mm/Makefile
>> @@ -6,4 +6,5 @@ obj-$(CONFIG_HUGETLB_PAGE)    += hugetlbpage.o
>>  obj-$(CONFIG_ARM64_PTDUMP)   += dump.o
>>
>>  obj-$(CONFIG_KASAN)          += kasan_init.o
>> +obj-$(CONFIG_NUMA)           += numa.o
>>  KASAN_SANITIZE_kasan_init.o  := n
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 17bf39a..8dc9c5d 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -37,6 +37,7 @@
>>
>>  #include <asm/fixmap.h>
>>  #include <asm/memory.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/sizes.h>
>> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>  }
>>
>> +#ifdef CONFIG_NUMA
>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>> +{
>> +     unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>> +
>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>> +     max_zone_pfns[ZONE_NORMAL] = max;
>> +
>> +     free_area_init_nodes(max_zone_pfns);
>> +}
>> +
>> +#else
>>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>  {
>>       struct memblock_region *reg;
>> @@ -116,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>
>>       free_area_init_node(0, zone_size, min, zhole_size);
>>  }
>> +#endif /* CONFIG_NUMA */
>>
>>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>  int pfn_valid(unsigned long pfn)
>> @@ -133,10 +148,15 @@ static void arm64_memory_present(void)
>>  static void arm64_memory_present(void)
>>  {
>>       struct memblock_region *reg;
>> +     int nid = 0;
>>
>> -     for_each_memblock(memory, reg)
>> -             memory_present(0, memblock_region_memory_base_pfn(reg),
>> -                            memblock_region_memory_end_pfn(reg));
>> +     for_each_memblock(memory, reg) {
>> +#ifdef CONFIG_NUMA
>> +             nid = reg->nid;
>> +#endif
>> +             memory_present(nid, memblock_region_memory_base_pfn(reg),
>> +                             memblock_region_memory_end_pfn(reg));
>> +     }
>>  }
>>  #endif
>>
>> @@ -193,6 +213,9 @@ void __init bootmem_init(void)
>>
>>       early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>>
>> +     max_pfn = max_low_pfn = max;
>> +
>> +     arm64_numa_init();
>>       /*
>>        * Sparsemem tries to allocate bootmem in memory_present(), so must be
>>        * done after the fixed reservations.
>> @@ -203,7 +226,6 @@ void __init bootmem_init(void)
>>       zone_sizes_init(min, max);
>>
>>       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> -     max_pfn = max_low_pfn = max;
>>  }
>>
>>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> new file mode 100644
>> index 0000000..e3afbf8
>> --- /dev/null
>> +++ b/arch/arm64/mm/numa.c
>> @@ -0,0 +1,384 @@
>> +/*
>> + * NUMA support, based on the x86 implementation.
>> + *
>> + * Copyright (C) 2015 Cavium Inc.
>> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/bootmem.h>
>> +#include <linux/ctype.h>
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/memblock.h>
>> +#include <linux/module.h>
>> +#include <linux/mmzone.h>
>> +#include <linux/nodemask.h>
>> +#include <linux/sched.h>
>> +#include <linux/string.h>
>> +#include <linux/topology.h>
>> +
>> +#include <asm/smp_plat.h>
>> +
>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>> +EXPORT_SYMBOL(node_data);
>> +nodemask_t numa_nodes_parsed __initdata;
>> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
>> +
>> +static int numa_off;
>> +static int numa_distance_cnt;
>> +static u8 *numa_distance;
>> +
>> +static __init int numa_parse_early_param(char *opt)
>> +{
>> +     if (!opt)
>> +             return -EINVAL;
>> +     if (!strncmp(opt, "off", 3)) {
>> +             pr_info("%s\n", "NUMA turned off");
>> +             numa_off = 1;
>> +     }
>> +     return 0;
>> +}
>> +early_param("numa", numa_parse_early_param);
>> +
>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +EXPORT_SYMBOL(node_to_cpumask_map);
>> +
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +/*
>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> + */
>> +const struct cpumask *cpumask_of_node(int node)
>> +{
>> +
>> +     if (WARN_ON(node >= nr_node_ids))
>> +             return cpu_none_mask;
>> +
>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> +             return cpu_online_mask;
>> +
>> +     return node_to_cpumask_map[node];
>> +}
>> +EXPORT_SYMBOL(cpumask_of_node);
>> +#endif
>> +
>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> +{
>> +     set_cpu_numa_node(cpu, nid);
>> +     if (nid >= 0)
>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> +}
>> +
>> +static void unmap_cpu_to_node(unsigned int cpu)
>> +{
>> +     int nid = cpu_to_node(cpu);
>> +
>> +     if (nid >= 0)
>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +}
>> +
>> +void numa_clear_node(unsigned int cpu)
>> +{
>> +     unmap_cpu_to_node(cpu);
>> +}
>> +
>> +/*
>> + * Allocate node_to_cpumask_map based on number of available nodes
>> + * Requires node_possible_map to be valid.
>> + *
>> + * Note: cpumask_of_node() is not valid until after this is done.
>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> + */
>> +static void __init setup_node_to_cpumask_map(void)
>> +{
>> +     unsigned int cpu;
>> +     int node;
>> +
>> +     /* setup nr_node_ids if not done yet */
>> +     if (nr_node_ids == MAX_NUMNODES)
>> +             setup_nr_node_ids();
>> +
>> +     /* allocate and clear the mapping */
>> +     for (node = 0; node < nr_node_ids; node++) {
>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> +             cpumask_clear(node_to_cpumask_map[node]);
>> +     }
>> +
>> +     for_each_possible_cpu(cpu)
>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +
>> +     /* cpumask_of_node() will now work */
>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> +}
>> +
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(unsigned int cpu)
>> +{
>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> +}
>> +
>> +/**
>> + * numa_add_memblk - Set node id to memblk
>> + * @nid: NUMA node ID of the new memblk
>> + * @start: Start address of the new memblk
>> + * @size:  Size of the new memblk
>> + *
>> + * RETURNS:
>> + * 0 on success, -errno on failure.
>> + */
>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>> +{
>> +     int ret;
>> +
>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>> +     if (ret < 0) {
>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +             return ret;
>> +     }
>> +
>> +     node_set(nid, numa_nodes_parsed);
>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(numa_add_memblk);
>> +
>> +/* Initialize NODE_DATA for a node on the local memory */
>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> +{
>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>> +     u64 nd_pa;
>> +     void *nd;
>> +     int tnid;
>> +
>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +                     nid, start_pfn << PAGE_SHIFT,
>> +                     (end_pfn << PAGE_SHIFT) - 1);
>> +
>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +     nd = __va(nd_pa);
>> +
>> +     /* report and initialize */
>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +             nd_pa, nd_pa + nd_size - 1);
>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> +     if (tnid != nid)
>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> +
>> +     node_data[nid] = nd;
>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> +     NODE_DATA(nid)->node_id = nid;
>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>> +}
>> +
>> +/**
>> + * numa_reset_distance - Reset NUMA distance table
>> + *
>> + * The current table is freed.
>> + * The next numa_set_distance() call will create a new one.
>> + */
>> +void __init numa_reset_distance(void)
>> +{
>> +     size_t size;
>> +
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     size = numa_distance_cnt * numa_distance_cnt *
>> +             sizeof(numa_distance[0]);
>> +
>> +     memblock_free(__pa(numa_distance), size);
>> +     numa_distance_cnt = 0;
>> +     numa_distance = NULL;
>> +}
>> +
>> +static int __init numa_alloc_distance(void)
>> +{
>> +     size_t size;
>> +     u64 phys;
>> +     int i, j;
>> +
>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> +                                   size, PAGE_SIZE);
>> +     if (WARN_ON(!phys))
>> +             return -ENOMEM;
>> +
>> +     memblock_reserve(phys, size);
>> +
>> +     numa_distance = __va(phys);
>> +     numa_distance_cnt = nr_node_ids;
>> +
>> +     /* fill with the default distances */
>> +     for (i = 0; i < numa_distance_cnt; i++)
>> +             for (j = 0; j < numa_distance_cnt; j++)
>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +
>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> +                     numa_distance_cnt);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> + * @from: the 'from' node to set distance
>> + * @to: the 'to'  node to set distance
>> + * @distance: NUMA distance
>> + *
>> + * Set the distance from node @from to @to to @distance.  If distance table
>> + * doesn't exist, one which is large enough to accommodate all the currently
>> + * known nodes will be created.
>> + *
>> + * If such table cannot be allocated, a warning is printed and further
>> + * calls are ignored until the distance table is reset with
>> + * numa_reset_distance().
>> + *
>> + * If @from or @to is higher than the highest known node or lower than zero
>> + * at the time of table creation or @distance doesn't make sense, the call
>> + * is ignored.
>> + * This is to allow simplification of specific NUMA config implementations.
>> + */
>> +void __init numa_set_distance(int from, int to, int distance)
>> +{
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> +                     from < 0 || to < 0) {
>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> +                         from, to, distance);
>> +             return;
>> +     }
>> +
>> +     if ((u8)distance != distance ||
>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> +                          from, to, distance);
>> +             return;
>> +     }
>> +
>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> +}
>> +EXPORT_SYMBOL(numa_set_distance);
>> +
>> +int __node_distance(int from, int to)
>> +{
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +     return numa_distance[from * numa_distance_cnt + to];
>> +}
>> +EXPORT_SYMBOL(__node_distance);
>> +
>> +static int __init numa_register_nodes(void)
>> +{
>> +     int nid;
>> +     struct memblock_region *mblk;
>> +
>> +     /* Check that valid nid is set to memblks */
>> +     for_each_memblock(memory, mblk)
>> +             if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
>> +                     return -EINVAL;
>> +
>> +     /* Finally register nodes. */
>> +     for_each_node_mask(nid, numa_nodes_parsed) {
>> +             unsigned long start_pfn, end_pfn;
>> +
>> +             get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>> +             setup_node_data(nid, start_pfn, end_pfn);
>> +             node_set_online(nid);
>> +     }
>> +
>> +     /* Setup online nodes to actual nodes*/
>> +     node_possible_map = numa_nodes_parsed;
>> +
>> +     /* Dump memblock with node info and return. */
>> +     memblock_dump_all();
>> +     return 0;
>> +}
>> +
>> +static int __init numa_init(int (*init_func)(void))
>> +{
>> +     int ret;
>> +
>> +     nodes_clear(numa_nodes_parsed);
>> +     nodes_clear(node_possible_map);
>> +     nodes_clear(node_online_map);
>> +     numa_reset_distance();
>> +
>> +     ret = init_func();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     if (nodes_empty(numa_nodes_parsed))
>> +             return -EINVAL;
>> +
>> +     ret = numa_register_nodes();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     ret = numa_alloc_distance();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     setup_node_to_cpumask_map();
>> +
>> +     /* init boot processor */
>> +     cpu_to_node_map[0] = 0;
>> +     map_cpu_to_node(0, 0);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>> + */
>> +static int __init dummy_numa_init(void)
>> +{
>> +     struct memblock_region *mblk;
>> +
>> +     pr_info("%s\n", "No NUMA configuration found");
>> +     pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> +            0LLU, PFN_PHYS(max_pfn) - 1);
>> +     for_each_memblock(memory, mblk)
>> +             numa_add_memblk(0, mblk->base, mblk->size);
>> +     numa_off = 1;
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * arm64_numa_init - Initialize NUMA
>> + *
>> + * Try each configured NUMA initialization method until one succeeds.  The
>> + * last fallback is dummy single node config encomapssing whole memory and
>> + * never fails.
>> + */
>> +void __init arm64_numa_init(void)
>> +{
>> +     numa_init(dummy_numa_init);
>> +}
>>
>
> --
> Shannon
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-12-01  8:45           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-01  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 27, 2015 at 1:30 PM, Shannon Zhao <zhaoshenglong@huawei.com> wrote:
>
>
> On 2015/11/18 1:20, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This patch adds by default the dummy numa node and
>> maps all memory and cpus to node 0.
>> using this patch, numa can be simulated on single node arm64 platforms.
>>
>> Reviewed-by: Robert Richter <rrichter@cavium.com>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
>
> I've tested this patch on QEMU VM.
>
> Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
thanks a lot!
>> ---
>>  arch/arm64/Kconfig              |  25 +++
>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>  arch/arm64/include/asm/numa.h   |  47 +++++
>>  arch/arm64/kernel/setup.c       |   4 +
>>  arch/arm64/kernel/smp.c         |   2 +
>>  arch/arm64/mm/Makefile          |   1 +
>>  arch/arm64/mm/init.c            |  30 +++-
>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 506 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>  create mode 100644 arch/arm64/include/asm/numa.h
>>  create mode 100644 arch/arm64/mm/numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 9ac16a4..7d8fb42 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -71,6 +71,7 @@ config ARM64
>>       select HAVE_GENERIC_DMA_COHERENT
>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>       select HAVE_MEMBLOCK
>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>       select HAVE_PATA_PLATFORM
>>       select HAVE_PERF_EVENTS
>>       select HAVE_PERF_REGS
>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>         can be controlled through /sys/devices/system/cpu.
>>
>> +# Common NUMA Features
>> +config NUMA
>> +     bool "Numa Memory Allocation and Scheduler Support"
>> +     depends on SMP
>> +     help
>> +       Enable NUMA (Non Uniform Memory Access) support.
>> +
>> +       The kernel will try to allocate memory used by a CPU on the
>> +       local memory controller of the CPU and add some more
>> +       NUMA awareness to the kernel.
>> +
>> +config NODES_SHIFT
>> +     int "Maximum NUMA Nodes (as a power of 2)"
>> +     range 1 10
>> +     default "2"
>> +     depends on NEED_MULTIPLE_NODES
>> +     help
>> +       Specify the maximum number of NUMA Nodes available on the target
>> +       system.  Increases memory reserved to accommodate various tables.
>> +
>> +config USE_PERCPU_NUMA_NODE_ID
>> +     def_bool y
>> +     depends on NUMA
>> +
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> new file mode 100644
>> index 0000000..6ddd468
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mmzone.h
>> @@ -0,0 +1,17 @@
>> +#ifndef __ASM_ARM64_MMZONE_H_
>> +#define __ASM_ARM64_MMZONE_H_
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#include <linux/mmdebug.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/smp.h>
>> +#include <asm/numa.h>
>> +
>> +extern struct pglist_data *node_data[];
>> +
>> +#define NODE_DATA(nid)               (node_data[(nid)])
>> +
>> +#endif /* CONFIG_NUMA */
>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> new file mode 100644
>> index 0000000..c00f3a4
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -0,0 +1,47 @@
>> +#ifndef _ASM_NUMA_H
>> +#define _ASM_NUMA_H
>> +
>> +#include <linux/nodemask.h>
>> +#include <asm/topology.h>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>> +
>> +/* currently, arm64 implements flat NUMA topology */
>> +#define parent_node(node)    (node)
>> +
>> +extern int __node_distance(int from, int to);
>> +#define node_distance(a, b) __node_distance(a, b)
>> +
>> +/* dummy definitions for pci functions */
>> +#define pcibus_to_node(node) 0
>> +#define cpumask_of_pcibus(bus)       0
>> +
>> +extern int cpu_to_node_map[NR_CPUS];
>> +extern nodemask_t numa_nodes_parsed __initdata;
>> +
>> +/* Mappings between node number and cpus on that node. */
>> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +extern void numa_clear_node(unsigned int cpu);
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +extern const struct cpumask *cpumask_of_node(int node);
>> +#else
>> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>> +static inline const struct cpumask *cpumask_of_node(int node)
>> +{
>> +     return node_to_cpumask_map[node];
>> +}
>> +#endif
>> +
>> +void __init arm64_numa_init(void);
>> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
>> +void __init numa_set_distance(int from, int to, int distance);
>> +void __init numa_reset_distance(void);
>> +void numa_store_cpu_info(unsigned int cpu);
>> +#else        /* CONFIG_NUMA */
>> +static inline void numa_store_cpu_info(unsigned int cpu)             { }
>> +static inline void arm64_numa_init(void)             { }
>> +#endif       /* CONFIG_NUMA */
>> +#endif       /* _ASM_NUMA_H */
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 8119479..d9b9761 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -53,6 +53,7 @@
>>  #include <asm/cpufeature.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/kasan.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/smp_plat.h>
>> @@ -372,6 +373,9 @@ static int __init topology_init(void)
>>  {
>>       int i;
>>
>> +     for_each_online_node(i)
>> +             register_one_node(i);
>> +
>>       for_each_possible_cpu(i) {
>>               struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>>               cpu->hotpluggable = 1;
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index b1adc51..d6e7d6a 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -45,6 +45,7 @@
>>  #include <asm/cputype.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/mmu_context.h>
>> +#include <asm/numa.h>
>>  #include <asm/pgtable.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/processor.h>
>> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>  static void smp_store_cpu_info(unsigned int cpuid)
>>  {
>>       store_cpu_topology(cpuid);
>> +     numa_store_cpu_info(cpuid);
>>  }
>>
>>  /*
>> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
>> index 57f57fd..2e57922 100644
>> --- a/arch/arm64/mm/Makefile
>> +++ b/arch/arm64/mm/Makefile
>> @@ -6,4 +6,5 @@ obj-$(CONFIG_HUGETLB_PAGE)    += hugetlbpage.o
>>  obj-$(CONFIG_ARM64_PTDUMP)   += dump.o
>>
>>  obj-$(CONFIG_KASAN)          += kasan_init.o
>> +obj-$(CONFIG_NUMA)           += numa.o
>>  KASAN_SANITIZE_kasan_init.o  := n
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 17bf39a..8dc9c5d 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -37,6 +37,7 @@
>>
>>  #include <asm/fixmap.h>
>>  #include <asm/memory.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/sizes.h>
>> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>  }
>>
>> +#ifdef CONFIG_NUMA
>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>> +{
>> +     unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>> +
>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>> +     max_zone_pfns[ZONE_NORMAL] = max;
>> +
>> +     free_area_init_nodes(max_zone_pfns);
>> +}
>> +
>> +#else
>>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>  {
>>       struct memblock_region *reg;
>> @@ -116,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>
>>       free_area_init_node(0, zone_size, min, zhole_size);
>>  }
>> +#endif /* CONFIG_NUMA */
>>
>>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>  int pfn_valid(unsigned long pfn)
>> @@ -133,10 +148,15 @@ static void arm64_memory_present(void)
>>  static void arm64_memory_present(void)
>>  {
>>       struct memblock_region *reg;
>> +     int nid = 0;
>>
>> -     for_each_memblock(memory, reg)
>> -             memory_present(0, memblock_region_memory_base_pfn(reg),
>> -                            memblock_region_memory_end_pfn(reg));
>> +     for_each_memblock(memory, reg) {
>> +#ifdef CONFIG_NUMA
>> +             nid = reg->nid;
>> +#endif
>> +             memory_present(nid, memblock_region_memory_base_pfn(reg),
>> +                             memblock_region_memory_end_pfn(reg));
>> +     }
>>  }
>>  #endif
>>
>> @@ -193,6 +213,9 @@ void __init bootmem_init(void)
>>
>>       early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>>
>> +     max_pfn = max_low_pfn = max;
>> +
>> +     arm64_numa_init();
>>       /*
>>        * Sparsemem tries to allocate bootmem in memory_present(), so must be
>>        * done after the fixed reservations.
>> @@ -203,7 +226,6 @@ void __init bootmem_init(void)
>>       zone_sizes_init(min, max);
>>
>>       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> -     max_pfn = max_low_pfn = max;
>>  }
>>
>>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> new file mode 100644
>> index 0000000..e3afbf8
>> --- /dev/null
>> +++ b/arch/arm64/mm/numa.c
>> @@ -0,0 +1,384 @@
>> +/*
>> + * NUMA support, based on the x86 implementation.
>> + *
>> + * Copyright (C) 2015 Cavium Inc.
>> + * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/bootmem.h>
>> +#include <linux/ctype.h>
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/memblock.h>
>> +#include <linux/module.h>
>> +#include <linux/mmzone.h>
>> +#include <linux/nodemask.h>
>> +#include <linux/sched.h>
>> +#include <linux/string.h>
>> +#include <linux/topology.h>
>> +
>> +#include <asm/smp_plat.h>
>> +
>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>> +EXPORT_SYMBOL(node_data);
>> +nodemask_t numa_nodes_parsed __initdata;
>> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
>> +
>> +static int numa_off;
>> +static int numa_distance_cnt;
>> +static u8 *numa_distance;
>> +
>> +static __init int numa_parse_early_param(char *opt)
>> +{
>> +     if (!opt)
>> +             return -EINVAL;
>> +     if (!strncmp(opt, "off", 3)) {
>> +             pr_info("%s\n", "NUMA turned off");
>> +             numa_off = 1;
>> +     }
>> +     return 0;
>> +}
>> +early_param("numa", numa_parse_early_param);
>> +
>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +EXPORT_SYMBOL(node_to_cpumask_map);
>> +
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +/*
>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> + */
>> +const struct cpumask *cpumask_of_node(int node)
>> +{
>> +
>> +     if (WARN_ON(node >= nr_node_ids))
>> +             return cpu_none_mask;
>> +
>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> +             return cpu_online_mask;
>> +
>> +     return node_to_cpumask_map[node];
>> +}
>> +EXPORT_SYMBOL(cpumask_of_node);
>> +#endif
>> +
>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> +{
>> +     set_cpu_numa_node(cpu, nid);
>> +     if (nid >= 0)
>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> +}
>> +
>> +static void unmap_cpu_to_node(unsigned int cpu)
>> +{
>> +     int nid = cpu_to_node(cpu);
>> +
>> +     if (nid >= 0)
>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +}
>> +
>> +void numa_clear_node(unsigned int cpu)
>> +{
>> +     unmap_cpu_to_node(cpu);
>> +}
>> +
>> +/*
>> + * Allocate node_to_cpumask_map based on number of available nodes
>> + * Requires node_possible_map to be valid.
>> + *
>> + * Note: cpumask_of_node() is not valid until after this is done.
>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> + */
>> +static void __init setup_node_to_cpumask_map(void)
>> +{
>> +     unsigned int cpu;
>> +     int node;
>> +
>> +     /* setup nr_node_ids if not done yet */
>> +     if (nr_node_ids == MAX_NUMNODES)
>> +             setup_nr_node_ids();
>> +
>> +     /* allocate and clear the mapping */
>> +     for (node = 0; node < nr_node_ids; node++) {
>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> +             cpumask_clear(node_to_cpumask_map[node]);
>> +     }
>> +
>> +     for_each_possible_cpu(cpu)
>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +
>> +     /* cpumask_of_node() will now work */
>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> +}
>> +
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(unsigned int cpu)
>> +{
>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> +}
>> +
>> +/**
>> + * numa_add_memblk - Set node id to memblk
>> + * @nid: NUMA node ID of the new memblk
>> + * @start: Start address of the new memblk
>> + * @size:  Size of the new memblk
>> + *
>> + * RETURNS:
>> + * 0 on success, -errno on failure.
>> + */
>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>> +{
>> +     int ret;
>> +
>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>> +     if (ret < 0) {
>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +             return ret;
>> +     }
>> +
>> +     node_set(nid, numa_nodes_parsed);
>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(numa_add_memblk);
>> +
>> +/* Initialize NODE_DATA for a node on the local memory */
>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> +{
>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>> +     u64 nd_pa;
>> +     void *nd;
>> +     int tnid;
>> +
>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +                     nid, start_pfn << PAGE_SHIFT,
>> +                     (end_pfn << PAGE_SHIFT) - 1);
>> +
>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +     nd = __va(nd_pa);
>> +
>> +     /* report and initialize */
>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +             nd_pa, nd_pa + nd_size - 1);
>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> +     if (tnid != nid)
>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> +
>> +     node_data[nid] = nd;
>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> +     NODE_DATA(nid)->node_id = nid;
>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>> +}
>> +
>> +/**
>> + * numa_reset_distance - Reset NUMA distance table
>> + *
>> + * The current table is freed.
>> + * The next numa_set_distance() call will create a new one.
>> + */
>> +void __init numa_reset_distance(void)
>> +{
>> +     size_t size;
>> +
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     size = numa_distance_cnt * numa_distance_cnt *
>> +             sizeof(numa_distance[0]);
>> +
>> +     memblock_free(__pa(numa_distance), size);
>> +     numa_distance_cnt = 0;
>> +     numa_distance = NULL;
>> +}
>> +
>> +static int __init numa_alloc_distance(void)
>> +{
>> +     size_t size;
>> +     u64 phys;
>> +     int i, j;
>> +
>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> +                                   size, PAGE_SIZE);
>> +     if (WARN_ON(!phys))
>> +             return -ENOMEM;
>> +
>> +     memblock_reserve(phys, size);
>> +
>> +     numa_distance = __va(phys);
>> +     numa_distance_cnt = nr_node_ids;
>> +
>> +     /* fill with the default distances */
>> +     for (i = 0; i < numa_distance_cnt; i++)
>> +             for (j = 0; j < numa_distance_cnt; j++)
>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +
>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> +                     numa_distance_cnt);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> + * @from: the 'from' node to set distance
>> + * @to: the 'to'  node to set distance
>> + * @distance: NUMA distance
>> + *
>> + * Set the distance from node @from to @to to @distance.  If distance table
>> + * doesn't exist, one which is large enough to accommodate all the currently
>> + * known nodes will be created.
>> + *
>> + * If such table cannot be allocated, a warning is printed and further
>> + * calls are ignored until the distance table is reset with
>> + * numa_reset_distance().
>> + *
>> + * If @from or @to is higher than the highest known node or lower than zero
>> + * at the time of table creation or @distance doesn't make sense, the call
>> + * is ignored.
>> + * This is to allow simplification of specific NUMA config implementations.
>> + */
>> +void __init numa_set_distance(int from, int to, int distance)
>> +{
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> +                     from < 0 || to < 0) {
>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> +                         from, to, distance);
>> +             return;
>> +     }
>> +
>> +     if ((u8)distance != distance ||
>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> +                          from, to, distance);
>> +             return;
>> +     }
>> +
>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> +}
>> +EXPORT_SYMBOL(numa_set_distance);
>> +
>> +int __node_distance(int from, int to)
>> +{
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +     return numa_distance[from * numa_distance_cnt + to];
>> +}
>> +EXPORT_SYMBOL(__node_distance);
>> +
>> +static int __init numa_register_nodes(void)
>> +{
>> +     int nid;
>> +     struct memblock_region *mblk;
>> +
>> +     /* Check that valid nid is set to memblks */
>> +     for_each_memblock(memory, mblk)
>> +             if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
>> +                     return -EINVAL;
>> +
>> +     /* Finally register nodes. */
>> +     for_each_node_mask(nid, numa_nodes_parsed) {
>> +             unsigned long start_pfn, end_pfn;
>> +
>> +             get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>> +             setup_node_data(nid, start_pfn, end_pfn);
>> +             node_set_online(nid);
>> +     }
>> +
>> +     /* Setup online nodes to actual nodes*/
>> +     node_possible_map = numa_nodes_parsed;
>> +
>> +     /* Dump memblock with node info and return. */
>> +     memblock_dump_all();
>> +     return 0;
>> +}
>> +
>> +static int __init numa_init(int (*init_func)(void))
>> +{
>> +     int ret;
>> +
>> +     nodes_clear(numa_nodes_parsed);
>> +     nodes_clear(node_possible_map);
>> +     nodes_clear(node_online_map);
>> +     numa_reset_distance();
>> +
>> +     ret = init_func();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     if (nodes_empty(numa_nodes_parsed))
>> +             return -EINVAL;
>> +
>> +     ret = numa_register_nodes();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     ret = numa_alloc_distance();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     setup_node_to_cpumask_map();
>> +
>> +     /* init boot processor */
>> +     cpu_to_node_map[0] = 0;
>> +     map_cpu_to_node(0, 0);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>> + */
>> +static int __init dummy_numa_init(void)
>> +{
>> +     struct memblock_region *mblk;
>> +
>> +     pr_info("%s\n", "No NUMA configuration found");
>> +     pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> +            0LLU, PFN_PHYS(max_pfn) - 1);
>> +     for_each_memblock(memory, mblk)
>> +             numa_add_memblk(0, mblk->base, mblk->size);
>> +     numa_off = 1;
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * arm64_numa_init - Initialize NUMA
>> + *
>> + * Try each configured NUMA initialization method until one succeeds.  The
>> + * last fallback is dummy single node config encomapssing whole memory and
>> + * never fails.
>> + */
>> +void __init arm64_numa_init(void)
>> +{
>> +     numa_init(dummy_numa_init);
>> +}
>>
>
> --
> Shannon
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 0/4] arm64, numa: Add numa support for arm64 platforms
  2015-11-17 17:20 ` Ganapatrao Kulkarni
@ 2015-12-02 11:19     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-02 11:19 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Rafael J. Wysocki,
	Len Brown, Marc Zyngier, Robert Richter

Hi All, Mark R,

please review!!

thanks,
Ganapat

On Tue, Nov 17, 2015 at 10:50 PM, Ganapatrao Kulkarni
<gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
> v7:
>         - managing numa memory mapping using memblock.
>         - Incorporated review comments of Mark Rutland.
>
> v6:
>         - defined and implemented the numa dt binding using
>         node property proximity and device node distance-map.
>         - renamed dt_numa to of_numa
>
> v5:
>         - created base verion of numa.c which creates dummy numa without using dt
>           on single socket platforms. Then added patches for dt support.
>         - Incorporated review comments from Hanjun Guo.
>
> v4:
> done changes as per Arnd review comments.
>
> v3:
> Added changes to support numa on arm64 based platforms.
> Tested these patches on cavium's multinode(2 node topology) platform.
> In this patchset, defined and implemented dt bindings for numa mapping
> for core and memory using device node property arm,associativity.
>
> v2:
> Defined and implemented numa map for memory, cores to node and
> proximity distance matrix of nodes.
>
> v1:
> Initial patchset to support numa on arm64 platforms.
>
> Note:
>         1. This patchset is tested for numa with dt on
>            thunderx single socket and dual socket boards.
>         2. Numa DT booting needs the dt memory nodes, which are deleted in current efi-stub,
>         hence to try numa with dt, you need to rebase with ard's patchset.
>         http://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-uefi-early-fdt-handling
>
> Ganapatrao Kulkarni (4):
>   arm64, numa: adding numa support for arm64 platforms.
>   Documentation, dt, arm64/arm: dt bindings for numa.
>   arm64/arm, numa, dt: adding numa dt binding implementation for arm64
>     platforms.
>   arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
>     topology.
>
>  Documentation/devicetree/bindings/arm/numa.txt  | 272 ++++++++
>  arch/arm64/Kconfig                              |  35 +
>  arch/arm64/boot/dts/cavium/Makefile             |   2 +-
>  arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
>  arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
>  arch/arm64/include/asm/mmzone.h                 |  17 +
>  arch/arm64/include/asm/numa.h                   |  56 ++
>  arch/arm64/kernel/Makefile                      |   1 +
>  arch/arm64/kernel/of_numa.c                     | 265 ++++++++
>  arch/arm64/kernel/setup.c                       |   4 +
>  arch/arm64/kernel/smp.c                         |   4 +
>  arch/arm64/mm/Makefile                          |   1 +
>  arch/arm64/mm/init.c                            |  30 +-
>  arch/arm64/mm/numa.c                            | 392 ++++++++++++
>  14 files changed, 1963 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>  create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
>  create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/kernel/of_numa.c
>  create mode 100644 arch/arm64/mm/numa.c
>
> --
> 1.8.1.4
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 0/4] arm64, numa: Add numa support for arm64 platforms
@ 2015-12-02 11:19     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-02 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All, Mark R,

please review!!

thanks,
Ganapat

On Tue, Nov 17, 2015 at 10:50 PM, Ganapatrao Kulkarni
<gkulkarni@caviumnetworks.com> wrote:
> v7:
>         - managing numa memory mapping using memblock.
>         - Incorporated review comments of Mark Rutland.
>
> v6:
>         - defined and implemented the numa dt binding using
>         node property proximity and device node distance-map.
>         - renamed dt_numa to of_numa
>
> v5:
>         - created base verion of numa.c which creates dummy numa without using dt
>           on single socket platforms. Then added patches for dt support.
>         - Incorporated review comments from Hanjun Guo.
>
> v4:
> done changes as per Arnd review comments.
>
> v3:
> Added changes to support numa on arm64 based platforms.
> Tested these patches on cavium's multinode(2 node topology) platform.
> In this patchset, defined and implemented dt bindings for numa mapping
> for core and memory using device node property arm,associativity.
>
> v2:
> Defined and implemented numa map for memory, cores to node and
> proximity distance matrix of nodes.
>
> v1:
> Initial patchset to support numa on arm64 platforms.
>
> Note:
>         1. This patchset is tested for numa with dt on
>            thunderx single socket and dual socket boards.
>         2. Numa DT booting needs the dt memory nodes, which are deleted in current efi-stub,
>         hence to try numa with dt, you need to rebase with ard's patchset.
>         http://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-uefi-early-fdt-handling
>
> Ganapatrao Kulkarni (4):
>   arm64, numa: adding numa support for arm64 platforms.
>   Documentation, dt, arm64/arm: dt bindings for numa.
>   arm64/arm, numa, dt: adding numa dt binding implementation for arm64
>     platforms.
>   arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
>     topology.
>
>  Documentation/devicetree/bindings/arm/numa.txt  | 272 ++++++++
>  arch/arm64/Kconfig                              |  35 +
>  arch/arm64/boot/dts/cavium/Makefile             |   2 +-
>  arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
>  arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
>  arch/arm64/include/asm/mmzone.h                 |  17 +
>  arch/arm64/include/asm/numa.h                   |  56 ++
>  arch/arm64/kernel/Makefile                      |   1 +
>  arch/arm64/kernel/of_numa.c                     | 265 ++++++++
>  arch/arm64/kernel/setup.c                       |   4 +
>  arch/arm64/kernel/smp.c                         |   4 +
>  arch/arm64/mm/Makefile                          |   1 +
>  arch/arm64/mm/init.c                            |  30 +-
>  arch/arm64/mm/numa.c                            | 392 ++++++++++++
>  14 files changed, 1963 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>  create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
>  create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/kernel/of_numa.c
>  create mode 100644 arch/arm64/mm/numa.c
>
> --
> 1.8.1.4
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-11-17 17:20   ` Ganapatrao Kulkarni
@ 2015-12-11 13:53       ` Mark Rutland
  -1 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2015-12-11 13:53 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Hi,

On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
> DT bindings for numa mapping of memory, cores and IOs.
> 
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>

Overall this looks good to me. However, I have a couple of concerns.

> ---
>  Documentation/devicetree/bindings/arm/numa.txt | 272 +++++++++++++++++++++++++
>  1 file changed, 272 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
> new file mode 100644
> index 0000000..b87bf4f
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/numa.txt
> @@ -0,0 +1,272 @@
> +==============================================================================
> +NUMA binding description.
> +==============================================================================
> +
> +==============================================================================
> +1 - Introduction
> +==============================================================================
> +
> +Systems employing a Non Uniform Memory Access (NUMA) architecture contain
> +collections of hardware resources including processors, memory, and I/O buses,
> +that comprise what is commonly known as a NUMA node.
> +Processor accesses to memory within the local NUMA node is generally faster
> +than processor accesses to memory outside of the local NUMA node.
> +DT defines interfaces that allow the platform to convey NUMA node
> +topology information to OS.
> +
> +==============================================================================
> +2 - numa-node-id
> +==============================================================================
> +The device node property numa-node-id describes numa domains within a
> +machine. This property can be used in device nodes like cpu, memory, bus and
> +devices to map to respective numa nodes.
> +
> +numa-node-id property is a 32-bit integer which defines numa node id to which
> +this device node has numa domain association.

I'd prefer if the above two paragraphs were replaced with:

	For the purpose of identification, each NUMA node is associated
	with a unique token known as a node id. For the purpose of this
	binding a node id is a 32-bit integer.

	A device node is associated with a NUMA node by the presence of
	a numa-node-id property which contains the node id of the
	device.

> +
> +Example:
> +	/* numa node 0 */
> +	numa-node-id = <0>;
> +
> +	/* numa node 1 */
> +	numa-node-id = <1>;
> +
> +==============================================================================
> +3 - distance-map
> +==============================================================================
> +
> +The device tree node distance-map describes the relative
> +distance (memory latency) between all numa nodes.

Is this not a combined approximation for latency and bandwidth?

> +- compatible : Should at least contain "numa,distance-map-v1".

Please use "numa-distance-map-v1", as "numa" is not a vendor.

> +- distance-matrix
> +  This property defines a matrix to describe the relative distances
> +  between all numa nodes.
> +  It is represented as a list of node pairs and their relative distance.
> +
> +  Note:
> +	1. Each entry represents distance from first node to second node.
> +	2. If both directions between 2 nodes have the same distance, only
> +	       one entry is required.

I still don't understand what direction means in this context. Are there
systems (of any architecture) which don't have symmetric distances?
Which accesses does this apply differently to?

Given that, I think that it might be best to explicitly call out
distances as being equal, and leave any directionality for a later
revision of the binding when we have some semantics for directionality.

> +	2. distance-matrix shold have entries in lexicographical ascending order of nodes.
> +	3. There must be only one Device node distance-map and must reside in the root node.
> +
> +Example:
> +	4 nodes connected in mesh/ring topology as below,
> +
> +		0_______20______1
> +		|               |
> +		|               |
> +	      20|               |20
> +		|               |
> +		|               |
> +		|_______________|
> +		3       20      2
> +
> +	if relative distance for each hop is 20,
> +	then inter node distance would be for this topology will be,
> +	      0 -> 1 = 20
> +	      1 -> 2 = 20
> +	      2 -> 3 = 20
> +	      3 -> 0 = 20
> +	      0 -> 2 = 40
> +	      1 -> 3 = 40

How is this scaled relative to a local access?

Do we assume that a local access has value 1, e.g. each hop takes 20x a
local access in this example?

Do we need a finer-grained scale (e.g. to allow us to represent a
distance of 2.5)? The ACPI SLIT spec seems to give local accesses a
value 10 implicitly to this end.

Other than those points, I'm happy with this binding.

Thanks,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
@ 2015-12-11 13:53       ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2015-12-11 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
> DT bindings for numa mapping of memory, cores and IOs.
> 
> Reviewed-by: Robert Richter <rrichter@cavium.com>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>

Overall this looks good to me. However, I have a couple of concerns.

> ---
>  Documentation/devicetree/bindings/arm/numa.txt | 272 +++++++++++++++++++++++++
>  1 file changed, 272 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
> new file mode 100644
> index 0000000..b87bf4f
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/numa.txt
> @@ -0,0 +1,272 @@
> +==============================================================================
> +NUMA binding description.
> +==============================================================================
> +
> +==============================================================================
> +1 - Introduction
> +==============================================================================
> +
> +Systems employing a Non Uniform Memory Access (NUMA) architecture contain
> +collections of hardware resources including processors, memory, and I/O buses,
> +that comprise what is commonly known as a NUMA node.
> +Processor accesses to memory within the local NUMA node is generally faster
> +than processor accesses to memory outside of the local NUMA node.
> +DT defines interfaces that allow the platform to convey NUMA node
> +topology information to OS.
> +
> +==============================================================================
> +2 - numa-node-id
> +==============================================================================
> +The device node property numa-node-id describes numa domains within a
> +machine. This property can be used in device nodes like cpu, memory, bus and
> +devices to map to respective numa nodes.
> +
> +numa-node-id property is a 32-bit integer which defines numa node id to which
> +this device node has numa domain association.

I'd prefer if the above two paragraphs were replaced with:

	For the purpose of identification, each NUMA node is associated
	with a unique token known as a node id. For the purpose of this
	binding a node id is a 32-bit integer.

	A device node is associated with a NUMA node by the presence of
	a numa-node-id property which contains the node id of the
	device.

> +
> +Example:
> +	/* numa node 0 */
> +	numa-node-id = <0>;
> +
> +	/* numa node 1 */
> +	numa-node-id = <1>;
> +
> +==============================================================================
> +3 - distance-map
> +==============================================================================
> +
> +The device tree node distance-map describes the relative
> +distance (memory latency) between all numa nodes.

Is this not a combined approximation for latency and bandwidth?

> +- compatible : Should at least contain "numa,distance-map-v1".

Please use "numa-distance-map-v1", as "numa" is not a vendor.

> +- distance-matrix
> +  This property defines a matrix to describe the relative distances
> +  between all numa nodes.
> +  It is represented as a list of node pairs and their relative distance.
> +
> +  Note:
> +	1. Each entry represents distance from first node to second node.
> +	2. If both directions between 2 nodes have the same distance, only
> +	       one entry is required.

I still don't understand what direction means in this context. Are there
systems (of any architecture) which don't have symmetric distances?
Which accesses does this apply differently to?

Given that, I think that it might be best to explicitly call out
distances as being equal, and leave any directionality for a later
revision of the binding when we have some semantics for directionality.

> +	2. distance-matrix shold have entries in lexicographical ascending order of nodes.
> +	3. There must be only one Device node distance-map and must reside in the root node.
> +
> +Example:
> +	4 nodes connected in mesh/ring topology as below,
> +
> +		0_______20______1
> +		|               |
> +		|               |
> +	      20|               |20
> +		|               |
> +		|               |
> +		|_______________|
> +		3       20      2
> +
> +	if relative distance for each hop is 20,
> +	then inter node distance would be for this topology will be,
> +	      0 -> 1 = 20
> +	      1 -> 2 = 20
> +	      2 -> 3 = 20
> +	      3 -> 0 = 20
> +	      0 -> 2 = 40
> +	      1 -> 3 = 40

How is this scaled relative to a local access?

Do we assume that a local access has value 1, e.g. each hop takes 20x a
local access in this example?

Do we need a finer-grained scale (e.g. to allow us to represent a
distance of 2.5)? The ACPI SLIT spec seems to give local accesses a
value 10 implicitly to this end.

Other than those points, I'm happy with this binding.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-12-11 13:53       ` Mark Rutland
@ 2015-12-11 14:41         ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-11 14:41 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki, Len Brown,
	Marc Zyngier, Robert Richter

On Fri, Dec 11, 2015 at 7:23 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> Hi,
>
> On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
>> DT bindings for numa mapping of memory, cores and IOs.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>
> Overall this looks good to me. However, I have a couple of concerns.
thanks.
>
>> ---
>>  Documentation/devicetree/bindings/arm/numa.txt | 272 +++++++++++++++++++++++++
>>  1 file changed, 272 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>>
>> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
>> new file mode 100644
>> index 0000000..b87bf4f
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/numa.txt
>> @@ -0,0 +1,272 @@
>> +==============================================================================
>> +NUMA binding description.
>> +==============================================================================
>> +
>> +==============================================================================
>> +1 - Introduction
>> +==============================================================================
>> +
>> +Systems employing a Non Uniform Memory Access (NUMA) architecture contain
>> +collections of hardware resources including processors, memory, and I/O buses,
>> +that comprise what is commonly known as a NUMA node.
>> +Processor accesses to memory within the local NUMA node is generally faster
>> +than processor accesses to memory outside of the local NUMA node.
>> +DT defines interfaces that allow the platform to convey NUMA node
>> +topology information to OS.
>> +
>> +==============================================================================
>> +2 - numa-node-id
>> +==============================================================================
>> +The device node property numa-node-id describes numa domains within a
>> +machine. This property can be used in device nodes like cpu, memory, bus and
>> +devices to map to respective numa nodes.
>> +
>> +numa-node-id property is a 32-bit integer which defines numa node id to which
>> +this device node has numa domain association.
>
> I'd prefer if the above two paragraphs were replaced with:
>
>         For the purpose of identification, each NUMA node is associated
>         with a unique token known as a node id. For the purpose of this
>         binding a node id is a 32-bit integer.
>
>         A device node is associated with a NUMA node by the presence of
>         a numa-node-id property which contains the node id of the
>         device.
ok, will do.
>
>> +
>> +Example:
>> +     /* numa node 0 */
>> +     numa-node-id = <0>;
>> +
>> +     /* numa node 1 */
>> +     numa-node-id = <1>;
>> +
>> +==============================================================================
>> +3 - distance-map
>> +==============================================================================
>> +
>> +The device tree node distance-map describes the relative
>> +distance (memory latency) between all numa nodes.
>
> Is this not a combined approximation for latency and bandwidth?
AFAIK, it is to represent inter-node memory access latency.
>
>> +- compatible : Should at least contain "numa,distance-map-v1".
>
> Please use "numa-distance-map-v1", as "numa" is not a vendor.
ok
>
>> +- distance-matrix
>> +  This property defines a matrix to describe the relative distances
>> +  between all numa nodes.
>> +  It is represented as a list of node pairs and their relative distance.
>> +
>> +  Note:
>> +     1. Each entry represents distance from first node to second node.
>> +     2. If both directions between 2 nodes have the same distance, only
>> +            one entry is required.
>
> I still don't understand what direction means in this context. Are there
> systems (of any architecture) which don't have symmetric distances?
> Which accesses does this apply differently to?
>
> Given that, I think that it might be best to explicitly call out
> distances as being equal, and leave any directionality for a later
> revision of the binding when we have some semantics for directionality.
agreed, given that there is no know system to substantiate dual direction,
let us not explicit about direction.
>
>> +     2. distance-matrix shold have entries in lexicographical ascending order of nodes.
>> +     3. There must be only one Device node distance-map and must reside in the root node.
>> +
>> +Example:
>> +     4 nodes connected in mesh/ring topology as below,
>> +
>> +             0_______20______1
>> +             |               |
>> +             |               |
>> +           20|               |20
>> +             |               |
>> +             |               |
>> +             |_______________|
>> +             3       20      2
>> +
>> +     if relative distance for each hop is 20,
>> +     then inter node distance would be for this topology will be,
>> +           0 -> 1 = 20
>> +           1 -> 2 = 20
>> +           2 -> 3 = 20
>> +           3 -> 0 = 20
>> +           0 -> 2 = 40
>> +           1 -> 3 = 40
>
> How is this scaled relative to a local access?
this is based on representing local distance with 10 and
all inter-node latency being represented as multiple of 10.

>
> Do we assume that a local access has value 1, e.g. each hop takes 20x a
> local access in this example?
The local distance is represented as 10, this is fixed and same as in ACPI.
Inter-node distance can be any number greater than 10.
this information can be added here to make it clear.
>
> Do we need a finer-grained scale (e.g. to allow us to represent a
> distance of 2.5)? The ACPI SLIT spec seems to give local accesses a
> value 10 implicitly to this end.
yes, same as ACPI, local node is 10.
>
> Other than those points, I'm happy with this binding.
>
> Thanks,
> Mark.
>
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
@ 2015-12-11 14:41         ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-11 14:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 11, 2015 at 7:23 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
>> DT bindings for numa mapping of memory, cores and IOs.
>>
>> Reviewed-by: Robert Richter <rrichter@cavium.com>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
>
> Overall this looks good to me. However, I have a couple of concerns.
thanks.
>
>> ---
>>  Documentation/devicetree/bindings/arm/numa.txt | 272 +++++++++++++++++++++++++
>>  1 file changed, 272 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>>
>> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
>> new file mode 100644
>> index 0000000..b87bf4f
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/numa.txt
>> @@ -0,0 +1,272 @@
>> +==============================================================================
>> +NUMA binding description.
>> +==============================================================================
>> +
>> +==============================================================================
>> +1 - Introduction
>> +==============================================================================
>> +
>> +Systems employing a Non Uniform Memory Access (NUMA) architecture contain
>> +collections of hardware resources including processors, memory, and I/O buses,
>> +that comprise what is commonly known as a NUMA node.
>> +Processor accesses to memory within the local NUMA node is generally faster
>> +than processor accesses to memory outside of the local NUMA node.
>> +DT defines interfaces that allow the platform to convey NUMA node
>> +topology information to OS.
>> +
>> +==============================================================================
>> +2 - numa-node-id
>> +==============================================================================
>> +The device node property numa-node-id describes numa domains within a
>> +machine. This property can be used in device nodes like cpu, memory, bus and
>> +devices to map to respective numa nodes.
>> +
>> +numa-node-id property is a 32-bit integer which defines numa node id to which
>> +this device node has numa domain association.
>
> I'd prefer if the above two paragraphs were replaced with:
>
>         For the purpose of identification, each NUMA node is associated
>         with a unique token known as a node id. For the purpose of this
>         binding a node id is a 32-bit integer.
>
>         A device node is associated with a NUMA node by the presence of
>         a numa-node-id property which contains the node id of the
>         device.
ok, will do.
>
>> +
>> +Example:
>> +     /* numa node 0 */
>> +     numa-node-id = <0>;
>> +
>> +     /* numa node 1 */
>> +     numa-node-id = <1>;
>> +
>> +==============================================================================
>> +3 - distance-map
>> +==============================================================================
>> +
>> +The device tree node distance-map describes the relative
>> +distance (memory latency) between all numa nodes.
>
> Is this not a combined approximation for latency and bandwidth?
AFAIK, it is to represent inter-node memory access latency.
>
>> +- compatible : Should at least contain "numa,distance-map-v1".
>
> Please use "numa-distance-map-v1", as "numa" is not a vendor.
ok
>
>> +- distance-matrix
>> +  This property defines a matrix to describe the relative distances
>> +  between all numa nodes.
>> +  It is represented as a list of node pairs and their relative distance.
>> +
>> +  Note:
>> +     1. Each entry represents distance from first node to second node.
>> +     2. If both directions between 2 nodes have the same distance, only
>> +            one entry is required.
>
> I still don't understand what direction means in this context. Are there
> systems (of any architecture) which don't have symmetric distances?
> Which accesses does this apply differently to?
>
> Given that, I think that it might be best to explicitly call out
> distances as being equal, and leave any directionality for a later
> revision of the binding when we have some semantics for directionality.
agreed, given that there is no know system to substantiate dual direction,
let us not explicit about direction.
>
>> +     2. distance-matrix shold have entries in lexicographical ascending order of nodes.
>> +     3. There must be only one Device node distance-map and must reside in the root node.
>> +
>> +Example:
>> +     4 nodes connected in mesh/ring topology as below,
>> +
>> +             0_______20______1
>> +             |               |
>> +             |               |
>> +           20|               |20
>> +             |               |
>> +             |               |
>> +             |_______________|
>> +             3       20      2
>> +
>> +     if relative distance for each hop is 20,
>> +     then inter node distance would be for this topology will be,
>> +           0 -> 1 = 20
>> +           1 -> 2 = 20
>> +           2 -> 3 = 20
>> +           3 -> 0 = 20
>> +           0 -> 2 = 40
>> +           1 -> 3 = 40
>
> How is this scaled relative to a local access?
this is based on representing local distance with 10 and
all inter-node latency being represented as multiple of 10.

>
> Do we assume that a local access has value 1, e.g. each hop takes 20x a
> local access in this example?
The local distance is represented as 10, this is fixed and same as in ACPI.
Inter-node distance can be any number greater than 10.
this information can be added here to make it clear.
>
> Do we need a finer-grained scale (e.g. to allow us to represent a
> distance of 2.5)? The ACPI SLIT spec seems to give local accesses a
> value 10 implicitly to this end.
yes, same as ACPI, local node is 10.
>
> Other than those points, I'm happy with this binding.
>
> Thanks,
> Mark.
>
thanks
Ganapat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-11-17 17:20   ` Ganapatrao Kulkarni
@ 2015-12-17 17:11       ` Will Deacon
  -1 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2015-12-17 17:11 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, catalin.marinas-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Hello,

This all looks pretty reasonable, but I'd like to see an Ack from a
devicetree maintainer on the binding before I merge anything (and I see
that there are outstanding comments from Rutland on that).

On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This patch adds by default the dummy numa node and
> maps all memory and cpus to node 0.
> using this patch, numa can be simulated on single node arm64 platforms.
> 
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  arch/arm64/Kconfig              |  25 +++
>  arch/arm64/include/asm/mmzone.h |  17 ++
>  arch/arm64/include/asm/numa.h   |  47 +++++
>  arch/arm64/kernel/setup.c       |   4 +
>  arch/arm64/kernel/smp.c         |   2 +
>  arch/arm64/mm/Makefile          |   1 +
>  arch/arm64/mm/init.c            |  30 +++-
>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 506 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/mm/numa.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9ac16a4..7d8fb42 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -71,6 +71,7 @@ config ARM64
>  	select HAVE_GENERIC_DMA_COHERENT
>  	select HAVE_HW_BREAKPOINT if PERF_EVENTS
>  	select HAVE_MEMBLOCK
> +	select HAVE_MEMBLOCK_NODE_MAP if NUMA
>  	select HAVE_PATA_PLATFORM
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_REGS
> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
>  
> +# Common NUMA Features
> +config NUMA
> +	bool "Numa Memory Allocation and Scheduler Support"
> +	depends on SMP
> +	help
> +	  Enable NUMA (Non Uniform Memory Access) support.
> +
> +	  The kernel will try to allocate memory used by a CPU on the
> +	  local memory controller of the CPU and add some more
> +	  NUMA awareness to the kernel.

I appreciate that this is copied from x86, but what exactly do you mean
by "memory controller" here?

> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> new file mode 100644
> index 0000000..6ddd468
> --- /dev/null
> +++ b/arch/arm64/include/asm/mmzone.h
> @@ -0,0 +1,17 @@
> +#ifndef __ASM_ARM64_MMZONE_H_
> +#define __ASM_ARM64_MMZONE_H_

Please try to follow the standard naming for header guards under arm64
(yes, it's not perfect, but we've made some effort for consistency).

> +
> +#ifdef CONFIG_NUMA
> +
> +#include <linux/mmdebug.h>
> +#include <linux/types.h>
> +
> +#include <asm/smp.h>
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +
> +#define NODE_DATA(nid)		(node_data[(nid)])

This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
and x86. Can we make this the default in the core code instead and then
replace this header file with asm-generic or something?

> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_ARM64_MMZONE_H_ */
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> new file mode 100644
> index 0000000..c00f3a4
> --- /dev/null
> +++ b/arch/arm64/include/asm/numa.h
> @@ -0,0 +1,47 @@
> +#ifndef _ASM_NUMA_H
> +#define _ASM_NUMA_H

Same comment on the guards.

> +#include <linux/nodemask.h>
> +#include <asm/topology.h>
> +
> +#ifdef CONFIG_NUMA
> +
> +#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)

This is only used by the ACPI code afaict, so maybe include it when you
add that?

> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))

Where is this used?

> +
> +/* currently, arm64 implements flat NUMA topology */
> +#define parent_node(node)	(node)
> +
> +extern int __node_distance(int from, int to);
> +#define node_distance(a, b) __node_distance(a, b)
> +
> +/* dummy definitions for pci functions */
> +#define pcibus_to_node(node)	0
> +#define cpumask_of_pcibus(bus)	0

There's a bunch of these dummy definitions already available in
asm-generic/topology.h. Can we use those instead of rolling our own
please?

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 17bf39a..8dc9c5d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -37,6 +37,7 @@
>  
>  #include <asm/fixmap.h>
>  #include <asm/memory.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/sizes.h>
> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>  	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> +	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> +
> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
> +	max_zone_pfns[ZONE_NORMAL] = max;
> +
> +	free_area_init_nodes(max_zone_pfns);
> +}

This is certainly more readable then the non-numa zone_sizes_init. Is
there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
having to handle the zone holds explicitly?

Also, I couldn't find any calls to memblock_add_node, which seem to be
expected. What am I missing?

> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> new file mode 100644
> index 0000000..e3afbf8
> --- /dev/null
> +++ b/arch/arm64/mm/numa.c
> @@ -0,0 +1,384 @@
> +/*
> + * NUMA support, based on the x86 implementation.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/bootmem.h>
> +#include <linux/ctype.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/memblock.h>
> +#include <linux/module.h>
> +#include <linux/mmzone.h>
> +#include <linux/nodemask.h>
> +#include <linux/sched.h>
> +#include <linux/string.h>
> +#include <linux/topology.h>
> +
> +#include <asm/smp_plat.h>
> +
> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> +EXPORT_SYMBOL(node_data);
> +nodemask_t numa_nodes_parsed __initdata;
> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
> +
> +static int numa_off;
> +static int numa_distance_cnt;
> +static u8 *numa_distance;
> +
> +static __init int numa_parse_early_param(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +	if (!strncmp(opt, "off", 3)) {
> +		pr_info("%s\n", "NUMA turned off");
> +		numa_off = 1;

There's a patch kicking around to add this to strtobool:

  https://lkml.org/lkml/2015/12/9/802

but I can't see it in next :(

> +	}
> +	return 0;
> +}
> +early_param("numa", numa_parse_early_param);
> +
> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +EXPORT_SYMBOL(node_to_cpumask_map);
> +
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +/*
> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> + */
> +const struct cpumask *cpumask_of_node(int node)
> +{
> +
> +	if (WARN_ON(node >= nr_node_ids))
> +		return cpu_none_mask;
> +
> +	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> +		return cpu_online_mask;
> +
> +	return node_to_cpumask_map[node];
> +}
> +EXPORT_SYMBOL(cpumask_of_node);
> +#endif
> +
> +static void map_cpu_to_node(unsigned int cpu, int nid)
> +{
> +	set_cpu_numa_node(cpu, nid);
> +	if (nid >= 0)
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> +}
> +
> +static void unmap_cpu_to_node(unsigned int cpu)
> +{
> +	int nid = cpu_to_node(cpu);
> +
> +	if (nid >= 0)
> +		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> +	set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +}
> +
> +void numa_clear_node(unsigned int cpu)
> +{
> +	unmap_cpu_to_node(cpu);
> +}
> +
> +/*
> + * Allocate node_to_cpumask_map based on number of available nodes
> + * Requires node_possible_map to be valid.
> + *
> + * Note: cpumask_of_node() is not valid until after this is done.
> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> + */
> +static void __init setup_node_to_cpumask_map(void)
> +{
> +	unsigned int cpu;
> +	int node;
> +
> +	/* setup nr_node_ids if not done yet */
> +	if (nr_node_ids == MAX_NUMNODES)
> +		setup_nr_node_ids();
> +
> +	/* allocate and clear the mapping */
> +	for (node = 0; node < nr_node_ids; node++) {
> +		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> +		cpumask_clear(node_to_cpumask_map[node]);
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +
> +	/* cpumask_of_node() will now work */
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> +}
> +
> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(unsigned int cpu)
> +{
> +	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> +}
> +
> +/**
> + * numa_add_memblk - Set node id to memblk
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @size:  Size of the new memblk
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_add_memblk(int nid, u64 start, u64 size)
> +{
> +	int ret;
> +
> +	ret = memblock_set_node(start, size, &memblock.memory, nid);
> +	if (ret < 0) {
> +		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> +			start, (start + size - 1), nid);
> +		return ret;
> +	}
> +
> +	node_set(nid, numa_nodes_parsed);
> +	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> +			start, (start + size - 1), nid);
> +	return ret;
> +}
> +EXPORT_SYMBOL(numa_add_memblk);
> +
> +/* Initialize NODE_DATA for a node on the local memory */
> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> +{
> +	const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
> +	u64 nd_pa;
> +	void *nd;
> +	int tnid;
> +
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +			nid, start_pfn << PAGE_SHIFT,
> +			(end_pfn << PAGE_SHIFT) - 1);
> +
> +	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd = __va(nd_pa);
> +
> +	/* report and initialize */
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +		nd_pa, nd_pa + nd_size - 1);
> +	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> +	if (tnid != nid)
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> +
> +	node_data[nid] = nd;
> +	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> +	NODE_DATA(nid)->node_id = nid;
> +	NODE_DATA(nid)->node_start_pfn = start_pfn;
> +	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
> +}
> +
> +/**
> + * numa_reset_distance - Reset NUMA distance table
> + *
> + * The current table is freed.
> + * The next numa_set_distance() call will create a new one.
> + */
> +void __init numa_reset_distance(void)
> +{
> +	size_t size;
> +
> +	if (!numa_distance)
> +		return;
> +
> +	size = numa_distance_cnt * numa_distance_cnt *
> +		sizeof(numa_distance[0]);
> +
> +	memblock_free(__pa(numa_distance), size);
> +	numa_distance_cnt = 0;
> +	numa_distance = NULL;
> +}
> +
> +static int __init numa_alloc_distance(void)
> +{
> +	size_t size;
> +	u64 phys;
> +	int i, j;
> +
> +	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
> +	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> +				      size, PAGE_SIZE);
> +	if (WARN_ON(!phys))
> +		return -ENOMEM;
> +
> +	memblock_reserve(phys, size);
> +
> +	numa_distance = __va(phys);
> +	numa_distance_cnt = nr_node_ids;
> +
> +	/* fill with the default distances */
> +	for (i = 0; i < numa_distance_cnt; i++)
> +		for (j = 0; j < numa_distance_cnt; j++)
> +			numa_distance[i * numa_distance_cnt + j] = i == j ?
> +				LOCAL_DISTANCE : REMOTE_DISTANCE;
> +
> +	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> +			numa_distance_cnt);
> +
> +	return 0;
> +}
> +
> +/**
> + * numa_set_distance - Set NUMA distance from one NUMA to another
> + * @from: the 'from' node to set distance
> + * @to: the 'to'  node to set distance
> + * @distance: NUMA distance
> + *
> + * Set the distance from node @from to @to to @distance.  If distance table
> + * doesn't exist, one which is large enough to accommodate all the currently
> + * known nodes will be created.
> + *
> + * If such table cannot be allocated, a warning is printed and further
> + * calls are ignored until the distance table is reset with
> + * numa_reset_distance().
> + *
> + * If @from or @to is higher than the highest known node or lower than zero
> + * at the time of table creation or @distance doesn't make sense, the call
> + * is ignored.
> + * This is to allow simplification of specific NUMA config implementations.
> + */
> +void __init numa_set_distance(int from, int to, int distance)
> +{
> +	if (!numa_distance)
> +		return;
> +
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> +			from < 0 || to < 0) {
> +		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +			    from, to, distance);
> +		return;
> +	}
> +
> +	if ((u8)distance != distance ||
> +	    (from == to && distance != LOCAL_DISTANCE)) {
> +		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +			     from, to, distance);
> +		return;
> +	}
> +
> +	numa_distance[from * numa_distance_cnt + to] = distance;
> +}
> +EXPORT_SYMBOL(numa_set_distance);
> +
> +int __node_distance(int from, int to)
> +{
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> +		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	return numa_distance[from * numa_distance_cnt + to];
> +}
> +EXPORT_SYMBOL(__node_distance);

Much of this is simply a direct copy/paste from x86. Why can't it be
moved to common code? I don't see anything arch-specific here.

> +static int __init numa_register_nodes(void)
> +{
> +	int nid;
> +	struct memblock_region *mblk;
> +
> +	/* Check that valid nid is set to memblks */
> +	for_each_memblock(memory, mblk)
> +		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
> +			return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, numa_nodes_parsed) {
> +		unsigned long start_pfn, end_pfn;
> +
> +		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> +		setup_node_data(nid, start_pfn, end_pfn);
> +		node_set_online(nid);
> +	}
> +
> +	/* Setup online nodes to actual nodes*/
> +	node_possible_map = numa_nodes_parsed;
> +
> +	/* Dump memblock with node info and return. */
> +	memblock_dump_all();

We already call this from arm64_memblock_init. If that's now too early
to be of any use, we should move it to after bootmem_init, but we should
probably avoid calling it twice.

> +	return 0;
> +}
> +
> +static int __init numa_init(int (*init_func)(void))
> +{
> +	int ret;
> +
> +	nodes_clear(numa_nodes_parsed);
> +	nodes_clear(node_possible_map);
> +	nodes_clear(node_online_map);
> +	numa_reset_distance();
> +
> +	ret = init_func();
> +	if (ret < 0)
> +		return ret;
> +
> +	if (nodes_empty(numa_nodes_parsed))
> +		return -EINVAL;
> +
> +	ret = numa_register_nodes();
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = numa_alloc_distance();
> +	if (ret < 0)
> +		return ret;
> +
> +	setup_node_to_cpumask_map();
> +
> +	/* init boot processor */
> +	cpu_to_node_map[0] = 0;
> +	map_cpu_to_node(0, 0);
> +
> +	return 0;
> +}
> +
> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.

Why can't it fail? It looks like the return value is ignored by numa_init.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-12-17 17:11       ` Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2015-12-17 17:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

This all looks pretty reasonable, but I'd like to see an Ack from a
devicetree maintainer on the binding before I merge anything (and I see
that there are outstanding comments from Rutland on that).

On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This patch adds by default the dummy numa node and
> maps all memory and cpus to node 0.
> using this patch, numa can be simulated on single node arm64 platforms.
> 
> Reviewed-by: Robert Richter <rrichter@cavium.com>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
> ---
>  arch/arm64/Kconfig              |  25 +++
>  arch/arm64/include/asm/mmzone.h |  17 ++
>  arch/arm64/include/asm/numa.h   |  47 +++++
>  arch/arm64/kernel/setup.c       |   4 +
>  arch/arm64/kernel/smp.c         |   2 +
>  arch/arm64/mm/Makefile          |   1 +
>  arch/arm64/mm/init.c            |  30 +++-
>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 506 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/mm/numa.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9ac16a4..7d8fb42 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -71,6 +71,7 @@ config ARM64
>  	select HAVE_GENERIC_DMA_COHERENT
>  	select HAVE_HW_BREAKPOINT if PERF_EVENTS
>  	select HAVE_MEMBLOCK
> +	select HAVE_MEMBLOCK_NODE_MAP if NUMA
>  	select HAVE_PATA_PLATFORM
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_REGS
> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
>  
> +# Common NUMA Features
> +config NUMA
> +	bool "Numa Memory Allocation and Scheduler Support"
> +	depends on SMP
> +	help
> +	  Enable NUMA (Non Uniform Memory Access) support.
> +
> +	  The kernel will try to allocate memory used by a CPU on the
> +	  local memory controller of the CPU and add some more
> +	  NUMA awareness to the kernel.

I appreciate that this is copied from x86, but what exactly do you mean
by "memory controller" here?

> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> new file mode 100644
> index 0000000..6ddd468
> --- /dev/null
> +++ b/arch/arm64/include/asm/mmzone.h
> @@ -0,0 +1,17 @@
> +#ifndef __ASM_ARM64_MMZONE_H_
> +#define __ASM_ARM64_MMZONE_H_

Please try to follow the standard naming for header guards under arm64
(yes, it's not perfect, but we've made some effort for consistency).

> +
> +#ifdef CONFIG_NUMA
> +
> +#include <linux/mmdebug.h>
> +#include <linux/types.h>
> +
> +#include <asm/smp.h>
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +
> +#define NODE_DATA(nid)		(node_data[(nid)])

This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
and x86. Can we make this the default in the core code instead and then
replace this header file with asm-generic or something?

> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_ARM64_MMZONE_H_ */
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> new file mode 100644
> index 0000000..c00f3a4
> --- /dev/null
> +++ b/arch/arm64/include/asm/numa.h
> @@ -0,0 +1,47 @@
> +#ifndef _ASM_NUMA_H
> +#define _ASM_NUMA_H

Same comment on the guards.

> +#include <linux/nodemask.h>
> +#include <asm/topology.h>
> +
> +#ifdef CONFIG_NUMA
> +
> +#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)

This is only used by the ACPI code afaict, so maybe include it when you
add that?

> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))

Where is this used?

> +
> +/* currently, arm64 implements flat NUMA topology */
> +#define parent_node(node)	(node)
> +
> +extern int __node_distance(int from, int to);
> +#define node_distance(a, b) __node_distance(a, b)
> +
> +/* dummy definitions for pci functions */
> +#define pcibus_to_node(node)	0
> +#define cpumask_of_pcibus(bus)	0

There's a bunch of these dummy definitions already available in
asm-generic/topology.h. Can we use those instead of rolling our own
please?

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 17bf39a..8dc9c5d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -37,6 +37,7 @@
>  
>  #include <asm/fixmap.h>
>  #include <asm/memory.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/sizes.h>
> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>  	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> +	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> +
> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
> +	max_zone_pfns[ZONE_NORMAL] = max;
> +
> +	free_area_init_nodes(max_zone_pfns);
> +}

This is certainly more readable then the non-numa zone_sizes_init. Is
there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
having to handle the zone holds explicitly?

Also, I couldn't find any calls to memblock_add_node, which seem to be
expected. What am I missing?

> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> new file mode 100644
> index 0000000..e3afbf8
> --- /dev/null
> +++ b/arch/arm64/mm/numa.c
> @@ -0,0 +1,384 @@
> +/*
> + * NUMA support, based on the x86 implementation.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/bootmem.h>
> +#include <linux/ctype.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/memblock.h>
> +#include <linux/module.h>
> +#include <linux/mmzone.h>
> +#include <linux/nodemask.h>
> +#include <linux/sched.h>
> +#include <linux/string.h>
> +#include <linux/topology.h>
> +
> +#include <asm/smp_plat.h>
> +
> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> +EXPORT_SYMBOL(node_data);
> +nodemask_t numa_nodes_parsed __initdata;
> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
> +
> +static int numa_off;
> +static int numa_distance_cnt;
> +static u8 *numa_distance;
> +
> +static __init int numa_parse_early_param(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +	if (!strncmp(opt, "off", 3)) {
> +		pr_info("%s\n", "NUMA turned off");
> +		numa_off = 1;

There's a patch kicking around to add this to strtobool:

  https://lkml.org/lkml/2015/12/9/802

but I can't see it in next :(

> +	}
> +	return 0;
> +}
> +early_param("numa", numa_parse_early_param);
> +
> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +EXPORT_SYMBOL(node_to_cpumask_map);
> +
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +/*
> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> + */
> +const struct cpumask *cpumask_of_node(int node)
> +{
> +
> +	if (WARN_ON(node >= nr_node_ids))
> +		return cpu_none_mask;
> +
> +	if (WARN_ON(node_to_cpumask_map[node] == NULL))
> +		return cpu_online_mask;
> +
> +	return node_to_cpumask_map[node];
> +}
> +EXPORT_SYMBOL(cpumask_of_node);
> +#endif
> +
> +static void map_cpu_to_node(unsigned int cpu, int nid)
> +{
> +	set_cpu_numa_node(cpu, nid);
> +	if (nid >= 0)
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> +}
> +
> +static void unmap_cpu_to_node(unsigned int cpu)
> +{
> +	int nid = cpu_to_node(cpu);
> +
> +	if (nid >= 0)
> +		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> +	set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +}
> +
> +void numa_clear_node(unsigned int cpu)
> +{
> +	unmap_cpu_to_node(cpu);
> +}
> +
> +/*
> + * Allocate node_to_cpumask_map based on number of available nodes
> + * Requires node_possible_map to be valid.
> + *
> + * Note: cpumask_of_node() is not valid until after this is done.
> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> + */
> +static void __init setup_node_to_cpumask_map(void)
> +{
> +	unsigned int cpu;
> +	int node;
> +
> +	/* setup nr_node_ids if not done yet */
> +	if (nr_node_ids == MAX_NUMNODES)
> +		setup_nr_node_ids();
> +
> +	/* allocate and clear the mapping */
> +	for (node = 0; node < nr_node_ids; node++) {
> +		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> +		cpumask_clear(node_to_cpumask_map[node]);
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +
> +	/* cpumask_of_node() will now work */
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> +}
> +
> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(unsigned int cpu)
> +{
> +	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> +}
> +
> +/**
> + * numa_add_memblk - Set node id to memblk
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @size:  Size of the new memblk
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_add_memblk(int nid, u64 start, u64 size)
> +{
> +	int ret;
> +
> +	ret = memblock_set_node(start, size, &memblock.memory, nid);
> +	if (ret < 0) {
> +		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> +			start, (start + size - 1), nid);
> +		return ret;
> +	}
> +
> +	node_set(nid, numa_nodes_parsed);
> +	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> +			start, (start + size - 1), nid);
> +	return ret;
> +}
> +EXPORT_SYMBOL(numa_add_memblk);
> +
> +/* Initialize NODE_DATA for a node on the local memory */
> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> +{
> +	const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
> +	u64 nd_pa;
> +	void *nd;
> +	int tnid;
> +
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +			nid, start_pfn << PAGE_SHIFT,
> +			(end_pfn << PAGE_SHIFT) - 1);
> +
> +	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd = __va(nd_pa);
> +
> +	/* report and initialize */
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +		nd_pa, nd_pa + nd_size - 1);
> +	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> +	if (tnid != nid)
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> +
> +	node_data[nid] = nd;
> +	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> +	NODE_DATA(nid)->node_id = nid;
> +	NODE_DATA(nid)->node_start_pfn = start_pfn;
> +	NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
> +}
> +
> +/**
> + * numa_reset_distance - Reset NUMA distance table
> + *
> + * The current table is freed.
> + * The next numa_set_distance() call will create a new one.
> + */
> +void __init numa_reset_distance(void)
> +{
> +	size_t size;
> +
> +	if (!numa_distance)
> +		return;
> +
> +	size = numa_distance_cnt * numa_distance_cnt *
> +		sizeof(numa_distance[0]);
> +
> +	memblock_free(__pa(numa_distance), size);
> +	numa_distance_cnt = 0;
> +	numa_distance = NULL;
> +}
> +
> +static int __init numa_alloc_distance(void)
> +{
> +	size_t size;
> +	u64 phys;
> +	int i, j;
> +
> +	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
> +	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> +				      size, PAGE_SIZE);
> +	if (WARN_ON(!phys))
> +		return -ENOMEM;
> +
> +	memblock_reserve(phys, size);
> +
> +	numa_distance = __va(phys);
> +	numa_distance_cnt = nr_node_ids;
> +
> +	/* fill with the default distances */
> +	for (i = 0; i < numa_distance_cnt; i++)
> +		for (j = 0; j < numa_distance_cnt; j++)
> +			numa_distance[i * numa_distance_cnt + j] = i == j ?
> +				LOCAL_DISTANCE : REMOTE_DISTANCE;
> +
> +	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> +			numa_distance_cnt);
> +
> +	return 0;
> +}
> +
> +/**
> + * numa_set_distance - Set NUMA distance from one NUMA to another
> + * @from: the 'from' node to set distance
> + * @to: the 'to'  node to set distance
> + * @distance: NUMA distance
> + *
> + * Set the distance from node @from to @to to @distance.  If distance table
> + * doesn't exist, one which is large enough to accommodate all the currently
> + * known nodes will be created.
> + *
> + * If such table cannot be allocated, a warning is printed and further
> + * calls are ignored until the distance table is reset with
> + * numa_reset_distance().
> + *
> + * If @from or @to is higher than the highest known node or lower than zero
> + * at the time of table creation or @distance doesn't make sense, the call
> + * is ignored.
> + * This is to allow simplification of specific NUMA config implementations.
> + */
> +void __init numa_set_distance(int from, int to, int distance)
> +{
> +	if (!numa_distance)
> +		return;
> +
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> +			from < 0 || to < 0) {
> +		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +			    from, to, distance);
> +		return;
> +	}
> +
> +	if ((u8)distance != distance ||
> +	    (from == to && distance != LOCAL_DISTANCE)) {
> +		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +			     from, to, distance);
> +		return;
> +	}
> +
> +	numa_distance[from * numa_distance_cnt + to] = distance;
> +}
> +EXPORT_SYMBOL(numa_set_distance);
> +
> +int __node_distance(int from, int to)
> +{
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> +		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	return numa_distance[from * numa_distance_cnt + to];
> +}
> +EXPORT_SYMBOL(__node_distance);

Much of this is simply a direct copy/paste from x86. Why can't it be
moved to common code? I don't see anything arch-specific here.

> +static int __init numa_register_nodes(void)
> +{
> +	int nid;
> +	struct memblock_region *mblk;
> +
> +	/* Check that valid nid is set to memblks */
> +	for_each_memblock(memory, mblk)
> +		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
> +			return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, numa_nodes_parsed) {
> +		unsigned long start_pfn, end_pfn;
> +
> +		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> +		setup_node_data(nid, start_pfn, end_pfn);
> +		node_set_online(nid);
> +	}
> +
> +	/* Setup online nodes to actual nodes*/
> +	node_possible_map = numa_nodes_parsed;
> +
> +	/* Dump memblock with node info and return. */
> +	memblock_dump_all();

We already call this from arm64_memblock_init. If that's now too early
to be of any use, we should move it to after bootmem_init, but we should
probably avoid calling it twice.

> +	return 0;
> +}
> +
> +static int __init numa_init(int (*init_func)(void))
> +{
> +	int ret;
> +
> +	nodes_clear(numa_nodes_parsed);
> +	nodes_clear(node_possible_map);
> +	nodes_clear(node_online_map);
> +	numa_reset_distance();
> +
> +	ret = init_func();
> +	if (ret < 0)
> +		return ret;
> +
> +	if (nodes_empty(numa_nodes_parsed))
> +		return -EINVAL;
> +
> +	ret = numa_register_nodes();
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = numa_alloc_distance();
> +	if (ret < 0)
> +		return ret;
> +
> +	setup_node_to_cpumask_map();
> +
> +	/* init boot processor */
> +	cpu_to_node_map[0] = 0;
> +	map_cpu_to_node(0, 0);
> +
> +	return 0;
> +}
> +
> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.

Why can't it fail? It looks like the return value is ignored by numa_init.

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-12-17 17:11       ` Will Deacon
@ 2015-12-17 18:30           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-17 18:30 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas, Grant Likely,
	Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ard Biesheuvel,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring, Steve Capper,
	Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll, Ian Campbell,
	Kumar Gala, Rafael J. Wysocki, Len Brown, Marc Zyngier,
	Robert Richter, Prasun Kapoor

Thanks Will for the review.

On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> Hello,
>
> This all looks pretty reasonable, but I'd like to see an Ack from a
> devicetree maintainer on the binding before I merge anything (and I see
> that there are outstanding comments from Rutland on that).
IIUC, there are no open comments for the binding.
Mark Rutland: please let me know, if there any open comments.
otherwise, can you please Ack the binding.
>
> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This patch adds by default the dummy numa node and
>> maps all memory and cpus to node 0.
>> using this patch, numa can be simulated on single node arm64 platforms.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>  arch/arm64/Kconfig              |  25 +++
>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>  arch/arm64/include/asm/numa.h   |  47 +++++
>>  arch/arm64/kernel/setup.c       |   4 +
>>  arch/arm64/kernel/smp.c         |   2 +
>>  arch/arm64/mm/Makefile          |   1 +
>>  arch/arm64/mm/init.c            |  30 +++-
>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 506 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>  create mode 100644 arch/arm64/include/asm/numa.h
>>  create mode 100644 arch/arm64/mm/numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 9ac16a4..7d8fb42 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -71,6 +71,7 @@ config ARM64
>>       select HAVE_GENERIC_DMA_COHERENT
>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>       select HAVE_MEMBLOCK
>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>       select HAVE_PATA_PLATFORM
>>       select HAVE_PERF_EVENTS
>>       select HAVE_PERF_REGS
>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>         can be controlled through /sys/devices/system/cpu.
>>
>> +# Common NUMA Features
>> +config NUMA
>> +     bool "Numa Memory Allocation and Scheduler Support"
>> +     depends on SMP
>> +     help
>> +       Enable NUMA (Non Uniform Memory Access) support.
>> +
>> +       The kernel will try to allocate memory used by a CPU on the
>> +       local memory controller of the CPU and add some more
>> +       NUMA awareness to the kernel.
>
> I appreciate that this is copied from x86, but what exactly do you mean
> by "memory controller" here?
ok, it is fair enough to say local memory.
>
>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> new file mode 100644
>> index 0000000..6ddd468
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mmzone.h
>> @@ -0,0 +1,17 @@
>> +#ifndef __ASM_ARM64_MMZONE_H_
>> +#define __ASM_ARM64_MMZONE_H_
>
> Please try to follow the standard naming for header guards under arm64
> (yes, it's not perfect, but we've made some effort for consistency).
sure, will follow as in other code.
>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#include <linux/mmdebug.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/smp.h>
>> +#include <asm/numa.h>
>> +
>> +extern struct pglist_data *node_data[];
>> +
>> +#define NODE_DATA(nid)               (node_data[(nid)])
>
> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
> and x86. Can we make this the default in the core code instead and then
> replace this header file with asm-generic or something?
IIUC, it is same in most but not in all arch.
>
>> +
>> +#endif /* CONFIG_NUMA */
>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> new file mode 100644
>> index 0000000..c00f3a4
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -0,0 +1,47 @@
>> +#ifndef _ASM_NUMA_H
>> +#define _ASM_NUMA_H
>
> Same comment on the guards.
ok
>
>> +#include <linux/nodemask.h>
>> +#include <asm/topology.h>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>
> This is only used by the ACPI code afaict, so maybe include it when you
> add that?
ok
>
>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>
> Where is this used?
sorry, was used in v6, missed to delete.
>
>> +
>> +/* currently, arm64 implements flat NUMA topology */
>> +#define parent_node(node)    (node)
>> +
>> +extern int __node_distance(int from, int to);
>> +#define node_distance(a, b) __node_distance(a, b)
>> +
>> +/* dummy definitions for pci functions */
>> +#define pcibus_to_node(node) 0
>> +#define cpumask_of_pcibus(bus)       0
>
> There's a bunch of these dummy definitions already available in
> asm-generic/topology.h. Can we use those instead of rolling our own
> please?
>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 17bf39a..8dc9c5d 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -37,6 +37,7 @@
>>
>>  #include <asm/fixmap.h>
>>  #include <asm/memory.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/sizes.h>
>> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>  }
>>
>> +#ifdef CONFIG_NUMA
>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>> +{
>> +     unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>> +
>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>> +     max_zone_pfns[ZONE_NORMAL] = max;
>> +
>> +     free_area_init_nodes(max_zone_pfns);
>> +}
>
> This is certainly more readable then the non-numa zone_sizes_init. Is
> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
> having to handle the zone holds explicitly?
yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
instead of for only numa.
i can experiment to have the zone_sizes_init of numa to non-numa case
also and delete the current zone_sizes_init.

>
> Also, I couldn't find any calls to memblock_add_node, which seem to be
> expected. What am I missing?
memblks are added much before numa in dt or acpi parsing.
memblock_set_node is used to set the node.
>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> new file mode 100644
>> index 0000000..e3afbf8
>> --- /dev/null
>> +++ b/arch/arm64/mm/numa.c
>> @@ -0,0 +1,384 @@
>> +/*
>> + * NUMA support, based on the x86 implementation.
>> + *
>> + * Copyright (C) 2015 Cavium Inc.
>> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/bootmem.h>
>> +#include <linux/ctype.h>
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/memblock.h>
>> +#include <linux/module.h>
>> +#include <linux/mmzone.h>
>> +#include <linux/nodemask.h>
>> +#include <linux/sched.h>
>> +#include <linux/string.h>
>> +#include <linux/topology.h>
>> +
>> +#include <asm/smp_plat.h>
>> +
>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>> +EXPORT_SYMBOL(node_data);
>> +nodemask_t numa_nodes_parsed __initdata;
>> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
>> +
>> +static int numa_off;
>> +static int numa_distance_cnt;
>> +static u8 *numa_distance;
>> +
>> +static __init int numa_parse_early_param(char *opt)
>> +{
>> +     if (!opt)
>> +             return -EINVAL;
>> +     if (!strncmp(opt, "off", 3)) {
>> +             pr_info("%s\n", "NUMA turned off");
>> +             numa_off = 1;
>
> There's a patch kicking around to add this to strtobool:
will change once it is in upstream. dont want to have dependency for this.
>
>   https://lkml.org/lkml/2015/12/9/802
>
> but I can't see it in next :(
>
>> +     }
>> +     return 0;
>> +}
>> +early_param("numa", numa_parse_early_param);
>> +
>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +EXPORT_SYMBOL(node_to_cpumask_map);
>> +
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +/*
>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> + */
>> +const struct cpumask *cpumask_of_node(int node)
>> +{
>> +
>> +     if (WARN_ON(node >= nr_node_ids))
>> +             return cpu_none_mask;
>> +
>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> +             return cpu_online_mask;
>> +
>> +     return node_to_cpumask_map[node];
>> +}
>> +EXPORT_SYMBOL(cpumask_of_node);
>> +#endif
>> +
>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> +{
>> +     set_cpu_numa_node(cpu, nid);
>> +     if (nid >= 0)
>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> +}
>> +
>> +static void unmap_cpu_to_node(unsigned int cpu)
>> +{
>> +     int nid = cpu_to_node(cpu);
>> +
>> +     if (nid >= 0)
>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +}
>> +
>> +void numa_clear_node(unsigned int cpu)
>> +{
>> +     unmap_cpu_to_node(cpu);
>> +}
>> +
>> +/*
>> + * Allocate node_to_cpumask_map based on number of available nodes
>> + * Requires node_possible_map to be valid.
>> + *
>> + * Note: cpumask_of_node() is not valid until after this is done.
>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> + */
>> +static void __init setup_node_to_cpumask_map(void)
>> +{
>> +     unsigned int cpu;
>> +     int node;
>> +
>> +     /* setup nr_node_ids if not done yet */
>> +     if (nr_node_ids == MAX_NUMNODES)
>> +             setup_nr_node_ids();
>> +
>> +     /* allocate and clear the mapping */
>> +     for (node = 0; node < nr_node_ids; node++) {
>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> +             cpumask_clear(node_to_cpumask_map[node]);
>> +     }
>> +
>> +     for_each_possible_cpu(cpu)
>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +
>> +     /* cpumask_of_node() will now work */
>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> +}
>> +
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(unsigned int cpu)
>> +{
>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> +}
>> +
>> +/**
>> + * numa_add_memblk - Set node id to memblk
>> + * @nid: NUMA node ID of the new memblk
>> + * @start: Start address of the new memblk
>> + * @size:  Size of the new memblk
>> + *
>> + * RETURNS:
>> + * 0 on success, -errno on failure.
>> + */
>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>> +{
>> +     int ret;
>> +
>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>> +     if (ret < 0) {
>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +             return ret;
>> +     }
>> +
>> +     node_set(nid, numa_nodes_parsed);
>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(numa_add_memblk);
>> +
>> +/* Initialize NODE_DATA for a node on the local memory */
>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> +{
>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>> +     u64 nd_pa;
>> +     void *nd;
>> +     int tnid;
>> +
>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +                     nid, start_pfn << PAGE_SHIFT,
>> +                     (end_pfn << PAGE_SHIFT) - 1);
>> +
>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +     nd = __va(nd_pa);
>> +
>> +     /* report and initialize */
>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +             nd_pa, nd_pa + nd_size - 1);
>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> +     if (tnid != nid)
>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> +
>> +     node_data[nid] = nd;
>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> +     NODE_DATA(nid)->node_id = nid;
>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>> +}
>> +
>> +/**
>> + * numa_reset_distance - Reset NUMA distance table
>> + *
>> + * The current table is freed.
>> + * The next numa_set_distance() call will create a new one.
>> + */
>> +void __init numa_reset_distance(void)
>> +{
>> +     size_t size;
>> +
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     size = numa_distance_cnt * numa_distance_cnt *
>> +             sizeof(numa_distance[0]);
>> +
>> +     memblock_free(__pa(numa_distance), size);
>> +     numa_distance_cnt = 0;
>> +     numa_distance = NULL;
>> +}
>> +
>> +static int __init numa_alloc_distance(void)
>> +{
>> +     size_t size;
>> +     u64 phys;
>> +     int i, j;
>> +
>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> +                                   size, PAGE_SIZE);
>> +     if (WARN_ON(!phys))
>> +             return -ENOMEM;
>> +
>> +     memblock_reserve(phys, size);
>> +
>> +     numa_distance = __va(phys);
>> +     numa_distance_cnt = nr_node_ids;
>> +
>> +     /* fill with the default distances */
>> +     for (i = 0; i < numa_distance_cnt; i++)
>> +             for (j = 0; j < numa_distance_cnt; j++)
>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +
>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> +                     numa_distance_cnt);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> + * @from: the 'from' node to set distance
>> + * @to: the 'to'  node to set distance
>> + * @distance: NUMA distance
>> + *
>> + * Set the distance from node @from to @to to @distance.  If distance table
>> + * doesn't exist, one which is large enough to accommodate all the currently
>> + * known nodes will be created.
>> + *
>> + * If such table cannot be allocated, a warning is printed and further
>> + * calls are ignored until the distance table is reset with
>> + * numa_reset_distance().
>> + *
>> + * If @from or @to is higher than the highest known node or lower than zero
>> + * at the time of table creation or @distance doesn't make sense, the call
>> + * is ignored.
>> + * This is to allow simplification of specific NUMA config implementations.
>> + */
>> +void __init numa_set_distance(int from, int to, int distance)
>> +{
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> +                     from < 0 || to < 0) {
>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> +                         from, to, distance);
>> +             return;
>> +     }
>> +
>> +     if ((u8)distance != distance ||
>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> +                          from, to, distance);
>> +             return;
>> +     }
>> +
>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> +}
>> +EXPORT_SYMBOL(numa_set_distance);
>> +
>> +int __node_distance(int from, int to)
>> +{
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +     return numa_distance[from * numa_distance_cnt + to];
>> +}
>> +EXPORT_SYMBOL(__node_distance);
>
> Much of this is simply a direct copy/paste from x86. Why can't it be
> moved to common code? I don't see anything arch-specific here.
not same for all arch.
>
>> +static int __init numa_register_nodes(void)
>> +{
>> +     int nid;
>> +     struct memblock_region *mblk;
>> +
>> +     /* Check that valid nid is set to memblks */
>> +     for_each_memblock(memory, mblk)
>> +             if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
>> +                     return -EINVAL;
>> +
>> +     /* Finally register nodes. */
>> +     for_each_node_mask(nid, numa_nodes_parsed) {
>> +             unsigned long start_pfn, end_pfn;
>> +
>> +             get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>> +             setup_node_data(nid, start_pfn, end_pfn);
>> +             node_set_online(nid);
>> +     }
>> +
>> +     /* Setup online nodes to actual nodes*/
>> +     node_possible_map = numa_nodes_parsed;
>> +
>> +     /* Dump memblock with node info and return. */
>> +     memblock_dump_all();
>
> We already call this from arm64_memblock_init. If that's now too early
> to be of any use, we should move it to after bootmem_init, but we should
> probably avoid calling it twice.
sure, will do changes to have called once.
>
>> +     return 0;
>> +}
>> +
>> +static int __init numa_init(int (*init_func)(void))
>> +{
>> +     int ret;
>> +
>> +     nodes_clear(numa_nodes_parsed);
>> +     nodes_clear(node_possible_map);
>> +     nodes_clear(node_online_map);
>> +     numa_reset_distance();
>> +
>> +     ret = init_func();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     if (nodes_empty(numa_nodes_parsed))
>> +             return -EINVAL;
>> +
>> +     ret = numa_register_nodes();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     ret = numa_alloc_distance();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     setup_node_to_cpumask_map();
>> +
>> +     /* init boot processor */
>> +     cpu_to_node_map[0] = 0;
>> +     map_cpu_to_node(0, 0);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>
> Why can't it fail? It looks like the return value is ignored by numa_init.
this function adds all memblocks to node 0. which is unlikely that it will fail.


>
> Will
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-12-17 18:30           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-17 18:30 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks Will for the review.

On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> Hello,
>
> This all looks pretty reasonable, but I'd like to see an Ack from a
> devicetree maintainer on the binding before I merge anything (and I see
> that there are outstanding comments from Rutland on that).
IIUC, there are no open comments for the binding.
Mark Rutland: please let me know, if there any open comments.
otherwise, can you please Ack the binding.
>
> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This patch adds by default the dummy numa node and
>> maps all memory and cpus to node 0.
>> using this patch, numa can be simulated on single node arm64 platforms.
>>
>> Reviewed-by: Robert Richter <rrichter@cavium.com>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
>> ---
>>  arch/arm64/Kconfig              |  25 +++
>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>  arch/arm64/include/asm/numa.h   |  47 +++++
>>  arch/arm64/kernel/setup.c       |   4 +
>>  arch/arm64/kernel/smp.c         |   2 +
>>  arch/arm64/mm/Makefile          |   1 +
>>  arch/arm64/mm/init.c            |  30 +++-
>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 506 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>  create mode 100644 arch/arm64/include/asm/numa.h
>>  create mode 100644 arch/arm64/mm/numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 9ac16a4..7d8fb42 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -71,6 +71,7 @@ config ARM64
>>       select HAVE_GENERIC_DMA_COHERENT
>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>       select HAVE_MEMBLOCK
>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>       select HAVE_PATA_PLATFORM
>>       select HAVE_PERF_EVENTS
>>       select HAVE_PERF_REGS
>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>         can be controlled through /sys/devices/system/cpu.
>>
>> +# Common NUMA Features
>> +config NUMA
>> +     bool "Numa Memory Allocation and Scheduler Support"
>> +     depends on SMP
>> +     help
>> +       Enable NUMA (Non Uniform Memory Access) support.
>> +
>> +       The kernel will try to allocate memory used by a CPU on the
>> +       local memory controller of the CPU and add some more
>> +       NUMA awareness to the kernel.
>
> I appreciate that this is copied from x86, but what exactly do you mean
> by "memory controller" here?
ok, it is fair enough to say local memory.
>
>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> new file mode 100644
>> index 0000000..6ddd468
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mmzone.h
>> @@ -0,0 +1,17 @@
>> +#ifndef __ASM_ARM64_MMZONE_H_
>> +#define __ASM_ARM64_MMZONE_H_
>
> Please try to follow the standard naming for header guards under arm64
> (yes, it's not perfect, but we've made some effort for consistency).
sure, will follow as in other code.
>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#include <linux/mmdebug.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/smp.h>
>> +#include <asm/numa.h>
>> +
>> +extern struct pglist_data *node_data[];
>> +
>> +#define NODE_DATA(nid)               (node_data[(nid)])
>
> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
> and x86. Can we make this the default in the core code instead and then
> replace this header file with asm-generic or something?
IIUC, it is same in most but not in all arch.
>
>> +
>> +#endif /* CONFIG_NUMA */
>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> new file mode 100644
>> index 0000000..c00f3a4
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -0,0 +1,47 @@
>> +#ifndef _ASM_NUMA_H
>> +#define _ASM_NUMA_H
>
> Same comment on the guards.
ok
>
>> +#include <linux/nodemask.h>
>> +#include <asm/topology.h>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>
> This is only used by the ACPI code afaict, so maybe include it when you
> add that?
ok
>
>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>
> Where is this used?
sorry, was used in v6, missed to delete.
>
>> +
>> +/* currently, arm64 implements flat NUMA topology */
>> +#define parent_node(node)    (node)
>> +
>> +extern int __node_distance(int from, int to);
>> +#define node_distance(a, b) __node_distance(a, b)
>> +
>> +/* dummy definitions for pci functions */
>> +#define pcibus_to_node(node) 0
>> +#define cpumask_of_pcibus(bus)       0
>
> There's a bunch of these dummy definitions already available in
> asm-generic/topology.h. Can we use those instead of rolling our own
> please?
>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 17bf39a..8dc9c5d 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -37,6 +37,7 @@
>>
>>  #include <asm/fixmap.h>
>>  #include <asm/memory.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/sizes.h>
>> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>  }
>>
>> +#ifdef CONFIG_NUMA
>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>> +{
>> +     unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>> +
>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>> +     max_zone_pfns[ZONE_NORMAL] = max;
>> +
>> +     free_area_init_nodes(max_zone_pfns);
>> +}
>
> This is certainly more readable then the non-numa zone_sizes_init. Is
> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
> having to handle the zone holds explicitly?
yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
instead of for only numa.
i can experiment to have the zone_sizes_init of numa to non-numa case
also and delete the current zone_sizes_init.

>
> Also, I couldn't find any calls to memblock_add_node, which seem to be
> expected. What am I missing?
memblks are added much before numa in dt or acpi parsing.
memblock_set_node is used to set the node.
>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> new file mode 100644
>> index 0000000..e3afbf8
>> --- /dev/null
>> +++ b/arch/arm64/mm/numa.c
>> @@ -0,0 +1,384 @@
>> +/*
>> + * NUMA support, based on the x86 implementation.
>> + *
>> + * Copyright (C) 2015 Cavium Inc.
>> + * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/bootmem.h>
>> +#include <linux/ctype.h>
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/memblock.h>
>> +#include <linux/module.h>
>> +#include <linux/mmzone.h>
>> +#include <linux/nodemask.h>
>> +#include <linux/sched.h>
>> +#include <linux/string.h>
>> +#include <linux/topology.h>
>> +
>> +#include <asm/smp_plat.h>
>> +
>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>> +EXPORT_SYMBOL(node_data);
>> +nodemask_t numa_nodes_parsed __initdata;
>> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
>> +
>> +static int numa_off;
>> +static int numa_distance_cnt;
>> +static u8 *numa_distance;
>> +
>> +static __init int numa_parse_early_param(char *opt)
>> +{
>> +     if (!opt)
>> +             return -EINVAL;
>> +     if (!strncmp(opt, "off", 3)) {
>> +             pr_info("%s\n", "NUMA turned off");
>> +             numa_off = 1;
>
> There's a patch kicking around to add this to strtobool:
will change once it is in upstream. dont want to have dependency for this.
>
>   https://lkml.org/lkml/2015/12/9/802
>
> but I can't see it in next :(
>
>> +     }
>> +     return 0;
>> +}
>> +early_param("numa", numa_parse_early_param);
>> +
>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +EXPORT_SYMBOL(node_to_cpumask_map);
>> +
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +/*
>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> + */
>> +const struct cpumask *cpumask_of_node(int node)
>> +{
>> +
>> +     if (WARN_ON(node >= nr_node_ids))
>> +             return cpu_none_mask;
>> +
>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> +             return cpu_online_mask;
>> +
>> +     return node_to_cpumask_map[node];
>> +}
>> +EXPORT_SYMBOL(cpumask_of_node);
>> +#endif
>> +
>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> +{
>> +     set_cpu_numa_node(cpu, nid);
>> +     if (nid >= 0)
>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> +}
>> +
>> +static void unmap_cpu_to_node(unsigned int cpu)
>> +{
>> +     int nid = cpu_to_node(cpu);
>> +
>> +     if (nid >= 0)
>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +}
>> +
>> +void numa_clear_node(unsigned int cpu)
>> +{
>> +     unmap_cpu_to_node(cpu);
>> +}
>> +
>> +/*
>> + * Allocate node_to_cpumask_map based on number of available nodes
>> + * Requires node_possible_map to be valid.
>> + *
>> + * Note: cpumask_of_node() is not valid until after this is done.
>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> + */
>> +static void __init setup_node_to_cpumask_map(void)
>> +{
>> +     unsigned int cpu;
>> +     int node;
>> +
>> +     /* setup nr_node_ids if not done yet */
>> +     if (nr_node_ids == MAX_NUMNODES)
>> +             setup_nr_node_ids();
>> +
>> +     /* allocate and clear the mapping */
>> +     for (node = 0; node < nr_node_ids; node++) {
>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> +             cpumask_clear(node_to_cpumask_map[node]);
>> +     }
>> +
>> +     for_each_possible_cpu(cpu)
>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +
>> +     /* cpumask_of_node() will now work */
>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> +}
>> +
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(unsigned int cpu)
>> +{
>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> +}
>> +
>> +/**
>> + * numa_add_memblk - Set node id to memblk
>> + * @nid: NUMA node ID of the new memblk
>> + * @start: Start address of the new memblk
>> + * @size:  Size of the new memblk
>> + *
>> + * RETURNS:
>> + * 0 on success, -errno on failure.
>> + */
>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>> +{
>> +     int ret;
>> +
>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>> +     if (ret < 0) {
>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +             return ret;
>> +     }
>> +
>> +     node_set(nid, numa_nodes_parsed);
>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> +                     start, (start + size - 1), nid);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL(numa_add_memblk);
>> +
>> +/* Initialize NODE_DATA for a node on the local memory */
>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> +{
>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>> +     u64 nd_pa;
>> +     void *nd;
>> +     int tnid;
>> +
>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +                     nid, start_pfn << PAGE_SHIFT,
>> +                     (end_pfn << PAGE_SHIFT) - 1);
>> +
>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +     nd = __va(nd_pa);
>> +
>> +     /* report and initialize */
>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +             nd_pa, nd_pa + nd_size - 1);
>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> +     if (tnid != nid)
>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> +
>> +     node_data[nid] = nd;
>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> +     NODE_DATA(nid)->node_id = nid;
>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>> +}
>> +
>> +/**
>> + * numa_reset_distance - Reset NUMA distance table
>> + *
>> + * The current table is freed.
>> + * The next numa_set_distance() call will create a new one.
>> + */
>> +void __init numa_reset_distance(void)
>> +{
>> +     size_t size;
>> +
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     size = numa_distance_cnt * numa_distance_cnt *
>> +             sizeof(numa_distance[0]);
>> +
>> +     memblock_free(__pa(numa_distance), size);
>> +     numa_distance_cnt = 0;
>> +     numa_distance = NULL;
>> +}
>> +
>> +static int __init numa_alloc_distance(void)
>> +{
>> +     size_t size;
>> +     u64 phys;
>> +     int i, j;
>> +
>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> +                                   size, PAGE_SIZE);
>> +     if (WARN_ON(!phys))
>> +             return -ENOMEM;
>> +
>> +     memblock_reserve(phys, size);
>> +
>> +     numa_distance = __va(phys);
>> +     numa_distance_cnt = nr_node_ids;
>> +
>> +     /* fill with the default distances */
>> +     for (i = 0; i < numa_distance_cnt; i++)
>> +             for (j = 0; j < numa_distance_cnt; j++)
>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +
>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> +                     numa_distance_cnt);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> + * @from: the 'from' node to set distance
>> + * @to: the 'to'  node to set distance
>> + * @distance: NUMA distance
>> + *
>> + * Set the distance from node @from to @to to @distance.  If distance table
>> + * doesn't exist, one which is large enough to accommodate all the currently
>> + * known nodes will be created.
>> + *
>> + * If such table cannot be allocated, a warning is printed and further
>> + * calls are ignored until the distance table is reset with
>> + * numa_reset_distance().
>> + *
>> + * If @from or @to is higher than the highest known node or lower than zero
>> + * at the time of table creation or @distance doesn't make sense, the call
>> + * is ignored.
>> + * This is to allow simplification of specific NUMA config implementations.
>> + */
>> +void __init numa_set_distance(int from, int to, int distance)
>> +{
>> +     if (!numa_distance)
>> +             return;
>> +
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> +                     from < 0 || to < 0) {
>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> +                         from, to, distance);
>> +             return;
>> +     }
>> +
>> +     if ((u8)distance != distance ||
>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> +                          from, to, distance);
>> +             return;
>> +     }
>> +
>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> +}
>> +EXPORT_SYMBOL(numa_set_distance);
>> +
>> +int __node_distance(int from, int to)
>> +{
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +     return numa_distance[from * numa_distance_cnt + to];
>> +}
>> +EXPORT_SYMBOL(__node_distance);
>
> Much of this is simply a direct copy/paste from x86. Why can't it be
> moved to common code? I don't see anything arch-specific here.
not same for all arch.
>
>> +static int __init numa_register_nodes(void)
>> +{
>> +     int nid;
>> +     struct memblock_region *mblk;
>> +
>> +     /* Check that valid nid is set to memblks */
>> +     for_each_memblock(memory, mblk)
>> +             if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
>> +                     return -EINVAL;
>> +
>> +     /* Finally register nodes. */
>> +     for_each_node_mask(nid, numa_nodes_parsed) {
>> +             unsigned long start_pfn, end_pfn;
>> +
>> +             get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>> +             setup_node_data(nid, start_pfn, end_pfn);
>> +             node_set_online(nid);
>> +     }
>> +
>> +     /* Setup online nodes to actual nodes*/
>> +     node_possible_map = numa_nodes_parsed;
>> +
>> +     /* Dump memblock with node info and return. */
>> +     memblock_dump_all();
>
> We already call this from arm64_memblock_init. If that's now too early
> to be of any use, we should move it to after bootmem_init, but we should
> probably avoid calling it twice.
sure, will do changes to have called once.
>
>> +     return 0;
>> +}
>> +
>> +static int __init numa_init(int (*init_func)(void))
>> +{
>> +     int ret;
>> +
>> +     nodes_clear(numa_nodes_parsed);
>> +     nodes_clear(node_possible_map);
>> +     nodes_clear(node_online_map);
>> +     numa_reset_distance();
>> +
>> +     ret = init_func();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     if (nodes_empty(numa_nodes_parsed))
>> +             return -EINVAL;
>> +
>> +     ret = numa_register_nodes();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     ret = numa_alloc_distance();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     setup_node_to_cpumask_map();
>> +
>> +     /* init boot processor */
>> +     cpu_to_node_map[0] = 0;
>> +     map_cpu_to_node(0, 0);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>
> Why can't it fail? It looks like the return value is ignored by numa_init.
this function adds all memblocks to node 0. which is unlikely that it will fail.


>
> Will
thanks
Ganapat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-12-11 14:41         ` Ganapatrao Kulkarni
@ 2015-12-17 19:07             ` Mark Rutland
  -1 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2015-12-17 19:07 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki, Len Brown,
	Marc Zyngier, Robert Richter

Hi,

On Fri, Dec 11, 2015 at 08:11:07PM +0530, Ganapatrao Kulkarni wrote:
> On Fri, Dec 11, 2015 at 7:23 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> > Hi,
> >
> > On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
> >> DT bindings for numa mapping of memory, cores and IOs.
> >>
> >> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> >> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> >
> > Overall this looks good to me. However, I have a couple of concerns.
> thanks.

[...]

> >> +==============================================================================
> >> +2 - numa-node-id
> >> +==============================================================================
> >> +The device node property numa-node-id describes numa domains within a
> >> +machine. This property can be used in device nodes like cpu, memory, bus and
> >> +devices to map to respective numa nodes.
> >> +
> >> +numa-node-id property is a 32-bit integer which defines numa node id to which
> >> +this device node has numa domain association.
> >
> > I'd prefer if the above two paragraphs were replaced with:
> >
> >         For the purpose of identification, each NUMA node is associated
> >         with a unique token known as a node id. For the purpose of this
> >         binding a node id is a 32-bit integer.
> >
> >         A device node is associated with a NUMA node by the presence of
> >         a numa-node-id property which contains the node id of the
> >         device.
> ok, will do.

[...]

> >> +==============================================================================
> >> +3 - distance-map
> >> +==============================================================================
> >> +
> >> +The device tree node distance-map describes the relative
> >> +distance (memory latency) between all numa nodes.
> >
> > Is this not a combined approximation for latency and bandwidth?
> AFAIK, it is to represent inter-node memory access latency.
> >
> >> +- compatible : Should at least contain "numa,distance-map-v1".
> >
> > Please use "numa-distance-map-v1", as "numa" is not a vendor.
> ok
> >
> >> +- distance-matrix
> >> +  This property defines a matrix to describe the relative distances
> >> +  between all numa nodes.
> >> +  It is represented as a list of node pairs and their relative distance.
> >> +
> >> +  Note:
> >> +     1. Each entry represents distance from first node to second node.
> >> +     2. If both directions between 2 nodes have the same distance, only
> >> +            one entry is required.
> >
> > I still don't understand what direction means in this context. Are there
> > systems (of any architecture) which don't have symmetric distances?
> > Which accesses does this apply differently to?
> >
> > Given that, I think that it might be best to explicitly call out
> > distances as being equal, and leave any directionality for a later
> > revision of the binding when we have some semantics for directionality.
> agreed, given that there is no know system to substantiate dual direction,
> let us not explicit about direction.

Regarding your comment in [1], I was expecting a respin of this series
with the above comments addressed. I will not provide an ack until I've
seen that.

Additional concerns below also apply.

> >> +     2. distance-matrix shold have entries in lexicographical ascending order of nodes.
> >> +     3. There must be only one Device node distance-map and must reside in the root node.
> >> +
> >> +Example:
> >> +     4 nodes connected in mesh/ring topology as below,
> >> +
> >> +             0_______20______1
> >> +             |               |
> >> +             |               |
> >> +           20|               |20
> >> +             |               |
> >> +             |               |
> >> +             |_______________|
> >> +             3       20      2
> >> +
> >> +     if relative distance for each hop is 20,
> >> +     then inter node distance would be for this topology will be,
> >> +           0 -> 1 = 20
> >> +           1 -> 2 = 20
> >> +           2 -> 3 = 20
> >> +           3 -> 0 = 20
> >> +           0 -> 2 = 40
> >> +           1 -> 3 = 40
> >
> > How is this scaled relative to a local access?
> this is based on representing local distance with 10 and
> all inter-node latency being represented as multiple of 10.
> 
> >
> > Do we assume that a local access has value 1, e.g. each hop takes 20x a
> > local access in this example?
> The local distance is represented as 10, this is fixed and same as in ACPI.
> Inter-node distance can be any number greater than 10.
> this information can be added here to make it clear.

This seems rather arbitrary.

Why can we not define the local distance in the DT? I appreciate that
the value is hard-coded for ACPI, but we don't have to copy that
limitation.

I'm not sure if asymmetric local distances matter.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394634.html
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
@ 2015-12-17 19:07             ` Mark Rutland
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Rutland @ 2015-12-17 19:07 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Dec 11, 2015 at 08:11:07PM +0530, Ganapatrao Kulkarni wrote:
> On Fri, Dec 11, 2015 at 7:23 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > Hi,
> >
> > On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
> >> DT bindings for numa mapping of memory, cores and IOs.
> >>
> >> Reviewed-by: Robert Richter <rrichter@cavium.com>
> >> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
> >
> > Overall this looks good to me. However, I have a couple of concerns.
> thanks.

[...]

> >> +==============================================================================
> >> +2 - numa-node-id
> >> +==============================================================================
> >> +The device node property numa-node-id describes numa domains within a
> >> +machine. This property can be used in device nodes like cpu, memory, bus and
> >> +devices to map to respective numa nodes.
> >> +
> >> +numa-node-id property is a 32-bit integer which defines numa node id to which
> >> +this device node has numa domain association.
> >
> > I'd prefer if the above two paragraphs were replaced with:
> >
> >         For the purpose of identification, each NUMA node is associated
> >         with a unique token known as a node id. For the purpose of this
> >         binding a node id is a 32-bit integer.
> >
> >         A device node is associated with a NUMA node by the presence of
> >         a numa-node-id property which contains the node id of the
> >         device.
> ok, will do.

[...]

> >> +==============================================================================
> >> +3 - distance-map
> >> +==============================================================================
> >> +
> >> +The device tree node distance-map describes the relative
> >> +distance (memory latency) between all numa nodes.
> >
> > Is this not a combined approximation for latency and bandwidth?
> AFAIK, it is to represent inter-node memory access latency.
> >
> >> +- compatible : Should at least contain "numa,distance-map-v1".
> >
> > Please use "numa-distance-map-v1", as "numa" is not a vendor.
> ok
> >
> >> +- distance-matrix
> >> +  This property defines a matrix to describe the relative distances
> >> +  between all numa nodes.
> >> +  It is represented as a list of node pairs and their relative distance.
> >> +
> >> +  Note:
> >> +     1. Each entry represents distance from first node to second node.
> >> +     2. If both directions between 2 nodes have the same distance, only
> >> +            one entry is required.
> >
> > I still don't understand what direction means in this context. Are there
> > systems (of any architecture) which don't have symmetric distances?
> > Which accesses does this apply differently to?
> >
> > Given that, I think that it might be best to explicitly call out
> > distances as being equal, and leave any directionality for a later
> > revision of the binding when we have some semantics for directionality.
> agreed, given that there is no know system to substantiate dual direction,
> let us not explicit about direction.

Regarding your comment in [1], I was expecting a respin of this series
with the above comments addressed. I will not provide an ack until I've
seen that.

Additional concerns below also apply.

> >> +     2. distance-matrix shold have entries in lexicographical ascending order of nodes.
> >> +     3. There must be only one Device node distance-map and must reside in the root node.
> >> +
> >> +Example:
> >> +     4 nodes connected in mesh/ring topology as below,
> >> +
> >> +             0_______20______1
> >> +             |               |
> >> +             |               |
> >> +           20|               |20
> >> +             |               |
> >> +             |               |
> >> +             |_______________|
> >> +             3       20      2
> >> +
> >> +     if relative distance for each hop is 20,
> >> +     then inter node distance would be for this topology will be,
> >> +           0 -> 1 = 20
> >> +           1 -> 2 = 20
> >> +           2 -> 3 = 20
> >> +           3 -> 0 = 20
> >> +           0 -> 2 = 40
> >> +           1 -> 3 = 40
> >
> > How is this scaled relative to a local access?
> this is based on representing local distance with 10 and
> all inter-node latency being represented as multiple of 10.
> 
> >
> > Do we assume that a local access has value 1, e.g. each hop takes 20x a
> > local access in this example?
> The local distance is represented as 10, this is fixed and same as in ACPI.
> Inter-node distance can be any number greater than 10.
> this information can be added here to make it clear.

This seems rather arbitrary.

Why can we not define the local distance in the DT? I appreciate that
the value is hard-coded for ACPI, but we don't have to copy that
limitation.

I'm not sure if asymmetric local distances matter.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394634.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-12-17 19:07             ` Mark Rutland
@ 2015-12-18  3:10               ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-18  3:10 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki, Len Brown,
	Marc Zyngier, Robert Richter

On Fri, Dec 18, 2015 at 12:37 AM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> Hi,
>
> On Fri, Dec 11, 2015 at 08:11:07PM +0530, Ganapatrao Kulkarni wrote:
>> On Fri, Dec 11, 2015 at 7:23 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
>> > Hi,
>> >
>> > On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
>> >> DT bindings for numa mapping of memory, cores and IOs.
>> >>
>> >> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> >> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> >
>> > Overall this looks good to me. However, I have a couple of concerns.
>> thanks.
>
> [...]
>
>> >> +==============================================================================
>> >> +2 - numa-node-id
>> >> +==============================================================================
>> >> +The device node property numa-node-id describes numa domains within a
>> >> +machine. This property can be used in device nodes like cpu, memory, bus and
>> >> +devices to map to respective numa nodes.
>> >> +
>> >> +numa-node-id property is a 32-bit integer which defines numa node id to which
>> >> +this device node has numa domain association.
>> >
>> > I'd prefer if the above two paragraphs were replaced with:
>> >
>> >         For the purpose of identification, each NUMA node is associated
>> >         with a unique token known as a node id. For the purpose of this
>> >         binding a node id is a 32-bit integer.
>> >
>> >         A device node is associated with a NUMA node by the presence of
>> >         a numa-node-id property which contains the node id of the
>> >         device.
>> ok, will do.
>
> [...]
>
>> >> +==============================================================================
>> >> +3 - distance-map
>> >> +==============================================================================
>> >> +
>> >> +The device tree node distance-map describes the relative
>> >> +distance (memory latency) between all numa nodes.
>> >
>> > Is this not a combined approximation for latency and bandwidth?
>> AFAIK, it is to represent inter-node memory access latency.
>> >
>> >> +- compatible : Should at least contain "numa,distance-map-v1".
>> >
>> > Please use "numa-distance-map-v1", as "numa" is not a vendor.
>> ok
>> >
>> >> +- distance-matrix
>> >> +  This property defines a matrix to describe the relative distances
>> >> +  between all numa nodes.
>> >> +  It is represented as a list of node pairs and their relative distance.
>> >> +
>> >> +  Note:
>> >> +     1. Each entry represents distance from first node to second node.
>> >> +     2. If both directions between 2 nodes have the same distance, only
>> >> +            one entry is required.
>> >
>> > I still don't understand what direction means in this context. Are there
>> > systems (of any architecture) which don't have symmetric distances?
>> > Which accesses does this apply differently to?
>> >
>> > Given that, I think that it might be best to explicitly call out
>> > distances as being equal, and leave any directionality for a later
>> > revision of the binding when we have some semantics for directionality.
>> agreed, given that there is no know system to substantiate dual direction,
>> let us not explicit about direction.
>
> Regarding your comment in [1], I was expecting a respin of this series
> with the above comments addressed. I will not provide an ack until I've
> seen that.
sure, i will respin with the comments addressed.
>
> Additional concerns below also apply.
>
>> >> +     2. distance-matrix shold have entries in lexicographical ascending order of nodes.
>> >> +     3. There must be only one Device node distance-map and must reside in the root node.
>> >> +
>> >> +Example:
>> >> +     4 nodes connected in mesh/ring topology as below,
>> >> +
>> >> +             0_______20______1
>> >> +             |               |
>> >> +             |               |
>> >> +           20|               |20
>> >> +             |               |
>> >> +             |               |
>> >> +             |_______________|
>> >> +             3       20      2
>> >> +
>> >> +     if relative distance for each hop is 20,
>> >> +     then inter node distance would be for this topology will be,
>> >> +           0 -> 1 = 20
>> >> +           1 -> 2 = 20
>> >> +           2 -> 3 = 20
>> >> +           3 -> 0 = 20
>> >> +           0 -> 2 = 40
>> >> +           1 -> 3 = 40
>> >
>> > How is this scaled relative to a local access?
>> this is based on representing local distance with 10 and
>> all inter-node latency being represented as multiple of 10.
>>
>> >
>> > Do we assume that a local access has value 1, e.g. each hop takes 20x a
>> > local access in this example?
>> The local distance is represented as 10, this is fixed and same as in ACPI.
>> Inter-node distance can be any number greater than 10.
>> this information can be added here to make it clear.
>
> This seems rather arbitrary.
>
> Why can we not define the local distance in the DT? I appreciate that
> the value is hard-coded for ACPI, but we don't have to copy that
> limitation.
yes, we can mention local distance.
>
> I'm not sure if asymmetric local distances matter.
>
> Thanks,
> Mark.
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394634.html

thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
@ 2015-12-18  3:10               ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-18  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Dec 18, 2015 at 12:37 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Fri, Dec 11, 2015 at 08:11:07PM +0530, Ganapatrao Kulkarni wrote:
>> On Fri, Dec 11, 2015 at 7:23 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > Hi,
>> >
>> > On Tue, Nov 17, 2015 at 10:50:41PM +0530, Ganapatrao Kulkarni wrote:
>> >> DT bindings for numa mapping of memory, cores and IOs.
>> >>
>> >> Reviewed-by: Robert Richter <rrichter@cavium.com>
>> >> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
>> >
>> > Overall this looks good to me. However, I have a couple of concerns.
>> thanks.
>
> [...]
>
>> >> +==============================================================================
>> >> +2 - numa-node-id
>> >> +==============================================================================
>> >> +The device node property numa-node-id describes numa domains within a
>> >> +machine. This property can be used in device nodes like cpu, memory, bus and
>> >> +devices to map to respective numa nodes.
>> >> +
>> >> +numa-node-id property is a 32-bit integer which defines numa node id to which
>> >> +this device node has numa domain association.
>> >
>> > I'd prefer if the above two paragraphs were replaced with:
>> >
>> >         For the purpose of identification, each NUMA node is associated
>> >         with a unique token known as a node id. For the purpose of this
>> >         binding a node id is a 32-bit integer.
>> >
>> >         A device node is associated with a NUMA node by the presence of
>> >         a numa-node-id property which contains the node id of the
>> >         device.
>> ok, will do.
>
> [...]
>
>> >> +==============================================================================
>> >> +3 - distance-map
>> >> +==============================================================================
>> >> +
>> >> +The device tree node distance-map describes the relative
>> >> +distance (memory latency) between all numa nodes.
>> >
>> > Is this not a combined approximation for latency and bandwidth?
>> AFAIK, it is to represent inter-node memory access latency.
>> >
>> >> +- compatible : Should at least contain "numa,distance-map-v1".
>> >
>> > Please use "numa-distance-map-v1", as "numa" is not a vendor.
>> ok
>> >
>> >> +- distance-matrix
>> >> +  This property defines a matrix to describe the relative distances
>> >> +  between all numa nodes.
>> >> +  It is represented as a list of node pairs and their relative distance.
>> >> +
>> >> +  Note:
>> >> +     1. Each entry represents distance from first node to second node.
>> >> +     2. If both directions between 2 nodes have the same distance, only
>> >> +            one entry is required.
>> >
>> > I still don't understand what direction means in this context. Are there
>> > systems (of any architecture) which don't have symmetric distances?
>> > Which accesses does this apply differently to?
>> >
>> > Given that, I think that it might be best to explicitly call out
>> > distances as being equal, and leave any directionality for a later
>> > revision of the binding when we have some semantics for directionality.
>> agreed, given that there is no know system to substantiate dual direction,
>> let us not explicit about direction.
>
> Regarding your comment in [1], I was expecting a respin of this series
> with the above comments addressed. I will not provide an ack until I've
> seen that.
sure, i will respin with the comments addressed.
>
> Additional concerns below also apply.
>
>> >> +     2. distance-matrix shold have entries in lexicographical ascending order of nodes.
>> >> +     3. There must be only one Device node distance-map and must reside in the root node.
>> >> +
>> >> +Example:
>> >> +     4 nodes connected in mesh/ring topology as below,
>> >> +
>> >> +             0_______20______1
>> >> +             |               |
>> >> +             |               |
>> >> +           20|               |20
>> >> +             |               |
>> >> +             |               |
>> >> +             |_______________|
>> >> +             3       20      2
>> >> +
>> >> +     if relative distance for each hop is 20,
>> >> +     then inter node distance would be for this topology will be,
>> >> +           0 -> 1 = 20
>> >> +           1 -> 2 = 20
>> >> +           2 -> 3 = 20
>> >> +           3 -> 0 = 20
>> >> +           0 -> 2 = 40
>> >> +           1 -> 3 = 40
>> >
>> > How is this scaled relative to a local access?
>> this is based on representing local distance with 10 and
>> all inter-node latency being represented as multiple of 10.
>>
>> >
>> > Do we assume that a local access has value 1, e.g. each hop takes 20x a
>> > local access in this example?
>> The local distance is represented as 10, this is fixed and same as in ACPI.
>> Inter-node distance can be any number greater than 10.
>> this information can be added here to make it clear.
>
> This seems rather arbitrary.
>
> Why can we not define the local distance in the DT? I appreciate that
> the value is hard-coded for ACPI, but we don't have to copy that
> limitation.
yes, we can mention local distance.
>
> I'm not sure if asymmetric local distances matter.
>
> Thanks,
> Mark.
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/394634.html

thanks
Ganapat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-12-17 18:30           ` Ganapatrao Kulkarni
@ 2015-12-22  9:34               ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-22  9:34 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas, Grant Likely,
	Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ard Biesheuvel,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring, Steve Capper,
	Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll, Ian Campbell,
	Kumar Gala, Rafael J. Wysocki, Len Brown, Marc Zyngier,
	Robert Richter, Prasun Kapoor

Hi Will,


On Fri, Dec 18, 2015 at 12:00 AM, Ganapatrao Kulkarni
<gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Thanks Will for the review.
>
> On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> Hello,
>>
>> This all looks pretty reasonable, but I'd like to see an Ack from a
>> devicetree maintainer on the binding before I merge anything (and I see
>> that there are outstanding comments from Rutland on that).
> IIUC, there are no open comments for the binding.
> Mark Rutland: please let me know, if there any open comments.
> otherwise, can you please Ack the binding.
>>
>> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
>>> Adding numa support for arm64 based platforms.
>>> This patch adds by default the dummy numa node and
>>> maps all memory and cpus to node 0.
>>> using this patch, numa can be simulated on single node arm64 platforms.
>>>
>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>> ---
>>>  arch/arm64/Kconfig              |  25 +++
>>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>>  arch/arm64/include/asm/numa.h   |  47 +++++
>>>  arch/arm64/kernel/setup.c       |   4 +
>>>  arch/arm64/kernel/smp.c         |   2 +
>>>  arch/arm64/mm/Makefile          |   1 +
>>>  arch/arm64/mm/init.c            |  30 +++-
>>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>>>  8 files changed, 506 insertions(+), 4 deletions(-)
>>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>>  create mode 100644 arch/arm64/include/asm/numa.h
>>>  create mode 100644 arch/arm64/mm/numa.c
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 9ac16a4..7d8fb42 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -71,6 +71,7 @@ config ARM64
>>>       select HAVE_GENERIC_DMA_COHERENT
>>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>>       select HAVE_MEMBLOCK
>>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>>       select HAVE_PATA_PLATFORM
>>>       select HAVE_PERF_EVENTS
>>>       select HAVE_PERF_REGS
>>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>>         can be controlled through /sys/devices/system/cpu.
>>>
>>> +# Common NUMA Features
>>> +config NUMA
>>> +     bool "Numa Memory Allocation and Scheduler Support"
>>> +     depends on SMP
>>> +     help
>>> +       Enable NUMA (Non Uniform Memory Access) support.
>>> +
>>> +       The kernel will try to allocate memory used by a CPU on the
>>> +       local memory controller of the CPU and add some more
>>> +       NUMA awareness to the kernel.
>>
>> I appreciate that this is copied from x86, but what exactly do you mean
>> by "memory controller" here?
> ok, it is fair enough to say local memory.
>>
>>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>>> new file mode 100644
>>> index 0000000..6ddd468
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/mmzone.h
>>> @@ -0,0 +1,17 @@
>>> +#ifndef __ASM_ARM64_MMZONE_H_
>>> +#define __ASM_ARM64_MMZONE_H_
>>
>> Please try to follow the standard naming for header guards under arm64
>> (yes, it's not perfect, but we've made some effort for consistency).
> sure, will follow as in other code.
>>
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +
>>> +#include <linux/mmdebug.h>
>>> +#include <linux/types.h>
>>> +
>>> +#include <asm/smp.h>
>>> +#include <asm/numa.h>
>>> +
>>> +extern struct pglist_data *node_data[];
>>> +
>>> +#define NODE_DATA(nid)               (node_data[(nid)])
>>
>> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
>> and x86. Can we make this the default in the core code instead and then
>> replace this header file with asm-generic or something?
> IIUC, it is same in most but not in all arch.
>>
>>> +
>>> +#endif /* CONFIG_NUMA */
>>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>>> new file mode 100644
>>> index 0000000..c00f3a4
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/numa.h
>>> @@ -0,0 +1,47 @@
>>> +#ifndef _ASM_NUMA_H
>>> +#define _ASM_NUMA_H
>>
>> Same comment on the guards.
> ok
>>
>>> +#include <linux/nodemask.h>
>>> +#include <asm/topology.h>
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +
>>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>>
>> This is only used by the ACPI code afaict, so maybe include it when you
>> add that?
> ok
>>
>>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>>
>> Where is this used?
> sorry, was used in v6, missed to delete.
>>
>>> +
>>> +/* currently, arm64 implements flat NUMA topology */
>>> +#define parent_node(node)    (node)
>>> +
>>> +extern int __node_distance(int from, int to);
>>> +#define node_distance(a, b) __node_distance(a, b)
>>> +
>>> +/* dummy definitions for pci functions */
>>> +#define pcibus_to_node(node) 0
>>> +#define cpumask_of_pcibus(bus)       0
>>
>> There's a bunch of these dummy definitions already available in
>> asm-generic/topology.h. Can we use those instead of rolling our own
>> please?
these are dummy in this patch, and i will post separate patch for numa-pci.
>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 17bf39a..8dc9c5d 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -37,6 +37,7 @@
>>>
>>>  #include <asm/fixmap.h>
>>>  #include <asm/memory.h>
>>> +#include <asm/numa.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/sizes.h>
>>> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>>  }
>>>
>>> +#ifdef CONFIG_NUMA
>>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>> +{
>>> +     unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>>> +
>>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>>> +     max_zone_pfns[ZONE_NORMAL] = max;
>>> +
>>> +     free_area_init_nodes(max_zone_pfns);
>>> +}
>>
>> This is certainly more readable then the non-numa zone_sizes_init. Is
>> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
>> having to handle the zone holds explicitly?
> yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
> instead of for only numa.
> i can experiment to have the zone_sizes_init of numa to non-numa case
> also and delete the current zone_sizes_init.
if i enable HAVE_MEMBLOCK_NODE_MAP for non-numa case, i see issues in
sparse_init and could be some dependency due to explicit call of
memory_present(note sure) in non-numa.
IMO, this needs to be worked out as a separate patch.
>
>>
>> Also, I couldn't find any calls to memblock_add_node, which seem to be
>> expected. What am I missing?
> memblks are added much before numa in dt or acpi parsing.
> memblock_set_node is used to set the node.
>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> new file mode 100644
>>> index 0000000..e3afbf8
>>> --- /dev/null
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -0,0 +1,384 @@
>>> +/*
>>> + * NUMA support, based on the x86 implementation.
>>> + *
>>> + * Copyright (C) 2015 Cavium Inc.
>>> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include <linux/bootmem.h>
>>> +#include <linux/ctype.h>
>>> +#include <linux/init.h>
>>> +#include <linux/kernel.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/module.h>
>>> +#include <linux/mmzone.h>
>>> +#include <linux/nodemask.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/string.h>
>>> +#include <linux/topology.h>
>>> +
>>> +#include <asm/smp_plat.h>
>>> +
>>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>>> +EXPORT_SYMBOL(node_data);
>>> +nodemask_t numa_nodes_parsed __initdata;
>>> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
>>> +
>>> +static int numa_off;
>>> +static int numa_distance_cnt;
>>> +static u8 *numa_distance;
>>> +
>>> +static __init int numa_parse_early_param(char *opt)
>>> +{
>>> +     if (!opt)
>>> +             return -EINVAL;
>>> +     if (!strncmp(opt, "off", 3)) {
>>> +             pr_info("%s\n", "NUMA turned off");
>>> +             numa_off = 1;
>>
>> There's a patch kicking around to add this to strtobool:
> will change once it is in upstream. dont want to have dependency for this.
>>
>>   https://lkml.org/lkml/2015/12/9/802
>>
>> but I can't see it in next :(
>>
>>> +     }
>>> +     return 0;
>>> +}
>>> +early_param("numa", numa_parse_early_param);
>>> +
>>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>>> +EXPORT_SYMBOL(node_to_cpumask_map);
>>> +
>>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>>> +/*
>>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>>> + */
>>> +const struct cpumask *cpumask_of_node(int node)
>>> +{
>>> +
>>> +     if (WARN_ON(node >= nr_node_ids))
>>> +             return cpu_none_mask;
>>> +
>>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>>> +             return cpu_online_mask;
>>> +
>>> +     return node_to_cpumask_map[node];
>>> +}
>>> +EXPORT_SYMBOL(cpumask_of_node);
>>> +#endif
>>> +
>>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>>> +{
>>> +     set_cpu_numa_node(cpu, nid);
>>> +     if (nid >= 0)
>>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>>> +}
>>> +
>>> +static void unmap_cpu_to_node(unsigned int cpu)
>>> +{
>>> +     int nid = cpu_to_node(cpu);
>>> +
>>> +     if (nid >= 0)
>>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>> +}
>>> +
>>> +void numa_clear_node(unsigned int cpu)
>>> +{
>>> +     unmap_cpu_to_node(cpu);
>>> +}
>>> +
>>> +/*
>>> + * Allocate node_to_cpumask_map based on number of available nodes
>>> + * Requires node_possible_map to be valid.
>>> + *
>>> + * Note: cpumask_of_node() is not valid until after this is done.
>>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>>> + */
>>> +static void __init setup_node_to_cpumask_map(void)
>>> +{
>>> +     unsigned int cpu;
>>> +     int node;
>>> +
>>> +     /* setup nr_node_ids if not done yet */
>>> +     if (nr_node_ids == MAX_NUMNODES)
>>> +             setup_nr_node_ids();
>>> +
>>> +     /* allocate and clear the mapping */
>>> +     for (node = 0; node < nr_node_ids; node++) {
>>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>>> +             cpumask_clear(node_to_cpumask_map[node]);
>>> +     }
>>> +
>>> +     for_each_possible_cpu(cpu)
>>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>> +
>>> +     /* cpumask_of_node() will now work */
>>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>> +}
>>> +
>>> +/*
>>> + *  Set the cpu to node and mem mapping
>>> + */
>>> +void numa_store_cpu_info(unsigned int cpu)
>>> +{
>>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>>> +}
>>> +
>>> +/**
>>> + * numa_add_memblk - Set node id to memblk
>>> + * @nid: NUMA node ID of the new memblk
>>> + * @start: Start address of the new memblk
>>> + * @size:  Size of the new memblk
>>> + *
>>> + * RETURNS:
>>> + * 0 on success, -errno on failure.
>>> + */
>>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>>> +{
>>> +     int ret;
>>> +
>>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>>> +     if (ret < 0) {
>>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>>> +                     start, (start + size - 1), nid);
>>> +             return ret;
>>> +     }
>>> +
>>> +     node_set(nid, numa_nodes_parsed);
>>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>>> +                     start, (start + size - 1), nid);
>>> +     return ret;
>>> +}
>>> +EXPORT_SYMBOL(numa_add_memblk);
>>> +
>>> +/* Initialize NODE_DATA for a node on the local memory */
>>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>> +{
>>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>>> +     u64 nd_pa;
>>> +     void *nd;
>>> +     int tnid;
>>> +
>>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>> +                     nid, start_pfn << PAGE_SHIFT,
>>> +                     (end_pfn << PAGE_SHIFT) - 1);
>>> +
>>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> +     nd = __va(nd_pa);
>>> +
>>> +     /* report and initialize */
>>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>>> +             nd_pa, nd_pa + nd_size - 1);
>>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>>> +     if (tnid != nid)
>>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>>> +
>>> +     node_data[nid] = nd;
>>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>>> +     NODE_DATA(nid)->node_id = nid;
>>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>>> +}
>>> +
>>> +/**
>>> + * numa_reset_distance - Reset NUMA distance table
>>> + *
>>> + * The current table is freed.
>>> + * The next numa_set_distance() call will create a new one.
>>> + */
>>> +void __init numa_reset_distance(void)
>>> +{
>>> +     size_t size;
>>> +
>>> +     if (!numa_distance)
>>> +             return;
>>> +
>>> +     size = numa_distance_cnt * numa_distance_cnt *
>>> +             sizeof(numa_distance[0]);
>>> +
>>> +     memblock_free(__pa(numa_distance), size);
>>> +     numa_distance_cnt = 0;
>>> +     numa_distance = NULL;
>>> +}
>>> +
>>> +static int __init numa_alloc_distance(void)
>>> +{
>>> +     size_t size;
>>> +     u64 phys;
>>> +     int i, j;
>>> +
>>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>>> +                                   size, PAGE_SIZE);
>>> +     if (WARN_ON(!phys))
>>> +             return -ENOMEM;
>>> +
>>> +     memblock_reserve(phys, size);
>>> +
>>> +     numa_distance = __va(phys);
>>> +     numa_distance_cnt = nr_node_ids;
>>> +
>>> +     /* fill with the default distances */
>>> +     for (i = 0; i < numa_distance_cnt; i++)
>>> +             for (j = 0; j < numa_distance_cnt; j++)
>>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> +
>>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>>> +                     numa_distance_cnt);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>>> + * @from: the 'from' node to set distance
>>> + * @to: the 'to'  node to set distance
>>> + * @distance: NUMA distance
>>> + *
>>> + * Set the distance from node @from to @to to @distance.  If distance table
>>> + * doesn't exist, one which is large enough to accommodate all the currently
>>> + * known nodes will be created.
>>> + *
>>> + * If such table cannot be allocated, a warning is printed and further
>>> + * calls are ignored until the distance table is reset with
>>> + * numa_reset_distance().
>>> + *
>>> + * If @from or @to is higher than the highest known node or lower than zero
>>> + * at the time of table creation or @distance doesn't make sense, the call
>>> + * is ignored.
>>> + * This is to allow simplification of specific NUMA config implementations.
>>> + */
>>> +void __init numa_set_distance(int from, int to, int distance)
>>> +{
>>> +     if (!numa_distance)
>>> +             return;
>>> +
>>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>>> +                     from < 0 || to < 0) {
>>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>>> +                         from, to, distance);
>>> +             return;
>>> +     }
>>> +
>>> +     if ((u8)distance != distance ||
>>> +         (from == to && distance != LOCAL_DISTANCE)) {
>>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>>> +                          from, to, distance);
>>> +             return;
>>> +     }
>>> +
>>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>>> +}
>>> +EXPORT_SYMBOL(numa_set_distance);
>>> +
>>> +int __node_distance(int from, int to)
>>> +{
>>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> +     return numa_distance[from * numa_distance_cnt + to];
>>> +}
>>> +EXPORT_SYMBOL(__node_distance);
>>
>> Much of this is simply a direct copy/paste from x86. Why can't it be
>> moved to common code? I don't see anything arch-specific here.
> not same for all arch.
>>
>>> +static int __init numa_register_nodes(void)
>>> +{
>>> +     int nid;
>>> +     struct memblock_region *mblk;
>>> +
>>> +     /* Check that valid nid is set to memblks */
>>> +     for_each_memblock(memory, mblk)
>>> +             if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
>>> +                     return -EINVAL;
>>> +
>>> +     /* Finally register nodes. */
>>> +     for_each_node_mask(nid, numa_nodes_parsed) {
>>> +             unsigned long start_pfn, end_pfn;
>>> +
>>> +             get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>>> +             setup_node_data(nid, start_pfn, end_pfn);
>>> +             node_set_online(nid);
>>> +     }
>>> +
>>> +     /* Setup online nodes to actual nodes*/
>>> +     node_possible_map = numa_nodes_parsed;
>>> +
>>> +     /* Dump memblock with node info and return. */
>>> +     memblock_dump_all();
>>
>> We already call this from arm64_memblock_init. If that's now too early
>> to be of any use, we should move it to after bootmem_init, but we should
>> probably avoid calling it twice.
> sure, will do changes to have called once.
>>
>>> +     return 0;
>>> +}
>>> +
>>> +static int __init numa_init(int (*init_func)(void))
>>> +{
>>> +     int ret;
>>> +
>>> +     nodes_clear(numa_nodes_parsed);
>>> +     nodes_clear(node_possible_map);
>>> +     nodes_clear(node_online_map);
>>> +     numa_reset_distance();
>>> +
>>> +     ret = init_func();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     if (nodes_empty(numa_nodes_parsed))
>>> +             return -EINVAL;
>>> +
>>> +     ret = numa_register_nodes();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     ret = numa_alloc_distance();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     setup_node_to_cpumask_map();
>>> +
>>> +     /* init boot processor */
>>> +     cpu_to_node_map[0] = 0;
>>> +     map_cpu_to_node(0, 0);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * dummy_numa_init - Fallback dummy NUMA init
>>> + *
>>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>>> + * fails, or NUMA is disabled on the command line.
>>> + *
>>> + * Must online at least one node and add memory blocks that cover all
>>> + * allowed memory.  This function must not fail.
>>
>> Why can't it fail? It looks like the return value is ignored by numa_init.
> this function adds all memblocks to node 0. which is unlikely that it will fail.
>
>
>>
>> Will
> thanks
> Ganapat
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-12-22  9:34               ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-22  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,


On Fri, Dec 18, 2015 at 12:00 AM, Ganapatrao Kulkarni
<gpkulkarni@gmail.com> wrote:
> Thanks Will for the review.
>
> On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon@arm.com> wrote:
>> Hello,
>>
>> This all looks pretty reasonable, but I'd like to see an Ack from a
>> devicetree maintainer on the binding before I merge anything (and I see
>> that there are outstanding comments from Rutland on that).
> IIUC, there are no open comments for the binding.
> Mark Rutland: please let me know, if there any open comments.
> otherwise, can you please Ack the binding.
>>
>> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
>>> Adding numa support for arm64 based platforms.
>>> This patch adds by default the dummy numa node and
>>> maps all memory and cpus to node 0.
>>> using this patch, numa can be simulated on single node arm64 platforms.
>>>
>>> Reviewed-by: Robert Richter <rrichter@cavium.com>
>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
>>> ---
>>>  arch/arm64/Kconfig              |  25 +++
>>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>>  arch/arm64/include/asm/numa.h   |  47 +++++
>>>  arch/arm64/kernel/setup.c       |   4 +
>>>  arch/arm64/kernel/smp.c         |   2 +
>>>  arch/arm64/mm/Makefile          |   1 +
>>>  arch/arm64/mm/init.c            |  30 +++-
>>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>>>  8 files changed, 506 insertions(+), 4 deletions(-)
>>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>>  create mode 100644 arch/arm64/include/asm/numa.h
>>>  create mode 100644 arch/arm64/mm/numa.c
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 9ac16a4..7d8fb42 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -71,6 +71,7 @@ config ARM64
>>>       select HAVE_GENERIC_DMA_COHERENT
>>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>>       select HAVE_MEMBLOCK
>>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>>       select HAVE_PATA_PLATFORM
>>>       select HAVE_PERF_EVENTS
>>>       select HAVE_PERF_REGS
>>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>>         can be controlled through /sys/devices/system/cpu.
>>>
>>> +# Common NUMA Features
>>> +config NUMA
>>> +     bool "Numa Memory Allocation and Scheduler Support"
>>> +     depends on SMP
>>> +     help
>>> +       Enable NUMA (Non Uniform Memory Access) support.
>>> +
>>> +       The kernel will try to allocate memory used by a CPU on the
>>> +       local memory controller of the CPU and add some more
>>> +       NUMA awareness to the kernel.
>>
>> I appreciate that this is copied from x86, but what exactly do you mean
>> by "memory controller" here?
> ok, it is fair enough to say local memory.
>>
>>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>>> new file mode 100644
>>> index 0000000..6ddd468
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/mmzone.h
>>> @@ -0,0 +1,17 @@
>>> +#ifndef __ASM_ARM64_MMZONE_H_
>>> +#define __ASM_ARM64_MMZONE_H_
>>
>> Please try to follow the standard naming for header guards under arm64
>> (yes, it's not perfect, but we've made some effort for consistency).
> sure, will follow as in other code.
>>
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +
>>> +#include <linux/mmdebug.h>
>>> +#include <linux/types.h>
>>> +
>>> +#include <asm/smp.h>
>>> +#include <asm/numa.h>
>>> +
>>> +extern struct pglist_data *node_data[];
>>> +
>>> +#define NODE_DATA(nid)               (node_data[(nid)])
>>
>> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
>> and x86. Can we make this the default in the core code instead and then
>> replace this header file with asm-generic or something?
> IIUC, it is same in most but not in all arch.
>>
>>> +
>>> +#endif /* CONFIG_NUMA */
>>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>>> new file mode 100644
>>> index 0000000..c00f3a4
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/numa.h
>>> @@ -0,0 +1,47 @@
>>> +#ifndef _ASM_NUMA_H
>>> +#define _ASM_NUMA_H
>>
>> Same comment on the guards.
> ok
>>
>>> +#include <linux/nodemask.h>
>>> +#include <asm/topology.h>
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +
>>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>>
>> This is only used by the ACPI code afaict, so maybe include it when you
>> add that?
> ok
>>
>>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>>
>> Where is this used?
> sorry, was used in v6, missed to delete.
>>
>>> +
>>> +/* currently, arm64 implements flat NUMA topology */
>>> +#define parent_node(node)    (node)
>>> +
>>> +extern int __node_distance(int from, int to);
>>> +#define node_distance(a, b) __node_distance(a, b)
>>> +
>>> +/* dummy definitions for pci functions */
>>> +#define pcibus_to_node(node) 0
>>> +#define cpumask_of_pcibus(bus)       0
>>
>> There's a bunch of these dummy definitions already available in
>> asm-generic/topology.h. Can we use those instead of rolling our own
>> please?
these are dummy in this patch, and i will post separate patch for numa-pci.
>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 17bf39a..8dc9c5d 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -37,6 +37,7 @@
>>>
>>>  #include <asm/fixmap.h>
>>>  #include <asm/memory.h>
>>> +#include <asm/numa.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/sizes.h>
>>> @@ -77,6 +78,19 @@ static phys_addr_t max_zone_dma_phys(void)
>>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>>  }
>>>
>>> +#ifdef CONFIG_NUMA
>>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>> +{
>>> +     unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>>> +
>>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>>> +     max_zone_pfns[ZONE_NORMAL] = max;
>>> +
>>> +     free_area_init_nodes(max_zone_pfns);
>>> +}
>>
>> This is certainly more readable then the non-numa zone_sizes_init. Is
>> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
>> having to handle the zone holds explicitly?
> yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
> instead of for only numa.
> i can experiment to have the zone_sizes_init of numa to non-numa case
> also and delete the current zone_sizes_init.
if i enable HAVE_MEMBLOCK_NODE_MAP for non-numa case, i see issues in
sparse_init and could be some dependency due to explicit call of
memory_present(note sure) in non-numa.
IMO, this needs to be worked out as a separate patch.
>
>>
>> Also, I couldn't find any calls to memblock_add_node, which seem to be
>> expected. What am I missing?
> memblks are added much before numa in dt or acpi parsing.
> memblock_set_node is used to set the node.
>>
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> new file mode 100644
>>> index 0000000..e3afbf8
>>> --- /dev/null
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -0,0 +1,384 @@
>>> +/*
>>> + * NUMA support, based on the x86 implementation.
>>> + *
>>> + * Copyright (C) 2015 Cavium Inc.
>>> + * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include <linux/bootmem.h>
>>> +#include <linux/ctype.h>
>>> +#include <linux/init.h>
>>> +#include <linux/kernel.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/module.h>
>>> +#include <linux/mmzone.h>
>>> +#include <linux/nodemask.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/string.h>
>>> +#include <linux/topology.h>
>>> +
>>> +#include <asm/smp_plat.h>
>>> +
>>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>>> +EXPORT_SYMBOL(node_data);
>>> +nodemask_t numa_nodes_parsed __initdata;
>>> +int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
>>> +
>>> +static int numa_off;
>>> +static int numa_distance_cnt;
>>> +static u8 *numa_distance;
>>> +
>>> +static __init int numa_parse_early_param(char *opt)
>>> +{
>>> +     if (!opt)
>>> +             return -EINVAL;
>>> +     if (!strncmp(opt, "off", 3)) {
>>> +             pr_info("%s\n", "NUMA turned off");
>>> +             numa_off = 1;
>>
>> There's a patch kicking around to add this to strtobool:
> will change once it is in upstream. dont want to have dependency for this.
>>
>>   https://lkml.org/lkml/2015/12/9/802
>>
>> but I can't see it in next :(
>>
>>> +     }
>>> +     return 0;
>>> +}
>>> +early_param("numa", numa_parse_early_param);
>>> +
>>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>>> +EXPORT_SYMBOL(node_to_cpumask_map);
>>> +
>>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>>> +/*
>>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>>> + */
>>> +const struct cpumask *cpumask_of_node(int node)
>>> +{
>>> +
>>> +     if (WARN_ON(node >= nr_node_ids))
>>> +             return cpu_none_mask;
>>> +
>>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>>> +             return cpu_online_mask;
>>> +
>>> +     return node_to_cpumask_map[node];
>>> +}
>>> +EXPORT_SYMBOL(cpumask_of_node);
>>> +#endif
>>> +
>>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>>> +{
>>> +     set_cpu_numa_node(cpu, nid);
>>> +     if (nid >= 0)
>>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>>> +}
>>> +
>>> +static void unmap_cpu_to_node(unsigned int cpu)
>>> +{
>>> +     int nid = cpu_to_node(cpu);
>>> +
>>> +     if (nid >= 0)
>>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>> +}
>>> +
>>> +void numa_clear_node(unsigned int cpu)
>>> +{
>>> +     unmap_cpu_to_node(cpu);
>>> +}
>>> +
>>> +/*
>>> + * Allocate node_to_cpumask_map based on number of available nodes
>>> + * Requires node_possible_map to be valid.
>>> + *
>>> + * Note: cpumask_of_node() is not valid until after this is done.
>>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>>> + */
>>> +static void __init setup_node_to_cpumask_map(void)
>>> +{
>>> +     unsigned int cpu;
>>> +     int node;
>>> +
>>> +     /* setup nr_node_ids if not done yet */
>>> +     if (nr_node_ids == MAX_NUMNODES)
>>> +             setup_nr_node_ids();
>>> +
>>> +     /* allocate and clear the mapping */
>>> +     for (node = 0; node < nr_node_ids; node++) {
>>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>>> +             cpumask_clear(node_to_cpumask_map[node]);
>>> +     }
>>> +
>>> +     for_each_possible_cpu(cpu)
>>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>> +
>>> +     /* cpumask_of_node() will now work */
>>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>> +}
>>> +
>>> +/*
>>> + *  Set the cpu to node and mem mapping
>>> + */
>>> +void numa_store_cpu_info(unsigned int cpu)
>>> +{
>>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>>> +}
>>> +
>>> +/**
>>> + * numa_add_memblk - Set node id to memblk
>>> + * @nid: NUMA node ID of the new memblk
>>> + * @start: Start address of the new memblk
>>> + * @size:  Size of the new memblk
>>> + *
>>> + * RETURNS:
>>> + * 0 on success, -errno on failure.
>>> + */
>>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>>> +{
>>> +     int ret;
>>> +
>>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>>> +     if (ret < 0) {
>>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>>> +                     start, (start + size - 1), nid);
>>> +             return ret;
>>> +     }
>>> +
>>> +     node_set(nid, numa_nodes_parsed);
>>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>>> +                     start, (start + size - 1), nid);
>>> +     return ret;
>>> +}
>>> +EXPORT_SYMBOL(numa_add_memblk);
>>> +
>>> +/* Initialize NODE_DATA for a node on the local memory */
>>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>> +{
>>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>>> +     u64 nd_pa;
>>> +     void *nd;
>>> +     int tnid;
>>> +
>>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>> +                     nid, start_pfn << PAGE_SHIFT,
>>> +                     (end_pfn << PAGE_SHIFT) - 1);
>>> +
>>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> +     nd = __va(nd_pa);
>>> +
>>> +     /* report and initialize */
>>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>>> +             nd_pa, nd_pa + nd_size - 1);
>>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>>> +     if (tnid != nid)
>>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>>> +
>>> +     node_data[nid] = nd;
>>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>>> +     NODE_DATA(nid)->node_id = nid;
>>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>>> +}
>>> +
>>> +/**
>>> + * numa_reset_distance - Reset NUMA distance table
>>> + *
>>> + * The current table is freed.
>>> + * The next numa_set_distance() call will create a new one.
>>> + */
>>> +void __init numa_reset_distance(void)
>>> +{
>>> +     size_t size;
>>> +
>>> +     if (!numa_distance)
>>> +             return;
>>> +
>>> +     size = numa_distance_cnt * numa_distance_cnt *
>>> +             sizeof(numa_distance[0]);
>>> +
>>> +     memblock_free(__pa(numa_distance), size);
>>> +     numa_distance_cnt = 0;
>>> +     numa_distance = NULL;
>>> +}
>>> +
>>> +static int __init numa_alloc_distance(void)
>>> +{
>>> +     size_t size;
>>> +     u64 phys;
>>> +     int i, j;
>>> +
>>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>>> +                                   size, PAGE_SIZE);
>>> +     if (WARN_ON(!phys))
>>> +             return -ENOMEM;
>>> +
>>> +     memblock_reserve(phys, size);
>>> +
>>> +     numa_distance = __va(phys);
>>> +     numa_distance_cnt = nr_node_ids;
>>> +
>>> +     /* fill with the default distances */
>>> +     for (i = 0; i < numa_distance_cnt; i++)
>>> +             for (j = 0; j < numa_distance_cnt; j++)
>>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> +
>>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>>> +                     numa_distance_cnt);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>>> + * @from: the 'from' node to set distance
>>> + * @to: the 'to'  node to set distance
>>> + * @distance: NUMA distance
>>> + *
>>> + * Set the distance from node @from to @to to @distance.  If distance table
>>> + * doesn't exist, one which is large enough to accommodate all the currently
>>> + * known nodes will be created.
>>> + *
>>> + * If such table cannot be allocated, a warning is printed and further
>>> + * calls are ignored until the distance table is reset with
>>> + * numa_reset_distance().
>>> + *
>>> + * If @from or @to is higher than the highest known node or lower than zero
>>> + * at the time of table creation or @distance doesn't make sense, the call
>>> + * is ignored.
>>> + * This is to allow simplification of specific NUMA config implementations.
>>> + */
>>> +void __init numa_set_distance(int from, int to, int distance)
>>> +{
>>> +     if (!numa_distance)
>>> +             return;
>>> +
>>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>>> +                     from < 0 || to < 0) {
>>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>>> +                         from, to, distance);
>>> +             return;
>>> +     }
>>> +
>>> +     if ((u8)distance != distance ||
>>> +         (from == to && distance != LOCAL_DISTANCE)) {
>>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>>> +                          from, to, distance);
>>> +             return;
>>> +     }
>>> +
>>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>>> +}
>>> +EXPORT_SYMBOL(numa_set_distance);
>>> +
>>> +int __node_distance(int from, int to)
>>> +{
>>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> +     return numa_distance[from * numa_distance_cnt + to];
>>> +}
>>> +EXPORT_SYMBOL(__node_distance);
>>
>> Much of this is simply a direct copy/paste from x86. Why can't it be
>> moved to common code? I don't see anything arch-specific here.
> not same for all arch.
>>
>>> +static int __init numa_register_nodes(void)
>>> +{
>>> +     int nid;
>>> +     struct memblock_region *mblk;
>>> +
>>> +     /* Check that valid nid is set to memblks */
>>> +     for_each_memblock(memory, mblk)
>>> +             if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES)
>>> +                     return -EINVAL;
>>> +
>>> +     /* Finally register nodes. */
>>> +     for_each_node_mask(nid, numa_nodes_parsed) {
>>> +             unsigned long start_pfn, end_pfn;
>>> +
>>> +             get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>>> +             setup_node_data(nid, start_pfn, end_pfn);
>>> +             node_set_online(nid);
>>> +     }
>>> +
>>> +     /* Setup online nodes to actual nodes*/
>>> +     node_possible_map = numa_nodes_parsed;
>>> +
>>> +     /* Dump memblock with node info and return. */
>>> +     memblock_dump_all();
>>
>> We already call this from arm64_memblock_init. If that's now too early
>> to be of any use, we should move it to after bootmem_init, but we should
>> probably avoid calling it twice.
> sure, will do changes to have called once.
>>
>>> +     return 0;
>>> +}
>>> +
>>> +static int __init numa_init(int (*init_func)(void))
>>> +{
>>> +     int ret;
>>> +
>>> +     nodes_clear(numa_nodes_parsed);
>>> +     nodes_clear(node_possible_map);
>>> +     nodes_clear(node_online_map);
>>> +     numa_reset_distance();
>>> +
>>> +     ret = init_func();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     if (nodes_empty(numa_nodes_parsed))
>>> +             return -EINVAL;
>>> +
>>> +     ret = numa_register_nodes();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     ret = numa_alloc_distance();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     setup_node_to_cpumask_map();
>>> +
>>> +     /* init boot processor */
>>> +     cpu_to_node_map[0] = 0;
>>> +     map_cpu_to_node(0, 0);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * dummy_numa_init - Fallback dummy NUMA init
>>> + *
>>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>>> + * fails, or NUMA is disabled on the command line.
>>> + *
>>> + * Must online at least one node and add memory blocks that cover all
>>> + * allowed memory.  This function must not fail.
>>
>> Why can't it fail? It looks like the return value is ignored by numa_init.
> this function adds all memblocks to node 0. which is unlikely that it will fail.
>
>
>>
>> Will
> thanks
> Ganapat
thanks
Ganapat

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-12-22  9:34               ` Ganapatrao Kulkarni
@ 2015-12-22  9:55                   ` Will Deacon
  -1 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2015-12-22  9:55 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Mark Rutland, Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas, Grant Likely,
	Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ard Biesheuvel,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring, Steve Capper,
	Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll, Ian Campbell,
	Kumar Gala, Rafael J. Wysocki, Len Brown, Marc Zyngier,
	Robert Richter

On Tue, Dec 22, 2015 at 03:04:48PM +0530, Ganapatrao Kulkarni wrote:
> On Fri, Dec 18, 2015 at 12:00 AM, Ganapatrao Kulkarni
> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> >> This all looks pretty reasonable, but I'd like to see an Ack from a
> >> devicetree maintainer on the binding before I merge anything (and I see
> >> that there are outstanding comments from Rutland on that).
> > IIUC, there are no open comments for the binding.
> > Mark Rutland: please let me know, if there any open comments.
> > otherwise, can you please Ack the binding.
> >>
> >> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
> >>> Adding numa support for arm64 based platforms.
> >>> This patch adds by default the dummy numa node and
> >>> maps all memory and cpus to node 0.
> >>> using this patch, numa can be simulated on single node arm64 platforms.
> >>>
> >>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> >>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> >>> ---
> >>>  arch/arm64/Kconfig              |  25 +++
> >>>  arch/arm64/include/asm/mmzone.h |  17 ++
> >>>  arch/arm64/include/asm/numa.h   |  47 +++++
> >>>  arch/arm64/kernel/setup.c       |   4 +
> >>>  arch/arm64/kernel/smp.c         |   2 +
> >>>  arch/arm64/mm/Makefile          |   1 +
> >>>  arch/arm64/mm/init.c            |  30 +++-
> >>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
> >>>  8 files changed, 506 insertions(+), 4 deletions(-)
> >>>  create mode 100644 arch/arm64/include/asm/mmzone.h
> >>>  create mode 100644 arch/arm64/include/asm/numa.h
> >>>  create mode 100644 arch/arm64/mm/numa.c
> >>>
> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >>> index 9ac16a4..7d8fb42 100644
> >>> --- a/arch/arm64/Kconfig
> >>> +++ b/arch/arm64/Kconfig
> >>> @@ -71,6 +71,7 @@ config ARM64
> >>>       select HAVE_GENERIC_DMA_COHERENT
> >>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
> >>>       select HAVE_MEMBLOCK
> >>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
> >>>       select HAVE_PATA_PLATFORM
> >>>       select HAVE_PERF_EVENTS
> >>>       select HAVE_PERF_REGS
> >>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
> >>>         Say Y here to experiment with turning CPUs off and on.  CPUs
> >>>         can be controlled through /sys/devices/system/cpu.
> >>>
> >>> +# Common NUMA Features
> >>> +config NUMA
> >>> +     bool "Numa Memory Allocation and Scheduler Support"
> >>> +     depends on SMP
> >>> +     help
> >>> +       Enable NUMA (Non Uniform Memory Access) support.
> >>> +
> >>> +       The kernel will try to allocate memory used by a CPU on the
> >>> +       local memory controller of the CPU and add some more
> >>> +       NUMA awareness to the kernel.
> >>
> >> I appreciate that this is copied from x86, but what exactly do you mean
> >> by "memory controller" here?
> > ok, it is fair enough to say local memory.
> >>
> >>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> >>> new file mode 100644
> >>> index 0000000..6ddd468
> >>> --- /dev/null
> >>> +++ b/arch/arm64/include/asm/mmzone.h
> >>> @@ -0,0 +1,17 @@
> >>> +#ifndef __ASM_ARM64_MMZONE_H_
> >>> +#define __ASM_ARM64_MMZONE_H_
> >>
> >> Please try to follow the standard naming for header guards under arm64
> >> (yes, it's not perfect, but we've made some effort for consistency).
> > sure, will follow as in other code.
> >>
> >>> +
> >>> +#ifdef CONFIG_NUMA
> >>> +
> >>> +#include <linux/mmdebug.h>
> >>> +#include <linux/types.h>
> >>> +
> >>> +#include <asm/smp.h>
> >>> +#include <asm/numa.h>
> >>> +
> >>> +extern struct pglist_data *node_data[];
> >>> +
> >>> +#define NODE_DATA(nid)               (node_data[(nid)])
> >>
> >> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
> >> and x86. Can we make this the default in the core code instead and then
> >> replace this header file with asm-generic or something?
> > IIUC, it is same in most but not in all arch.

Yes, so we should make the core code take on the behaviour that most
architectures want, and then allow the few remaining architectures that
need to do something differently override those macros. We do this all
over the place in the kernel.

> >>
> >>> +
> >>> +#endif /* CONFIG_NUMA */
> >>> +#endif /* __ASM_ARM64_MMZONE_H_ */
> >>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> >>> new file mode 100644
> >>> index 0000000..c00f3a4
> >>> --- /dev/null
> >>> +++ b/arch/arm64/include/asm/numa.h
> >>> @@ -0,0 +1,47 @@
> >>> +#ifndef _ASM_NUMA_H
> >>> +#define _ASM_NUMA_H
> >>
> >> Same comment on the guards.
> > ok
> >>
> >>> +#include <linux/nodemask.h>
> >>> +#include <asm/topology.h>
> >>> +
> >>> +#ifdef CONFIG_NUMA
> >>> +
> >>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
> >>
> >> This is only used by the ACPI code afaict, so maybe include it when you
> >> add that?
> > ok
> >>
> >>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
> >>
> >> Where is this used?
> > sorry, was used in v6, missed to delete.
> >>
> >>> +
> >>> +/* currently, arm64 implements flat NUMA topology */
> >>> +#define parent_node(node)    (node)
> >>> +
> >>> +extern int __node_distance(int from, int to);
> >>> +#define node_distance(a, b) __node_distance(a, b)
> >>> +
> >>> +/* dummy definitions for pci functions */
> >>> +#define pcibus_to_node(node) 0
> >>> +#define cpumask_of_pcibus(bus)       0
> >>
> >> There's a bunch of these dummy definitions already available in
> >> asm-generic/topology.h. Can we use those instead of rolling our own
> >> please?
> these are dummy in this patch, and i will post separate patch for numa-pci.

So you're doing to use the generic definitions?

> >> This is certainly more readable then the non-numa zone_sizes_init. Is
> >> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
> >> having to handle the zone holds explicitly?
> > yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
> > instead of for only numa.
> > i can experiment to have the zone_sizes_init of numa to non-numa case
> > also and delete the current zone_sizes_init.
> if i enable HAVE_MEMBLOCK_NODE_MAP for non-numa case, i see issues in
> sparse_init and could be some dependency due to explicit call of
> memory_present(note sure) in non-numa.
> IMO, this needs to be worked out as a separate patch.

Ok, please can you investigate and send a separate patch then?

> >>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> >>> +/*
> >>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> >>> + */
> >>> +const struct cpumask *cpumask_of_node(int node)
> >>> +{
> >>> +
> >>> +     if (WARN_ON(node >= nr_node_ids))
> >>> +             return cpu_none_mask;
> >>> +
> >>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
> >>> +             return cpu_online_mask;
> >>> +
> >>> +     return node_to_cpumask_map[node];
> >>> +}
> >>> +EXPORT_SYMBOL(cpumask_of_node);
> >>> +#endif
> >>> +
> >>> +static void map_cpu_to_node(unsigned int cpu, int nid)
> >>> +{
> >>> +     set_cpu_numa_node(cpu, nid);
> >>> +     if (nid >= 0)
> >>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> >>> +}
> >>> +
> >>> +static void unmap_cpu_to_node(unsigned int cpu)
> >>> +{
> >>> +     int nid = cpu_to_node(cpu);
> >>> +
> >>> +     if (nid >= 0)
> >>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> >>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
> >>> +}
> >>> +
> >>> +void numa_clear_node(unsigned int cpu)
> >>> +{
> >>> +     unmap_cpu_to_node(cpu);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Allocate node_to_cpumask_map based on number of available nodes
> >>> + * Requires node_possible_map to be valid.
> >>> + *
> >>> + * Note: cpumask_of_node() is not valid until after this is done.
> >>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> >>> + */
> >>> +static void __init setup_node_to_cpumask_map(void)
> >>> +{
> >>> +     unsigned int cpu;
> >>> +     int node;
> >>> +
> >>> +     /* setup nr_node_ids if not done yet */
> >>> +     if (nr_node_ids == MAX_NUMNODES)
> >>> +             setup_nr_node_ids();
> >>> +
> >>> +     /* allocate and clear the mapping */
> >>> +     for (node = 0; node < nr_node_ids; node++) {
> >>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> >>> +             cpumask_clear(node_to_cpumask_map[node]);
> >>> +     }
> >>> +
> >>> +     for_each_possible_cpu(cpu)
> >>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
> >>> +
> >>> +     /* cpumask_of_node() will now work */
> >>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> >>> +}
> >>> +
> >>> +/*
> >>> + *  Set the cpu to node and mem mapping
> >>> + */
> >>> +void numa_store_cpu_info(unsigned int cpu)
> >>> +{
> >>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> >>> +}
> >>> +
> >>> +/**
> >>> + * numa_add_memblk - Set node id to memblk
> >>> + * @nid: NUMA node ID of the new memblk
> >>> + * @start: Start address of the new memblk
> >>> + * @size:  Size of the new memblk
> >>> + *
> >>> + * RETURNS:
> >>> + * 0 on success, -errno on failure.
> >>> + */
> >>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
> >>> +{
> >>> +     int ret;
> >>> +
> >>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
> >>> +     if (ret < 0) {
> >>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> >>> +                     start, (start + size - 1), nid);
> >>> +             return ret;
> >>> +     }
> >>> +
> >>> +     node_set(nid, numa_nodes_parsed);
> >>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> >>> +                     start, (start + size - 1), nid);
> >>> +     return ret;
> >>> +}
> >>> +EXPORT_SYMBOL(numa_add_memblk);
> >>> +
> >>> +/* Initialize NODE_DATA for a node on the local memory */
> >>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> >>> +{
> >>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
> >>> +     u64 nd_pa;
> >>> +     void *nd;
> >>> +     int tnid;
> >>> +
> >>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> >>> +                     nid, start_pfn << PAGE_SHIFT,
> >>> +                     (end_pfn << PAGE_SHIFT) - 1);
> >>> +
> >>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> >>> +     nd = __va(nd_pa);
> >>> +
> >>> +     /* report and initialize */
> >>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> >>> +             nd_pa, nd_pa + nd_size - 1);
> >>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> >>> +     if (tnid != nid)
> >>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> >>> +
> >>> +     node_data[nid] = nd;
> >>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> >>> +     NODE_DATA(nid)->node_id = nid;
> >>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
> >>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
> >>> +}
> >>> +
> >>> +/**
> >>> + * numa_reset_distance - Reset NUMA distance table
> >>> + *
> >>> + * The current table is freed.
> >>> + * The next numa_set_distance() call will create a new one.
> >>> + */
> >>> +void __init numa_reset_distance(void)
> >>> +{
> >>> +     size_t size;
> >>> +
> >>> +     if (!numa_distance)
> >>> +             return;
> >>> +
> >>> +     size = numa_distance_cnt * numa_distance_cnt *
> >>> +             sizeof(numa_distance[0]);
> >>> +
> >>> +     memblock_free(__pa(numa_distance), size);
> >>> +     numa_distance_cnt = 0;
> >>> +     numa_distance = NULL;
> >>> +}
> >>> +
> >>> +static int __init numa_alloc_distance(void)
> >>> +{
> >>> +     size_t size;
> >>> +     u64 phys;
> >>> +     int i, j;
> >>> +
> >>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
> >>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> >>> +                                   size, PAGE_SIZE);
> >>> +     if (WARN_ON(!phys))
> >>> +             return -ENOMEM;
> >>> +
> >>> +     memblock_reserve(phys, size);
> >>> +
> >>> +     numa_distance = __va(phys);
> >>> +     numa_distance_cnt = nr_node_ids;
> >>> +
> >>> +     /* fill with the default distances */
> >>> +     for (i = 0; i < numa_distance_cnt; i++)
> >>> +             for (j = 0; j < numa_distance_cnt; j++)
> >>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
> >>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
> >>> +
> >>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> >>> +                     numa_distance_cnt);
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * numa_set_distance - Set NUMA distance from one NUMA to another
> >>> + * @from: the 'from' node to set distance
> >>> + * @to: the 'to'  node to set distance
> >>> + * @distance: NUMA distance
> >>> + *
> >>> + * Set the distance from node @from to @to to @distance.  If distance table
> >>> + * doesn't exist, one which is large enough to accommodate all the currently
> >>> + * known nodes will be created.
> >>> + *
> >>> + * If such table cannot be allocated, a warning is printed and further
> >>> + * calls are ignored until the distance table is reset with
> >>> + * numa_reset_distance().
> >>> + *
> >>> + * If @from or @to is higher than the highest known node or lower than zero
> >>> + * at the time of table creation or @distance doesn't make sense, the call
> >>> + * is ignored.
> >>> + * This is to allow simplification of specific NUMA config implementations.
> >>> + */
> >>> +void __init numa_set_distance(int from, int to, int distance)
> >>> +{
> >>> +     if (!numa_distance)
> >>> +             return;
> >>> +
> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> >>> +                     from < 0 || to < 0) {
> >>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> >>> +                         from, to, distance);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     if ((u8)distance != distance ||
> >>> +         (from == to && distance != LOCAL_DISTANCE)) {
> >>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> >>> +                          from, to, distance);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     numa_distance[from * numa_distance_cnt + to] = distance;
> >>> +}
> >>> +EXPORT_SYMBOL(numa_set_distance);
> >>> +
> >>> +int __node_distance(int from, int to)
> >>> +{
> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> >>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> >>> +     return numa_distance[from * numa_distance_cnt + to];
> >>> +}
> >>> +EXPORT_SYMBOL(__node_distance);
> >>
> >> Much of this is simply a direct copy/paste from x86. Why can't it be
> >> moved to common code? I don't see anything arch-specific here.
> > not same for all arch.

See my earlier comment. Not having identical code for all architectures
doesn't mean it's not worth having a common portion of that in core code.
It makes it easier to maintain, easier to use and easier to extend.

> >>> +/**
> >>> + * dummy_numa_init - Fallback dummy NUMA init
> >>> + *
> >>> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >>> + * fails, or NUMA is disabled on the command line.
> >>> + *
> >>> + * Must online at least one node and add memory blocks that cover all
> >>> + * allowed memory.  This function must not fail.
> >>
> >> Why can't it fail? It looks like the return value is ignored by numa_init.
> > this function adds all memblocks to node 0. which is unlikely that it will fail.

Good thing it returns an int, then.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-12-22  9:55                   ` Will Deacon
  0 siblings, 0 replies; 38+ messages in thread
From: Will Deacon @ 2015-12-22  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 03:04:48PM +0530, Ganapatrao Kulkarni wrote:
> On Fri, Dec 18, 2015 at 12:00 AM, Ganapatrao Kulkarni
> <gpkulkarni@gmail.com> wrote:
> > On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon@arm.com> wrote:
> >> This all looks pretty reasonable, but I'd like to see an Ack from a
> >> devicetree maintainer on the binding before I merge anything (and I see
> >> that there are outstanding comments from Rutland on that).
> > IIUC, there are no open comments for the binding.
> > Mark Rutland: please let me know, if there any open comments.
> > otherwise, can you please Ack the binding.
> >>
> >> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
> >>> Adding numa support for arm64 based platforms.
> >>> This patch adds by default the dummy numa node and
> >>> maps all memory and cpus to node 0.
> >>> using this patch, numa can be simulated on single node arm64 platforms.
> >>>
> >>> Reviewed-by: Robert Richter <rrichter@cavium.com>
> >>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
> >>> ---
> >>>  arch/arm64/Kconfig              |  25 +++
> >>>  arch/arm64/include/asm/mmzone.h |  17 ++
> >>>  arch/arm64/include/asm/numa.h   |  47 +++++
> >>>  arch/arm64/kernel/setup.c       |   4 +
> >>>  arch/arm64/kernel/smp.c         |   2 +
> >>>  arch/arm64/mm/Makefile          |   1 +
> >>>  arch/arm64/mm/init.c            |  30 +++-
> >>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
> >>>  8 files changed, 506 insertions(+), 4 deletions(-)
> >>>  create mode 100644 arch/arm64/include/asm/mmzone.h
> >>>  create mode 100644 arch/arm64/include/asm/numa.h
> >>>  create mode 100644 arch/arm64/mm/numa.c
> >>>
> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >>> index 9ac16a4..7d8fb42 100644
> >>> --- a/arch/arm64/Kconfig
> >>> +++ b/arch/arm64/Kconfig
> >>> @@ -71,6 +71,7 @@ config ARM64
> >>>       select HAVE_GENERIC_DMA_COHERENT
> >>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
> >>>       select HAVE_MEMBLOCK
> >>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
> >>>       select HAVE_PATA_PLATFORM
> >>>       select HAVE_PERF_EVENTS
> >>>       select HAVE_PERF_REGS
> >>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
> >>>         Say Y here to experiment with turning CPUs off and on.  CPUs
> >>>         can be controlled through /sys/devices/system/cpu.
> >>>
> >>> +# Common NUMA Features
> >>> +config NUMA
> >>> +     bool "Numa Memory Allocation and Scheduler Support"
> >>> +     depends on SMP
> >>> +     help
> >>> +       Enable NUMA (Non Uniform Memory Access) support.
> >>> +
> >>> +       The kernel will try to allocate memory used by a CPU on the
> >>> +       local memory controller of the CPU and add some more
> >>> +       NUMA awareness to the kernel.
> >>
> >> I appreciate that this is copied from x86, but what exactly do you mean
> >> by "memory controller" here?
> > ok, it is fair enough to say local memory.
> >>
> >>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> >>> new file mode 100644
> >>> index 0000000..6ddd468
> >>> --- /dev/null
> >>> +++ b/arch/arm64/include/asm/mmzone.h
> >>> @@ -0,0 +1,17 @@
> >>> +#ifndef __ASM_ARM64_MMZONE_H_
> >>> +#define __ASM_ARM64_MMZONE_H_
> >>
> >> Please try to follow the standard naming for header guards under arm64
> >> (yes, it's not perfect, but we've made some effort for consistency).
> > sure, will follow as in other code.
> >>
> >>> +
> >>> +#ifdef CONFIG_NUMA
> >>> +
> >>> +#include <linux/mmdebug.h>
> >>> +#include <linux/types.h>
> >>> +
> >>> +#include <asm/smp.h>
> >>> +#include <asm/numa.h>
> >>> +
> >>> +extern struct pglist_data *node_data[];
> >>> +
> >>> +#define NODE_DATA(nid)               (node_data[(nid)])
> >>
> >> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
> >> and x86. Can we make this the default in the core code instead and then
> >> replace this header file with asm-generic or something?
> > IIUC, it is same in most but not in all arch.

Yes, so we should make the core code take on the behaviour that most
architectures want, and then allow the few remaining architectures that
need to do something differently override those macros. We do this all
over the place in the kernel.

> >>
> >>> +
> >>> +#endif /* CONFIG_NUMA */
> >>> +#endif /* __ASM_ARM64_MMZONE_H_ */
> >>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> >>> new file mode 100644
> >>> index 0000000..c00f3a4
> >>> --- /dev/null
> >>> +++ b/arch/arm64/include/asm/numa.h
> >>> @@ -0,0 +1,47 @@
> >>> +#ifndef _ASM_NUMA_H
> >>> +#define _ASM_NUMA_H
> >>
> >> Same comment on the guards.
> > ok
> >>
> >>> +#include <linux/nodemask.h>
> >>> +#include <asm/topology.h>
> >>> +
> >>> +#ifdef CONFIG_NUMA
> >>> +
> >>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
> >>
> >> This is only used by the ACPI code afaict, so maybe include it when you
> >> add that?
> > ok
> >>
> >>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
> >>
> >> Where is this used?
> > sorry, was used in v6, missed to delete.
> >>
> >>> +
> >>> +/* currently, arm64 implements flat NUMA topology */
> >>> +#define parent_node(node)    (node)
> >>> +
> >>> +extern int __node_distance(int from, int to);
> >>> +#define node_distance(a, b) __node_distance(a, b)
> >>> +
> >>> +/* dummy definitions for pci functions */
> >>> +#define pcibus_to_node(node) 0
> >>> +#define cpumask_of_pcibus(bus)       0
> >>
> >> There's a bunch of these dummy definitions already available in
> >> asm-generic/topology.h. Can we use those instead of rolling our own
> >> please?
> these are dummy in this patch, and i will post separate patch for numa-pci.

So you're doing to use the generic definitions?

> >> This is certainly more readable then the non-numa zone_sizes_init. Is
> >> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
> >> having to handle the zone holds explicitly?
> > yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
> > instead of for only numa.
> > i can experiment to have the zone_sizes_init of numa to non-numa case
> > also and delete the current zone_sizes_init.
> if i enable HAVE_MEMBLOCK_NODE_MAP for non-numa case, i see issues in
> sparse_init and could be some dependency due to explicit call of
> memory_present(note sure) in non-numa.
> IMO, this needs to be worked out as a separate patch.

Ok, please can you investigate and send a separate patch then?

> >>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> >>> +/*
> >>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> >>> + */
> >>> +const struct cpumask *cpumask_of_node(int node)
> >>> +{
> >>> +
> >>> +     if (WARN_ON(node >= nr_node_ids))
> >>> +             return cpu_none_mask;
> >>> +
> >>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
> >>> +             return cpu_online_mask;
> >>> +
> >>> +     return node_to_cpumask_map[node];
> >>> +}
> >>> +EXPORT_SYMBOL(cpumask_of_node);
> >>> +#endif
> >>> +
> >>> +static void map_cpu_to_node(unsigned int cpu, int nid)
> >>> +{
> >>> +     set_cpu_numa_node(cpu, nid);
> >>> +     if (nid >= 0)
> >>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> >>> +}
> >>> +
> >>> +static void unmap_cpu_to_node(unsigned int cpu)
> >>> +{
> >>> +     int nid = cpu_to_node(cpu);
> >>> +
> >>> +     if (nid >= 0)
> >>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> >>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
> >>> +}
> >>> +
> >>> +void numa_clear_node(unsigned int cpu)
> >>> +{
> >>> +     unmap_cpu_to_node(cpu);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Allocate node_to_cpumask_map based on number of available nodes
> >>> + * Requires node_possible_map to be valid.
> >>> + *
> >>> + * Note: cpumask_of_node() is not valid until after this is done.
> >>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> >>> + */
> >>> +static void __init setup_node_to_cpumask_map(void)
> >>> +{
> >>> +     unsigned int cpu;
> >>> +     int node;
> >>> +
> >>> +     /* setup nr_node_ids if not done yet */
> >>> +     if (nr_node_ids == MAX_NUMNODES)
> >>> +             setup_nr_node_ids();
> >>> +
> >>> +     /* allocate and clear the mapping */
> >>> +     for (node = 0; node < nr_node_ids; node++) {
> >>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> >>> +             cpumask_clear(node_to_cpumask_map[node]);
> >>> +     }
> >>> +
> >>> +     for_each_possible_cpu(cpu)
> >>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
> >>> +
> >>> +     /* cpumask_of_node() will now work */
> >>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> >>> +}
> >>> +
> >>> +/*
> >>> + *  Set the cpu to node and mem mapping
> >>> + */
> >>> +void numa_store_cpu_info(unsigned int cpu)
> >>> +{
> >>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> >>> +}
> >>> +
> >>> +/**
> >>> + * numa_add_memblk - Set node id to memblk
> >>> + * @nid: NUMA node ID of the new memblk
> >>> + * @start: Start address of the new memblk
> >>> + * @size:  Size of the new memblk
> >>> + *
> >>> + * RETURNS:
> >>> + * 0 on success, -errno on failure.
> >>> + */
> >>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
> >>> +{
> >>> +     int ret;
> >>> +
> >>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
> >>> +     if (ret < 0) {
> >>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> >>> +                     start, (start + size - 1), nid);
> >>> +             return ret;
> >>> +     }
> >>> +
> >>> +     node_set(nid, numa_nodes_parsed);
> >>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> >>> +                     start, (start + size - 1), nid);
> >>> +     return ret;
> >>> +}
> >>> +EXPORT_SYMBOL(numa_add_memblk);
> >>> +
> >>> +/* Initialize NODE_DATA for a node on the local memory */
> >>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> >>> +{
> >>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
> >>> +     u64 nd_pa;
> >>> +     void *nd;
> >>> +     int tnid;
> >>> +
> >>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> >>> +                     nid, start_pfn << PAGE_SHIFT,
> >>> +                     (end_pfn << PAGE_SHIFT) - 1);
> >>> +
> >>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> >>> +     nd = __va(nd_pa);
> >>> +
> >>> +     /* report and initialize */
> >>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> >>> +             nd_pa, nd_pa + nd_size - 1);
> >>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> >>> +     if (tnid != nid)
> >>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> >>> +
> >>> +     node_data[nid] = nd;
> >>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> >>> +     NODE_DATA(nid)->node_id = nid;
> >>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
> >>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
> >>> +}
> >>> +
> >>> +/**
> >>> + * numa_reset_distance - Reset NUMA distance table
> >>> + *
> >>> + * The current table is freed.
> >>> + * The next numa_set_distance() call will create a new one.
> >>> + */
> >>> +void __init numa_reset_distance(void)
> >>> +{
> >>> +     size_t size;
> >>> +
> >>> +     if (!numa_distance)
> >>> +             return;
> >>> +
> >>> +     size = numa_distance_cnt * numa_distance_cnt *
> >>> +             sizeof(numa_distance[0]);
> >>> +
> >>> +     memblock_free(__pa(numa_distance), size);
> >>> +     numa_distance_cnt = 0;
> >>> +     numa_distance = NULL;
> >>> +}
> >>> +
> >>> +static int __init numa_alloc_distance(void)
> >>> +{
> >>> +     size_t size;
> >>> +     u64 phys;
> >>> +     int i, j;
> >>> +
> >>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
> >>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> >>> +                                   size, PAGE_SIZE);
> >>> +     if (WARN_ON(!phys))
> >>> +             return -ENOMEM;
> >>> +
> >>> +     memblock_reserve(phys, size);
> >>> +
> >>> +     numa_distance = __va(phys);
> >>> +     numa_distance_cnt = nr_node_ids;
> >>> +
> >>> +     /* fill with the default distances */
> >>> +     for (i = 0; i < numa_distance_cnt; i++)
> >>> +             for (j = 0; j < numa_distance_cnt; j++)
> >>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
> >>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
> >>> +
> >>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> >>> +                     numa_distance_cnt);
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * numa_set_distance - Set NUMA distance from one NUMA to another
> >>> + * @from: the 'from' node to set distance
> >>> + * @to: the 'to'  node to set distance
> >>> + * @distance: NUMA distance
> >>> + *
> >>> + * Set the distance from node @from to @to to @distance.  If distance table
> >>> + * doesn't exist, one which is large enough to accommodate all the currently
> >>> + * known nodes will be created.
> >>> + *
> >>> + * If such table cannot be allocated, a warning is printed and further
> >>> + * calls are ignored until the distance table is reset with
> >>> + * numa_reset_distance().
> >>> + *
> >>> + * If @from or @to is higher than the highest known node or lower than zero
> >>> + * at the time of table creation or @distance doesn't make sense, the call
> >>> + * is ignored.
> >>> + * This is to allow simplification of specific NUMA config implementations.
> >>> + */
> >>> +void __init numa_set_distance(int from, int to, int distance)
> >>> +{
> >>> +     if (!numa_distance)
> >>> +             return;
> >>> +
> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> >>> +                     from < 0 || to < 0) {
> >>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> >>> +                         from, to, distance);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     if ((u8)distance != distance ||
> >>> +         (from == to && distance != LOCAL_DISTANCE)) {
> >>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> >>> +                          from, to, distance);
> >>> +             return;
> >>> +     }
> >>> +
> >>> +     numa_distance[from * numa_distance_cnt + to] = distance;
> >>> +}
> >>> +EXPORT_SYMBOL(numa_set_distance);
> >>> +
> >>> +int __node_distance(int from, int to)
> >>> +{
> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> >>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> >>> +     return numa_distance[from * numa_distance_cnt + to];
> >>> +}
> >>> +EXPORT_SYMBOL(__node_distance);
> >>
> >> Much of this is simply a direct copy/paste from x86. Why can't it be
> >> moved to common code? I don't see anything arch-specific here.
> > not same for all arch.

See my earlier comment. Not having identical code for all architectures
doesn't mean it's not worth having a common portion of that in core code.
It makes it easier to maintain, easier to use and easier to extend.

> >>> +/**
> >>> + * dummy_numa_init - Fallback dummy NUMA init
> >>> + *
> >>> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >>> + * fails, or NUMA is disabled on the command line.
> >>> + *
> >>> + * Must online at least one node and add memory blocks that cover all
> >>> + * allowed memory.  This function must not fail.
> >>
> >> Why can't it fail? It looks like the return value is ignored by numa_init.
> > this function adds all memblocks to node 0. which is unlikely that it will fail.

Good thing it returns an int, then.

Will

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-12-22  9:55                   ` Will Deacon
@ 2015-12-22 13:43                       ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-22 13:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas, Grant Likely,
	Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA, Ard Biesheuvel,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring, Steve Capper,
	Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll, Ian Campbell,
	Kumar Gala, Rafael J. Wysocki, Len Brown, Marc Zyngier,
	Robert Richter

On Tue, Dec 22, 2015 at 3:25 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> On Tue, Dec 22, 2015 at 03:04:48PM +0530, Ganapatrao Kulkarni wrote:
>> On Fri, Dec 18, 2015 at 12:00 AM, Ganapatrao Kulkarni
>> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> >> This all looks pretty reasonable, but I'd like to see an Ack from a
>> >> devicetree maintainer on the binding before I merge anything (and I see
>> >> that there are outstanding comments from Rutland on that).
>> > IIUC, there are no open comments for the binding.
>> > Mark Rutland: please let me know, if there any open comments.
>> > otherwise, can you please Ack the binding.
>> >>
>> >> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
>> >>> Adding numa support for arm64 based platforms.
>> >>> This patch adds by default the dummy numa node and
>> >>> maps all memory and cpus to node 0.
>> >>> using this patch, numa can be simulated on single node arm64 platforms.
>> >>>
>> >>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> >>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> >>> ---
>> >>>  arch/arm64/Kconfig              |  25 +++
>> >>>  arch/arm64/include/asm/mmzone.h |  17 ++
>> >>>  arch/arm64/include/asm/numa.h   |  47 +++++
>> >>>  arch/arm64/kernel/setup.c       |   4 +
>> >>>  arch/arm64/kernel/smp.c         |   2 +
>> >>>  arch/arm64/mm/Makefile          |   1 +
>> >>>  arch/arm64/mm/init.c            |  30 +++-
>> >>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>> >>>  8 files changed, 506 insertions(+), 4 deletions(-)
>> >>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>> >>>  create mode 100644 arch/arm64/include/asm/numa.h
>> >>>  create mode 100644 arch/arm64/mm/numa.c
>> >>>
>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> >>> index 9ac16a4..7d8fb42 100644
>> >>> --- a/arch/arm64/Kconfig
>> >>> +++ b/arch/arm64/Kconfig
>> >>> @@ -71,6 +71,7 @@ config ARM64
>> >>>       select HAVE_GENERIC_DMA_COHERENT
>> >>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>> >>>       select HAVE_MEMBLOCK
>> >>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>> >>>       select HAVE_PATA_PLATFORM
>> >>>       select HAVE_PERF_EVENTS
>> >>>       select HAVE_PERF_REGS
>> >>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>> >>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>> >>>         can be controlled through /sys/devices/system/cpu.
>> >>>
>> >>> +# Common NUMA Features
>> >>> +config NUMA
>> >>> +     bool "Numa Memory Allocation and Scheduler Support"
>> >>> +     depends on SMP
>> >>> +     help
>> >>> +       Enable NUMA (Non Uniform Memory Access) support.
>> >>> +
>> >>> +       The kernel will try to allocate memory used by a CPU on the
>> >>> +       local memory controller of the CPU and add some more
>> >>> +       NUMA awareness to the kernel.
>> >>
>> >> I appreciate that this is copied from x86, but what exactly do you mean
>> >> by "memory controller" here?
>> > ok, it is fair enough to say local memory.
>> >>
>> >>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> >>> new file mode 100644
>> >>> index 0000000..6ddd468
>> >>> --- /dev/null
>> >>> +++ b/arch/arm64/include/asm/mmzone.h
>> >>> @@ -0,0 +1,17 @@
>> >>> +#ifndef __ASM_ARM64_MMZONE_H_
>> >>> +#define __ASM_ARM64_MMZONE_H_
>> >>
>> >> Please try to follow the standard naming for header guards under arm64
>> >> (yes, it's not perfect, but we've made some effort for consistency).
>> > sure, will follow as in other code.
>> >>
>> >>> +
>> >>> +#ifdef CONFIG_NUMA
>> >>> +
>> >>> +#include <linux/mmdebug.h>
>> >>> +#include <linux/types.h>
>> >>> +
>> >>> +#include <asm/smp.h>
>> >>> +#include <asm/numa.h>
>> >>> +
>> >>> +extern struct pglist_data *node_data[];
>> >>> +
>> >>> +#define NODE_DATA(nid)               (node_data[(nid)])
>> >>
>> >> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
>> >> and x86. Can we make this the default in the core code instead and then
>> >> replace this header file with asm-generic or something?
>> > IIUC, it is same in most but not in all arch.
>
> Yes, so we should make the core code take on the behaviour that most
> architectures want, and then allow the few remaining architectures that
> need to do something differently override those macros. We do this all
> over the place in the kernel.
i totally agree with you.
we will do these in subsequent clean up patches, once this series is upstreamed.
please let us not mix in this series.
>
>> >>
>> >>> +
>> >>> +#endif /* CONFIG_NUMA */
>> >>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> >>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> >>> new file mode 100644
>> >>> index 0000000..c00f3a4
>> >>> --- /dev/null
>> >>> +++ b/arch/arm64/include/asm/numa.h
>> >>> @@ -0,0 +1,47 @@
>> >>> +#ifndef _ASM_NUMA_H
>> >>> +#define _ASM_NUMA_H
>> >>
>> >> Same comment on the guards.
>> > ok
>> >>
>> >>> +#include <linux/nodemask.h>
>> >>> +#include <asm/topology.h>
>> >>> +
>> >>> +#ifdef CONFIG_NUMA
>> >>> +
>> >>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>> >>
>> >> This is only used by the ACPI code afaict, so maybe include it when you
>> >> add that?
>> > ok
>> >>
>> >>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>> >>
>> >> Where is this used?
>> > sorry, was used in v6, missed to delete.
>> >>
>> >>> +
>> >>> +/* currently, arm64 implements flat NUMA topology */
>> >>> +#define parent_node(node)    (node)
>> >>> +
>> >>> +extern int __node_distance(int from, int to);
>> >>> +#define node_distance(a, b) __node_distance(a, b)
>> >>> +
>> >>> +/* dummy definitions for pci functions */
>> >>> +#define pcibus_to_node(node) 0
>> >>> +#define cpumask_of_pcibus(bus)       0
>> >>
>> >> There's a bunch of these dummy definitions already available in
>> >> asm-generic/topology.h. Can we use those instead of rolling our own
>> >> please?
>> these are dummy in this patch, and i will post separate patch for numa-pci.
>
> So you're doing to use the generic definitions?
generic are different than what we need.
for pci, i have sent a patch.
>
>> >> This is certainly more readable then the non-numa zone_sizes_init. Is
>> >> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
>> >> having to handle the zone holds explicitly?
>> > yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
>> > instead of for only numa.
>> > i can experiment to have the zone_sizes_init of numa to non-numa case
>> > also and delete the current zone_sizes_init.
>> if i enable HAVE_MEMBLOCK_NODE_MAP for non-numa case, i see issues in
>> sparse_init and could be some dependency due to explicit call of
>> memory_present(note sure) in non-numa.
>> IMO, this needs to be worked out as a separate patch.
>
> Ok, please can you investigate and send a separate patch then?
sure, not in this series please.
>
>> >>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> >>> +/*
>> >>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> >>> + */
>> >>> +const struct cpumask *cpumask_of_node(int node)
>> >>> +{
>> >>> +
>> >>> +     if (WARN_ON(node >= nr_node_ids))
>> >>> +             return cpu_none_mask;
>> >>> +
>> >>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> >>> +             return cpu_online_mask;
>> >>> +
>> >>> +     return node_to_cpumask_map[node];
>> >>> +}
>> >>> +EXPORT_SYMBOL(cpumask_of_node);
>> >>> +#endif
>> >>> +
>> >>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> >>> +{
>> >>> +     set_cpu_numa_node(cpu, nid);
>> >>> +     if (nid >= 0)
>> >>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> >>> +}
>> >>> +
>> >>> +static void unmap_cpu_to_node(unsigned int cpu)
>> >>> +{
>> >>> +     int nid = cpu_to_node(cpu);
>> >>> +
>> >>> +     if (nid >= 0)
>> >>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> >>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> >>> +}
>> >>> +
>> >>> +void numa_clear_node(unsigned int cpu)
>> >>> +{
>> >>> +     unmap_cpu_to_node(cpu);
>> >>> +}
>> >>> +
>> >>> +/*
>> >>> + * Allocate node_to_cpumask_map based on number of available nodes
>> >>> + * Requires node_possible_map to be valid.
>> >>> + *
>> >>> + * Note: cpumask_of_node() is not valid until after this is done.
>> >>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> >>> + */
>> >>> +static void __init setup_node_to_cpumask_map(void)
>> >>> +{
>> >>> +     unsigned int cpu;
>> >>> +     int node;
>> >>> +
>> >>> +     /* setup nr_node_ids if not done yet */
>> >>> +     if (nr_node_ids == MAX_NUMNODES)
>> >>> +             setup_nr_node_ids();
>> >>> +
>> >>> +     /* allocate and clear the mapping */
>> >>> +     for (node = 0; node < nr_node_ids; node++) {
>> >>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> >>> +             cpumask_clear(node_to_cpumask_map[node]);
>> >>> +     }
>> >>> +
>> >>> +     for_each_possible_cpu(cpu)
>> >>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> >>> +
>> >>> +     /* cpumask_of_node() will now work */
>> >>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> >>> +}
>> >>> +
>> >>> +/*
>> >>> + *  Set the cpu to node and mem mapping
>> >>> + */
>> >>> +void numa_store_cpu_info(unsigned int cpu)
>> >>> +{
>> >>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> >>> +}
>> >>> +
>> >>> +/**
>> >>> + * numa_add_memblk - Set node id to memblk
>> >>> + * @nid: NUMA node ID of the new memblk
>> >>> + * @start: Start address of the new memblk
>> >>> + * @size:  Size of the new memblk
>> >>> + *
>> >>> + * RETURNS:
>> >>> + * 0 on success, -errno on failure.
>> >>> + */
>> >>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>> >>> +{
>> >>> +     int ret;
>> >>> +
>> >>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>> >>> +     if (ret < 0) {
>> >>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> >>> +                     start, (start + size - 1), nid);
>> >>> +             return ret;
>> >>> +     }
>> >>> +
>> >>> +     node_set(nid, numa_nodes_parsed);
>> >>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> >>> +                     start, (start + size - 1), nid);
>> >>> +     return ret;
>> >>> +}
>> >>> +EXPORT_SYMBOL(numa_add_memblk);
>> >>> +
>> >>> +/* Initialize NODE_DATA for a node on the local memory */
>> >>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> >>> +{
>> >>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>> >>> +     u64 nd_pa;
>> >>> +     void *nd;
>> >>> +     int tnid;
>> >>> +
>> >>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> >>> +                     nid, start_pfn << PAGE_SHIFT,
>> >>> +                     (end_pfn << PAGE_SHIFT) - 1);
>> >>> +
>> >>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> >>> +     nd = __va(nd_pa);
>> >>> +
>> >>> +     /* report and initialize */
>> >>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> >>> +             nd_pa, nd_pa + nd_size - 1);
>> >>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> >>> +     if (tnid != nid)
>> >>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> >>> +
>> >>> +     node_data[nid] = nd;
>> >>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> >>> +     NODE_DATA(nid)->node_id = nid;
>> >>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>> >>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>> >>> +}
>> >>> +
>> >>> +/**
>> >>> + * numa_reset_distance - Reset NUMA distance table
>> >>> + *
>> >>> + * The current table is freed.
>> >>> + * The next numa_set_distance() call will create a new one.
>> >>> + */
>> >>> +void __init numa_reset_distance(void)
>> >>> +{
>> >>> +     size_t size;
>> >>> +
>> >>> +     if (!numa_distance)
>> >>> +             return;
>> >>> +
>> >>> +     size = numa_distance_cnt * numa_distance_cnt *
>> >>> +             sizeof(numa_distance[0]);
>> >>> +
>> >>> +     memblock_free(__pa(numa_distance), size);
>> >>> +     numa_distance_cnt = 0;
>> >>> +     numa_distance = NULL;
>> >>> +}
>> >>> +
>> >>> +static int __init numa_alloc_distance(void)
>> >>> +{
>> >>> +     size_t size;
>> >>> +     u64 phys;
>> >>> +     int i, j;
>> >>> +
>> >>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>> >>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> >>> +                                   size, PAGE_SIZE);
>> >>> +     if (WARN_ON(!phys))
>> >>> +             return -ENOMEM;
>> >>> +
>> >>> +     memblock_reserve(phys, size);
>> >>> +
>> >>> +     numa_distance = __va(phys);
>> >>> +     numa_distance_cnt = nr_node_ids;
>> >>> +
>> >>> +     /* fill with the default distances */
>> >>> +     for (i = 0; i < numa_distance_cnt; i++)
>> >>> +             for (j = 0; j < numa_distance_cnt; j++)
>> >>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>> >>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> >>> +
>> >>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> >>> +                     numa_distance_cnt);
>> >>> +
>> >>> +     return 0;
>> >>> +}
>> >>> +
>> >>> +/**
>> >>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> >>> + * @from: the 'from' node to set distance
>> >>> + * @to: the 'to'  node to set distance
>> >>> + * @distance: NUMA distance
>> >>> + *
>> >>> + * Set the distance from node @from to @to to @distance.  If distance table
>> >>> + * doesn't exist, one which is large enough to accommodate all the currently
>> >>> + * known nodes will be created.
>> >>> + *
>> >>> + * If such table cannot be allocated, a warning is printed and further
>> >>> + * calls are ignored until the distance table is reset with
>> >>> + * numa_reset_distance().
>> >>> + *
>> >>> + * If @from or @to is higher than the highest known node or lower than zero
>> >>> + * at the time of table creation or @distance doesn't make sense, the call
>> >>> + * is ignored.
>> >>> + * This is to allow simplification of specific NUMA config implementations.
>> >>> + */
>> >>> +void __init numa_set_distance(int from, int to, int distance)
>> >>> +{
>> >>> +     if (!numa_distance)
>> >>> +             return;
>> >>> +
>> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> >>> +                     from < 0 || to < 0) {
>> >>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> >>> +                         from, to, distance);
>> >>> +             return;
>> >>> +     }
>> >>> +
>> >>> +     if ((u8)distance != distance ||
>> >>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> >>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> >>> +                          from, to, distance);
>> >>> +             return;
>> >>> +     }
>> >>> +
>> >>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> >>> +}
>> >>> +EXPORT_SYMBOL(numa_set_distance);
>> >>> +
>> >>> +int __node_distance(int from, int to)
>> >>> +{
>> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> >>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> >>> +     return numa_distance[from * numa_distance_cnt + to];
>> >>> +}
>> >>> +EXPORT_SYMBOL(__node_distance);
>> >>
>> >> Much of this is simply a direct copy/paste from x86. Why can't it be
>> >> moved to common code? I don't see anything arch-specific here.
>> > not same for all arch.
>
> See my earlier comment. Not having identical code for all architectures
> doesn't mean it's not worth having a common portion of that in core code.
> It makes it easier to maintain, easier to use and easier to extend.
>
>> >>> +/**
>> >>> + * dummy_numa_init - Fallback dummy NUMA init
>> >>> + *
>> >>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> >>> + * fails, or NUMA is disabled on the command line.
>> >>> + *
>> >>> + * Must online at least one node and add memory blocks that cover all
>> >>> + * allowed memory.  This function must not fail.
>> >>
>> >> Why can't it fail? It looks like the return value is ignored by numa_init.
no, numa_init is not ignoring return value.
    ret = init_func();
        if (ret < 0)
                return ret;


>> > this function adds all memblocks to node 0. which is unlikely that it will fail.
>
> Good thing it returns an int, then.
>
> Will
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v7 1/4] arm64, numa: adding numa support for arm64 platforms.
@ 2015-12-22 13:43                       ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 38+ messages in thread
From: Ganapatrao Kulkarni @ 2015-12-22 13:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Dec 22, 2015 at 3:25 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Dec 22, 2015 at 03:04:48PM +0530, Ganapatrao Kulkarni wrote:
>> On Fri, Dec 18, 2015 at 12:00 AM, Ganapatrao Kulkarni
>> <gpkulkarni@gmail.com> wrote:
>> > On Thu, Dec 17, 2015 at 10:41 PM, Will Deacon <will.deacon@arm.com> wrote:
>> >> This all looks pretty reasonable, but I'd like to see an Ack from a
>> >> devicetree maintainer on the binding before I merge anything (and I see
>> >> that there are outstanding comments from Rutland on that).
>> > IIUC, there are no open comments for the binding.
>> > Mark Rutland: please let me know, if there any open comments.
>> > otherwise, can you please Ack the binding.
>> >>
>> >> On Tue, Nov 17, 2015 at 10:50:40PM +0530, Ganapatrao Kulkarni wrote:
>> >>> Adding numa support for arm64 based platforms.
>> >>> This patch adds by default the dummy numa node and
>> >>> maps all memory and cpus to node 0.
>> >>> using this patch, numa can be simulated on single node arm64 platforms.
>> >>>
>> >>> Reviewed-by: Robert Richter <rrichter@cavium.com>
>> >>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
>> >>> ---
>> >>>  arch/arm64/Kconfig              |  25 +++
>> >>>  arch/arm64/include/asm/mmzone.h |  17 ++
>> >>>  arch/arm64/include/asm/numa.h   |  47 +++++
>> >>>  arch/arm64/kernel/setup.c       |   4 +
>> >>>  arch/arm64/kernel/smp.c         |   2 +
>> >>>  arch/arm64/mm/Makefile          |   1 +
>> >>>  arch/arm64/mm/init.c            |  30 +++-
>> >>>  arch/arm64/mm/numa.c            | 384 ++++++++++++++++++++++++++++++++++++++++
>> >>>  8 files changed, 506 insertions(+), 4 deletions(-)
>> >>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>> >>>  create mode 100644 arch/arm64/include/asm/numa.h
>> >>>  create mode 100644 arch/arm64/mm/numa.c
>> >>>
>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> >>> index 9ac16a4..7d8fb42 100644
>> >>> --- a/arch/arm64/Kconfig
>> >>> +++ b/arch/arm64/Kconfig
>> >>> @@ -71,6 +71,7 @@ config ARM64
>> >>>       select HAVE_GENERIC_DMA_COHERENT
>> >>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>> >>>       select HAVE_MEMBLOCK
>> >>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>> >>>       select HAVE_PATA_PLATFORM
>> >>>       select HAVE_PERF_EVENTS
>> >>>       select HAVE_PERF_REGS
>> >>> @@ -482,6 +483,30 @@ config HOTPLUG_CPU
>> >>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>> >>>         can be controlled through /sys/devices/system/cpu.
>> >>>
>> >>> +# Common NUMA Features
>> >>> +config NUMA
>> >>> +     bool "Numa Memory Allocation and Scheduler Support"
>> >>> +     depends on SMP
>> >>> +     help
>> >>> +       Enable NUMA (Non Uniform Memory Access) support.
>> >>> +
>> >>> +       The kernel will try to allocate memory used by a CPU on the
>> >>> +       local memory controller of the CPU and add some more
>> >>> +       NUMA awareness to the kernel.
>> >>
>> >> I appreciate that this is copied from x86, but what exactly do you mean
>> >> by "memory controller" here?
>> > ok, it is fair enough to say local memory.
>> >>
>> >>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> >>> new file mode 100644
>> >>> index 0000000..6ddd468
>> >>> --- /dev/null
>> >>> +++ b/arch/arm64/include/asm/mmzone.h
>> >>> @@ -0,0 +1,17 @@
>> >>> +#ifndef __ASM_ARM64_MMZONE_H_
>> >>> +#define __ASM_ARM64_MMZONE_H_
>> >>
>> >> Please try to follow the standard naming for header guards under arm64
>> >> (yes, it's not perfect, but we've made some effort for consistency).
>> > sure, will follow as in other code.
>> >>
>> >>> +
>> >>> +#ifdef CONFIG_NUMA
>> >>> +
>> >>> +#include <linux/mmdebug.h>
>> >>> +#include <linux/types.h>
>> >>> +
>> >>> +#include <asm/smp.h>
>> >>> +#include <asm/numa.h>
>> >>> +
>> >>> +extern struct pglist_data *node_data[];
>> >>> +
>> >>> +#define NODE_DATA(nid)               (node_data[(nid)])
>> >>
>> >> This is the same as m32r, metag, parisc, powerpc, s390, sh, sparc, tile
>> >> and x86. Can we make this the default in the core code instead and then
>> >> replace this header file with asm-generic or something?
>> > IIUC, it is same in most but not in all arch.
>
> Yes, so we should make the core code take on the behaviour that most
> architectures want, and then allow the few remaining architectures that
> need to do something differently override those macros. We do this all
> over the place in the kernel.
i totally agree with you.
we will do these in subsequent clean up patches, once this series is upstreamed.
please let us not mix in this series.
>
>> >>
>> >>> +
>> >>> +#endif /* CONFIG_NUMA */
>> >>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> >>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> >>> new file mode 100644
>> >>> index 0000000..c00f3a4
>> >>> --- /dev/null
>> >>> +++ b/arch/arm64/include/asm/numa.h
>> >>> @@ -0,0 +1,47 @@
>> >>> +#ifndef _ASM_NUMA_H
>> >>> +#define _ASM_NUMA_H
>> >>
>> >> Same comment on the guards.
>> > ok
>> >>
>> >>> +#include <linux/nodemask.h>
>> >>> +#include <asm/topology.h>
>> >>> +
>> >>> +#ifdef CONFIG_NUMA
>> >>> +
>> >>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>> >>
>> >> This is only used by the ACPI code afaict, so maybe include it when you
>> >> add that?
>> > ok
>> >>
>> >>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>> >>
>> >> Where is this used?
>> > sorry, was used in v6, missed to delete.
>> >>
>> >>> +
>> >>> +/* currently, arm64 implements flat NUMA topology */
>> >>> +#define parent_node(node)    (node)
>> >>> +
>> >>> +extern int __node_distance(int from, int to);
>> >>> +#define node_distance(a, b) __node_distance(a, b)
>> >>> +
>> >>> +/* dummy definitions for pci functions */
>> >>> +#define pcibus_to_node(node) 0
>> >>> +#define cpumask_of_pcibus(bus)       0
>> >>
>> >> There's a bunch of these dummy definitions already available in
>> >> asm-generic/topology.h. Can we use those instead of rolling our own
>> >> please?
>> these are dummy in this patch, and i will post separate patch for numa-pci.
>
> So you're doing to use the generic definitions?
generic are different than what we need.
for pci, i have sent a patch.
>
>> >> This is certainly more readable then the non-numa zone_sizes_init. Is
>> >> there a reason we can't always select HAVE_MEMBLOCK_NODE_MAP and avoid
>> >> having to handle the zone holds explicitly?
>> > yes, i can think off to have select HAVE_MEMBLOCK_NODE_MAP always
>> > instead of for only numa.
>> > i can experiment to have the zone_sizes_init of numa to non-numa case
>> > also and delete the current zone_sizes_init.
>> if i enable HAVE_MEMBLOCK_NODE_MAP for non-numa case, i see issues in
>> sparse_init and could be some dependency due to explicit call of
>> memory_present(note sure) in non-numa.
>> IMO, this needs to be worked out as a separate patch.
>
> Ok, please can you investigate and send a separate patch then?
sure, not in this series please.
>
>> >>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> >>> +/*
>> >>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> >>> + */
>> >>> +const struct cpumask *cpumask_of_node(int node)
>> >>> +{
>> >>> +
>> >>> +     if (WARN_ON(node >= nr_node_ids))
>> >>> +             return cpu_none_mask;
>> >>> +
>> >>> +     if (WARN_ON(node_to_cpumask_map[node] == NULL))
>> >>> +             return cpu_online_mask;
>> >>> +
>> >>> +     return node_to_cpumask_map[node];
>> >>> +}
>> >>> +EXPORT_SYMBOL(cpumask_of_node);
>> >>> +#endif
>> >>> +
>> >>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> >>> +{
>> >>> +     set_cpu_numa_node(cpu, nid);
>> >>> +     if (nid >= 0)
>> >>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> >>> +}
>> >>> +
>> >>> +static void unmap_cpu_to_node(unsigned int cpu)
>> >>> +{
>> >>> +     int nid = cpu_to_node(cpu);
>> >>> +
>> >>> +     if (nid >= 0)
>> >>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> >>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> >>> +}
>> >>> +
>> >>> +void numa_clear_node(unsigned int cpu)
>> >>> +{
>> >>> +     unmap_cpu_to_node(cpu);
>> >>> +}
>> >>> +
>> >>> +/*
>> >>> + * Allocate node_to_cpumask_map based on number of available nodes
>> >>> + * Requires node_possible_map to be valid.
>> >>> + *
>> >>> + * Note: cpumask_of_node() is not valid until after this is done.
>> >>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> >>> + */
>> >>> +static void __init setup_node_to_cpumask_map(void)
>> >>> +{
>> >>> +     unsigned int cpu;
>> >>> +     int node;
>> >>> +
>> >>> +     /* setup nr_node_ids if not done yet */
>> >>> +     if (nr_node_ids == MAX_NUMNODES)
>> >>> +             setup_nr_node_ids();
>> >>> +
>> >>> +     /* allocate and clear the mapping */
>> >>> +     for (node = 0; node < nr_node_ids; node++) {
>> >>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> >>> +             cpumask_clear(node_to_cpumask_map[node]);
>> >>> +     }
>> >>> +
>> >>> +     for_each_possible_cpu(cpu)
>> >>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> >>> +
>> >>> +     /* cpumask_of_node() will now work */
>> >>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> >>> +}
>> >>> +
>> >>> +/*
>> >>> + *  Set the cpu to node and mem mapping
>> >>> + */
>> >>> +void numa_store_cpu_info(unsigned int cpu)
>> >>> +{
>> >>> +     map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> >>> +}
>> >>> +
>> >>> +/**
>> >>> + * numa_add_memblk - Set node id to memblk
>> >>> + * @nid: NUMA node ID of the new memblk
>> >>> + * @start: Start address of the new memblk
>> >>> + * @size:  Size of the new memblk
>> >>> + *
>> >>> + * RETURNS:
>> >>> + * 0 on success, -errno on failure.
>> >>> + */
>> >>> +int __init numa_add_memblk(int nid, u64 start, u64 size)
>> >>> +{
>> >>> +     int ret;
>> >>> +
>> >>> +     ret = memblock_set_node(start, size, &memblock.memory, nid);
>> >>> +     if (ret < 0) {
>> >>> +             pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> >>> +                     start, (start + size - 1), nid);
>> >>> +             return ret;
>> >>> +     }
>> >>> +
>> >>> +     node_set(nid, numa_nodes_parsed);
>> >>> +     pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> >>> +                     start, (start + size - 1), nid);
>> >>> +     return ret;
>> >>> +}
>> >>> +EXPORT_SYMBOL(numa_add_memblk);
>> >>> +
>> >>> +/* Initialize NODE_DATA for a node on the local memory */
>> >>> +static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> >>> +{
>> >>> +     const size_t nd_size = roundup(sizeof(pg_data_t), SMP_CACHE_BYTES);
>> >>> +     u64 nd_pa;
>> >>> +     void *nd;
>> >>> +     int tnid;
>> >>> +
>> >>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> >>> +                     nid, start_pfn << PAGE_SHIFT,
>> >>> +                     (end_pfn << PAGE_SHIFT) - 1);
>> >>> +
>> >>> +     nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> >>> +     nd = __va(nd_pa);
>> >>> +
>> >>> +     /* report and initialize */
>> >>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> >>> +             nd_pa, nd_pa + nd_size - 1);
>> >>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> >>> +     if (tnid != nid)
>> >>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> >>> +
>> >>> +     node_data[nid] = nd;
>> >>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> >>> +     NODE_DATA(nid)->node_id = nid;
>> >>> +     NODE_DATA(nid)->node_start_pfn = start_pfn;
>> >>> +     NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn;
>> >>> +}
>> >>> +
>> >>> +/**
>> >>> + * numa_reset_distance - Reset NUMA distance table
>> >>> + *
>> >>> + * The current table is freed.
>> >>> + * The next numa_set_distance() call will create a new one.
>> >>> + */
>> >>> +void __init numa_reset_distance(void)
>> >>> +{
>> >>> +     size_t size;
>> >>> +
>> >>> +     if (!numa_distance)
>> >>> +             return;
>> >>> +
>> >>> +     size = numa_distance_cnt * numa_distance_cnt *
>> >>> +             sizeof(numa_distance[0]);
>> >>> +
>> >>> +     memblock_free(__pa(numa_distance), size);
>> >>> +     numa_distance_cnt = 0;
>> >>> +     numa_distance = NULL;
>> >>> +}
>> >>> +
>> >>> +static int __init numa_alloc_distance(void)
>> >>> +{
>> >>> +     size_t size;
>> >>> +     u64 phys;
>> >>> +     int i, j;
>> >>> +
>> >>> +     size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
>> >>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> >>> +                                   size, PAGE_SIZE);
>> >>> +     if (WARN_ON(!phys))
>> >>> +             return -ENOMEM;
>> >>> +
>> >>> +     memblock_reserve(phys, size);
>> >>> +
>> >>> +     numa_distance = __va(phys);
>> >>> +     numa_distance_cnt = nr_node_ids;
>> >>> +
>> >>> +     /* fill with the default distances */
>> >>> +     for (i = 0; i < numa_distance_cnt; i++)
>> >>> +             for (j = 0; j < numa_distance_cnt; j++)
>> >>> +                     numa_distance[i * numa_distance_cnt + j] = i == j ?
>> >>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> >>> +
>> >>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> >>> +                     numa_distance_cnt);
>> >>> +
>> >>> +     return 0;
>> >>> +}
>> >>> +
>> >>> +/**
>> >>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> >>> + * @from: the 'from' node to set distance
>> >>> + * @to: the 'to'  node to set distance
>> >>> + * @distance: NUMA distance
>> >>> + *
>> >>> + * Set the distance from node @from to @to to @distance.  If distance table
>> >>> + * doesn't exist, one which is large enough to accommodate all the currently
>> >>> + * known nodes will be created.
>> >>> + *
>> >>> + * If such table cannot be allocated, a warning is printed and further
>> >>> + * calls are ignored until the distance table is reset with
>> >>> + * numa_reset_distance().
>> >>> + *
>> >>> + * If @from or @to is higher than the highest known node or lower than zero
>> >>> + * at the time of table creation or @distance doesn't make sense, the call
>> >>> + * is ignored.
>> >>> + * This is to allow simplification of specific NUMA config implementations.
>> >>> + */
>> >>> +void __init numa_set_distance(int from, int to, int distance)
>> >>> +{
>> >>> +     if (!numa_distance)
>> >>> +             return;
>> >>> +
>> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> >>> +                     from < 0 || to < 0) {
>> >>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> >>> +                         from, to, distance);
>> >>> +             return;
>> >>> +     }
>> >>> +
>> >>> +     if ((u8)distance != distance ||
>> >>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> >>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> >>> +                          from, to, distance);
>> >>> +             return;
>> >>> +     }
>> >>> +
>> >>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> >>> +}
>> >>> +EXPORT_SYMBOL(numa_set_distance);
>> >>> +
>> >>> +int __node_distance(int from, int to)
>> >>> +{
>> >>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> >>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> >>> +     return numa_distance[from * numa_distance_cnt + to];
>> >>> +}
>> >>> +EXPORT_SYMBOL(__node_distance);
>> >>
>> >> Much of this is simply a direct copy/paste from x86. Why can't it be
>> >> moved to common code? I don't see anything arch-specific here.
>> > not same for all arch.
>
> See my earlier comment. Not having identical code for all architectures
> doesn't mean it's not worth having a common portion of that in core code.
> It makes it easier to maintain, easier to use and easier to extend.
>
>> >>> +/**
>> >>> + * dummy_numa_init - Fallback dummy NUMA init
>> >>> + *
>> >>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> >>> + * fails, or NUMA is disabled on the command line.
>> >>> + *
>> >>> + * Must online at least one node and add memory blocks that cover all
>> >>> + * allowed memory.  This function must not fail.
>> >>
>> >> Why can't it fail? It looks like the return value is ignored by numa_init.
no, numa_init is not ignoring return value.
    ret = init_func();
        if (ret < 0)
                return ret;


>> > this function adds all memblocks to node 0. which is unlikely that it will fail.
>
> Good thing it returns an int, then.
>
> Will
thanks
Ganapat

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-12-22 13:43 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17 17:20 [PATCH v7 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
2015-11-17 17:20 ` Ganapatrao Kulkarni
2015-11-17 17:20 ` [PATCH v7 1/4] arm64, numa: adding " Ganapatrao Kulkarni
2015-11-17 17:20   ` Ganapatrao Kulkarni
     [not found]   ` <1447780843-9223-2-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-11-27  8:00     ` Shannon Zhao
2015-11-27  8:00       ` Shannon Zhao
     [not found]       ` <56580D80.2050806-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-12-01  8:45         ` Ganapatrao Kulkarni
2015-12-01  8:45           ` Ganapatrao Kulkarni
2015-12-17 17:11     ` Will Deacon
2015-12-17 17:11       ` Will Deacon
     [not found]       ` <20151217171131.GC24108-5wv7dgnIgG8@public.gmane.org>
2015-12-17 18:30         ` Ganapatrao Kulkarni
2015-12-17 18:30           ` Ganapatrao Kulkarni
     [not found]           ` <CAFpQJXW0Ac4-3aQLZ_Pw_uG65F-EQmBYk4p-ntUu5tLey2hARA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-22  9:34             ` Ganapatrao Kulkarni
2015-12-22  9:34               ` Ganapatrao Kulkarni
     [not found]               ` <CAFpQJXUoSojdOuZPFEuD+T2DdEv_t3y68osXT8Zja3xG47qVsA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-22  9:55                 ` Will Deacon
2015-12-22  9:55                   ` Will Deacon
     [not found]                   ` <20151222095529.GB32623-5wv7dgnIgG8@public.gmane.org>
2015-12-22 13:43                     ` Ganapatrao Kulkarni
2015-12-22 13:43                       ` Ganapatrao Kulkarni
2015-11-17 17:20 ` [PATCH v7 2/4] Documentation, dt, arm64/arm: dt bindings for numa Ganapatrao Kulkarni
2015-11-17 17:20   ` Ganapatrao Kulkarni
     [not found]   ` <1447780843-9223-3-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-12-11 13:53     ` Mark Rutland
2015-12-11 13:53       ` Mark Rutland
2015-12-11 14:41       ` Ganapatrao Kulkarni
2015-12-11 14:41         ` Ganapatrao Kulkarni
     [not found]         ` <CAFpQJXXopH4_GjE=dX0+NPcfwzRgErEFVMkGd57K+4=YZPDVsw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-17 19:07           ` Mark Rutland
2015-12-17 19:07             ` Mark Rutland
2015-12-18  3:10             ` Ganapatrao Kulkarni
2015-12-18  3:10               ` Ganapatrao Kulkarni
2015-11-17 17:20 ` [PATCH v7 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms Ganapatrao Kulkarni
2015-11-17 17:20   ` Ganapatrao Kulkarni
     [not found]   ` <1447780843-9223-4-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-11-28  9:30     ` Shannon Zhao
2015-11-28  9:30       ` Shannon Zhao
     [not found]       ` <5659741F.9090606-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-12-01  8:43         ` Ganapatrao Kulkarni
2015-12-01  8:43           ` Ganapatrao Kulkarni
     [not found] ` <1447780843-9223-1-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-11-17 17:20   ` [PATCH v7 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology Ganapatrao Kulkarni
2015-11-17 17:20     ` Ganapatrao Kulkarni
2015-12-02 11:19   ` [PATCH v7 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
2015-12-02 11:19     ` Ganapatrao Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.