devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] arm64, numa: Add numa support for arm64 platforms
@ 2015-10-20 10:45 Ganapatrao Kulkarni
  2015-10-20 10:45 ` [PATCH v6 1/4] arm64, numa: adding " Ganapatrao Kulkarni
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-20 10:45 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

v6:
	- defined and implemented the numa dt binding using
	node property proximity and device node distance-map.
	- renamed dt_numa to of_numa

v5:
        - created base verion of numa.c which creates dummy numa without using dt
          on single socket platforms. Then added patches for dt support.
        - Incorporated review comments from Hanjun Guo.

v4:
done changes as per Arnd review comments.

v3:
Added changes to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) platform.
In this patchset, defined and implemented dt bindings for numa mapping
for core and memory using device node property arm,associativity.

v2:
Defined and implemented numa map for memory, cores to node and
proximity distance matrix of nodes.

v1:
Initial patchset to support numa on arm64 platforms.

Note:
        1. This patchset is tested for numa with dt on
           thunderx single socket and dual socket boards.
        2. Numa DT booting needs the dt memory nodes, which are deleted in current efi-stub,
        hence to try numa with dt, you need to rebase with ard's patchset.
        http://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-uefi-early-fdt-handling

Ganapatrao Kulkarni (4):
  arm64, numa: adding numa support for arm64 platforms.
  Documentation, dt, arm64/arm: dt bindings for numa.
  arm64/arm, numa, dt: adding numa dt binding implementation for arm64
    platforms
  arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
    topology.

 Documentation/devicetree/bindings/arm/numa.txt  | 275 ++++++++
 arch/arm64/Kconfig                              |  35 +
 arch/arm64/boot/dts/cavium/Makefile             |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  82 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
 arch/arm64/include/asm/mmzone.h                 |  17 +
 arch/arm64/include/asm/numa.h                   |  73 +++
 arch/arm64/kernel/Makefile                      |   1 +
 arch/arm64/kernel/of_numa.c                     | 221 +++++++
 arch/arm64/kernel/setup.c                       |   9 +
 arch/arm64/kernel/smp.c                         |   3 +
 arch/arm64/mm/Makefile                          |   1 +
 arch/arm64/mm/init.c                            |  31 +-
 arch/arm64/mm/numa.c                            | 539 ++++++++++++++++
 14 files changed, 2090 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/kernel/of_numa.c
 create mode 100644 arch/arm64/mm/numa.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v6 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-10-20 10:45 [PATCH v6 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
@ 2015-10-20 10:45 ` Ganapatrao Kulkarni
       [not found]   ` <1445337931-11344-2-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
  2015-10-20 10:45 ` [PATCH v6 2/4] Documentation, dt, arm64/arm: dt bindings for numa Ganapatrao Kulkarni
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-20 10:45 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

Adding numa support for arm64 based platforms.
This patch adds by default the dummy numa node and
maps all memory and cpus to node 0.
using this patch, numa can be simulated on single node arm64 platforms.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig              |  25 ++
 arch/arm64/include/asm/mmzone.h |  17 ++
 arch/arm64/include/asm/numa.h   |  63 +++++
 arch/arm64/kernel/setup.c       |   9 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  31 ++-
 arch/arm64/mm/numa.c            | 531 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 675 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7d95663..0f9cdc7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -68,6 +68,7 @@ config ARM64
 	select HAVE_GENERIC_DMA_COHERENT
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
 	select HAVE_MEMBLOCK
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_REGS
@@ -414,6 +415,30 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	help
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	help
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
 source kernel/Kconfig.preempt
 
 config HZ
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..6ddd468
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,17 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <linux/types.h>
+
+#include <asm/smp.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..cadbd24
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,63 @@
+#ifndef _ASM_NUMA_H
+#define _ASM_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/* currently, arm64 implements flat NUMA topology */
+#define parent_node(node)	(node)
+
+extern int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+struct __node_cpu_hwid {
+	int node_id;    /* logical node containing this CPU */
+	u64 cpu_hwid;   /* MPIDR for this CPU */
+};
+
+struct numa_memblk {
+	u64 start;
+	u64 end;
+	int nid;
+};
+
+struct numa_meminfo {
+	int nr_blks;
+	struct numa_memblk blk[NR_NODE_MEMBLKS];
+};
+
+extern struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
+extern nodemask_t numa_nodes_parsed __initdata;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+extern void numa_clear_node(unsigned int cpu);
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+extern const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+	return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_reset_distance(void);
+void numa_store_cpu_info(unsigned int cpu);
+#else	/* CONFIG_NUMA */
+static inline void numa_store_cpu_info(unsigned int cpu)		{ }
+static inline void arm64_numa_init(void)		{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index a2794da..4f3623d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -54,6 +54,7 @@
 #include <asm/elf.h>
 #include <asm/cpufeature.h>
 #include <asm/cpu_ops.h>
+#include <asm/numa.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/smp_plat.h>
@@ -485,6 +486,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
@@ -557,7 +561,12 @@ static int c_show(struct seq_file *m, void *v)
 		 * online processors, looking for lines beginning with
 		 * "processor".  Give glibc what it expects.
 		 */
+#ifdef CONFIG_NUMA
+		seq_printf(m, "processor\t: %d", i);
+		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
+#else
 		seq_printf(m, "processor\t: %d\n", i);
+#endif
 
 		/*
 		 * Dump out the common processor features in a single line.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dbdaacd..985ee04 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -45,6 +45,7 @@
 #include <asm/cputype.h>
 #include <asm/cpu_ops.h>
 #include <asm/mmu_context.h>
+#include <asm/numa.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/processor.h>
@@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 773d37a..bb92d41 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -4,3 +4,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   context.o proc.o pageattr.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
 obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
+obj-$(CONFIG_NUMA)		+= numa.o
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 697a6d0..81a0316 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -37,6 +37,7 @@
 
 #include <asm/fixmap.h>
 #include <asm/memory.h>
+#include <asm/numa.h>
 #include <asm/sections.h>
 #include <asm/setup.h>
 #include <asm/sizes.h>
@@ -77,6 +78,20 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES];
+
+	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -115,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -132,10 +148,15 @@ static void arm64_memory_present(void)
 static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
+	int nid = 0;
 
-	for_each_memblock(memory, reg)
-		memory_present(0, memblock_region_memory_base_pfn(reg),
-			       memblock_region_memory_end_pfn(reg));
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		nid = reg->nid;
+#endif
+		memory_present(nid, memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+	}
 }
 #endif
 
@@ -192,6 +213,9 @@ void __init bootmem_init(void)
 
 	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
 
+	max_pfn = max_low_pfn = max;
+
+	arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -202,7 +226,6 @@ void __init bootmem_init(void)
 	zone_sizes_init(min, max);
 
 	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..4dd7436
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,531 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/topology.h>
+#include <linux/mmzone.h>
+
+#include <asm/smp_plat.h>
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+nodemask_t numa_nodes_parsed __initdata;
+struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
+
+static int numa_off;
+static int numa_distance_cnt;
+static u8 *numa_distance;
+static struct numa_meminfo numa_meminfo;
+
+static __init int numa_parse_early_param(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3)) {
+		pr_info("%s\n", "NUMA turned off");
+		numa_off = 1;
+	}
+	return 0;
+}
+early_param("numa", numa_parse_early_param);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+	if (node >= nr_node_ids) {
+		pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_none_mask;
+	}
+	if (node_to_cpumask_map[node] == NULL) {
+		pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return cpu_online_mask;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+#endif
+
+static void map_cpu_to_node(unsigned int cpu, int nid)
+{
+	set_cpu_numa_node(cpu, nid);
+	if (nid >= 0)
+		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
+}
+
+static void unmap_cpu_to_node(unsigned int cpu)
+{
+	int nid = cpu_to_node(cpu);
+
+	if (nid >= 0)
+		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
+	set_cpu_numa_node(cpu, NUMA_NO_NODE);
+}
+
+void numa_clear_node(unsigned int cpu)
+{
+	unmap_cpu_to_node(cpu);
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+static void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int cpu;
+	int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate the map */
+	for (node = 0; node < nr_node_ids; node++)
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+
+	/* Clear the mapping */
+	for (node = 0; node < nr_node_ids; node++)
+		cpumask_clear(node_to_cpumask_map[node]);
+
+	for_each_possible_cpu(cpu)
+		set_cpu_numa_node(cpu, NUMA_NO_NODE);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(unsigned int cpu)
+{
+	map_cpu_to_node(cpu, numa_off ? 0 : node_cpu_hwid[cpu].node_id);
+}
+
+/**
+ * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
+ */
+
+static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     struct numa_meminfo *mi)
+{
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
+
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+				nid, start, end - 1);
+		return 0;
+	}
+
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("NUMA: too many memblk ranges\n");
+		return -EINVAL;
+	}
+
+	pr_info("NUMA: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
+			mi->nr_blks, start, end, nid);
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
+	return 0;
+}
+
+/**
+ * numa_add_memblk - Add one numa_memblk to numa_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the default numa_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+#define MAX_PHYS_ADDR	((phys_addr_t)~0)
+
+int __init numa_add_memblk(int nid, u64 base, u64 end)
+{
+	const u64 phys_offset = __pa(PAGE_OFFSET);
+
+	base &= PAGE_MASK;
+	end &= PAGE_MASK;
+
+	if (base > MAX_PHYS_ADDR) {
+		pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
+				base, base + end);
+		return -ENOMEM;
+	}
+
+	if (base + end > MAX_PHYS_ADDR) {
+		pr_info("NUMA: Ignoring memory range 0x%lx - 0x%llx\n",
+				ULONG_MAX, base + end);
+		end = MAX_PHYS_ADDR - base;
+	}
+
+	if (base + end < phys_offset) {
+		pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
+			   base, base + end);
+		return -ENOMEM;
+	}
+	if (base < phys_offset) {
+		pr_info("NUMA: Ignoring memory range 0x%llx - 0x%llx\n",
+			   base, phys_offset);
+		end -= phys_offset - base;
+		base = phys_offset;
+	}
+
+	return numa_add_memblk_to(nid, base, base + end, &numa_meminfo);
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start, u64 end)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	start = roundup(start, ZONE_ALIGN);
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+	       nid, start, end - 1);
+
+	/*
+	 * Allocate node data.  Try node-local memory and then any node.
+	 */
+	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	if (!nd_pa) {
+		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
+					      MEMBLOCK_ALLOC_ACCESSIBLE);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in node %d\n",
+			       nd_size, nid);
+			return;
+		}
+	}
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
+}
+
+/*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
+{
+	u64 numaram, totalram;
+	int i;
+
+	numaram = 0;
+	for (i = 0; i < mi->nr_blks; i++) {
+		u64 s = mi->blk[i].start >> PAGE_SHIFT;
+		u64 e = mi->blk[i].end >> PAGE_SHIFT;
+
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		if ((s64)numaram < 0)
+			numaram = 0;
+	}
+
+	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
+
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		pr_err("NUMA: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (totalram << PAGE_SHIFT) >> 20);
+		return false;
+	}
+	return true;
+}
+
+/**
+ * numa_reset_distance - Reset NUMA distance table
+ *
+ * The current table is freed.
+ * The next numa_set_distance() call will create a new one.
+ */
+void __init numa_reset_distance(void)
+{
+	size_t size = numa_distance_cnt * numa_distance_cnt *
+		sizeof(numa_distance[0]);
+
+	/* numa_distance could be 1LU marking allocation failure, test cnt */
+	if (numa_distance_cnt)
+		memblock_free(__pa(numa_distance), size);
+	numa_distance_cnt = 0;
+	numa_distance = NULL;	/* enable table creation */
+}
+
+static int __init numa_alloc_distance(void)
+{
+	nodemask_t nodes_parsed;
+	size_t size;
+	int i, j, cnt = 0;
+	u64 phys;
+
+	/* size the new table and allocate it */
+	nodes_parsed = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
+
+	for_each_node_mask(i, nodes_parsed)
+		cnt = i;
+	cnt++;
+	size = cnt * cnt * sizeof(numa_distance[0]);
+
+	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
+				      size, PAGE_SIZE);
+	if (!phys) {
+		pr_warn("NUMA: Warning: can't allocate distance table!\n");
+		/* don't retry until explicitly reset */
+		numa_distance = (void *)1LU;
+		return -ENOMEM;
+	}
+	memblock_reserve(phys, size);
+
+	numa_distance = __va(phys);
+	numa_distance_cnt = cnt;
+
+	/* fill with the default distances */
+	for (i = 0; i < cnt; i++)
+		for (j = 0; j < cnt; j++)
+			numa_distance[i * cnt + j] = i == j ?
+				LOCAL_DISTANCE : REMOTE_DISTANCE;
+	pr_debug("NUMA: Initialized distance table, cnt=%d\n", cnt);
+
+	return 0;
+}
+
+/**
+ * numa_set_distance - Set NUMA distance from one NUMA to another
+ * @from: the 'from' node to set distance
+ * @to: the 'to'  node to set distance
+ * @distance: NUMA distance
+ *
+ * Set the distance from node @from to @to to @distance.  If distance table
+ * doesn't exist, one which is large enough to accommodate all the currently
+ * known nodes will be created.
+ *
+ * If such table cannot be allocated, a warning is printed and further
+ * calls are ignored until the distance table is reset with
+ * numa_reset_distance().
+ *
+ * If @from or @to is higher than the highest known node or lower than zero
+ * at the time of table creation or @distance doesn't make sense, the call
+ * is ignored.
+ * This is to allow simplification of specific NUMA config implementations.
+ */
+void __init numa_set_distance(int from, int to, int distance)
+{
+	if (!numa_distance && numa_alloc_distance() < 0)
+		return;
+
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
+			from < 0 || to < 0) {
+		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
+			    from, to, distance);
+		return;
+	}
+
+	if ((u8)distance != distance ||
+	    (from == to && distance != LOCAL_DISTANCE)) {
+		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+			     from, to, distance);
+		return;
+	}
+
+	numa_distance[from * numa_distance_cnt + to] = distance;
+}
+EXPORT_SYMBOL(numa_set_distance);
+
+int __node_distance(int from, int to)
+{
+	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
+	return numa_distance[from * numa_distance_cnt + to];
+}
+EXPORT_SYMBOL(__node_distance);
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	unsigned long uninitialized_var(pfn_align);
+	int i, nid;
+
+	/* Account for nodes with cpus and no memory */
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+
+		memblock_set_node(mb->start, mb->end - mb->start,
+				  &memblock.memory, mb->nid);
+	}
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end);
+	}
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret;
+
+	nodes_clear(numa_nodes_parsed);
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+	numa_reset_distance();
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_register_memblks(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	setup_node_to_cpumask_map();
+
+	/* init boot processor */
+	map_cpu_to_node(0, 0);
+
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	pr_info("%s\n", "No NUMA configuration found");
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+	node_set(0, numa_nodes_parsed);
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+	numa_off = 1;
+
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-10-20 10:45 [PATCH v6 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
  2015-10-20 10:45 ` [PATCH v6 1/4] arm64, numa: adding " Ganapatrao Kulkarni
@ 2015-10-20 10:45 ` Ganapatrao Kulkarni
       [not found]   ` <1445337931-11344-3-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
  2015-10-20 10:45 ` [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms Ganapatrao Kulkarni
  2015-10-20 10:45 ` [PATCH v6 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology Ganapatrao Kulkarni
  3 siblings, 1 reply; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-20 10:45 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

DT bindings for numa mapping of memory, cores and IOs.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 Documentation/devicetree/bindings/arm/numa.txt | 275 +++++++++++++++++++++++++
 1 file changed, 275 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt

diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
new file mode 100644
index 0000000..f3bc8e6
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/numa.txt
@@ -0,0 +1,275 @@
+==============================================================================
+NUMA binding description.
+==============================================================================
+
+==============================================================================
+1 - Introduction
+==============================================================================
+
+Systems employing a Non Uniform Memory Access (NUMA) architecture contain
+collections of hardware resources including processors, memory, and I/O buses,
+that comprise what is commonly known as a NUMA node.
+Processor accesses to memory within the local NUMA node is generally faster
+than processor accesses to memory outside of the local NUMA node.
+DT defines interfaces that allow the platform to convey NUMA node
+topology information to OS.
+
+==============================================================================
+2 - proximity
+==============================================================================
+The proximity device node property describes proximity domains within a
+machine. This property can be used in device nodes like cpu, memory, bus and
+devices to map to respective numa nodes.
+
+proximity property is a 32-bit integer which defines numa node id to which
+this device node has numa proximity association.
+
+Example:
+	/* numa node 0 */
+	proximity = <0>;
+
+	/* numa node 1 */
+	proximity = <1>;
+
+==============================================================================
+3 - distance-map
+==============================================================================
+
+The device tree node distance-map describes the relative
+distance (memory latency) between all numa nodes.
+
+- distance-matrix
+  This property defines a matrix to describe the relative distances
+  between all numa nodes.
+  It is represented as a list of node pairs and their relative distance.
+
+  Note:
+	1. If there is no distance-map, the system should setup:
+
+		      local/local:  10
+		      local/remote: 20
+	for all node distances.
+
+	2. If both directions between 2 nodes have the same distance, only
+	       one entry is required.
+	3. distance-matrix shold have entries in ascending order of nodes.
+	4. Device node distance-map must reside in the root node.
+
+Example:
+	4 nodes connected in mesh/ring topology as below,
+
+		0_______20______1
+		|		|
+		|		|
+	     20	|		|20
+		|		|
+		|		|
+		|_______________|
+		3	20	2
+
+	if relative distance for each hop is 20,
+	then inter node distance would be for this topology will be,
+	      0 -> 1 = 20
+	      1 -> 2 = 20
+	      2 -> 3 = 20
+	      3 -> 0 = 20
+	      0 -> 2 = 40
+	      1 -> 3 = 40
+
+     and dt presentation for this distance matrix is,
+
+		distance-map {
+			 distance-matrix = <0 0  10>,
+					   <0 1  20>,
+					   <0 2  40>,
+					   <0 3  20>,
+					   <1 0  20>,
+					   <1 1  10>,
+					   <1 2  20>,
+					   <1 3  40>,
+					   <2 0  40>,
+					   <2 1  20>,
+					   <2 2  10>,
+					   <2 3  20>,
+					   <3 0  20>,
+					   <3 1  40>,
+					   <3 2  20>,
+					   <3 3  10>;
+		};
+
+Note:
+	 1. The entries like <0 0> <1 1>  <2 2> <3 3>
+	    can be optional and system can put default value(local distance, i.e 10).
+	 2. The entries like <1 0> can be optional if <0 1> and <1 0>
+	    are of same distance.
+
+==============================================================================
+4 - Example dts
+==============================================================================
+
+2 sockets system consists of 2 boards connected through ccn bus and
+each board having one socket/soc of 8 cpus, memory and pci bus.
+
+	memory@00c00000 {
+		device_type = "memory";
+		reg = <0x0 0x00c00000 0x0 0x80000000>;
+		/* node 0 */
+		proximity = <0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		/* node 1 */
+		proximity = <1>;
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu@000 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+			/* node 0 */
+			proximity = <0>;
+		};
+		cpu@001 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@002 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@003 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@004 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@005 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@006 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@007 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@008 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+			/* node 1 */
+			proximity = <1>;
+		};
+		cpu@009 {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@00a {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@00b {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@00c {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@00d {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@00e {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@00f {
+			device_type = "cpu";
+			compatible =  "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+	};
+
+	pcie0: pcie0@0x8480,00000000 {
+		compatible = "arm,armv8";
+		device_type = "pci";
+		bus-range = <0 255>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		reg = <0x8480 0x00000000 0 0x10000000>;  /* Configuration space */
+		ranges = <0x03000000 0x8010 0x00000000 0x8010 0x00000000 0x70 0x00000000>;
+		/* node 0 */
+		proximity = <0>;
+        };
+
+	pcie1: pcie1@0x9480,00000000 {
+		compatible = "arm,armv8";
+		device_type = "pci";
+		bus-range = <0 255>;
+		#size-cells = <2>;
+		#address-cells = <3>;
+		reg = <0x9480 0x00000000 0 0x10000000>;  /* Configuration space */
+		ranges = <0x03000000 0x9010 0x00000000 0x9010 0x00000000 0x70 0x00000000>;
+		/* node 1 */
+		proximity = <1>;
+        };
+
+	distance-map {
+		distance-matrix = <0 0 10>,
+				  <0 1 20>,
+				  <1 1 10>;
+	};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
  2015-10-20 10:45 [PATCH v6 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
  2015-10-20 10:45 ` [PATCH v6 1/4] arm64, numa: adding " Ganapatrao Kulkarni
  2015-10-20 10:45 ` [PATCH v6 2/4] Documentation, dt, arm64/arm: dt bindings for numa Ganapatrao Kulkarni
@ 2015-10-20 10:45 ` Ganapatrao Kulkarni
       [not found]   ` <1445337931-11344-4-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
  2015-10-20 10:45 ` [PATCH v6 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology Ganapatrao Kulkarni
  3 siblings, 1 reply; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-20 10:45 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

Adding numa dt binding support for arm64 based platforms.
dt node parsing for numa topology is done using device property
proximity and device node distance-map.

Reviewed-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig            |  10 ++
 arch/arm64/include/asm/numa.h |  10 ++
 arch/arm64/kernel/Makefile    |   1 +
 arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/smp.c       |   1 +
 arch/arm64/mm/numa.c          |  10 +-
 6 files changed, 252 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/of_numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0f9cdc7..6cf8d20 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -426,6 +426,16 @@ config NUMA
 	  local memory controller of the CPU and add some more
 	  NUMA awareness to the kernel.
 
+config OF_NUMA
+	bool "Device Tree NUMA support"
+	depends on NUMA
+	depends on OF
+	default y
+	help
+	  Enable Device Tree NUMA support.
+	  This enables the numa mapping of cpu, memory, io and
+	  inter node distances using dt bindings.
+
 config NODES_SHIFT
 	int "Maximum NUMA Nodes (as a power of 2)"
 	range 1 10
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index cadbd24..322da78 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -60,4 +60,14 @@ void numa_store_cpu_info(unsigned int cpu);
 static inline void numa_store_cpu_info(unsigned int cpu)		{ }
 static inline void arm64_numa_init(void)		{ }
 #endif	/* CONFIG_NUMA */
+
+struct device_node;
+#ifdef CONFIG_OF_NUMA
+int __init arm64_of_numa_init(void);
+void __init of_numa_set_node_info(unsigned int cpu,
+		u64 hwid, struct device_node *dn);
+#else
+static inline void of_numa_set_node_info(unsigned int cpu, u64 hwid,
+		struct device_node *dn) { }
+#endif
 #endif	/* _ASM_NUMA_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 22dc9bc..ad1fd72 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -36,6 +36,7 @@ arm64-obj-$(CONFIG_EFI)			+= efi.o efi-stub.o efi-entry.o
 arm64-obj-$(CONFIG_PCI)			+= pci.o
 arm64-obj-$(CONFIG_ARMV8_DEPRECATED)	+= armv8_deprecated.o
 arm64-obj-$(CONFIG_ACPI)		+= acpi.o
+arm64-obj-$(CONFIG_OF_NUMA)		+= of_numa.o
 
 obj-y					+= $(arm64-obj-y) vdso/
 obj-m					+= $(arm64-obj-m)
diff --git a/arch/arm64/kernel/of_numa.c b/arch/arm64/kernel/of_numa.c
new file mode 100644
index 0000000..0a6b2cf
--- /dev/null
+++ b/arch/arm64/kernel/of_numa.c
@@ -0,0 +1,221 @@
+/*
+ * OF NUMA Parsing support.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/memblock.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <asm/smp_plat.h>
+
+/* define default numa node to 0 */
+#define DEFAULT_NODE 0
+
+/* Returns nid in the range [0..MAX_NUMNODES-1],
+ * or NUMA_NO_NODE if no valid proximity entry found
+ * or DEFAULT_NODE if no proximity entry exists
+ */
+static int proximity_to_nid(const __be32 *proximity, int length)
+{
+	int nid;
+
+	if (!proximity)
+		return DEFAULT_NODE;
+
+	if (length != sizeof(*proximity)) {
+		pr_warn("NUMA: Invalid proximity length %d found.\n", length);
+		return NUMA_NO_NODE;
+	}
+
+	nid = of_read_number(proximity, 1);
+	if (nid >= MAX_NUMNODES) {
+		pr_warn("NUMA: Invalid numa node %d found.\n", nid);
+		return NUMA_NO_NODE;
+	}
+
+	return nid;
+}
+
+/* Must hold reference to node during call */
+static int of_get_proximity(struct device_node *device)
+{
+	int length;
+	const __be32 *proximity;
+
+	proximity = of_get_property(device, "proximity", &length);
+
+	return proximity_to_nid(proximity, length);
+}
+
+static int early_init_of_get_proximity(unsigned long node)
+{
+	int length;
+	const __be32 *proximity;
+
+	proximity = of_get_flat_dt_prop(node, "proximity", &length);
+
+	return proximity_to_nid(proximity, length);
+}
+
+/* Walk the device tree upwards, looking for a proximity node */
+int of_node_to_nid(struct device_node *device)
+{
+	struct device_node *parent;
+	int nid = NUMA_NO_NODE;
+
+	of_node_get(device);
+	while (device) {
+		const __be32 *proximity;
+		int length;
+
+		proximity = of_get_property(device, "proximity", &length);
+		if (proximity) {
+			nid = proximity_to_nid(proximity, length);
+			break;
+		}
+
+		parent = device;
+		device = of_get_parent(parent);
+		of_node_put(parent);
+	}
+	of_node_put(device);
+
+	return nid;
+}
+
+void __init of_numa_set_node_info(unsigned int cpu,
+		u64 hwid, struct device_node *device)
+{
+	int nid = DEFAULT_NODE;
+
+	if (device)
+		nid = of_get_proximity(device);
+
+	node_cpu_hwid[cpu].node_id = nid;
+	node_cpu_hwid[cpu].cpu_hwid = hwid;
+	node_set(nid, numa_nodes_parsed);
+}
+
+static int __init early_init_parse_memory_node(unsigned long node)
+{
+	const __be32 *reg, *endp;
+	int length;
+	int nid;
+
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL)
+		return 0;
+	else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	nid = early_init_of_get_proximity(node);
+
+	if (nid == NUMA_NO_NODE)
+		return -EINVAL;
+
+	reg = of_get_flat_dt_prop(node, "reg", &length);
+	endp = reg + (length / sizeof(__be32));
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+		struct memblock_region *mblk;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+		pr_debug("NUMA-DT:  base = %llx , node = %u\n",
+				base, nid);
+
+		for_each_memblock(memory, mblk) {
+			if (mblk->base == base) {
+				node_set(nid, numa_nodes_parsed);
+				numa_add_memblk(nid, mblk->base, mblk->size);
+				break;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int early_init_parse_distance_map(unsigned long node, const char *uname)
+{
+	const __be32 *prop_dist_matrix;
+	int length = 0, i, matrix_count;
+	int nr_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT;
+
+	if (strcmp(uname, "distance-map") != 0)
+		return 0;
+
+	prop_dist_matrix =
+		of_get_flat_dt_prop(node, "distance-matrix", &length);
+
+	if (!length) {
+		pr_err("NUMA: failed to parse distance-matrix\n");
+		return  -ENODEV;
+	}
+
+	matrix_count = ((length / sizeof(__be32)) / (3 * nr_size_cells));
+
+	if ((matrix_count * sizeof(__be32) * 3 * nr_size_cells) !=  length) {
+		pr_warn("NUMA: invalid distance-matrix length %d\n", length);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < matrix_count; i++) {
+		u32 nodea, nodeb, distance;
+
+		nodea = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		nodeb = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		distance = dt_mem_next_cell(nr_size_cells, &prop_dist_matrix);
+		numa_set_distance(nodea, nodeb, distance);
+		pr_debug("NUMA-DT:  distance[node%d -> node%d] = %d\n",
+				nodea, nodeb, distance);
+
+		/* Set default distance of node B->A same as A->B */
+		if (nodeb > nodea)
+			numa_set_distance(nodeb, nodea, distance);
+	}
+
+	return 0;
+}
+
+/**
+ * early_init_of_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_of_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	int ret;
+
+	ret = early_init_parse_memory_node(node);
+
+	if (!ret)
+		ret = early_init_parse_distance_map(node, uname);
+
+	return ret;
+}
+
+/* DT node mapping is done already early_init_of_scan_memory */
+int __init arm64_of_numa_init(void)
+{
+	return of_scan_flat_dt(early_init_of_scan_numa_map, NULL);
+}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 985ee04..a9d7f93 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -516,6 +516,7 @@ void __init of_parse_and_init_cpus(void)
 
 		pr_debug("cpu logical map 0x%llx\n", hwid);
 		cpu_logical_map(cpu_count) = hwid;
+		of_numa_set_node_info(cpu_count, hwid, dn);
 next:
 		cpu_count++;
 	}
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 4dd7436..ab01551 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -527,5 +527,13 @@ static int __init dummy_numa_init(void)
  */
 void __init arm64_numa_init(void)
 {
-	numa_init(dummy_numa_init);
+	int ret = -ENODEV;
+
+#ifdef CONFIG_OF_NUMA
+	if (!numa_off)
+		ret = numa_init(arm64_of_numa_init);
+#endif
+
+	if (ret)
+		numa_init(dummy_numa_init);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v6 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology.
  2015-10-20 10:45 [PATCH v6 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
                   ` (2 preceding siblings ...)
  2015-10-20 10:45 ` [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms Ganapatrao Kulkarni
@ 2015-10-20 10:45 ` Ganapatrao Kulkarni
  3 siblings, 0 replies; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-20 10:45 UTC (permalink / raw)
  To: linux-arm-kernel, devicetree, Will.Deacon, catalin.marinas,
	grant.likely, leif.lindholm, rfranz, ard.biesheuvel, msalter,
	robh+dt, steve.capper, hanjun.guo, al.stone, arnd, pawel.moll,
	mark.rutland, ijc+devicetree, galak, rjw, lenb, marc.zyngier,
	rrichter, Prasun.Kapoor
  Cc: gpkulkarni

Adding dt file for Cavium's Thunderx dual socket platform.

Signed-off-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
---
 arch/arm64/boot/dts/cavium/Makefile             |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  82 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 ++++++++++++++++++++++++
 3 files changed, 889 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi

diff --git a/arch/arm64/boot/dts/cavium/Makefile b/arch/arm64/boot/dts/cavium/Makefile
index e34f89d..7fe7067 100644
--- a/arch/arm64/boot/dts/cavium/Makefile
+++ b/arch/arm64/boot/dts/cavium/Makefile
@@ -1,4 +1,4 @@
-dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb
+dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb thunder-88xx-2n.dtb
 
 always		:= $(dtb-y)
 subdir-y	:= $(dts-dirs)
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
new file mode 100644
index 0000000..4bde994
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
@@ -0,0 +1,82 @@
+/*
+ * Cavium Thunder DTS file - Thunder board description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+/include/ "thunder-88xx-2n.dtsi"
+
+/ {
+	model = "Cavium ThunderX CN88XX board";
+	compatible = "cavium,thunder-88xx";
+
+	aliases {
+		serial0 = &uaa0;
+		serial1 = &uaa1;
+	};
+
+	memory@00000000 {
+		device_type = "memory";
+		reg = <0x0 0x01400000 0x3 0xFEC00000>;
+		/* socket 0 */
+		proximity = <0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00400000 0x3 0xFFC00000>;
+		 /* socket 1 */
+		proximity = <1>;
+	};
+
+	distance-map {
+		distance-matrix = <0 0  10>,
+				  <0 1  20>,
+				  <1 1  10>;
+	};
+};
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
new file mode 100644
index 0000000..959a65a
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
@@ -0,0 +1,806 @@
+/*
+ * Cavium Thunder DTS file - Thunder SoC description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/ {
+	compatible = "cavium,thunder-88xx";
+	interrupt-parent = <&gic0>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	psci {
+		compatible = "arm,psci-0.2";
+		method = "smc";
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu@000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+			/* socket 0 */
+			proximity = <0>;
+		};
+		cpu@001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@00a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@00b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@00c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@00d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@00e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@00f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x100>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x101>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x102>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x103>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x104>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x105>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x106>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x107>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x108>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x109>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10a>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10b>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10c>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10d>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10e>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10f>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x200>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x201>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x202>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x203>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x204>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x205>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x206>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x207>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x208>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x209>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@20a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20a>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@20b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20b>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@20c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20c>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@20d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20d>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@20e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20e>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@20f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20f>;
+			enable-method = "psci";
+			proximity = <0>;
+		};
+		cpu@10000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10000>;
+			enable-method = "psci";
+			/* socket 1 */
+			proximity = <1>;
+		};
+		cpu@10001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10001>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10002>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10003>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10004>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10005>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10006>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10007>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10008>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10009>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1000a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000a>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1000b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000b>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1000c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000c>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1000d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000d>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1000e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000e>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1000f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000f>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10100>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10101>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10102>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10103>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10104>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10105>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10106>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10107>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10108>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10109>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1010a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010a>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1010b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010b>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1010c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010c>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1010d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010d>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1010e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010e>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1010f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010f>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10200>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10201>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10202>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10203>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10204>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10205>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10206>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10207>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10208>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@10209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10209>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1020a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020a>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1020b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020b>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1020c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020c>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1020d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020d>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1020e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020e>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+		cpu@1020f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020f>;
+			enable-method = "psci";
+			proximity = <1>;
+		};
+	};
+
+	timer {
+		compatible = "arm,armv8-timer";
+		interrupts = <1 13 0xff01>,
+		             <1 14 0xff01>,
+		             <1 11 0xff01>,
+		             <1 10 0xff01>;
+	};
+
+	soc {
+		compatible = "simple-bus";
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		refclk50mhz: refclk50mhz {
+			compatible = "fixed-clock";
+			#clock-cells = <0>;
+			clock-frequency = <50000000>;
+			clock-output-names = "refclk50mhz";
+		};
+
+		gic0: interrupt-controller@8010,00000000 {
+			compatible = "arm,gic-v3";
+			#interrupt-cells = <3>;
+			#address-cells = <2>;
+			#size-cells = <2>;
+			#redistributor-regions = <2>;
+			ranges;
+			interrupt-controller;
+			reg = <0x8010 0x00000000 0x0 0x010000>, /* GICD */
+			      <0x8010 0x80000000 0x0 0x600000>, /* GICR Node 0 */
+			      <0x9010 0x80000000 0x0 0x600000>; /* GICR Node 1 */
+			interrupts = <1 9 0xf04>;
+
+			its: gic-its@8010,00020000 {
+				compatible = "arm,gic-v3-its";
+				msi-controller;
+				reg = <0x8010 0x20000 0x0 0x200000>;
+				proximity = <0>;
+			};
+
+			its1: gic-its@9010,00020000 {
+				compatible = "arm,gic-v3-its";
+				msi-controller;
+				reg = <0x9010 0x20000 0x0 0x200000>;
+				proximity = <1>;
+			};
+		};
+
+		uaa0: serial@87e0,24000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x24000000 0x0 0x1000>;
+			interrupts = <1 21 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+
+		uaa1: serial@87e0,25000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x25000000 0x0 0x1000>;
+			interrupts = <1 22 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+	};
+};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] arm64, numa: adding numa support for arm64 platforms.
       [not found]   ` <1445337931-11344-2-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
@ 2015-10-20 14:47     ` Mark Rutland
  2015-10-21  8:54       ` Ganapatrao Kulkarni
  2015-10-23 15:11     ` Matthias Brugger
  1 sibling, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2015-10-20 14:47 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Hi,

I'm away for the rest of this week and don't have the time to give this
a full review, but I've given this a first pass and have some high-level
comments:

First, most of this is copy+paste from x86. We should try to share code
rather than duplicating it. Especially as it looks like there's cleanup
(and therefore divergence) that could happen.

Second, this reimplements memblock to associate nids with memory
regions. I think we should keep memblock around for this (I'm under the
impression that Ard also wants that for memory attributes), rather than
creating a new memblock-like API that we then have to reconcile with
actual memblock information.

Third, NAK to the changes to /proc/cpuinfo, please drop that from the
patch. Further comments on that matter are inline below.

On Tue, Oct 20, 2015 at 04:15:28PM +0530, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This patch adds by default the dummy numa node and
> maps all memory and cpus to node 0.
> using this patch, numa can be simulated on single node arm64 platforms.
> 
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  arch/arm64/Kconfig              |  25 ++
>  arch/arm64/include/asm/mmzone.h |  17 ++
>  arch/arm64/include/asm/numa.h   |  63 +++++
>  arch/arm64/kernel/setup.c       |   9 +
>  arch/arm64/kernel/smp.c         |   2 +
>  arch/arm64/mm/Makefile          |   1 +
>  arch/arm64/mm/init.c            |  31 ++-
>  arch/arm64/mm/numa.c            | 531 ++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 675 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/mm/numa.c
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7d95663..0f9cdc7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -68,6 +68,7 @@ config ARM64
>  	select HAVE_GENERIC_DMA_COHERENT
>  	select HAVE_HW_BREAKPOINT if PERF_EVENTS
>  	select HAVE_MEMBLOCK
> +	select HAVE_MEMBLOCK_NODE_MAP if NUMA
>  	select HAVE_PATA_PLATFORM
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_REGS
> @@ -414,6 +415,30 @@ config HOTPLUG_CPU
>  	  Say Y here to experiment with turning CPUs off and on.  CPUs
>  	  can be controlled through /sys/devices/system/cpu.
>  
> +# Common NUMA Features
> +config NUMA
> +	bool "Numa Memory Allocation and Scheduler Support"
> +	depends on SMP
> +	help
> +	  Enable NUMA (Non Uniform Memory Access) support.
> +
> +	  The kernel will try to allocate memory used by a CPU on the
> +	  local memory controller of the CPU and add some more
> +	  NUMA awareness to the kernel.
> +
> +config NODES_SHIFT
> +	int "Maximum NUMA Nodes (as a power of 2)"
> +	range 1 10
> +	default "2"
> +	depends on NEED_MULTIPLE_NODES
> +	help
> +	  Specify the maximum number of NUMA Nodes available on the target
> +	  system.  Increases memory reserved to accommodate various tables.

How much memory do we end up requiring per node?

> +
> +config USE_PERCPU_NUMA_NODE_ID
> +	def_bool y
> +	depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  
>  config HZ
> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> new file mode 100644
> index 0000000..6ddd468
> --- /dev/null
> +++ b/arch/arm64/include/asm/mmzone.h
> @@ -0,0 +1,17 @@
> +#ifndef __ASM_ARM64_MMZONE_H_
> +#define __ASM_ARM64_MMZONE_H_
> +
> +#ifdef CONFIG_NUMA
> +
> +#include <linux/mmdebug.h>
> +#include <linux/types.h>
> +
> +#include <asm/smp.h>
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +
> +#define NODE_DATA(nid)		(node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_ARM64_MMZONE_H_ */
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> new file mode 100644
> index 0000000..cadbd24
> --- /dev/null
> +++ b/arch/arm64/include/asm/numa.h
> @@ -0,0 +1,63 @@
> +#ifndef _ASM_NUMA_H
> +#define _ASM_NUMA_H
> +
> +#include <linux/nodemask.h>
> +#include <asm/topology.h>
> +
> +#ifdef CONFIG_NUMA
> +
> +#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
> +
> +/* currently, arm64 implements flat NUMA topology */
> +#define parent_node(node)	(node)
> +
> +extern int __node_distance(int from, int to);
> +#define node_distance(a, b) __node_distance(a, b)
> +
> +/* dummy definitions for pci functions */
> +#define pcibus_to_node(node)	0
> +#define cpumask_of_pcibus(bus)	0
> +
> +struct __node_cpu_hwid {
> +	int node_id;    /* logical node containing this CPU */
> +	u64 cpu_hwid;   /* MPIDR for this CPU */
> +};

We already have the MPIDR ID in the cpu_logical_map. Please don't
duplicate it here.

As node_cpu_hwid seems to be indexed by logical ID, you can simlpy use
the same index for the logical map to get the MPIDR ID when necessary.

> +
> +struct numa_memblk {
> +	u64 start;
> +	u64 end;
> +	int nid;
> +};
> +
> +struct numa_meminfo {
> +	int nr_blks;
> +	struct numa_memblk blk[NR_NODE_MEMBLKS];
> +};

I think we should keep the usual memblock around for this. It already
has some nid support.

> +
> +extern struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
> +extern nodemask_t numa_nodes_parsed __initdata;
> +
> +/* Mappings between node number and cpus on that node. */
> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +extern void numa_clear_node(unsigned int cpu);
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +extern const struct cpumask *cpumask_of_node(int node);
> +#else
> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> +static inline const struct cpumask *cpumask_of_node(int node)
> +{
> +	return node_to_cpumask_map[node];
> +}
> +#endif
> +
> +void __init arm64_numa_init(void);
> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
> +void __init numa_set_distance(int from, int to, int distance);
> +void __init numa_reset_distance(void);
> +void numa_store_cpu_info(unsigned int cpu);
> +#else	/* CONFIG_NUMA */
> +static inline void numa_store_cpu_info(unsigned int cpu)		{ }
> +static inline void arm64_numa_init(void)		{ }
> +#endif	/* CONFIG_NUMA */
> +#endif	/* _ASM_NUMA_H */
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index a2794da..4f3623d 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -54,6 +54,7 @@
>  #include <asm/elf.h>
>  #include <asm/cpufeature.h>
>  #include <asm/cpu_ops.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/smp_plat.h>
> @@ -485,6 +486,9 @@ static int __init topology_init(void)
>  {
>  	int i;
>  
> +	for_each_online_node(i)
> +		register_one_node(i);
> +
>  	for_each_possible_cpu(i) {
>  		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>  		cpu->hotpluggable = 1;
> @@ -557,7 +561,12 @@ static int c_show(struct seq_file *m, void *v)
>  		 * online processors, looking for lines beginning with
>  		 * "processor".  Give glibc what it expects.
>  		 */
> +#ifdef CONFIG_NUMA
> +		seq_printf(m, "processor\t: %d", i);
> +		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
> +#else
>  		seq_printf(m, "processor\t: %d\n", i);
> +#endif

As above, NAK to a /proc/cpuinfo change.

We don't have this on arch/arm and didn't previously have it on arm64,
so it could easily break existing software (both compat and native).
Having the format randomly change based on a config option is also not
great, and there's already been enough pain in this area.

Additionally, other architctures don't have this, so it's clearly not
necessary.

Surely there's a (portable/consistent) sysfs interface that provides the
NUMA information userspace requires? If not, we should add one that
works across architectures.

>  
>  		/*
>  		 * Dump out the common processor features in a single line.
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index dbdaacd..985ee04 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -45,6 +45,7 @@
>  #include <asm/cputype.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/mmu_context.h>
> +#include <asm/numa.h>
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
>  #include <asm/processor.h>
> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  static void smp_store_cpu_info(unsigned int cpuid)
>  {
>  	store_cpu_topology(cpuid);
> +	numa_store_cpu_info(cpuid);
>  }
>  
>  /*
> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index 773d37a..bb92d41 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -4,3 +4,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
>  				   context.o proc.o pageattr.o
>  obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
>  obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
> +obj-$(CONFIG_NUMA)		+= numa.o
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 697a6d0..81a0316 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -37,6 +37,7 @@
>  
>  #include <asm/fixmap.h>
>  #include <asm/memory.h>
> +#include <asm/numa.h>
>  #include <asm/sections.h>
>  #include <asm/setup.h>
>  #include <asm/sizes.h>
> @@ -77,6 +78,20 @@ static phys_addr_t max_zone_dma_phys(void)
>  	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>  }
>  
> +#ifdef CONFIG_NUMA
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> +	unsigned long max_zone_pfns[MAX_NR_ZONES];
> +
> +	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));

You can make this simpler by initialising the variable when defining it:

	unsigned long max_zone_pfs[MAX_NR_ZONES] = { 0 };

> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
> +	max_zone_pfns[ZONE_NORMAL] = max;
> +
> +	free_area_init_nodes(max_zone_pfns);
> +}
> +
> +#else
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  {
>  	struct memblock_region *reg;
> @@ -115,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  
>  	free_area_init_node(0, zone_size, min, zhole_size);
>  }
> +#endif /* CONFIG_NUMA */
>  
>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>  int pfn_valid(unsigned long pfn)
> @@ -132,10 +148,15 @@ static void arm64_memory_present(void)
>  static void arm64_memory_present(void)
>  {
>  	struct memblock_region *reg;
> +	int nid = 0;
>  
> -	for_each_memblock(memory, reg)
> -		memory_present(0, memblock_region_memory_base_pfn(reg),
> -			       memblock_region_memory_end_pfn(reg));
> +	for_each_memblock(memory, reg) {
> +#ifdef CONFIG_NUMA
> +		nid = reg->nid;
> +#endif
> +		memory_present(nid, memblock_region_memory_base_pfn(reg),
> +				memblock_region_memory_end_pfn(reg));
> +	}
>  }
>  #endif
>  
> @@ -192,6 +213,9 @@ void __init bootmem_init(void)
>  
>  	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>  
> +	max_pfn = max_low_pfn = max;
> +
> +	arm64_numa_init();
>  	/*
>  	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
>  	 * done after the fixed reservations.
> @@ -202,7 +226,6 @@ void __init bootmem_init(void)
>  	zone_sizes_init(min, max);
>  
>  	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> -	max_pfn = max_low_pfn = max;
>  }
>  
>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> new file mode 100644
> index 0000000..4dd7436
> --- /dev/null
> +++ b/arch/arm64/mm/numa.c
> @@ -0,0 +1,531 @@
> +/*
> + * NUMA support, based on the x86 implementation.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/string.h>
> +#include <linux/init.h>
> +#include <linux/bootmem.h>
> +#include <linux/memblock.h>
> +#include <linux/ctype.h>
> +#include <linux/module.h>
> +#include <linux/nodemask.h>
> +#include <linux/sched.h>
> +#include <linux/topology.h>
> +#include <linux/mmzone.h>

Nit: please sort these alphabetically.

> +
> +#include <asm/smp_plat.h>
> +
> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> +EXPORT_SYMBOL(node_data);
> +nodemask_t numa_nodes_parsed __initdata;
> +struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
> +
> +static int numa_off;
> +static int numa_distance_cnt;
> +static u8 *numa_distance;
> +static struct numa_meminfo numa_meminfo;
> +
> +static __init int numa_parse_early_param(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +	if (!strncmp(opt, "off", 3)) {
> +		pr_info("%s\n", "NUMA turned off");
> +		numa_off = 1;
> +	}
> +	return 0;
> +}
> +early_param("numa", numa_parse_early_param);
> +
> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +EXPORT_SYMBOL(node_to_cpumask_map);
> +
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +/*
> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> + */
> +const struct cpumask *cpumask_of_node(int node)
> +{
> +	if (node >= nr_node_ids) {
> +		pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
> +			node, nr_node_ids);
> +		dump_stack();
> +		return cpu_none_mask;
> +	}

This can be:

	if (WARN_ON(node >= nr_node_ids))
		return cpu_none_mask;

> +	if (node_to_cpumask_map[node] == NULL) {
> +		pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
> +			node);
> +		dump_stack();
> +		return cpu_online_mask;
> +	}

Likewise:
	
	if (WARN_ON(!node_to_cpumask_map[node]))
		return cpu_online_mask;

> +	return node_to_cpumask_map[node];
> +}
> +EXPORT_SYMBOL(cpumask_of_node);
> +#endif
> +
> +static void map_cpu_to_node(unsigned int cpu, int nid)
> +{
> +	set_cpu_numa_node(cpu, nid);
> +	if (nid >= 0)
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> +}
> +
> +static void unmap_cpu_to_node(unsigned int cpu)
> +{
> +	int nid = cpu_to_node(cpu);
> +
> +	if (nid >= 0)
> +		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> +	set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +}
> +
> +void numa_clear_node(unsigned int cpu)
> +{
> +	unmap_cpu_to_node(cpu);
> +}
> +
> +/*
> + * Allocate node_to_cpumask_map based on number of available nodes
> + * Requires node_possible_map to be valid.
> + *
> + * Note: cpumask_of_node() is not valid until after this is done.
> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> + */
> +static void __init setup_node_to_cpumask_map(void)
> +{
> +	unsigned int cpu;
> +	int node;
> +
> +	/* setup nr_node_ids if not done yet */
> +	if (nr_node_ids == MAX_NUMNODES)
> +		setup_nr_node_ids();

Where would this be done otherwise? 

If we can initialise this earlier, what happens if we actually had
MAX_NUMNODES nodes?

> +
> +	/* allocate the map */
> +	for (node = 0; node < nr_node_ids; node++)
> +		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> +
> +	/* Clear the mapping */
> +	for (node = 0; node < nr_node_ids; node++)
> +		cpumask_clear(node_to_cpumask_map[node]);

Why not do these at the same time?

Can an allocation fail?

> +
> +	for_each_possible_cpu(cpu)
> +		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +
> +	/* cpumask_of_node() will now work */
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> +}
> +
> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(unsigned int cpu)
> +{
> +	map_cpu_to_node(cpu, numa_off ? 0 : node_cpu_hwid[cpu].node_id);
> +}
> +
> +/**
> + * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
> + */
> +
> +static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
> +				     struct numa_meminfo *mi)
> +{
> +	/* ignore zero length blks */
> +	if (start == end)
> +		return 0;
> +
> +	/* whine about and ignore invalid blks */
> +	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
> +		pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
> +				nid, start, end - 1);
> +		return 0;
> +	}

When would this happen?

> +
> +	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
> +		pr_err("NUMA: too many memblk ranges\n");
> +		return -EINVAL;
> +	}
> +
> +	pr_info("NUMA: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
> +			mi->nr_blks, start, end, nid);
> +	mi->blk[mi->nr_blks].start = start;
> +	mi->blk[mi->nr_blks].end = end;
> +	mi->blk[mi->nr_blks].nid = nid;
> +	mi->nr_blks++;
> +	return 0;
> +}

As I mentioned earlier, I think that we should keep the memblock
infrastructure around, and reuse it here.

> +
> +/**
> + * numa_add_memblk - Add one numa_memblk to numa_meminfo
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @end: End address of the new memblk
> + *
> + * Add a new memblk to the default numa_meminfo.
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +#define MAX_PHYS_ADDR	((phys_addr_t)~0)

We should probably rethink MAX_MEMBLOCK_ADDR and potentially make it
more generic so we can use it here and elsewhere. See commit
8eafeb4802281651 ("of/fdt: make memblock maximum physical address arch
configurable").

However, that might not matter if we're able to reuse memblock.

> +
> +int __init numa_add_memblk(int nid, u64 base, u64 end)
> +{
> +	const u64 phys_offset = __pa(PAGE_OFFSET);
> +
> +	base &= PAGE_MASK;
> +	end &= PAGE_MASK;
> +
> +	if (base > MAX_PHYS_ADDR) {
> +		pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
> +				base, base + end);
> +		return -ENOMEM;
> +	}
> +
> +	if (base + end > MAX_PHYS_ADDR) {
> +		pr_info("NUMA: Ignoring memory range 0x%lx - 0x%llx\n",
> +				ULONG_MAX, base + end);
> +		end = MAX_PHYS_ADDR - base;
> +	}
> +
> +	if (base + end < phys_offset) {
> +		pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
> +			   base, base + end);
> +		return -ENOMEM;
> +	}
> +	if (base < phys_offset) {
> +		pr_info("NUMA: Ignoring memory range 0x%llx - 0x%llx\n",
> +			   base, phys_offset);
> +		end -= phys_offset - base;
> +		base = phys_offset;
> +	}
> +
> +	return numa_add_memblk_to(nid, base, base + end, &numa_meminfo);
> +}
> +EXPORT_SYMBOL(numa_add_memblk);

I take it this is only used to look up the node for a given mermoy
region, rather than any region describe being usable by the kernel?
Otherwise that rounding of the base is worrying.

> +
> +/* Initialize NODE_DATA for a node on the local memory */
> +static void __init setup_node_data(int nid, u64 start, u64 end)
> +{
> +	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
> +	u64 nd_pa;
> +	void *nd;
> +	int tnid;
> +
> +	start = roundup(start, ZONE_ALIGN);
> +
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +	       nid, start, end - 1);
> +
> +	/*
> +	 * Allocate node data.  Try node-local memory and then any node.
> +	 */
> +	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);

Why was nd_size rounded to PAGE_SIZE earlier if we only care about
SMP_CACHE_BYTES alignment? I had assumed we wanted naturally-aligned
pages, but that doesn't seem to be the case given the above.

> +	if (!nd_pa) {
> +		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
> +					      MEMBLOCK_ALLOC_ACCESSIBLE);
> +		if (!nd_pa) {
> +			pr_err("Cannot find %zu bytes in node %d\n",
> +			       nd_size, nid);
> +			return;
> +		}
> +	}
> +	nd = __va(nd_pa);

Isn't memblock_alloc_try_nid sufficient for the above?

> +
> +	/* report and initialize */
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +	       nd_pa, nd_pa + nd_size - 1);
> +	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> +	if (tnid != nid)
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> +
> +	node_data[nid] = nd;
> +	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> +	NODE_DATA(nid)->node_id = nid;
> +	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
> +	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
> +
> +	node_set_online(nid);
> +}
> +
> +/*
> + * Set nodes, which have memory in @mi, in *@nodemask.
> + */
> +static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
> +					      const struct numa_meminfo *mi)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
> +		if (mi->blk[i].start != mi->blk[i].end &&
> +		    mi->blk[i].nid != NUMA_NO_NODE)
> +			node_set(mi->blk[i].nid, *nodemask);
> +}
> +
> +/*
> + * Sanity check to catch more bad NUMA configurations (they are amazingly
> + * common).  Make sure the nodes cover all memory.
> + */

This comment is surprising, given this functionality is brand new.

> +static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
> +{
> +	u64 numaram, totalram;
> +	int i;
> +
> +	numaram = 0;
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		u64 s = mi->blk[i].start >> PAGE_SHIFT;
> +		u64 e = mi->blk[i].end >> PAGE_SHIFT;
> +
> +		numaram += e - s;
> +		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
> +		if ((s64)numaram < 0)
> +			numaram = 0;
> +	}
> +
> +	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
> +
> +	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */

We shouldn't rely on magic like this.

Where and why are we "losing" pages, and why is that considered ok?

> +	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
> +		pr_err("NUMA: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
> +		       (numaram << PAGE_SHIFT) >> 20,
> +		       (totalram << PAGE_SHIFT) >> 20);
> +		return false;
> +	}
> +	return true;
> +}
> +
> +/**
> + * numa_reset_distance - Reset NUMA distance table
> + *
> + * The current table is freed.
> + * The next numa_set_distance() call will create a new one.
> + */
> +void __init numa_reset_distance(void)
> +{
> +	size_t size = numa_distance_cnt * numa_distance_cnt *
> +		sizeof(numa_distance[0]);
> +
> +	/* numa_distance could be 1LU marking allocation failure, test cnt */
> +	if (numa_distance_cnt)
> +		memblock_free(__pa(numa_distance), size);
> +	numa_distance_cnt = 0;
> +	numa_distance = NULL;	/* enable table creation */
> +}
> +
> +static int __init numa_alloc_distance(void)
> +{
> +	nodemask_t nodes_parsed;
> +	size_t size;
> +	int i, j, cnt = 0;
> +	u64 phys;
> +
> +	/* size the new table and allocate it */
> +	nodes_parsed = numa_nodes_parsed;
> +	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
> +
> +	for_each_node_mask(i, nodes_parsed)
> +		cnt = i;
> +	cnt++;
> +	size = cnt * cnt * sizeof(numa_distance[0]);
> +
> +	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> +				      size, PAGE_SIZE);
> +	if (!phys) {
> +		pr_warn("NUMA: Warning: can't allocate distance table!\n");
> +		/* don't retry until explicitly reset */
> +		numa_distance = (void *)1LU;

This doesn't look good. Why do we need to set this to a non-pointer
value?

Thanks,
Mark.

> +		return -ENOMEM;
> +	}
> +	memblock_reserve(phys, size);
> +
> +	numa_distance = __va(phys);
> +	numa_distance_cnt = cnt;
> +
> +	/* fill with the default distances */
> +	for (i = 0; i < cnt; i++)
> +		for (j = 0; j < cnt; j++)
> +			numa_distance[i * cnt + j] = i == j ?
> +				LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	pr_debug("NUMA: Initialized distance table, cnt=%d\n", cnt);
> +
> +	return 0;
> +}
> +
> +/**
> + * numa_set_distance - Set NUMA distance from one NUMA to another
> + * @from: the 'from' node to set distance
> + * @to: the 'to'  node to set distance
> + * @distance: NUMA distance
> + *
> + * Set the distance from node @from to @to to @distance.  If distance table
> + * doesn't exist, one which is large enough to accommodate all the currently
> + * known nodes will be created.
> + *
> + * If such table cannot be allocated, a warning is printed and further
> + * calls are ignored until the distance table is reset with
> + * numa_reset_distance().
> + *
> + * If @from or @to is higher than the highest known node or lower than zero
> + * at the time of table creation or @distance doesn't make sense, the call
> + * is ignored.
> + * This is to allow simplification of specific NUMA config implementations.
> + */
> +void __init numa_set_distance(int from, int to, int distance)
> +{
> +	if (!numa_distance && numa_alloc_distance() < 0)
> +		return;
> +
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> +			from < 0 || to < 0) {
> +		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +			    from, to, distance);
> +		return;
> +	}
> +
> +	if ((u8)distance != distance ||
> +	    (from == to && distance != LOCAL_DISTANCE)) {
> +		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +			     from, to, distance);
> +		return;
> +	}
> +
> +	numa_distance[from * numa_distance_cnt + to] = distance;
> +}
> +EXPORT_SYMBOL(numa_set_distance);
> +
> +int __node_distance(int from, int to)
> +{
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> +		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	return numa_distance[from * numa_distance_cnt + to];
> +}
> +EXPORT_SYMBOL(__node_distance);
> +
> +static int __init numa_register_memblks(struct numa_meminfo *mi)
> +{
> +	unsigned long uninitialized_var(pfn_align);
> +	int i, nid;
> +
> +	/* Account for nodes with cpus and no memory */
> +	node_possible_map = numa_nodes_parsed;
> +	numa_nodemask_from_meminfo(&node_possible_map, mi);
> +	if (WARN_ON(nodes_empty(node_possible_map)))
> +		return -EINVAL;
> +
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *mb = &mi->blk[i];
> +
> +		memblock_set_node(mb->start, mb->end - mb->start,
> +				  &memblock.memory, mb->nid);
> +	}
> +
> +	/*
> +	 * If sections array is gonna be used for pfn -> nid mapping, check
> +	 * whether its granularity is fine enough.
> +	 */
> +#ifdef NODE_NOT_IN_PAGE_FLAGS
> +	pfn_align = node_map_pfn_alignment();
> +	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
> +		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
> +		       PFN_PHYS(pfn_align) >> 20,
> +		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
> +		return -EINVAL;
> +	}
> +#endif
> +	if (!numa_meminfo_cover_memory(mi))
> +		return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, node_possible_map) {
> +		u64 start = PFN_PHYS(max_pfn);
> +		u64 end = 0;
> +
> +		for (i = 0; i < mi->nr_blks; i++) {
> +			if (nid != mi->blk[i].nid)
> +				continue;
> +			start = min(mi->blk[i].start, start);
> +			end = max(mi->blk[i].end, end);
> +		}
> +
> +		if (start < end)
> +			setup_node_data(nid, start, end);
> +	}
> +
> +	/* Dump memblock with node info and return. */
> +	memblock_dump_all();
> +	return 0;
> +}
> +
> +static int __init numa_init(int (*init_func)(void))
> +{
> +	int ret;
> +
> +	nodes_clear(numa_nodes_parsed);
> +	nodes_clear(node_possible_map);
> +	nodes_clear(node_online_map);
> +	numa_reset_distance();
> +
> +	ret = init_func();
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = numa_register_memblks(&numa_meminfo);
> +	if (ret < 0)
> +		return ret;
> +
> +	setup_node_to_cpumask_map();
> +
> +	/* init boot processor */
> +	map_cpu_to_node(0, 0);
> +
> +	return 0;
> +}
> +
> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.
> + */
> +static int __init dummy_numa_init(void)
> +{
> +	pr_info("%s\n", "No NUMA configuration found");
> +	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> +	       0LLU, PFN_PHYS(max_pfn) - 1);
> +	node_set(0, numa_nodes_parsed);
> +	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> +	numa_off = 1;
> +
> +	return 0;
> +}
> +
> +/**
> + * arm64_numa_init - Initialize NUMA
> + *
> + * Try each configured NUMA initialization method until one succeeds.  The
> + * last fallback is dummy single node config encomapssing whole memory and
> + * never fails.
> + */
> +void __init arm64_numa_init(void)
> +{
> +	numa_init(dummy_numa_init);
> +}
> -- 
> 1.8.1.4
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
       [not found]   ` <1445337931-11344-3-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
@ 2015-10-20 15:35     ` Mark Rutland
  2015-10-21  4:27       ` Ganapatrao Kulkarni
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2015-10-20 15:35 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

On Tue, Oct 20, 2015 at 04:15:29PM +0530, Ganapatrao Kulkarni wrote:
> DT bindings for numa mapping of memory, cores and IOs.
> 
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/arm/numa.txt | 275 +++++++++++++++++++++++++
>  1 file changed, 275 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
> new file mode 100644
> index 0000000..f3bc8e6
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/numa.txt
> @@ -0,0 +1,275 @@
> +==============================================================================
> +NUMA binding description.
> +==============================================================================
> +
> +==============================================================================
> +1 - Introduction
> +==============================================================================
> +
> +Systems employing a Non Uniform Memory Access (NUMA) architecture contain
> +collections of hardware resources including processors, memory, and I/O buses,
> +that comprise what is commonly known as a NUMA node.
> +Processor accesses to memory within the local NUMA node is generally faster
> +than processor accesses to memory outside of the local NUMA node.
> +DT defines interfaces that allow the platform to convey NUMA node
> +topology information to OS.
> +
> +==============================================================================
> +2 - proximity
> +==============================================================================
> +The proximity device node property describes proximity domains within a
> +machine. This property can be used in device nodes like cpu, memory, bus and
> +devices to map to respective numa nodes.
> +
> +proximity property is a 32-bit integer which defines numa node id to which
> +this device node has numa proximity association.
> +
> +Example:
> +	/* numa node 0 */
> +	proximity = <0>;
> +
> +	/* numa node 1 */
> +	proximity = <1>;


It would probably be better to call this something like "numa-domain-id"
or "numa-node-id". The "proximity" is a relationship (that's actually
described in the distance map), and it makes it obvious that this is
NUMA related.

> +
> +==============================================================================
> +3 - distance-map
> +==============================================================================
> +
> +The device tree node distance-map describes the relative
> +distance (memory latency) between all numa nodes.

Rather than making this another magic name, we should give it a
compatible string. That will also help if/when updating this in future.

> +
> +- distance-matrix
> +  This property defines a matrix to describe the relative distances
> +  between all numa nodes.
> +  It is represented as a list of node pairs and their relative distance.
> +
> +  Note:
> +	1. If there is no distance-map, the system should setup:
> +
> +		      local/local:  10
> +		      local/remote: 20
> +	for all node distances.

I think that either you have both the IDs and a distance map, or we
assume !NUMA (as we currently do). If your system is so trivial that the
above defaults are good enough, it's trivial to write them explicitly.

So I think this should go.

> +
> +	2. If both directions between 2 nodes have the same distance, only
> +	       one entry is required.

So there's a direction implied by each entry? That should be stated
explicitly.

That said, I'm having some difficulty comprehending an asymmetric
distance, and I worry that it's ill-defined.

What does the direction apply to specifically?

How is it to be interpreted?

Assuming I have two domains A and B, and I have:

	distance-matrix = <A B 1>, <B A 255>;

What does that mean for those domains? What's fast and what is slow?

> +	3. distance-matrix shold have entries in ascending order of nodes.

s/ascending/lexicographical ascending/, and s/nodes/domain ids/, just to
be explicit.

> +	4. Device node distance-map must reside in the root node.

Presumably there should be no duplicate entries? We should state that
explicitly.

> +
> +Example:
> +	4 nodes connected in mesh/ring topology as below,
> +
> +		0_______20______1
> +		|		|
> +		|		|
> +	     20	|		|20
> +		|		|
> +		|		|
> +		|_______________|
> +		3	20	2
> +
> +	if relative distance for each hop is 20,
> +	then inter node distance would be for this topology will be,
> +	      0 -> 1 = 20
> +	      1 -> 2 = 20
> +	      2 -> 3 = 20
> +	      3 -> 0 = 20
> +	      0 -> 2 = 40
> +	      1 -> 3 = 40
> +
> +     and dt presentation for this distance matrix is,
> +
> +		distance-map {
> +			 distance-matrix = <0 0  10>,
> +					   <0 1  20>,
> +					   <0 2  40>,
> +					   <0 3  20>,
> +					   <1 0  20>,
> +					   <1 1  10>,
> +					   <1 2  20>,
> +					   <1 3  40>,
> +					   <2 0  40>,
> +					   <2 1  20>,
> +					   <2 2  10>,
> +					   <2 3  20>,
> +					   <3 0  20>,
> +					   <3 1  40>,
> +					   <3 2  20>,
> +					   <3 3  10>;
> +		};
> +
> +Note:
> +	 1. The entries like <0 0> <1 1>  <2 2> <3 3>
> +	    can be optional and system can put default value(local distance, i.e 10).

As mentioned above, I think this should go.

Other than the comments above, this is looking promising!

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 2/4] Documentation, dt, arm64/arm: dt bindings for numa.
  2015-10-20 15:35     ` Mark Rutland
@ 2015-10-21  4:27       ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-21  4:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, Marc Zyngier, Robert Richter

On Tue, Oct 20, 2015 at 9:05 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Tue, Oct 20, 2015 at 04:15:29PM +0530, Ganapatrao Kulkarni wrote:
>> DT bindings for numa mapping of memory, cores and IOs.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>  Documentation/devicetree/bindings/arm/numa.txt | 275 +++++++++++++++++++++++++
>>  1 file changed, 275 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>>
>> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
>> new file mode 100644
>> index 0000000..f3bc8e6
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/numa.txt
>> @@ -0,0 +1,275 @@
>> +==============================================================================
>> +NUMA binding description.
>> +==============================================================================
>> +
>> +==============================================================================
>> +1 - Introduction
>> +==============================================================================
>> +
>> +Systems employing a Non Uniform Memory Access (NUMA) architecture contain
>> +collections of hardware resources including processors, memory, and I/O buses,
>> +that comprise what is commonly known as a NUMA node.
>> +Processor accesses to memory within the local NUMA node is generally faster
>> +than processor accesses to memory outside of the local NUMA node.
>> +DT defines interfaces that allow the platform to convey NUMA node
>> +topology information to OS.
>> +
>> +==============================================================================
>> +2 - proximity
>> +==============================================================================
>> +The proximity device node property describes proximity domains within a
>> +machine. This property can be used in device nodes like cpu, memory, bus and
>> +devices to map to respective numa nodes.
>> +
>> +proximity property is a 32-bit integer which defines numa node id to which
>> +this device node has numa proximity association.
>> +
>> +Example:
>> +     /* numa node 0 */
>> +     proximity = <0>;
>> +
>> +     /* numa node 1 */
>> +     proximity = <1>;
>
>
> It would probably be better to call this something like "numa-domain-id"
> or "numa-node-id". The "proximity" is a relationship (that's actually
> described in the distance map), and it makes it obvious that this is
> NUMA related.
ok, numa-node-id makes more appropriate.
>
>> +
>> +==============================================================================
>> +3 - distance-map
>> +==============================================================================
>> +
>> +The device tree node distance-map describes the relative
>> +distance (memory latency) between all numa nodes.
>
> Rather than making this another magic name, we should give it a
> compatible string. That will also help if/when updating this in future.
thanks, we can have compatible string, which helps in future expansions.
>
>> +
>> +- distance-matrix
>> +  This property defines a matrix to describe the relative distances
>> +  between all numa nodes.
>> +  It is represented as a list of node pairs and their relative distance.
>> +
>> +  Note:
>> +     1. If there is no distance-map, the system should setup:
>> +
>> +                   local/local:  10
>> +                   local/remote: 20
>> +     for all node distances.
>
> I think that either you have both the IDs and a distance map, or we
> assume !NUMA (as we currently do). If your system is so trivial that the
> above defaults are good enough, it's trivial to write them explicitly.
it might be trivial to mention this explicitly, however definitely
2 node numa system is not trivial!!
>
> So I think this should go.
ok.
>
>> +
>> +     2. If both directions between 2 nodes have the same distance, only
>> +            one entry is required.
>
> So there's a direction implied by each entry? That should be stated
> explicitly.
ok
>
> That said, I'm having some difficulty comprehending an asymmetric
> distance, and I worry that it's ill-defined.
>
> What does the direction apply to specifically?
>
> How is it to be interpreted?
>
> Assuming I have two domains A and B, and I have:
>
>         distance-matrix = <A B 1>, <B A 255>;
>
> What does that mean for those domains? What's fast and what is slow?
lesser the distance value indicates less inter node access latency.
for cpu present in node A to access memory of node B, the latency
would be 1(less latency)
for other-way scenario, it is 255(more latency)

i am not sure how the system behaves with asymmetric distance, however
function sched_init_numa tries to assess that the distance are symmetric or not
and for asymmetric, it prints warning.
>
>> +     3. distance-matrix shold have entries in ascending order of nodes.
>
> s/ascending/lexicographical ascending/, and s/nodes/domain ids/, just to
> be explicit.
ok
>
>> +     4. Device node distance-map must reside in the root node.
>
> Presumably there should be no duplicate entries? We should state that
> explicitly.
ok
>
>> +
>> +Example:
>> +     4 nodes connected in mesh/ring topology as below,
>> +
>> +             0_______20______1
>> +             |               |
>> +             |               |
>> +          20 |               |20
>> +             |               |
>> +             |               |
>> +             |_______________|
>> +             3       20      2
>> +
>> +     if relative distance for each hop is 20,
>> +     then inter node distance would be for this topology will be,
>> +           0 -> 1 = 20
>> +           1 -> 2 = 20
>> +           2 -> 3 = 20
>> +           3 -> 0 = 20
>> +           0 -> 2 = 40
>> +           1 -> 3 = 40
>> +
>> +     and dt presentation for this distance matrix is,
>> +
>> +             distance-map {
>> +                      distance-matrix = <0 0  10>,
>> +                                        <0 1  20>,
>> +                                        <0 2  40>,
>> +                                        <0 3  20>,
>> +                                        <1 0  20>,
>> +                                        <1 1  10>,
>> +                                        <1 2  20>,
>> +                                        <1 3  40>,
>> +                                        <2 0  40>,
>> +                                        <2 1  20>,
>> +                                        <2 2  10>,
>> +                                        <2 3  20>,
>> +                                        <3 0  20>,
>> +                                        <3 1  40>,
>> +                                        <3 2  20>,
>> +                                        <3 3  10>;
>> +             };
>> +
>> +Note:
>> +      1. The entries like <0 0> <1 1>  <2 2> <3 3>
>> +         can be optional and system can put default value(local distance, i.e 10).
>
> As mentioned above, I think this should go.
ok
>
> Other than the comments above, this is looking promising!
thanks
>
> Thanks,
> Mark.
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] arm64, numa: adding numa support for arm64 platforms.
  2015-10-20 14:47     ` Mark Rutland
@ 2015-10-21  8:54       ` Ganapatrao Kulkarni
       [not found]         ` <CAFpQJXUXw2AP-fJR0eLJFnHQML3RbtbP7iNXT3RFtdc+0jzvKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-21  8:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, Marc Zyngier, Robert Richter

On Tue, Oct 20, 2015 at 8:17 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> Hi,
>
> I'm away for the rest of this week and don't have the time to give this
> a full review, but I've given this a first pass and have some high-level
> comments:
>
> First, most of this is copy+paste from x86. We should try to share code
> rather than duplicating it. Especially as it looks like there's cleanup
> (and therefore divergence) that could happen.
this is arch specific glue layer for numa.
there might be some common functions and there will be arch specific
hacks/exceptions.
if we try to pull out common code then still will have some arch
specific code or end up with unavoidable ifdefs.
IMHO, this code better to be in arch (in single file).
>
> Second, this reimplements memblock to associate nids with memory
> regions. I think we should keep memblock around for this (I'm under the
> impression that Ard also wants that for memory attributes), rather than
> creating a new memblock-like API that we then have to reconcile with
> actual memblock information.
thanks, this seems to be good idea to reuse the memblock code/infrastructures.
will work on this.
>
> Third, NAK to the changes to /proc/cpuinfo, please drop that from the
> patch. Further comments on that matter are inline below.
ok.
>
> On Tue, Oct 20, 2015 at 04:15:28PM +0530, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This patch adds by default the dummy numa node and
>> maps all memory and cpus to node 0.
>> using this patch, numa can be simulated on single node arm64 platforms.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>  arch/arm64/Kconfig              |  25 ++
>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>  arch/arm64/include/asm/numa.h   |  63 +++++
>>  arch/arm64/kernel/setup.c       |   9 +
>>  arch/arm64/kernel/smp.c         |   2 +
>>  arch/arm64/mm/Makefile          |   1 +
>>  arch/arm64/mm/init.c            |  31 ++-
>>  arch/arm64/mm/numa.c            | 531 ++++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 675 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>  create mode 100644 arch/arm64/include/asm/numa.h
>>  create mode 100644 arch/arm64/mm/numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7d95663..0f9cdc7 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -68,6 +68,7 @@ config ARM64
>>       select HAVE_GENERIC_DMA_COHERENT
>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>       select HAVE_MEMBLOCK
>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>       select HAVE_PATA_PLATFORM
>>       select HAVE_PERF_EVENTS
>>       select HAVE_PERF_REGS
>> @@ -414,6 +415,30 @@ config HOTPLUG_CPU
>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>         can be controlled through /sys/devices/system/cpu.
>>
>> +# Common NUMA Features
>> +config NUMA
>> +     bool "Numa Memory Allocation and Scheduler Support"
>> +     depends on SMP
>> +     help
>> +       Enable NUMA (Non Uniform Memory Access) support.
>> +
>> +       The kernel will try to allocate memory used by a CPU on the
>> +       local memory controller of the CPU and add some more
>> +       NUMA awareness to the kernel.
>> +
>> +config NODES_SHIFT
>> +     int "Maximum NUMA Nodes (as a power of 2)"
>> +     range 1 10
>> +     default "2"
>> +     depends on NEED_MULTIPLE_NODES
>> +     help
>> +       Specify the maximum number of NUMA Nodes available on the target
>> +       system.  Increases memory reserved to accommodate various tables.
>
> How much memory do we end up requiring per node?
not much. however it is proportional to number of max nodes supported
by the system.
>
>> +
>> +config USE_PERCPU_NUMA_NODE_ID
>> +     def_bool y
>> +     depends on NUMA
>> +
>>  source kernel/Kconfig.preempt
>>
>>  config HZ
>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>> new file mode 100644
>> index 0000000..6ddd468
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mmzone.h
>> @@ -0,0 +1,17 @@
>> +#ifndef __ASM_ARM64_MMZONE_H_
>> +#define __ASM_ARM64_MMZONE_H_
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#include <linux/mmdebug.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/smp.h>
>> +#include <asm/numa.h>
>> +
>> +extern struct pglist_data *node_data[];
>> +
>> +#define NODE_DATA(nid)               (node_data[(nid)])
>> +
>> +#endif /* CONFIG_NUMA */
>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> new file mode 100644
>> index 0000000..cadbd24
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -0,0 +1,63 @@
>> +#ifndef _ASM_NUMA_H
>> +#define _ASM_NUMA_H
>> +
>> +#include <linux/nodemask.h>
>> +#include <asm/topology.h>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>> +
>> +/* currently, arm64 implements flat NUMA topology */
>> +#define parent_node(node)    (node)
>> +
>> +extern int __node_distance(int from, int to);
>> +#define node_distance(a, b) __node_distance(a, b)
>> +
>> +/* dummy definitions for pci functions */
>> +#define pcibus_to_node(node) 0
>> +#define cpumask_of_pcibus(bus)       0
>> +
>> +struct __node_cpu_hwid {
>> +     int node_id;    /* logical node containing this CPU */
>> +     u64 cpu_hwid;   /* MPIDR for this CPU */
>> +};
>
> We already have the MPIDR ID in the cpu_logical_map. Please don't
> duplicate it here.
>
> As node_cpu_hwid seems to be indexed by logical ID, you can simlpy use
> the same index for the logical map to get the MPIDR ID when necessary.
thanks, this solves what we wanted to achieve with this mapping.
>
>> +
>> +struct numa_memblk {
>> +     u64 start;
>> +     u64 end;
>> +     int nid;
>> +};
>> +
>> +struct numa_meminfo {
>> +     int nr_blks;
>> +     struct numa_memblk blk[NR_NODE_MEMBLKS];
>> +};
>
> I think we should keep the usual memblock around for this. It already
> has some nid support.
ok.
>
>> +
>> +extern struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
>> +extern nodemask_t numa_nodes_parsed __initdata;
>> +
>> +/* Mappings between node number and cpus on that node. */
>> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +extern void numa_clear_node(unsigned int cpu);
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +extern const struct cpumask *cpumask_of_node(int node);
>> +#else
>> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>> +static inline const struct cpumask *cpumask_of_node(int node)
>> +{
>> +     return node_to_cpumask_map[node];
>> +}
>> +#endif
>> +
>> +void __init arm64_numa_init(void);
>> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
>> +void __init numa_set_distance(int from, int to, int distance);
>> +void __init numa_reset_distance(void);
>> +void numa_store_cpu_info(unsigned int cpu);
>> +#else        /* CONFIG_NUMA */
>> +static inline void numa_store_cpu_info(unsigned int cpu)             { }
>> +static inline void arm64_numa_init(void)             { }
>> +#endif       /* CONFIG_NUMA */
>> +#endif       /* _ASM_NUMA_H */
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index a2794da..4f3623d 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -54,6 +54,7 @@
>>  #include <asm/elf.h>
>>  #include <asm/cpufeature.h>
>>  #include <asm/cpu_ops.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/smp_plat.h>
>> @@ -485,6 +486,9 @@ static int __init topology_init(void)
>>  {
>>       int i;
>>
>> +     for_each_online_node(i)
>> +             register_one_node(i);
>> +
>>       for_each_possible_cpu(i) {
>>               struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>>               cpu->hotpluggable = 1;
>> @@ -557,7 +561,12 @@ static int c_show(struct seq_file *m, void *v)
>>                * online processors, looking for lines beginning with
>>                * "processor".  Give glibc what it expects.
>>                */
>> +#ifdef CONFIG_NUMA
>> +             seq_printf(m, "processor\t: %d", i);
>> +             seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
>> +#else
>>               seq_printf(m, "processor\t: %d\n", i);
>> +#endif
>
> As above, NAK to a /proc/cpuinfo change.
>
> We don't have this on arch/arm and didn't previously have it on arm64,
> so it could easily break existing software (both compat and native).
> Having the format randomly change based on a config option is also not
> great, and there's already been enough pain in this area.
>
> Additionally, other architctures don't have this, so it's clearly not
> necessary.
ok, will remove this.
>
> Surely there's a (portable/consistent) sysfs interface that provides the
> NUMA information userspace requires? If not, we should add one that
> works across architectures.
>
>>
>>               /*
>>                * Dump out the common processor features in a single line.
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index dbdaacd..985ee04 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -45,6 +45,7 @@
>>  #include <asm/cputype.h>
>>  #include <asm/cpu_ops.h>
>>  #include <asm/mmu_context.h>
>> +#include <asm/numa.h>
>>  #include <asm/pgtable.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/processor.h>
>> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>  static void smp_store_cpu_info(unsigned int cpuid)
>>  {
>>       store_cpu_topology(cpuid);
>> +     numa_store_cpu_info(cpuid);
>>  }
>>
>>  /*
>> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
>> index 773d37a..bb92d41 100644
>> --- a/arch/arm64/mm/Makefile
>> +++ b/arch/arm64/mm/Makefile
>> @@ -4,3 +4,4 @@ obj-y                         := dma-mapping.o extable.o fault.o init.o \
>>                                  context.o proc.o pageattr.o
>>  obj-$(CONFIG_HUGETLB_PAGE)   += hugetlbpage.o
>>  obj-$(CONFIG_ARM64_PTDUMP)   += dump.o
>> +obj-$(CONFIG_NUMA)           += numa.o
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 697a6d0..81a0316 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -37,6 +37,7 @@
>>
>>  #include <asm/fixmap.h>
>>  #include <asm/memory.h>
>> +#include <asm/numa.h>
>>  #include <asm/sections.h>
>>  #include <asm/setup.h>
>>  #include <asm/sizes.h>
>> @@ -77,6 +78,20 @@ static phys_addr_t max_zone_dma_phys(void)
>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>  }
>>
>> +#ifdef CONFIG_NUMA
>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>> +{
>> +     unsigned long max_zone_pfns[MAX_NR_ZONES];
>> +
>> +     memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>
> You can make this simpler by initialising the variable when defining it:
>
>         unsigned long max_zone_pfs[MAX_NR_ZONES] = { 0 };
ok
>
>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>> +     max_zone_pfns[ZONE_NORMAL] = max;
>> +
>> +     free_area_init_nodes(max_zone_pfns);
>> +}
>> +
>> +#else
>>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>  {
>>       struct memblock_region *reg;
>> @@ -115,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>
>>       free_area_init_node(0, zone_size, min, zhole_size);
>>  }
>> +#endif /* CONFIG_NUMA */
>>
>>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>  int pfn_valid(unsigned long pfn)
>> @@ -132,10 +148,15 @@ static void arm64_memory_present(void)
>>  static void arm64_memory_present(void)
>>  {
>>       struct memblock_region *reg;
>> +     int nid = 0;
>>
>> -     for_each_memblock(memory, reg)
>> -             memory_present(0, memblock_region_memory_base_pfn(reg),
>> -                            memblock_region_memory_end_pfn(reg));
>> +     for_each_memblock(memory, reg) {
>> +#ifdef CONFIG_NUMA
>> +             nid = reg->nid;
>> +#endif
>> +             memory_present(nid, memblock_region_memory_base_pfn(reg),
>> +                             memblock_region_memory_end_pfn(reg));
>> +     }
>>  }
>>  #endif
>>
>> @@ -192,6 +213,9 @@ void __init bootmem_init(void)
>>
>>       early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>>
>> +     max_pfn = max_low_pfn = max;
>> +
>> +     arm64_numa_init();
>>       /*
>>        * Sparsemem tries to allocate bootmem in memory_present(), so must be
>>        * done after the fixed reservations.
>> @@ -202,7 +226,6 @@ void __init bootmem_init(void)
>>       zone_sizes_init(min, max);
>>
>>       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> -     max_pfn = max_low_pfn = max;
>>  }
>>
>>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> new file mode 100644
>> index 0000000..4dd7436
>> --- /dev/null
>> +++ b/arch/arm64/mm/numa.c
>> @@ -0,0 +1,531 @@
>> +/*
>> + * NUMA support, based on the x86 implementation.
>> + *
>> + * Copyright (C) 2015 Cavium Inc.
>> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/string.h>
>> +#include <linux/init.h>
>> +#include <linux/bootmem.h>
>> +#include <linux/memblock.h>
>> +#include <linux/ctype.h>
>> +#include <linux/module.h>
>> +#include <linux/nodemask.h>
>> +#include <linux/sched.h>
>> +#include <linux/topology.h>
>> +#include <linux/mmzone.h>
>
> Nit: please sort these alphabetically.
ok
>
>> +
>> +#include <asm/smp_plat.h>
>> +
>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>> +EXPORT_SYMBOL(node_data);
>> +nodemask_t numa_nodes_parsed __initdata;
>> +struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
>> +
>> +static int numa_off;
>> +static int numa_distance_cnt;
>> +static u8 *numa_distance;
>> +static struct numa_meminfo numa_meminfo;
>> +
>> +static __init int numa_parse_early_param(char *opt)
>> +{
>> +     if (!opt)
>> +             return -EINVAL;
>> +     if (!strncmp(opt, "off", 3)) {
>> +             pr_info("%s\n", "NUMA turned off");
>> +             numa_off = 1;
>> +     }
>> +     return 0;
>> +}
>> +early_param("numa", numa_parse_early_param);
>> +
>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +EXPORT_SYMBOL(node_to_cpumask_map);
>> +
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +/*
>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> + */
>> +const struct cpumask *cpumask_of_node(int node)
>> +{
>> +     if (node >= nr_node_ids) {
>> +             pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
>> +                     node, nr_node_ids);
>> +             dump_stack();
>> +             return cpu_none_mask;
>> +     }
>
> This can be:
>
>         if (WARN_ON(node >= nr_node_ids))
>                 return cpu_none_mask;
>
ok
>> +     if (node_to_cpumask_map[node] == NULL) {
>> +             pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
>> +                     node);
>> +             dump_stack();
>> +             return cpu_online_mask;
>> +     }
>
> Likewise:
>
>         if (WARN_ON(!node_to_cpumask_map[node]))
>                 return cpu_online_mask;
ok.
>
>> +     return node_to_cpumask_map[node];
>> +}
>> +EXPORT_SYMBOL(cpumask_of_node);
>> +#endif
>> +
>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> +{
>> +     set_cpu_numa_node(cpu, nid);
>> +     if (nid >= 0)
>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> +}
>> +
>> +static void unmap_cpu_to_node(unsigned int cpu)
>> +{
>> +     int nid = cpu_to_node(cpu);
>> +
>> +     if (nid >= 0)
>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +}
>> +
>> +void numa_clear_node(unsigned int cpu)
>> +{
>> +     unmap_cpu_to_node(cpu);
>> +}
>> +
>> +/*
>> + * Allocate node_to_cpumask_map based on number of available nodes
>> + * Requires node_possible_map to be valid.
>> + *
>> + * Note: cpumask_of_node() is not valid until after this is done.
>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> + */
>> +static void __init setup_node_to_cpumask_map(void)
>> +{
>> +     unsigned int cpu;
>> +     int node;
>> +
>> +     /* setup nr_node_ids if not done yet */
>> +     if (nr_node_ids == MAX_NUMNODES)
>> +             setup_nr_node_ids();
>
> Where would this be done otherwise?
in function free_area_init_nodes
>
> If we can initialise this earlier, what happens if we actually had
> MAX_NUMNODES nodes?
init of table for actual number of nodes, this is just to avoid
uncesary table allocation.
>
>> +
>> +     /* allocate the map */
>> +     for (node = 0; node < nr_node_ids; node++)
>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> +
>> +     /* Clear the mapping */
>> +     for (node = 0; node < nr_node_ids; node++)
>> +             cpumask_clear(node_to_cpumask_map[node]);
>
> Why not do these at the same time?
ok, can be done.
>
> Can an allocation fail?
most unlikely, however will add check.
>
>> +
>> +     for_each_possible_cpu(cpu)
>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +
>> +     /* cpumask_of_node() will now work */
>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> +}
>> +
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(unsigned int cpu)
>> +{
>> +     map_cpu_to_node(cpu, numa_off ? 0 : node_cpu_hwid[cpu].node_id);
>> +}
>> +
>> +/**
>> + * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
>> + */
>> +
>> +static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>> +                                  struct numa_meminfo *mi)
>> +{
>> +     /* ignore zero length blks */
>> +     if (start == end)
>> +             return 0;
>> +
>> +     /* whine about and ignore invalid blks */
>> +     if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
>> +             pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
>> +                             nid, start, end - 1);
>> +             return 0;
>> +     }
>
> When would this happen?
if acpi or dt binding is wrong/corrupt.
>
>> +
>> +     if (mi->nr_blks >= NR_NODE_MEMBLKS) {
>> +             pr_err("NUMA: too many memblk ranges\n");
>> +             return -EINVAL;
>> +     }
>> +
>> +     pr_info("NUMA: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
>> +                     mi->nr_blks, start, end, nid);
>> +     mi->blk[mi->nr_blks].start = start;
>> +     mi->blk[mi->nr_blks].end = end;
>> +     mi->blk[mi->nr_blks].nid = nid;
>> +     mi->nr_blks++;
>> +     return 0;
>> +}
>
> As I mentioned earlier, I think that we should keep the memblock
> infrastructure around, and reuse it here.
will try as said in the beginning.
>
>> +
>> +/**
>> + * numa_add_memblk - Add one numa_memblk to numa_meminfo
>> + * @nid: NUMA node ID of the new memblk
>> + * @start: Start address of the new memblk
>> + * @end: End address of the new memblk
>> + *
>> + * Add a new memblk to the default numa_meminfo.
>> + *
>> + * RETURNS:
>> + * 0 on success, -errno on failure.
>> + */
>> +#define MAX_PHYS_ADDR        ((phys_addr_t)~0)
>
> We should probably rethink MAX_MEMBLOCK_ADDR and potentially make it
> more generic so we can use it here and elsewhere. See commit
> 8eafeb4802281651 ("of/fdt: make memblock maximum physical address arch
> configurable").
>
> However, that might not matter if we're able to reuse memblock.
ok
>
>> +
>> +int __init numa_add_memblk(int nid, u64 base, u64 end)
>> +{
>> +     const u64 phys_offset = __pa(PAGE_OFFSET);
>> +
>> +     base &= PAGE_MASK;
>> +     end &= PAGE_MASK;
>> +
>> +     if (base > MAX_PHYS_ADDR) {
>> +             pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
>> +                             base, base + end);
>> +             return -ENOMEM;
>> +     }
>> +
>> +     if (base + end > MAX_PHYS_ADDR) {
>> +             pr_info("NUMA: Ignoring memory range 0x%lx - 0x%llx\n",
>> +                             ULONG_MAX, base + end);
>> +             end = MAX_PHYS_ADDR - base;
>> +     }
>> +
>> +     if (base + end < phys_offset) {
>> +             pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
>> +                        base, base + end);
>> +             return -ENOMEM;
>> +     }
>> +     if (base < phys_offset) {
>> +             pr_info("NUMA: Ignoring memory range 0x%llx - 0x%llx\n",
>> +                        base, phys_offset);
>> +             end -= phys_offset - base;
>> +             base = phys_offset;
>> +     }
>> +
>> +     return numa_add_memblk_to(nid, base, base + end, &numa_meminfo);
>> +}
>> +EXPORT_SYMBOL(numa_add_memblk);
>
> I take it this is only used to look up the node for a given mermoy
> region, rather than any region describe being usable by the kernel?
> Otherwise that rounding of the base is worrying.
yes this is to look up node for a memblock.
>
>> +
>> +/* Initialize NODE_DATA for a node on the local memory */
>> +static void __init setup_node_data(int nid, u64 start, u64 end)
>> +{
>> +     const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
>> +     u64 nd_pa;
>> +     void *nd;
>> +     int tnid;
>> +
>> +     start = roundup(start, ZONE_ALIGN);
>> +
>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +            nid, start, end - 1);
>> +
>> +     /*
>> +      * Allocate node data.  Try node-local memory and then any node.
>> +      */
>> +     nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>
> Why was nd_size rounded to PAGE_SIZE earlier if we only care about
it is just aligned to pg_data_t too.
> SMP_CACHE_BYTES alignment? I had assumed we wanted naturally-aligned
> pages, but that doesn't seem to be the case given the above.
what is the concern here?
>
>> +     if (!nd_pa) {
>> +             nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
>> +                                           MEMBLOCK_ALLOC_ACCESSIBLE);
>> +             if (!nd_pa) {
>> +                     pr_err("Cannot find %zu bytes in node %d\n",
>> +                            nd_size, nid);
>> +                     return;
>> +             }
>> +     }
>> +     nd = __va(nd_pa);
>
> Isn't memblock_alloc_try_nid sufficient for the above?
thanks, will replace with memblock_alloc_try_nid.
>
>> +
>> +     /* report and initialize */
>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +            nd_pa, nd_pa + nd_size - 1);
>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> +     if (tnid != nid)
>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> +
>> +     node_data[nid] = nd;
>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> +     NODE_DATA(nid)->node_id = nid;
>> +     NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
>> +     NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
>> +
>> +     node_set_online(nid);
>> +}
>> +
>> +/*
>> + * Set nodes, which have memory in @mi, in *@nodemask.
>> + */
>> +static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
>> +                                           const struct numa_meminfo *mi)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
>> +             if (mi->blk[i].start != mi->blk[i].end &&
>> +                 mi->blk[i].nid != NUMA_NO_NODE)
>> +                     node_set(mi->blk[i].nid, *nodemask);
>> +}
>> +
>> +/*
>> + * Sanity check to catch more bad NUMA configurations (they are amazingly
>> + * common).  Make sure the nodes cover all memory.
>> + */
>
> This comment is surprising, given this functionality is brand new.
sorry. this is from x86, will remove it.
>
>> +static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
>> +{
>> +     u64 numaram, totalram;
>> +     int i;
>> +
>> +     numaram = 0;
>> +     for (i = 0; i < mi->nr_blks; i++) {
>> +             u64 s = mi->blk[i].start >> PAGE_SHIFT;
>> +             u64 e = mi->blk[i].end >> PAGE_SHIFT;
>> +
>> +             numaram += e - s;
>> +             numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
>> +             if ((s64)numaram < 0)
>> +                     numaram = 0;
>> +     }
>> +
>> +     totalram = max_pfn - absent_pages_in_range(0, max_pfn);
>> +
>> +     /* We seem to lose 3 pages somewhere. Allow 1M of slack. */
>
> We shouldn't rely on magic like this.
>
> Where and why are we "losing" pages, and why is that considered ok?
need to experiment to know that is applicable still.
will remove, if not relavant for us.
>
>> +     if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
>> +             pr_err("NUMA: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
>> +                    (numaram << PAGE_SHIFT) >> 20,
>> +                    (totalram << PAGE_SHIFT) >> 20);
>> +             return false;
>> +     }
>> +     return true;
>> +}
>> +
>> +/**
>> + * numa_reset_distance - Reset NUMA distance table
>> + *
>> + * The current table is freed.
>> + * The next numa_set_distance() call will create a new one.
>> + */
>> +void __init numa_reset_distance(void)
>> +{
>> +     size_t size = numa_distance_cnt * numa_distance_cnt *
>> +             sizeof(numa_distance[0]);
>> +
>> +     /* numa_distance could be 1LU marking allocation failure, test cnt */
>> +     if (numa_distance_cnt)
>> +             memblock_free(__pa(numa_distance), size);
>> +     numa_distance_cnt = 0;
>> +     numa_distance = NULL;   /* enable table creation */
>> +}
>> +
>> +static int __init numa_alloc_distance(void)
>> +{
>> +     nodemask_t nodes_parsed;
>> +     size_t size;
>> +     int i, j, cnt = 0;
>> +     u64 phys;
>> +
>> +     /* size the new table and allocate it */
>> +     nodes_parsed = numa_nodes_parsed;
>> +     numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
>> +
>> +     for_each_node_mask(i, nodes_parsed)
>> +             cnt = i;
>> +     cnt++;
>> +     size = cnt * cnt * sizeof(numa_distance[0]);
>> +
>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> +                                   size, PAGE_SIZE);
>> +     if (!phys) {
>> +             pr_warn("NUMA: Warning: can't allocate distance table!\n");
>> +             /* don't retry until explicitly reset */
>> +             numa_distance = (void *)1LU;
>
> This doesn't look good. Why do we need to set this to a non-pointer
> value?
to avoid further attempts until unless table is reset.
>
> Thanks,
> Mark.
>
>> +             return -ENOMEM;
>> +     }
>> +     memblock_reserve(phys, size);
>> +
>> +     numa_distance = __va(phys);
>> +     numa_distance_cnt = cnt;
>> +
>> +     /* fill with the default distances */
>> +     for (i = 0; i < cnt; i++)
>> +             for (j = 0; j < cnt; j++)
>> +                     numa_distance[i * cnt + j] = i == j ?
>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n", cnt);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> + * @from: the 'from' node to set distance
>> + * @to: the 'to'  node to set distance
>> + * @distance: NUMA distance
>> + *
>> + * Set the distance from node @from to @to to @distance.  If distance table
>> + * doesn't exist, one which is large enough to accommodate all the currently
>> + * known nodes will be created.
>> + *
>> + * If such table cannot be allocated, a warning is printed and further
>> + * calls are ignored until the distance table is reset with
>> + * numa_reset_distance().
>> + *
>> + * If @from or @to is higher than the highest known node or lower than zero
>> + * at the time of table creation or @distance doesn't make sense, the call
>> + * is ignored.
>> + * This is to allow simplification of specific NUMA config implementations.
>> + */
>> +void __init numa_set_distance(int from, int to, int distance)
>> +{
>> +     if (!numa_distance && numa_alloc_distance() < 0)
>> +             return;
>> +
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> +                     from < 0 || to < 0) {
>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> +                         from, to, distance);
>> +             return;
>> +     }
>> +
>> +     if ((u8)distance != distance ||
>> +         (from == to && distance != LOCAL_DISTANCE)) {
>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> +                          from, to, distance);
>> +             return;
>> +     }
>> +
>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>> +}
>> +EXPORT_SYMBOL(numa_set_distance);
>> +
>> +int __node_distance(int from, int to)
>> +{
>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +     return numa_distance[from * numa_distance_cnt + to];
>> +}
>> +EXPORT_SYMBOL(__node_distance);
>> +
>> +static int __init numa_register_memblks(struct numa_meminfo *mi)
>> +{
>> +     unsigned long uninitialized_var(pfn_align);
>> +     int i, nid;
>> +
>> +     /* Account for nodes with cpus and no memory */
>> +     node_possible_map = numa_nodes_parsed;
>> +     numa_nodemask_from_meminfo(&node_possible_map, mi);
>> +     if (WARN_ON(nodes_empty(node_possible_map)))
>> +             return -EINVAL;
>> +
>> +     for (i = 0; i < mi->nr_blks; i++) {
>> +             struct numa_memblk *mb = &mi->blk[i];
>> +
>> +             memblock_set_node(mb->start, mb->end - mb->start,
>> +                               &memblock.memory, mb->nid);
>> +     }
>> +
>> +     /*
>> +      * If sections array is gonna be used for pfn -> nid mapping, check
>> +      * whether its granularity is fine enough.
>> +      */
>> +#ifdef NODE_NOT_IN_PAGE_FLAGS
>> +     pfn_align = node_map_pfn_alignment();
>> +     if (pfn_align && pfn_align < PAGES_PER_SECTION) {
>> +             pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
>> +                    PFN_PHYS(pfn_align) >> 20,
>> +                    PFN_PHYS(PAGES_PER_SECTION) >> 20);
>> +             return -EINVAL;
>> +     }
>> +#endif
>> +     if (!numa_meminfo_cover_memory(mi))
>> +             return -EINVAL;
>> +
>> +     /* Finally register nodes. */
>> +     for_each_node_mask(nid, node_possible_map) {
>> +             u64 start = PFN_PHYS(max_pfn);
>> +             u64 end = 0;
>> +
>> +             for (i = 0; i < mi->nr_blks; i++) {
>> +                     if (nid != mi->blk[i].nid)
>> +                             continue;
>> +                     start = min(mi->blk[i].start, start);
>> +                     end = max(mi->blk[i].end, end);
>> +             }
>> +
>> +             if (start < end)
>> +                     setup_node_data(nid, start, end);
>> +     }
>> +
>> +     /* Dump memblock with node info and return. */
>> +     memblock_dump_all();
>> +     return 0;
>> +}
>> +
>> +static int __init numa_init(int (*init_func)(void))
>> +{
>> +     int ret;
>> +
>> +     nodes_clear(numa_nodes_parsed);
>> +     nodes_clear(node_possible_map);
>> +     nodes_clear(node_online_map);
>> +     numa_reset_distance();
>> +
>> +     ret = init_func();
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     ret = numa_register_memblks(&numa_meminfo);
>> +     if (ret < 0)
>> +             return ret;
>> +
>> +     setup_node_to_cpumask_map();
>> +
>> +     /* init boot processor */
>> +     map_cpu_to_node(0, 0);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>> + */
>> +static int __init dummy_numa_init(void)
>> +{
>> +     pr_info("%s\n", "No NUMA configuration found");
>> +     pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> +            0LLU, PFN_PHYS(max_pfn) - 1);
>> +     node_set(0, numa_nodes_parsed);
>> +     numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> +     numa_off = 1;
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * arm64_numa_init - Initialize NUMA
>> + *
>> + * Try each configured NUMA initialization method until one succeeds.  The
>> + * last fallback is dummy single node config encomapssing whole memory and
>> + * never fails.
>> + */
>> +void __init arm64_numa_init(void)
>> +{
>> +     numa_init(dummy_numa_init);
>> +}
>> --
>> 1.8.1.4
>>
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] arm64, numa: adding numa support for arm64 platforms.
       [not found]   ` <1445337931-11344-2-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
  2015-10-20 14:47     ` Mark Rutland
@ 2015-10-23 15:11     ` Matthias Brugger
       [not found]       ` <562A4E21.2060600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Matthias Brugger @ 2015-10-23 15:11 UTC (permalink / raw)
  To: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will.Deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A,
	hanjun.guo-QSEj5FYQhm4dnm+yROfE0A,
	al.stone-QSEj5FYQhm4dnm+yROfE0A, arnd-r2nGTMty4D4,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, rjw-LthD3rsA81gm4RdzfppkhA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, marc.zyngier-5wv7dgnIgG8,
	rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun.Kapoor-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8
  Cc: gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w



On 20/10/15 12:45, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This patch adds by default the dummy numa node and
> maps all memory and cpus to node 0.
> using this patch, numa can be simulated on single node arm64 platforms.
>
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>   arch/arm64/Kconfig              |  25 ++
>   arch/arm64/include/asm/mmzone.h |  17 ++
>   arch/arm64/include/asm/numa.h   |  63 +++++
>   arch/arm64/kernel/setup.c       |   9 +
>   arch/arm64/kernel/smp.c         |   2 +
>   arch/arm64/mm/Makefile          |   1 +
>   arch/arm64/mm/init.c            |  31 ++-
>   arch/arm64/mm/numa.c            | 531 ++++++++++++++++++++++++++++++++++++++++
>   8 files changed, 675 insertions(+), 4 deletions(-)
>   create mode 100644 arch/arm64/include/asm/mmzone.h
>   create mode 100644 arch/arm64/include/asm/numa.h
>   create mode 100644 arch/arm64/mm/numa.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7d95663..0f9cdc7 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -68,6 +68,7 @@ config ARM64
>   	select HAVE_GENERIC_DMA_COHERENT
>   	select HAVE_HW_BREAKPOINT if PERF_EVENTS
>   	select HAVE_MEMBLOCK
> +	select HAVE_MEMBLOCK_NODE_MAP if NUMA
>   	select HAVE_PATA_PLATFORM
>   	select HAVE_PERF_EVENTS
>   	select HAVE_PERF_REGS
> @@ -414,6 +415,30 @@ config HOTPLUG_CPU
>   	  Say Y here to experiment with turning CPUs off and on.  CPUs
>   	  can be controlled through /sys/devices/system/cpu.
>
> +# Common NUMA Features
> +config NUMA
> +	bool "Numa Memory Allocation and Scheduler Support"
> +	depends on SMP
> +	help
> +	  Enable NUMA (Non Uniform Memory Access) support.
> +
> +	  The kernel will try to allocate memory used by a CPU on the
> +	  local memory controller of the CPU and add some more
> +	  NUMA awareness to the kernel.
> +
> +config NODES_SHIFT
> +	int "Maximum NUMA Nodes (as a power of 2)"
> +	range 1 10
> +	default "2"
> +	depends on NEED_MULTIPLE_NODES
> +	help
> +	  Specify the maximum number of NUMA Nodes available on the target
> +	  system.  Increases memory reserved to accommodate various tables.
> +
> +config USE_PERCPU_NUMA_NODE_ID
> +	def_bool y
> +	depends on NUMA
> +
>   source kernel/Kconfig.preempt
>
>   config HZ
> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
> new file mode 100644
> index 0000000..6ddd468
> --- /dev/null
> +++ b/arch/arm64/include/asm/mmzone.h
> @@ -0,0 +1,17 @@
> +#ifndef __ASM_ARM64_MMZONE_H_
> +#define __ASM_ARM64_MMZONE_H_
> +
> +#ifdef CONFIG_NUMA
> +
> +#include <linux/mmdebug.h>
> +#include <linux/types.h>
> +
> +#include <asm/smp.h>
> +#include <asm/numa.h>
> +
> +extern struct pglist_data *node_data[];
> +
> +#define NODE_DATA(nid)		(node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_ARM64_MMZONE_H_ */
> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
> new file mode 100644
> index 0000000..cadbd24
> --- /dev/null
> +++ b/arch/arm64/include/asm/numa.h
> @@ -0,0 +1,63 @@
> +#ifndef _ASM_NUMA_H
> +#define _ASM_NUMA_H
> +
> +#include <linux/nodemask.h>
> +#include <asm/topology.h>
> +
> +#ifdef CONFIG_NUMA
> +
> +#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
> +
> +/* currently, arm64 implements flat NUMA topology */
> +#define parent_node(node)	(node)
> +
> +extern int __node_distance(int from, int to);
> +#define node_distance(a, b) __node_distance(a, b)
> +
> +/* dummy definitions for pci functions */
> +#define pcibus_to_node(node)	0
> +#define cpumask_of_pcibus(bus)	0
> +
> +struct __node_cpu_hwid {
> +	int node_id;    /* logical node containing this CPU */
> +	u64 cpu_hwid;   /* MPIDR for this CPU */
> +};
> +
> +struct numa_memblk {
> +	u64 start;
> +	u64 end;
> +	int nid;
> +};
> +
> +struct numa_meminfo {
> +	int nr_blks;
> +	struct numa_memblk blk[NR_NODE_MEMBLKS];
> +};
> +
> +extern struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
> +extern nodemask_t numa_nodes_parsed __initdata;
> +
> +/* Mappings between node number and cpus on that node. */
> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +extern void numa_clear_node(unsigned int cpu);
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +extern const struct cpumask *cpumask_of_node(int node);
> +#else
> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> +static inline const struct cpumask *cpumask_of_node(int node)
> +{
> +	return node_to_cpumask_map[node];
> +}
> +#endif
> +
> +void __init arm64_numa_init(void);
> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
> +void __init numa_set_distance(int from, int to, int distance);
> +void __init numa_reset_distance(void);
> +void numa_store_cpu_info(unsigned int cpu);
> +#else	/* CONFIG_NUMA */
> +static inline void numa_store_cpu_info(unsigned int cpu)		{ }
> +static inline void arm64_numa_init(void)		{ }
> +#endif	/* CONFIG_NUMA */
> +#endif	/* _ASM_NUMA_H */
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index a2794da..4f3623d 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -54,6 +54,7 @@
>   #include <asm/elf.h>
>   #include <asm/cpufeature.h>
>   #include <asm/cpu_ops.h>
> +#include <asm/numa.h>
>   #include <asm/sections.h>
>   #include <asm/setup.h>
>   #include <asm/smp_plat.h>
> @@ -485,6 +486,9 @@ static int __init topology_init(void)
>   {
>   	int i;
>
> +	for_each_online_node(i)
> +		register_one_node(i);
> +
>   	for_each_possible_cpu(i) {
>   		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>   		cpu->hotpluggable = 1;
> @@ -557,7 +561,12 @@ static int c_show(struct seq_file *m, void *v)
>   		 * online processors, looking for lines beginning with
>   		 * "processor".  Give glibc what it expects.
>   		 */
> +#ifdef CONFIG_NUMA
> +		seq_printf(m, "processor\t: %d", i);
> +		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
> +#else
>   		seq_printf(m, "processor\t: %d\n", i);
> +#endif
>
>   		/*
>   		 * Dump out the common processor features in a single line.
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index dbdaacd..985ee04 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -45,6 +45,7 @@
>   #include <asm/cputype.h>
>   #include <asm/cpu_ops.h>
>   #include <asm/mmu_context.h>
> +#include <asm/numa.h>
>   #include <asm/pgtable.h>
>   #include <asm/pgalloc.h>
>   #include <asm/processor.h>
> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>   static void smp_store_cpu_info(unsigned int cpuid)
>   {
>   	store_cpu_topology(cpuid);
> +	numa_store_cpu_info(cpuid);
>   }
>
>   /*
> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index 773d37a..bb92d41 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -4,3 +4,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
>   				   context.o proc.o pageattr.o
>   obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
>   obj-$(CONFIG_ARM64_PTDUMP)	+= dump.o
> +obj-$(CONFIG_NUMA)		+= numa.o
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 697a6d0..81a0316 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -37,6 +37,7 @@
>
>   #include <asm/fixmap.h>
>   #include <asm/memory.h>
> +#include <asm/numa.h>
>   #include <asm/sections.h>
>   #include <asm/setup.h>
>   #include <asm/sizes.h>
> @@ -77,6 +78,20 @@ static phys_addr_t max_zone_dma_phys(void)
>   	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>   }
>
> +#ifdef CONFIG_NUMA
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> +	unsigned long max_zone_pfns[MAX_NR_ZONES];
> +
> +	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
> +	if (IS_ENABLED(CONFIG_ZONE_DMA))
> +		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
> +	max_zone_pfns[ZONE_NORMAL] = max;
> +
> +	free_area_init_nodes(max_zone_pfns);
> +}
> +
> +#else
>   static void __init zone_sizes_init(unsigned long min, unsigned long max)
>   {
>   	struct memblock_region *reg;
> @@ -115,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>
>   	free_area_init_node(0, zone_size, min, zhole_size);
>   }
> +#endif /* CONFIG_NUMA */
>
>   #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>   int pfn_valid(unsigned long pfn)
> @@ -132,10 +148,15 @@ static void arm64_memory_present(void)
>   static void arm64_memory_present(void)
>   {
>   	struct memblock_region *reg;
> +	int nid = 0;
>
> -	for_each_memblock(memory, reg)
> -		memory_present(0, memblock_region_memory_base_pfn(reg),
> -			       memblock_region_memory_end_pfn(reg));
> +	for_each_memblock(memory, reg) {
> +#ifdef CONFIG_NUMA
> +		nid = reg->nid;
> +#endif
> +		memory_present(nid, memblock_region_memory_base_pfn(reg),
> +				memblock_region_memory_end_pfn(reg));
> +	}
>   }
>   #endif
>
> @@ -192,6 +213,9 @@ void __init bootmem_init(void)
>
>   	early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>
> +	max_pfn = max_low_pfn = max;
> +
> +	arm64_numa_init();
>   	/*
>   	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
>   	 * done after the fixed reservations.
> @@ -202,7 +226,6 @@ void __init bootmem_init(void)
>   	zone_sizes_init(min, max);
>
>   	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> -	max_pfn = max_low_pfn = max;
>   }
>
>   #ifndef CONFIG_SPARSEMEM_VMEMMAP
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> new file mode 100644
> index 0000000..4dd7436
> --- /dev/null
> +++ b/arch/arm64/mm/numa.c
> @@ -0,0 +1,531 @@
> +/*
> + * NUMA support, based on the x86 implementation.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/string.h>
> +#include <linux/init.h>
> +#include <linux/bootmem.h>
> +#include <linux/memblock.h>
> +#include <linux/ctype.h>
> +#include <linux/module.h>
> +#include <linux/nodemask.h>
> +#include <linux/sched.h>
> +#include <linux/topology.h>
> +#include <linux/mmzone.h>
> +
> +#include <asm/smp_plat.h>
> +
> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
> +EXPORT_SYMBOL(node_data);
> +nodemask_t numa_nodes_parsed __initdata;
> +struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
> +
> +static int numa_off;
> +static int numa_distance_cnt;
> +static u8 *numa_distance;
> +static struct numa_meminfo numa_meminfo;
> +
> +static __init int numa_parse_early_param(char *opt)
> +{
> +	if (!opt)
> +		return -EINVAL;
> +	if (!strncmp(opt, "off", 3)) {
> +		pr_info("%s\n", "NUMA turned off");
> +		numa_off = 1;
> +	}
> +	return 0;
> +}
> +early_param("numa", numa_parse_early_param);
> +
> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
> +EXPORT_SYMBOL(node_to_cpumask_map);
> +
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +/*
> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
> + */
> +const struct cpumask *cpumask_of_node(int node)
> +{
> +	if (node >= nr_node_ids) {
> +		pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
> +			node, nr_node_ids);
> +		dump_stack();
> +		return cpu_none_mask;
> +	}
> +	if (node_to_cpumask_map[node] == NULL) {
> +		pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
> +			node);
> +		dump_stack();
> +		return cpu_online_mask;
> +	}
> +	return node_to_cpumask_map[node];
> +}
> +EXPORT_SYMBOL(cpumask_of_node);
> +#endif
> +
> +static void map_cpu_to_node(unsigned int cpu, int nid)
> +{
> +	set_cpu_numa_node(cpu, nid);
> +	if (nid >= 0)
> +		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
> +}
> +
> +static void unmap_cpu_to_node(unsigned int cpu)
> +{
> +	int nid = cpu_to_node(cpu);
> +
> +	if (nid >= 0)
> +		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
> +	set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +}
> +
> +void numa_clear_node(unsigned int cpu)
> +{
> +	unmap_cpu_to_node(cpu);
> +}
> +
> +/*
> + * Allocate node_to_cpumask_map based on number of available nodes
> + * Requires node_possible_map to be valid.
> + *
> + * Note: cpumask_of_node() is not valid until after this is done.
> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
> + */
> +static void __init setup_node_to_cpumask_map(void)
> +{
> +	unsigned int cpu;
> +	int node;
> +
> +	/* setup nr_node_ids if not done yet */
> +	if (nr_node_ids == MAX_NUMNODES)
> +		setup_nr_node_ids();
> +
> +	/* allocate the map */
> +	for (node = 0; node < nr_node_ids; node++)
> +		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
> +
> +	/* Clear the mapping */
> +	for (node = 0; node < nr_node_ids; node++)
> +		cpumask_clear(node_to_cpumask_map[node]);
> +
> +	for_each_possible_cpu(cpu)
> +		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> +
> +	/* cpumask_of_node() will now work */
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
> +}
> +
> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(unsigned int cpu)
> +{
> +	map_cpu_to_node(cpu, numa_off ? 0 : node_cpu_hwid[cpu].node_id);
> +}
> +
> +/**
> + * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
> + */
> +
> +static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
> +				     struct numa_meminfo *mi)
> +{
> +	/* ignore zero length blks */
> +	if (start == end)
> +		return 0;
> +
> +	/* whine about and ignore invalid blks */
> +	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
> +		pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
> +				nid, start, end - 1);
> +		return 0;
> +	}
> +
> +	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
> +		pr_err("NUMA: too many memblk ranges\n");
> +		return -EINVAL;
> +	}
> +
> +	pr_info("NUMA: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
> +			mi->nr_blks, start, end, nid);
> +	mi->blk[mi->nr_blks].start = start;
> +	mi->blk[mi->nr_blks].end = end;
> +	mi->blk[mi->nr_blks].nid = nid;
> +	mi->nr_blks++;
> +	return 0;
> +}
> +
> +/**
> + * numa_add_memblk - Add one numa_memblk to numa_meminfo
> + * @nid: NUMA node ID of the new memblk
> + * @start: Start address of the new memblk
> + * @end: End address of the new memblk
> + *
> + * Add a new memblk to the default numa_meminfo.
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +#define MAX_PHYS_ADDR	((phys_addr_t)~0)
> +
> +int __init numa_add_memblk(int nid, u64 base, u64 end)
> +{
> +	const u64 phys_offset = __pa(PAGE_OFFSET);
> +
> +	base &= PAGE_MASK;
> +	end &= PAGE_MASK;
> +
> +	if (base > MAX_PHYS_ADDR) {
> +		pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
> +				base, base + end);
> +		return -ENOMEM;
> +	}
> +
> +	if (base + end > MAX_PHYS_ADDR) {
> +		pr_info("NUMA: Ignoring memory range 0x%lx - 0x%llx\n",
> +				ULONG_MAX, base + end);
> +		end = MAX_PHYS_ADDR - base;
> +	}
> +
> +	if (base + end < phys_offset) {
> +		pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
> +			   base, base + end);
> +		return -ENOMEM;
> +	}
> +	if (base < phys_offset) {
> +		pr_info("NUMA: Ignoring memory range 0x%llx - 0x%llx\n",
> +			   base, phys_offset);
> +		end -= phys_offset - base;
> +		base = phys_offset;
> +	}
> +
> +	return numa_add_memblk_to(nid, base, base + end, &numa_meminfo);
> +}
> +EXPORT_SYMBOL(numa_add_memblk);
> +
> +/* Initialize NODE_DATA for a node on the local memory */
> +static void __init setup_node_data(int nid, u64 start, u64 end)
> +{
> +	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
> +	u64 nd_pa;
> +	void *nd;
> +	int tnid;
> +
> +	start = roundup(start, ZONE_ALIGN);
> +
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +	       nid, start, end - 1);
> +
> +	/*
> +	 * Allocate node data.  Try node-local memory and then any node.
> +	 */
> +	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	if (!nd_pa) {
> +		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
> +					      MEMBLOCK_ALLOC_ACCESSIBLE);
> +		if (!nd_pa) {
> +			pr_err("Cannot find %zu bytes in node %d\n",
> +			       nd_size, nid);
> +			return;
> +		}
> +	}
> +	nd = __va(nd_pa);
> +
> +	/* report and initialize */
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +	       nd_pa, nd_pa + nd_size - 1);
> +	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
> +	if (tnid != nid)
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> +
> +	node_data[nid] = nd;
> +	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> +	NODE_DATA(nid)->node_id = nid;
> +	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
> +	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
> +
> +	node_set_online(nid);
> +}
> +
> +/*
> + * Set nodes, which have memory in @mi, in *@nodemask.
> + */
> +static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
> +					      const struct numa_meminfo *mi)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
> +		if (mi->blk[i].start != mi->blk[i].end &&
> +		    mi->blk[i].nid != NUMA_NO_NODE)
> +			node_set(mi->blk[i].nid, *nodemask);
> +}
> +
> +/*
> + * Sanity check to catch more bad NUMA configurations (they are amazingly
> + * common).  Make sure the nodes cover all memory.
> + */
> +static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
> +{
> +	u64 numaram, totalram;
> +	int i;
> +
> +	numaram = 0;
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		u64 s = mi->blk[i].start >> PAGE_SHIFT;
> +		u64 e = mi->blk[i].end >> PAGE_SHIFT;
> +
> +		numaram += e - s;
> +		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
> +		if ((s64)numaram < 0)
> +			numaram = 0;
> +	}
> +
> +	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
> +
> +	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
> +	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
> +		pr_err("NUMA: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
> +		       (numaram << PAGE_SHIFT) >> 20,
> +		       (totalram << PAGE_SHIFT) >> 20);
> +		return false;
> +	}
> +	return true;
> +}
> +
> +/**
> + * numa_reset_distance - Reset NUMA distance table
> + *
> + * The current table is freed.
> + * The next numa_set_distance() call will create a new one.
> + */
> +void __init numa_reset_distance(void)
> +{
> +	size_t size = numa_distance_cnt * numa_distance_cnt *
> +		sizeof(numa_distance[0]);
> +
> +	/* numa_distance could be 1LU marking allocation failure, test cnt */
> +	if (numa_distance_cnt)
> +		memblock_free(__pa(numa_distance), size);
> +	numa_distance_cnt = 0;
> +	numa_distance = NULL;	/* enable table creation */
> +}
> +
> +static int __init numa_alloc_distance(void)
> +{
> +	nodemask_t nodes_parsed;
> +	size_t size;
> +	int i, j, cnt = 0;
> +	u64 phys;
> +
> +	/* size the new table and allocate it */
> +	nodes_parsed = numa_nodes_parsed;
> +	numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
> +
> +	for_each_node_mask(i, nodes_parsed)
> +		cnt = i;
> +	cnt++;
> +	size = cnt * cnt * sizeof(numa_distance[0]);
> +
> +	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
> +				      size, PAGE_SIZE);
> +	if (!phys) {
> +		pr_warn("NUMA: Warning: can't allocate distance table!\n");
> +		/* don't retry until explicitly reset */
> +		numa_distance = (void *)1LU;
> +		return -ENOMEM;
> +	}
> +	memblock_reserve(phys, size);
> +
> +	numa_distance = __va(phys);
> +	numa_distance_cnt = cnt;
> +
> +	/* fill with the default distances */
> +	for (i = 0; i < cnt; i++)
> +		for (j = 0; j < cnt; j++)
> +			numa_distance[i * cnt + j] = i == j ?
> +				LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	pr_debug("NUMA: Initialized distance table, cnt=%d\n", cnt);
> +
> +	return 0;
> +}
> +
> +/**
> + * numa_set_distance - Set NUMA distance from one NUMA to another
> + * @from: the 'from' node to set distance
> + * @to: the 'to'  node to set distance
> + * @distance: NUMA distance
> + *
> + * Set the distance from node @from to @to to @distance.  If distance table
> + * doesn't exist, one which is large enough to accommodate all the currently
> + * known nodes will be created.
> + *
> + * If such table cannot be allocated, a warning is printed and further
> + * calls are ignored until the distance table is reset with
> + * numa_reset_distance().
> + *
> + * If @from or @to is higher than the highest known node or lower than zero
> + * at the time of table creation or @distance doesn't make sense, the call
> + * is ignored.
> + * This is to allow simplification of specific NUMA config implementations.
> + */
> +void __init numa_set_distance(int from, int to, int distance)
> +{
> +	if (!numa_distance && numa_alloc_distance() < 0)
> +		return;
> +
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
> +			from < 0 || to < 0) {
> +		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +			    from, to, distance);
> +		return;
> +	}
> +
> +	if ((u8)distance != distance ||
> +	    (from == to && distance != LOCAL_DISTANCE)) {
> +		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +			     from, to, distance);
> +		return;
> +	}
> +
> +	numa_distance[from * numa_distance_cnt + to] = distance;
> +}
> +EXPORT_SYMBOL(numa_set_distance);
> +
> +int __node_distance(int from, int to)
> +{
> +	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
> +		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
> +	return numa_distance[from * numa_distance_cnt + to];
> +}
> +EXPORT_SYMBOL(__node_distance);
> +
> +static int __init numa_register_memblks(struct numa_meminfo *mi)
> +{
> +	unsigned long uninitialized_var(pfn_align);
> +	int i, nid;
> +
> +	/* Account for nodes with cpus and no memory */
> +	node_possible_map = numa_nodes_parsed;
> +	numa_nodemask_from_meminfo(&node_possible_map, mi);
> +	if (WARN_ON(nodes_empty(node_possible_map)))

This will taint the kernel on all hardware which does not have a NUMA 
architecture but has the NUMA support turned on in its config.

Would it be possible to use pr_warn instead?

Regrads,
Matthias

> +		return -EINVAL;
> +
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *mb = &mi->blk[i];
> +
> +		memblock_set_node(mb->start, mb->end - mb->start,
> +				  &memblock.memory, mb->nid);
> +	}
> +
> +	/*
> +	 * If sections array is gonna be used for pfn -> nid mapping, check
> +	 * whether its granularity is fine enough.
> +	 */
> +#ifdef NODE_NOT_IN_PAGE_FLAGS
> +	pfn_align = node_map_pfn_alignment();
> +	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
> +		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
> +		       PFN_PHYS(pfn_align) >> 20,
> +		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
> +		return -EINVAL;
> +	}
> +#endif
> +	if (!numa_meminfo_cover_memory(mi))
> +		return -EINVAL;
> +
> +	/* Finally register nodes. */
> +	for_each_node_mask(nid, node_possible_map) {
> +		u64 start = PFN_PHYS(max_pfn);
> +		u64 end = 0;
> +
> +		for (i = 0; i < mi->nr_blks; i++) {
> +			if (nid != mi->blk[i].nid)
> +				continue;
> +			start = min(mi->blk[i].start, start);
> +			end = max(mi->blk[i].end, end);
> +		}
> +
> +		if (start < end)
> +			setup_node_data(nid, start, end);
> +	}
> +
> +	/* Dump memblock with node info and return. */
> +	memblock_dump_all();
> +	return 0;
> +}
> +
> +static int __init numa_init(int (*init_func)(void))
> +{
> +	int ret;
> +
> +	nodes_clear(numa_nodes_parsed);
> +	nodes_clear(node_possible_map);
> +	nodes_clear(node_online_map);
> +	numa_reset_distance();
> +
> +	ret = init_func();
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = numa_register_memblks(&numa_meminfo);
> +	if (ret < 0)
> +		return ret;
> +
> +	setup_node_to_cpumask_map();
> +
> +	/* init boot processor */
> +	map_cpu_to_node(0, 0);
> +
> +	return 0;
> +}
> +
> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.
> + */
> +static int __init dummy_numa_init(void)
> +{
> +	pr_info("%s\n", "No NUMA configuration found");
> +	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> +	       0LLU, PFN_PHYS(max_pfn) - 1);
> +	node_set(0, numa_nodes_parsed);
> +	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> +	numa_off = 1;
> +
> +	return 0;
> +}
> +
> +/**
> + * arm64_numa_init - Initialize NUMA
> + *
> + * Try each configured NUMA initialization method until one succeeds.  The
> + * last fallback is dummy single node config encomapssing whole memory and
> + * never fails.
> + */
> +void __init arm64_numa_init(void)
> +{
> +	numa_init(dummy_numa_init);
> +}
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] arm64, numa: adding numa support for arm64 platforms.
       [not found]       ` <562A4E21.2060600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-10-24 10:07         ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-24 10:07 UTC (permalink / raw)
  To: Matthias Brugger
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Rafael J. Wysocki,
	Len Brown, Marc Zyngier

Hi Matthias,

On Fri, Oct 23, 2015 at 8:41 PM, Matthias Brugger
<matthias.bgg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
> On 20/10/15 12:45, Ganapatrao Kulkarni wrote:
>>
>> Adding numa support for arm64 based platforms.
>> This patch adds by default the dummy numa node and
>> maps all memory and cpus to node 0.
>> using this patch, numa can be simulated on single node arm64 platforms.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>   arch/arm64/Kconfig              |  25 ++
>>   arch/arm64/include/asm/mmzone.h |  17 ++
>>   arch/arm64/include/asm/numa.h   |  63 +++++
>>   arch/arm64/kernel/setup.c       |   9 +
>>   arch/arm64/kernel/smp.c         |   2 +
>>   arch/arm64/mm/Makefile          |   1 +
>>   arch/arm64/mm/init.c            |  31 ++-
>>   arch/arm64/mm/numa.c            | 531
>> ++++++++++++++++++++++++++++++++++++++++
>>   8 files changed, 675 insertions(+), 4 deletions(-)
>>   create mode 100644 arch/arm64/include/asm/mmzone.h
>>   create mode 100644 arch/arm64/include/asm/numa.h
>>   create mode 100644 arch/arm64/mm/numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7d95663..0f9cdc7 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -68,6 +68,7 @@ config ARM64
>>         select HAVE_GENERIC_DMA_COHERENT
>>         select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>         select HAVE_MEMBLOCK
>> +       select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>         select HAVE_PATA_PLATFORM
>>         select HAVE_PERF_EVENTS
>>         select HAVE_PERF_REGS
>> @@ -414,6 +415,30 @@ config HOTPLUG_CPU
>>           Say Y here to experiment with turning CPUs off and on.  CPUs
>>           can be controlled through /sys/devices/system/cpu.
>>
>> +# Common NUMA Features
>> +config NUMA
>> +       bool "Numa Memory Allocation and Scheduler Support"
>> +       depends on SMP
>> +       help
>> +         Enable NUMA (Non Uniform Memory Access) support.
>> +
>> +         The kernel will try to allocate memory used by a CPU on the
>> +         local memory controller of the CPU and add some more
>> +         NUMA awareness to the kernel.
>> +
>> +config NODES_SHIFT
>> +       int "Maximum NUMA Nodes (as a power of 2)"
>> +       range 1 10
>> +       default "2"
>> +       depends on NEED_MULTIPLE_NODES
>> +       help
>> +         Specify the maximum number of NUMA Nodes available on the target
>> +         system.  Increases memory reserved to accommodate various
>> tables.
>> +
>> +config USE_PERCPU_NUMA_NODE_ID
>> +       def_bool y
>> +       depends on NUMA
>> +
>>   source kernel/Kconfig.preempt
>>
>>   config HZ
>> diff --git a/arch/arm64/include/asm/mmzone.h
>> b/arch/arm64/include/asm/mmzone.h
>> new file mode 100644
>> index 0000000..6ddd468
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mmzone.h
>> @@ -0,0 +1,17 @@
>> +#ifndef __ASM_ARM64_MMZONE_H_
>> +#define __ASM_ARM64_MMZONE_H_
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#include <linux/mmdebug.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/smp.h>
>> +#include <asm/numa.h>
>> +
>> +extern struct pglist_data *node_data[];
>> +
>> +#define NODE_DATA(nid)         (node_data[(nid)])
>> +
>> +#endif /* CONFIG_NUMA */
>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>> new file mode 100644
>> index 0000000..cadbd24
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/numa.h
>> @@ -0,0 +1,63 @@
>> +#ifndef _ASM_NUMA_H
>> +#define _ASM_NUMA_H
>> +
>> +#include <linux/nodemask.h>
>> +#include <asm/topology.h>
>> +
>> +#ifdef CONFIG_NUMA
>> +
>> +#define NR_NODE_MEMBLKS                (MAX_NUMNODES * 2)
>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>> +
>> +/* currently, arm64 implements flat NUMA topology */
>> +#define parent_node(node)      (node)
>> +
>> +extern int __node_distance(int from, int to);
>> +#define node_distance(a, b) __node_distance(a, b)
>> +
>> +/* dummy definitions for pci functions */
>> +#define pcibus_to_node(node)   0
>> +#define cpumask_of_pcibus(bus) 0
>> +
>> +struct __node_cpu_hwid {
>> +       int node_id;    /* logical node containing this CPU */
>> +       u64 cpu_hwid;   /* MPIDR for this CPU */
>> +};
>> +
>> +struct numa_memblk {
>> +       u64 start;
>> +       u64 end;
>> +       int nid;
>> +};
>> +
>> +struct numa_meminfo {
>> +       int nr_blks;
>> +       struct numa_memblk blk[NR_NODE_MEMBLKS];
>> +};
>> +
>> +extern struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
>> +extern nodemask_t numa_nodes_parsed __initdata;
>> +
>> +/* Mappings between node number and cpus on that node. */
>> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +extern void numa_clear_node(unsigned int cpu);
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +extern const struct cpumask *cpumask_of_node(int node);
>> +#else
>> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>> +static inline const struct cpumask *cpumask_of_node(int node)
>> +{
>> +       return node_to_cpumask_map[node];
>> +}
>> +#endif
>> +
>> +void __init arm64_numa_init(void);
>> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
>> +void __init numa_set_distance(int from, int to, int distance);
>> +void __init numa_reset_distance(void);
>> +void numa_store_cpu_info(unsigned int cpu);
>> +#else  /* CONFIG_NUMA */
>> +static inline void numa_store_cpu_info(unsigned int cpu)               {
>> }
>> +static inline void arm64_numa_init(void)               { }
>> +#endif /* CONFIG_NUMA */
>> +#endif /* _ASM_NUMA_H */
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index a2794da..4f3623d 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -54,6 +54,7 @@
>>   #include <asm/elf.h>
>>   #include <asm/cpufeature.h>
>>   #include <asm/cpu_ops.h>
>> +#include <asm/numa.h>
>>   #include <asm/sections.h>
>>   #include <asm/setup.h>
>>   #include <asm/smp_plat.h>
>> @@ -485,6 +486,9 @@ static int __init topology_init(void)
>>   {
>>         int i;
>>
>> +       for_each_online_node(i)
>> +               register_one_node(i);
>> +
>>         for_each_possible_cpu(i) {
>>                 struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>>                 cpu->hotpluggable = 1;
>> @@ -557,7 +561,12 @@ static int c_show(struct seq_file *m, void *v)
>>                  * online processors, looking for lines beginning with
>>                  * "processor".  Give glibc what it expects.
>>                  */
>> +#ifdef CONFIG_NUMA
>> +               seq_printf(m, "processor\t: %d", i);
>> +               seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
>> +#else
>>                 seq_printf(m, "processor\t: %d\n", i);
>> +#endif
>>
>>                 /*
>>                  * Dump out the common processor features in a single
>> line.
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index dbdaacd..985ee04 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -45,6 +45,7 @@
>>   #include <asm/cputype.h>
>>   #include <asm/cpu_ops.h>
>>   #include <asm/mmu_context.h>
>> +#include <asm/numa.h>
>>   #include <asm/pgtable.h>
>>   #include <asm/pgalloc.h>
>>   #include <asm/processor.h>
>> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct
>> *idle)
>>   static void smp_store_cpu_info(unsigned int cpuid)
>>   {
>>         store_cpu_topology(cpuid);
>> +       numa_store_cpu_info(cpuid);
>>   }
>>
>>   /*
>> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
>> index 773d37a..bb92d41 100644
>> --- a/arch/arm64/mm/Makefile
>> +++ b/arch/arm64/mm/Makefile
>> @@ -4,3 +4,4 @@ obj-y                           := dma-mapping.o extable.o
>> fault.o init.o \
>>                                    context.o proc.o pageattr.o
>>   obj-$(CONFIG_HUGETLB_PAGE)    += hugetlbpage.o
>>   obj-$(CONFIG_ARM64_PTDUMP)    += dump.o
>> +obj-$(CONFIG_NUMA)             += numa.o
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 697a6d0..81a0316 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -37,6 +37,7 @@
>>
>>   #include <asm/fixmap.h>
>>   #include <asm/memory.h>
>> +#include <asm/numa.h>
>>   #include <asm/sections.h>
>>   #include <asm/setup.h>
>>   #include <asm/sizes.h>
>> @@ -77,6 +78,20 @@ static phys_addr_t max_zone_dma_phys(void)
>>         return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>   }
>>
>> +#ifdef CONFIG_NUMA
>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>> +{
>> +       unsigned long max_zone_pfns[MAX_NR_ZONES];
>> +
>> +       memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>> +       if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +               max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>> +       max_zone_pfns[ZONE_NORMAL] = max;
>> +
>> +       free_area_init_nodes(max_zone_pfns);
>> +}
>> +
>> +#else
>>   static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>   {
>>         struct memblock_region *reg;
>> @@ -115,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min,
>> unsigned long max)
>>
>>         free_area_init_node(0, zone_size, min, zhole_size);
>>   }
>> +#endif /* CONFIG_NUMA */
>>
>>   #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>   int pfn_valid(unsigned long pfn)
>> @@ -132,10 +148,15 @@ static void arm64_memory_present(void)
>>   static void arm64_memory_present(void)
>>   {
>>         struct memblock_region *reg;
>> +       int nid = 0;
>>
>> -       for_each_memblock(memory, reg)
>> -               memory_present(0, memblock_region_memory_base_pfn(reg),
>> -                              memblock_region_memory_end_pfn(reg));
>> +       for_each_memblock(memory, reg) {
>> +#ifdef CONFIG_NUMA
>> +               nid = reg->nid;
>> +#endif
>> +               memory_present(nid, memblock_region_memory_base_pfn(reg),
>> +                               memblock_region_memory_end_pfn(reg));
>> +       }
>>   }
>>   #endif
>>
>> @@ -192,6 +213,9 @@ void __init bootmem_init(void)
>>
>>         early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>>
>> +       max_pfn = max_low_pfn = max;
>> +
>> +       arm64_numa_init();
>>         /*
>>          * Sparsemem tries to allocate bootmem in memory_present(), so
>> must be
>>          * done after the fixed reservations.
>> @@ -202,7 +226,6 @@ void __init bootmem_init(void)
>>         zone_sizes_init(min, max);
>>
>>         high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> -       max_pfn = max_low_pfn = max;
>>   }
>>
>>   #ifndef CONFIG_SPARSEMEM_VMEMMAP
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> new file mode 100644
>> index 0000000..4dd7436
>> --- /dev/null
>> +++ b/arch/arm64/mm/numa.c
>> @@ -0,0 +1,531 @@
>> +/*
>> + * NUMA support, based on the x86 implementation.
>> + *
>> + * Copyright (C) 2015 Cavium Inc.
>> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/string.h>
>> +#include <linux/init.h>
>> +#include <linux/bootmem.h>
>> +#include <linux/memblock.h>
>> +#include <linux/ctype.h>
>> +#include <linux/module.h>
>> +#include <linux/nodemask.h>
>> +#include <linux/sched.h>
>> +#include <linux/topology.h>
>> +#include <linux/mmzone.h>
>> +
>> +#include <asm/smp_plat.h>
>> +
>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>> +EXPORT_SYMBOL(node_data);
>> +nodemask_t numa_nodes_parsed __initdata;
>> +struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
>> +
>> +static int numa_off;
>> +static int numa_distance_cnt;
>> +static u8 *numa_distance;
>> +static struct numa_meminfo numa_meminfo;
>> +
>> +static __init int numa_parse_early_param(char *opt)
>> +{
>> +       if (!opt)
>> +               return -EINVAL;
>> +       if (!strncmp(opt, "off", 3)) {
>> +               pr_info("%s\n", "NUMA turned off");
>> +               numa_off = 1;
>> +       }
>> +       return 0;
>> +}
>> +early_param("numa", numa_parse_early_param);
>> +
>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>> +EXPORT_SYMBOL(node_to_cpumask_map);
>> +
>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>> +/*
>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>> + */
>> +const struct cpumask *cpumask_of_node(int node)
>> +{
>> +       if (node >= nr_node_ids) {
>> +               pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
>> +                       node, nr_node_ids);
>> +               dump_stack();
>> +               return cpu_none_mask;
>> +       }
>> +       if (node_to_cpumask_map[node] == NULL) {
>> +               pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
>> +                       node);
>> +               dump_stack();
>> +               return cpu_online_mask;
>> +       }
>> +       return node_to_cpumask_map[node];
>> +}
>> +EXPORT_SYMBOL(cpumask_of_node);
>> +#endif
>> +
>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>> +{
>> +       set_cpu_numa_node(cpu, nid);
>> +       if (nid >= 0)
>> +               cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>> +}
>> +
>> +static void unmap_cpu_to_node(unsigned int cpu)
>> +{
>> +       int nid = cpu_to_node(cpu);
>> +
>> +       if (nid >= 0)
>> +               cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>> +       set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +}
>> +
>> +void numa_clear_node(unsigned int cpu)
>> +{
>> +       unmap_cpu_to_node(cpu);
>> +}
>> +
>> +/*
>> + * Allocate node_to_cpumask_map based on number of available nodes
>> + * Requires node_possible_map to be valid.
>> + *
>> + * Note: cpumask_of_node() is not valid until after this is done.
>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>> + */
>> +static void __init setup_node_to_cpumask_map(void)
>> +{
>> +       unsigned int cpu;
>> +       int node;
>> +
>> +       /* setup nr_node_ids if not done yet */
>> +       if (nr_node_ids == MAX_NUMNODES)
>> +               setup_nr_node_ids();
>> +
>> +       /* allocate the map */
>> +       for (node = 0; node < nr_node_ids; node++)
>> +               alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>> +
>> +       /* Clear the mapping */
>> +       for (node = 0; node < nr_node_ids; node++)
>> +               cpumask_clear(node_to_cpumask_map[node]);
>> +
>> +       for_each_possible_cpu(cpu)
>> +               set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> +
>> +       /* cpumask_of_node() will now work */
>> +       pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>> +}
>> +
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(unsigned int cpu)
>> +{
>> +       map_cpu_to_node(cpu, numa_off ? 0 : node_cpu_hwid[cpu].node_id);
>> +}
>> +
>> +/**
>> + * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
>> + */
>> +
>> +static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>> +                                    struct numa_meminfo *mi)
>> +{
>> +       /* ignore zero length blks */
>> +       if (start == end)
>> +               return 0;
>> +
>> +       /* whine about and ignore invalid blks */
>> +       if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
>> +               pr_warn("NUMA: Warning: invalid memblk node %d [mem
>> %#010Lx-%#010Lx]\n",
>> +                               nid, start, end - 1);
>> +               return 0;
>> +       }
>> +
>> +       if (mi->nr_blks >= NR_NODE_MEMBLKS) {
>> +               pr_err("NUMA: too many memblk ranges\n");
>> +               return -EINVAL;
>> +       }
>> +
>> +       pr_info("NUMA: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
>> +                       mi->nr_blks, start, end, nid);
>> +       mi->blk[mi->nr_blks].start = start;
>> +       mi->blk[mi->nr_blks].end = end;
>> +       mi->blk[mi->nr_blks].nid = nid;
>> +       mi->nr_blks++;
>> +       return 0;
>> +}
>> +
>> +/**
>> + * numa_add_memblk - Add one numa_memblk to numa_meminfo
>> + * @nid: NUMA node ID of the new memblk
>> + * @start: Start address of the new memblk
>> + * @end: End address of the new memblk
>> + *
>> + * Add a new memblk to the default numa_meminfo.
>> + *
>> + * RETURNS:
>> + * 0 on success, -errno on failure.
>> + */
>> +#define MAX_PHYS_ADDR  ((phys_addr_t)~0)
>> +
>> +int __init numa_add_memblk(int nid, u64 base, u64 end)
>> +{
>> +       const u64 phys_offset = __pa(PAGE_OFFSET);
>> +
>> +       base &= PAGE_MASK;
>> +       end &= PAGE_MASK;
>> +
>> +       if (base > MAX_PHYS_ADDR) {
>> +               pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
>> +                               base, base + end);
>> +               return -ENOMEM;
>> +       }
>> +
>> +       if (base + end > MAX_PHYS_ADDR) {
>> +               pr_info("NUMA: Ignoring memory range 0x%lx - 0x%llx\n",
>> +                               ULONG_MAX, base + end);
>> +               end = MAX_PHYS_ADDR - base;
>> +       }
>> +
>> +       if (base + end < phys_offset) {
>> +               pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
>> +                          base, base + end);
>> +               return -ENOMEM;
>> +       }
>> +       if (base < phys_offset) {
>> +               pr_info("NUMA: Ignoring memory range 0x%llx - 0x%llx\n",
>> +                          base, phys_offset);
>> +               end -= phys_offset - base;
>> +               base = phys_offset;
>> +       }
>> +
>> +       return numa_add_memblk_to(nid, base, base + end, &numa_meminfo);
>> +}
>> +EXPORT_SYMBOL(numa_add_memblk);
>> +
>> +/* Initialize NODE_DATA for a node on the local memory */
>> +static void __init setup_node_data(int nid, u64 start, u64 end)
>> +{
>> +       const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
>> +       u64 nd_pa;
>> +       void *nd;
>> +       int tnid;
>> +
>> +       start = roundup(start, ZONE_ALIGN);
>> +
>> +       pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +              nid, start, end - 1);
>> +
>> +       /*
>> +        * Allocate node data.  Try node-local memory and then any node.
>> +        */
>> +       nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +       if (!nd_pa) {
>> +               nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
>> +                                             MEMBLOCK_ALLOC_ACCESSIBLE);
>> +               if (!nd_pa) {
>> +                       pr_err("Cannot find %zu bytes in node %d\n",
>> +                              nd_size, nid);
>> +                       return;
>> +               }
>> +       }
>> +       nd = __va(nd_pa);
>> +
>> +       /* report and initialize */
>> +       pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +              nd_pa, nd_pa + nd_size - 1);
>> +       tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>> +       if (tnid != nid)
>> +               pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>> +
>> +       node_data[nid] = nd;
>> +       memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> +       NODE_DATA(nid)->node_id = nid;
>> +       NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
>> +       NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
>> +
>> +       node_set_online(nid);
>> +}
>> +
>> +/*
>> + * Set nodes, which have memory in @mi, in *@nodemask.
>> + */
>> +static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
>> +                                             const struct numa_meminfo
>> *mi)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
>> +               if (mi->blk[i].start != mi->blk[i].end &&
>> +                   mi->blk[i].nid != NUMA_NO_NODE)
>> +                       node_set(mi->blk[i].nid, *nodemask);
>> +}
>> +
>> +/*
>> + * Sanity check to catch more bad NUMA configurations (they are amazingly
>> + * common).  Make sure the nodes cover all memory.
>> + */
>> +static bool __init numa_meminfo_cover_memory(const struct numa_meminfo
>> *mi)
>> +{
>> +       u64 numaram, totalram;
>> +       int i;
>> +
>> +       numaram = 0;
>> +       for (i = 0; i < mi->nr_blks; i++) {
>> +               u64 s = mi->blk[i].start >> PAGE_SHIFT;
>> +               u64 e = mi->blk[i].end >> PAGE_SHIFT;
>> +
>> +               numaram += e - s;
>> +               numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
>> +               if ((s64)numaram < 0)
>> +                       numaram = 0;
>> +       }
>> +
>> +       totalram = max_pfn - absent_pages_in_range(0, max_pfn);
>> +
>> +       /* We seem to lose 3 pages somewhere. Allow 1M of slack. */
>> +       if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
>> +               pr_err("NUMA: nodes only cover %lluMB of your %lluMB Total
>> RAM. Not used.\n",
>> +                      (numaram << PAGE_SHIFT) >> 20,
>> +                      (totalram << PAGE_SHIFT) >> 20);
>> +               return false;
>> +       }
>> +       return true;
>> +}
>> +
>> +/**
>> + * numa_reset_distance - Reset NUMA distance table
>> + *
>> + * The current table is freed.
>> + * The next numa_set_distance() call will create a new one.
>> + */
>> +void __init numa_reset_distance(void)
>> +{
>> +       size_t size = numa_distance_cnt * numa_distance_cnt *
>> +               sizeof(numa_distance[0]);
>> +
>> +       /* numa_distance could be 1LU marking allocation failure, test cnt
>> */
>> +       if (numa_distance_cnt)
>> +               memblock_free(__pa(numa_distance), size);
>> +       numa_distance_cnt = 0;
>> +       numa_distance = NULL;   /* enable table creation */
>> +}
>> +
>> +static int __init numa_alloc_distance(void)
>> +{
>> +       nodemask_t nodes_parsed;
>> +       size_t size;
>> +       int i, j, cnt = 0;
>> +       u64 phys;
>> +
>> +       /* size the new table and allocate it */
>> +       nodes_parsed = numa_nodes_parsed;
>> +       numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
>> +
>> +       for_each_node_mask(i, nodes_parsed)
>> +               cnt = i;
>> +       cnt++;
>> +       size = cnt * cnt * sizeof(numa_distance[0]);
>> +
>> +       phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>> +                                     size, PAGE_SIZE);
>> +       if (!phys) {
>> +               pr_warn("NUMA: Warning: can't allocate distance
>> table!\n");
>> +               /* don't retry until explicitly reset */
>> +               numa_distance = (void *)1LU;
>> +               return -ENOMEM;
>> +       }
>> +       memblock_reserve(phys, size);
>> +
>> +       numa_distance = __va(phys);
>> +       numa_distance_cnt = cnt;
>> +
>> +       /* fill with the default distances */
>> +       for (i = 0; i < cnt; i++)
>> +               for (j = 0; j < cnt; j++)
>> +                       numa_distance[i * cnt + j] = i == j ?
>> +                               LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +       pr_debug("NUMA: Initialized distance table, cnt=%d\n", cnt);
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>> + * @from: the 'from' node to set distance
>> + * @to: the 'to'  node to set distance
>> + * @distance: NUMA distance
>> + *
>> + * Set the distance from node @from to @to to @distance.  If distance
>> table
>> + * doesn't exist, one which is large enough to accommodate all the
>> currently
>> + * known nodes will be created.
>> + *
>> + * If such table cannot be allocated, a warning is printed and further
>> + * calls are ignored until the distance table is reset with
>> + * numa_reset_distance().
>> + *
>> + * If @from or @to is higher than the highest known node or lower than
>> zero
>> + * at the time of table creation or @distance doesn't make sense, the
>> call
>> + * is ignored.
>> + * This is to allow simplification of specific NUMA config
>> implementations.
>> + */
>> +void __init numa_set_distance(int from, int to, int distance)
>> +{
>> +       if (!numa_distance && numa_alloc_distance() < 0)
>> +               return;
>> +
>> +       if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>> +                       from < 0 || to < 0) {
>> +               pr_warn_once("NUMA: Warning: node ids are out of bound,
>> from=%d to=%d distance=%d\n",
>> +                           from, to, distance);
>> +               return;
>> +       }
>> +
>> +       if ((u8)distance != distance ||
>> +           (from == to && distance != LOCAL_DISTANCE)) {
>> +               pr_warn_once("NUMA: Warning: invalid distance parameter,
>> from=%d to=%d distance=%d\n",
>> +                            from, to, distance);
>> +               return;
>> +       }
>> +
>> +       numa_distance[from * numa_distance_cnt + to] = distance;
>> +}
>> +EXPORT_SYMBOL(numa_set_distance);
>> +
>> +int __node_distance(int from, int to)
>> +{
>> +       if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>> +               return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>> +       return numa_distance[from * numa_distance_cnt + to];
>> +}
>> +EXPORT_SYMBOL(__node_distance);
>> +
>> +static int __init numa_register_memblks(struct numa_meminfo *mi)
>> +{
>> +       unsigned long uninitialized_var(pfn_align);
>> +       int i, nid;
>> +
>> +       /* Account for nodes with cpus and no memory */
>> +       node_possible_map = numa_nodes_parsed;
>> +       numa_nodemask_from_meminfo(&node_possible_map, mi);
>> +       if (WARN_ON(nodes_empty(node_possible_map)))
>
>
> This will taint the kernel on all hardware which does not have a NUMA
> architecture but has the NUMA support turned on in its config.
>
> Would it be possible to use pr_warn instead?
thanks for the review.
 this warning will not hit for non-numa systems.
for non-numa system it will create singel numa node (node0) system.
this warning will hit, if proximity property is corrupt or has invalid value.
>
> Regrads,
> Matthias
>
>
>> +               return -EINVAL;
>> +
>> +       for (i = 0; i < mi->nr_blks; i++) {
>> +               struct numa_memblk *mb = &mi->blk[i];
>> +
>> +               memblock_set_node(mb->start, mb->end - mb->start,
>> +                                 &memblock.memory, mb->nid);
>> +       }
>> +
>> +       /*
>> +        * If sections array is gonna be used for pfn -> nid mapping,
>> check
>> +        * whether its granularity is fine enough.
>> +        */
>> +#ifdef NODE_NOT_IN_PAGE_FLAGS
>> +       pfn_align = node_map_pfn_alignment();
>> +       if (pfn_align && pfn_align < PAGES_PER_SECTION) {
>> +               pr_warn("Node alignment %lluMB < min %lluMB, rejecting
>> NUMA config\n",
>> +                      PFN_PHYS(pfn_align) >> 20,
>> +                      PFN_PHYS(PAGES_PER_SECTION) >> 20);
>> +               return -EINVAL;
>> +       }
>> +#endif
>> +       if (!numa_meminfo_cover_memory(mi))
>> +               return -EINVAL;
>> +
>> +       /* Finally register nodes. */
>> +       for_each_node_mask(nid, node_possible_map) {
>> +               u64 start = PFN_PHYS(max_pfn);
>> +               u64 end = 0;
>> +
>> +               for (i = 0; i < mi->nr_blks; i++) {
>> +                       if (nid != mi->blk[i].nid)
>> +                               continue;
>> +                       start = min(mi->blk[i].start, start);
>> +                       end = max(mi->blk[i].end, end);
>> +               }
>> +
>> +               if (start < end)
>> +                       setup_node_data(nid, start, end);
>> +       }
>> +
>> +       /* Dump memblock with node info and return. */
>> +       memblock_dump_all();
>> +       return 0;
>> +}
>> +
>> +static int __init numa_init(int (*init_func)(void))
>> +{
>> +       int ret;
>> +
>> +       nodes_clear(numa_nodes_parsed);
>> +       nodes_clear(node_possible_map);
>> +       nodes_clear(node_online_map);
>> +       numa_reset_distance();
>> +
>> +       ret = init_func();
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       ret = numa_register_memblks(&numa_meminfo);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       setup_node_to_cpumask_map();
>> +
>> +       /* init boot processor */
>> +       map_cpu_to_node(0, 0);
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>> + */
>> +static int __init dummy_numa_init(void)
>> +{
>> +       pr_info("%s\n", "No NUMA configuration found");
>> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> +              0LLU, PFN_PHYS(max_pfn) - 1);
>> +       node_set(0, numa_nodes_parsed);
>> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> +       numa_off = 1;
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * arm64_numa_init - Initialize NUMA
>> + *
>> + * Try each configured NUMA initialization method until one succeeds.
>> The
>> + * last fallback is dummy single node config encomapssing whole memory
>> and
>> + * never fails.
>> + */
>> +void __init arm64_numa_init(void)
>> +{
>> +       numa_init(dummy_numa_init);
>> +}
>>
>
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]   ` <1445337931-11344-4-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
@ 2015-10-26  1:44     ` Ming Lei
       [not found]       ` <CACVXFVMiZs_g3qw99-PhxLiX5oLe8_z5cOAfJZ_4-09jN2koCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2015-10-26  1:44 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-arm-kernel, devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	Catalin Marinas, Grant Likely, Leif Lindholm,
	rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A, Mark Salter, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Mark Rutland, ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ, Rafael J. Wysocki, Len Brown,
	marc.zyngier-5wv7dgnIgG8, rrichter-YGCgFSpz5w/QT0dZR+AlfA,
	Prasun Kapoor, gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
<gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
> Adding numa dt binding support for arm64 based platforms.
> dt node parsing for numa topology is done using device property
> proximity and device node distance-map.
>
> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  arch/arm64/Kconfig            |  10 ++
>  arch/arm64/include/asm/numa.h |  10 ++
>  arch/arm64/kernel/Makefile    |   1 +
>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kernel/smp.c       |   1 +
>  arch/arm64/mm/numa.c          |  10 +-
>  6 files changed, 252 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kernel/of_numa.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0f9cdc7..6cf8d20 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -426,6 +426,16 @@ config NUMA
>           local memory controller of the CPU and add some more
>           NUMA awareness to the kernel.
>
> +config OF_NUMA
> +       bool "Device Tree NUMA support"
> +       depends on NUMA
> +       depends on OF
> +       default y
> +       help
> +         Enable Device Tree NUMA support.
> +         This enables the numa mapping of cpu, memory, io and
> +         inter node distances using dt bindings.

Enabling the above config option can cause numa_init() warning in
numa-less arm64 system, please see the following report:

          https://bugs.launchpad.net/bugs/1509221

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]       ` <CACVXFVMiZs_g3qw99-PhxLiX5oLe8_z5cOAfJZ_4-09jN2koCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  3:50         ` Ganapatrao Kulkarni
       [not found]           ` <CAFpQJXX8VXX+gTBn-+7MepSAOzbOuc-yWJe4f=B7MH_Nj7FsKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-26  3:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ganapatrao Kulkarni, linux-arm-kernel,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, Mark Salter, Rob Herring, Steve Capper,
	Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll, Mark Rutland,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki, Len Brown,
	marc.zyngier-5wv7dgnIgG8, Robert

On Mon, Oct 26, 2015 at 7:14 AM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
> <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
>> Adding numa dt binding support for arm64 based platforms.
>> dt node parsing for numa topology is done using device property
>> proximity and device node distance-map.
>>
>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>  arch/arm64/Kconfig            |  10 ++
>>  arch/arm64/include/asm/numa.h |  10 ++
>>  arch/arm64/kernel/Makefile    |   1 +
>>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>>  arch/arm64/kernel/smp.c       |   1 +
>>  arch/arm64/mm/numa.c          |  10 +-
>>  6 files changed, 252 insertions(+), 1 deletion(-)
>>  create mode 100644 arch/arm64/kernel/of_numa.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 0f9cdc7..6cf8d20 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -426,6 +426,16 @@ config NUMA
>>           local memory controller of the CPU and add some more
>>           NUMA awareness to the kernel.
>>
>> +config OF_NUMA
>> +       bool "Device Tree NUMA support"
>> +       depends on NUMA
>> +       depends on OF
>> +       default y
>> +       help
>> +         Enable Device Tree NUMA support.
>> +         This enables the numa mapping of cpu, memory, io and
>> +         inter node distances using dt bindings.
>
> Enabling the above config option can cause numa_init() warning in
> numa-less arm64 system, please see the following report:
this is taken care. which version are you using?

>
>           https://bugs.launchpad.net/bugs/1509221
>
> Thanks,

Thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]           ` <CAFpQJXX8VXX+gTBn-+7MepSAOzbOuc-yWJe4f=B7MH_Nj7FsKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  4:53             ` Ming Lei
       [not found]               ` <CACVXFVPoZqjFeqK7=M7wK0JQ0iS9tYoN8BHsYMc25OL-4j=e1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2015-10-26  4:53 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Pawel Moll, Al Stone,
	Prasun Kapoor, Mark Salter, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Len Brown, devicetree-u79uwXL29TY76Z2rM5mHXA, Steve Capper,
	Arnd Bergmann, Ian Campbell, marc.zyngier-5wv7dgnIgG8,
	Leif Lindholm, Robert Richter, Grant Likely, Rob Herring,
	linux-arm-kernel, Ard Biesheuvel, Rafael J. Wysocki, Hanjun Guo,
	Kumar

On Mon, Oct 26, 2015 at 11:50 AM, Ganapatrao Kulkarni
<gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Oct 26, 2015 at 7:14 AM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
>> <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
>>> Adding numa dt binding support for arm64 based platforms.
>>> dt node parsing for numa topology is done using device property
>>> proximity and device node distance-map.
>>>
>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>> ---
>>>  arch/arm64/Kconfig            |  10 ++
>>>  arch/arm64/include/asm/numa.h |  10 ++
>>>  arch/arm64/kernel/Makefile    |   1 +
>>>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>>>  arch/arm64/kernel/smp.c       |   1 +
>>>  arch/arm64/mm/numa.c          |  10 +-
>>>  6 files changed, 252 insertions(+), 1 deletion(-)
>>>  create mode 100644 arch/arm64/kernel/of_numa.c
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 0f9cdc7..6cf8d20 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -426,6 +426,16 @@ config NUMA
>>>           local memory controller of the CPU and add some more
>>>           NUMA awareness to the kernel.
>>>
>>> +config OF_NUMA
>>> +       bool "Device Tree NUMA support"
>>> +       depends on NUMA
>>> +       depends on OF
>>> +       default y
>>> +       help
>>> +         Enable Device Tree NUMA support.
>>> +         This enables the numa mapping of cpu, memory, io and
>>> +         inter node distances using dt bindings.
>>
>> Enabling the above config option can cause numa_init() warning in
>> numa-less arm64 system, please see the following report:
> this is taken care. which version are you using?

V5.

>
>>
>>           https://bugs.launchpad.net/bugs/1509221
>>
>> Thanks,
>
> Thanks
> Ganapat
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 1/4] arm64, numa: adding numa support for arm64 platforms.
       [not found]         ` <CAFpQJXUXw2AP-fJR0eLJFnHQML3RbtbP7iNXT3RFtdc+0jzvKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  5:45           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-26  5:45 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Catalin Marinas,
	Grant Likely, Leif Lindholm, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA, Rob Herring,
	Steve Capper, Hanjun Guo, Al Stone, Arnd Bergmann, Pawel Moll,
	Ian Campbell, Kumar Gala, Rafael J. Wysocki, Len Brown,
	Marc Zyngier, Robert Richter

On Wed, Oct 21, 2015 at 2:24 PM, Ganapatrao Kulkarni
<gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Oct 20, 2015 at 8:17 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
>> Hi,
>>
>> I'm away for the rest of this week and don't have the time to give this
>> a full review, but I've given this a first pass and have some high-level
>> comments:
>>
>> First, most of this is copy+paste from x86. We should try to share code
>> rather than duplicating it. Especially as it looks like there's cleanup
>> (and therefore divergence) that could happen.
> this is arch specific glue layer for numa.
> there might be some common functions and there will be arch specific
> hacks/exceptions.
> if we try to pull out common code then still will have some arch
> specific code or end up with unavoidable ifdefs.
> IMHO, this code better to be in arch (in single file).
>>
>> Second, this reimplements memblock to associate nids with memory
>> regions. I think we should keep memblock around for this (I'm under the
>> impression that Ard also wants that for memory attributes), rather than
>> creating a new memblock-like API that we then have to reconcile with
>> actual memblock information.
> thanks, this seems to be good idea to reuse the memblock code/infrastructures.
> will work on this.
this i have added and will be part of v7 patchset.
please review other changes.
>>
>> Third, NAK to the changes to /proc/cpuinfo, please drop that from the
>> patch. Further comments on that matter are inline below.
> ok.
>>
>> On Tue, Oct 20, 2015 at 04:15:28PM +0530, Ganapatrao Kulkarni wrote:
>>> Adding numa support for arm64 based platforms.
>>> This patch adds by default the dummy numa node and
>>> maps all memory and cpus to node 0.
>>> using this patch, numa can be simulated on single node arm64 platforms.
>>>
>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>> ---
>>>  arch/arm64/Kconfig              |  25 ++
>>>  arch/arm64/include/asm/mmzone.h |  17 ++
>>>  arch/arm64/include/asm/numa.h   |  63 +++++
>>>  arch/arm64/kernel/setup.c       |   9 +
>>>  arch/arm64/kernel/smp.c         |   2 +
>>>  arch/arm64/mm/Makefile          |   1 +
>>>  arch/arm64/mm/init.c            |  31 ++-
>>>  arch/arm64/mm/numa.c            | 531 ++++++++++++++++++++++++++++++++++++++++
>>>  8 files changed, 675 insertions(+), 4 deletions(-)
>>>  create mode 100644 arch/arm64/include/asm/mmzone.h
>>>  create mode 100644 arch/arm64/include/asm/numa.h
>>>  create mode 100644 arch/arm64/mm/numa.c
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 7d95663..0f9cdc7 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -68,6 +68,7 @@ config ARM64
>>>       select HAVE_GENERIC_DMA_COHERENT
>>>       select HAVE_HW_BREAKPOINT if PERF_EVENTS
>>>       select HAVE_MEMBLOCK
>>> +     select HAVE_MEMBLOCK_NODE_MAP if NUMA
>>>       select HAVE_PATA_PLATFORM
>>>       select HAVE_PERF_EVENTS
>>>       select HAVE_PERF_REGS
>>> @@ -414,6 +415,30 @@ config HOTPLUG_CPU
>>>         Say Y here to experiment with turning CPUs off and on.  CPUs
>>>         can be controlled through /sys/devices/system/cpu.
>>>
>>> +# Common NUMA Features
>>> +config NUMA
>>> +     bool "Numa Memory Allocation and Scheduler Support"
>>> +     depends on SMP
>>> +     help
>>> +       Enable NUMA (Non Uniform Memory Access) support.
>>> +
>>> +       The kernel will try to allocate memory used by a CPU on the
>>> +       local memory controller of the CPU and add some more
>>> +       NUMA awareness to the kernel.
>>> +
>>> +config NODES_SHIFT
>>> +     int "Maximum NUMA Nodes (as a power of 2)"
>>> +     range 1 10
>>> +     default "2"
>>> +     depends on NEED_MULTIPLE_NODES
>>> +     help
>>> +       Specify the maximum number of NUMA Nodes available on the target
>>> +       system.  Increases memory reserved to accommodate various tables.
>>
>> How much memory do we end up requiring per node?
> not much. however it is proportional to number of max nodes supported
> by the system.
>>
>>> +
>>> +config USE_PERCPU_NUMA_NODE_ID
>>> +     def_bool y
>>> +     depends on NUMA
>>> +
>>>  source kernel/Kconfig.preempt
>>>
>>>  config HZ
>>> diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
>>> new file mode 100644
>>> index 0000000..6ddd468
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/mmzone.h
>>> @@ -0,0 +1,17 @@
>>> +#ifndef __ASM_ARM64_MMZONE_H_
>>> +#define __ASM_ARM64_MMZONE_H_
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +
>>> +#include <linux/mmdebug.h>
>>> +#include <linux/types.h>
>>> +
>>> +#include <asm/smp.h>
>>> +#include <asm/numa.h>
>>> +
>>> +extern struct pglist_data *node_data[];
>>> +
>>> +#define NODE_DATA(nid)               (node_data[(nid)])
>>> +
>>> +#endif /* CONFIG_NUMA */
>>> +#endif /* __ASM_ARM64_MMZONE_H_ */
>>> diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
>>> new file mode 100644
>>> index 0000000..cadbd24
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/numa.h
>>> @@ -0,0 +1,63 @@
>>> +#ifndef _ASM_NUMA_H
>>> +#define _ASM_NUMA_H
>>> +
>>> +#include <linux/nodemask.h>
>>> +#include <asm/topology.h>
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +
>>> +#define NR_NODE_MEMBLKS              (MAX_NUMNODES * 2)
>>> +#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
>>> +
>>> +/* currently, arm64 implements flat NUMA topology */
>>> +#define parent_node(node)    (node)
>>> +
>>> +extern int __node_distance(int from, int to);
>>> +#define node_distance(a, b) __node_distance(a, b)
>>> +
>>> +/* dummy definitions for pci functions */
>>> +#define pcibus_to_node(node) 0
>>> +#define cpumask_of_pcibus(bus)       0
>>> +
>>> +struct __node_cpu_hwid {
>>> +     int node_id;    /* logical node containing this CPU */
>>> +     u64 cpu_hwid;   /* MPIDR for this CPU */
>>> +};
>>
>> We already have the MPIDR ID in the cpu_logical_map. Please don't
>> duplicate it here.
>>
>> As node_cpu_hwid seems to be indexed by logical ID, you can simlpy use
>> the same index for the logical map to get the MPIDR ID when necessary.
> thanks, this solves what we wanted to achieve with this mapping.
>>
>>> +
>>> +struct numa_memblk {
>>> +     u64 start;
>>> +     u64 end;
>>> +     int nid;
>>> +};
>>> +
>>> +struct numa_meminfo {
>>> +     int nr_blks;
>>> +     struct numa_memblk blk[NR_NODE_MEMBLKS];
>>> +};
>>
>> I think we should keep the usual memblock around for this. It already
>> has some nid support.
> ok.
>>
>>> +
>>> +extern struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
>>> +extern nodemask_t numa_nodes_parsed __initdata;
>>> +
>>> +/* Mappings between node number and cpus on that node. */
>>> +extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>>> +extern void numa_clear_node(unsigned int cpu);
>>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>>> +extern const struct cpumask *cpumask_of_node(int node);
>>> +#else
>>> +/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>>> +static inline const struct cpumask *cpumask_of_node(int node)
>>> +{
>>> +     return node_to_cpumask_map[node];
>>> +}
>>> +#endif
>>> +
>>> +void __init arm64_numa_init(void);
>>> +int __init numa_add_memblk(int nodeid, u64 start, u64 end);
>>> +void __init numa_set_distance(int from, int to, int distance);
>>> +void __init numa_reset_distance(void);
>>> +void numa_store_cpu_info(unsigned int cpu);
>>> +#else        /* CONFIG_NUMA */
>>> +static inline void numa_store_cpu_info(unsigned int cpu)             { }
>>> +static inline void arm64_numa_init(void)             { }
>>> +#endif       /* CONFIG_NUMA */
>>> +#endif       /* _ASM_NUMA_H */
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index a2794da..4f3623d 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -54,6 +54,7 @@
>>>  #include <asm/elf.h>
>>>  #include <asm/cpufeature.h>
>>>  #include <asm/cpu_ops.h>
>>> +#include <asm/numa.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/smp_plat.h>
>>> @@ -485,6 +486,9 @@ static int __init topology_init(void)
>>>  {
>>>       int i;
>>>
>>> +     for_each_online_node(i)
>>> +             register_one_node(i);
>>> +
>>>       for_each_possible_cpu(i) {
>>>               struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
>>>               cpu->hotpluggable = 1;
>>> @@ -557,7 +561,12 @@ static int c_show(struct seq_file *m, void *v)
>>>                * online processors, looking for lines beginning with
>>>                * "processor".  Give glibc what it expects.
>>>                */
>>> +#ifdef CONFIG_NUMA
>>> +             seq_printf(m, "processor\t: %d", i);
>>> +             seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
>>> +#else
>>>               seq_printf(m, "processor\t: %d\n", i);
>>> +#endif
>>
>> As above, NAK to a /proc/cpuinfo change.
>>
>> We don't have this on arch/arm and didn't previously have it on arm64,
>> so it could easily break existing software (both compat and native).
>> Having the format randomly change based on a config option is also not
>> great, and there's already been enough pain in this area.
>>
>> Additionally, other architctures don't have this, so it's clearly not
>> necessary.
> ok, will remove this.
>>
>> Surely there's a (portable/consistent) sysfs interface that provides the
>> NUMA information userspace requires? If not, we should add one that
>> works across architectures.
>>
>>>
>>>               /*
>>>                * Dump out the common processor features in a single line.
>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>> index dbdaacd..985ee04 100644
>>> --- a/arch/arm64/kernel/smp.c
>>> +++ b/arch/arm64/kernel/smp.c
>>> @@ -45,6 +45,7 @@
>>>  #include <asm/cputype.h>
>>>  #include <asm/cpu_ops.h>
>>>  #include <asm/mmu_context.h>
>>> +#include <asm/numa.h>
>>>  #include <asm/pgtable.h>
>>>  #include <asm/pgalloc.h>
>>>  #include <asm/processor.h>
>>> @@ -125,6 +126,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>>  static void smp_store_cpu_info(unsigned int cpuid)
>>>  {
>>>       store_cpu_topology(cpuid);
>>> +     numa_store_cpu_info(cpuid);
>>>  }
>>>
>>>  /*
>>> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
>>> index 773d37a..bb92d41 100644
>>> --- a/arch/arm64/mm/Makefile
>>> +++ b/arch/arm64/mm/Makefile
>>> @@ -4,3 +4,4 @@ obj-y                         := dma-mapping.o extable.o fault.o init.o \
>>>                                  context.o proc.o pageattr.o
>>>  obj-$(CONFIG_HUGETLB_PAGE)   += hugetlbpage.o
>>>  obj-$(CONFIG_ARM64_PTDUMP)   += dump.o
>>> +obj-$(CONFIG_NUMA)           += numa.o
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 697a6d0..81a0316 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -37,6 +37,7 @@
>>>
>>>  #include <asm/fixmap.h>
>>>  #include <asm/memory.h>
>>> +#include <asm/numa.h>
>>>  #include <asm/sections.h>
>>>  #include <asm/setup.h>
>>>  #include <asm/sizes.h>
>>> @@ -77,6 +78,20 @@ static phys_addr_t max_zone_dma_phys(void)
>>>       return min(offset + (1ULL << 32), memblock_end_of_DRAM());
>>>  }
>>>
>>> +#ifdef CONFIG_NUMA
>>> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>> +{
>>> +     unsigned long max_zone_pfns[MAX_NR_ZONES];
>>> +
>>> +     memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>>
>> You can make this simpler by initialising the variable when defining it:
>>
>>         unsigned long max_zone_pfs[MAX_NR_ZONES] = { 0 };
> ok
>>
>>> +     if (IS_ENABLED(CONFIG_ZONE_DMA))
>>> +             max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
>>> +     max_zone_pfns[ZONE_NORMAL] = max;
>>> +
>>> +     free_area_init_nodes(max_zone_pfns);
>>> +}
>>> +
>>> +#else
>>>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>>  {
>>>       struct memblock_region *reg;
>>> @@ -115,6 +130,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
>>>
>>>       free_area_init_node(0, zone_size, min, zhole_size);
>>>  }
>>> +#endif /* CONFIG_NUMA */
>>>
>>>  #ifdef CONFIG_HAVE_ARCH_PFN_VALID
>>>  int pfn_valid(unsigned long pfn)
>>> @@ -132,10 +148,15 @@ static void arm64_memory_present(void)
>>>  static void arm64_memory_present(void)
>>>  {
>>>       struct memblock_region *reg;
>>> +     int nid = 0;
>>>
>>> -     for_each_memblock(memory, reg)
>>> -             memory_present(0, memblock_region_memory_base_pfn(reg),
>>> -                            memblock_region_memory_end_pfn(reg));
>>> +     for_each_memblock(memory, reg) {
>>> +#ifdef CONFIG_NUMA
>>> +             nid = reg->nid;
>>> +#endif
>>> +             memory_present(nid, memblock_region_memory_base_pfn(reg),
>>> +                             memblock_region_memory_end_pfn(reg));
>>> +     }
>>>  }
>>>  #endif
>>>
>>> @@ -192,6 +213,9 @@ void __init bootmem_init(void)
>>>
>>>       early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);
>>>
>>> +     max_pfn = max_low_pfn = max;
>>> +
>>> +     arm64_numa_init();
>>>       /*
>>>        * Sparsemem tries to allocate bootmem in memory_present(), so must be
>>>        * done after the fixed reservations.
>>> @@ -202,7 +226,6 @@ void __init bootmem_init(void)
>>>       zone_sizes_init(min, max);
>>>
>>>       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>>> -     max_pfn = max_low_pfn = max;
>>>  }
>>>
>>>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> new file mode 100644
>>> index 0000000..4dd7436
>>> --- /dev/null
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -0,0 +1,531 @@
>>> +/*
>>> + * NUMA support, based on the x86 implementation.
>>> + *
>>> + * Copyright (C) 2015 Cavium Inc.
>>> + * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include <linux/kernel.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/string.h>
>>> +#include <linux/init.h>
>>> +#include <linux/bootmem.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/ctype.h>
>>> +#include <linux/module.h>
>>> +#include <linux/nodemask.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/topology.h>
>>> +#include <linux/mmzone.h>
>>
>> Nit: please sort these alphabetically.
> ok
>>
>>> +
>>> +#include <asm/smp_plat.h>
>>> +
>>> +struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>>> +EXPORT_SYMBOL(node_data);
>>> +nodemask_t numa_nodes_parsed __initdata;
>>> +struct __node_cpu_hwid node_cpu_hwid[NR_CPUS];
>>> +
>>> +static int numa_off;
>>> +static int numa_distance_cnt;
>>> +static u8 *numa_distance;
>>> +static struct numa_meminfo numa_meminfo;
>>> +
>>> +static __init int numa_parse_early_param(char *opt)
>>> +{
>>> +     if (!opt)
>>> +             return -EINVAL;
>>> +     if (!strncmp(opt, "off", 3)) {
>>> +             pr_info("%s\n", "NUMA turned off");
>>> +             numa_off = 1;
>>> +     }
>>> +     return 0;
>>> +}
>>> +early_param("numa", numa_parse_early_param);
>>> +
>>> +cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
>>> +EXPORT_SYMBOL(node_to_cpumask_map);
>>> +
>>> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
>>> +/*
>>> + * Returns a pointer to the bitmask of CPUs on Node 'node'.
>>> + */
>>> +const struct cpumask *cpumask_of_node(int node)
>>> +{
>>> +     if (node >= nr_node_ids) {
>>> +             pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
>>> +                     node, nr_node_ids);
>>> +             dump_stack();
>>> +             return cpu_none_mask;
>>> +     }
>>
>> This can be:
>>
>>         if (WARN_ON(node >= nr_node_ids))
>>                 return cpu_none_mask;
>>
> ok
>>> +     if (node_to_cpumask_map[node] == NULL) {
>>> +             pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
>>> +                     node);
>>> +             dump_stack();
>>> +             return cpu_online_mask;
>>> +     }
>>
>> Likewise:
>>
>>         if (WARN_ON(!node_to_cpumask_map[node]))
>>                 return cpu_online_mask;
> ok.
>>
>>> +     return node_to_cpumask_map[node];
>>> +}
>>> +EXPORT_SYMBOL(cpumask_of_node);
>>> +#endif
>>> +
>>> +static void map_cpu_to_node(unsigned int cpu, int nid)
>>> +{
>>> +     set_cpu_numa_node(cpu, nid);
>>> +     if (nid >= 0)
>>> +             cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
>>> +}
>>> +
>>> +static void unmap_cpu_to_node(unsigned int cpu)
>>> +{
>>> +     int nid = cpu_to_node(cpu);
>>> +
>>> +     if (nid >= 0)
>>> +             cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
>>> +     set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>> +}
>>> +
>>> +void numa_clear_node(unsigned int cpu)
>>> +{
>>> +     unmap_cpu_to_node(cpu);
>>> +}
>>> +
>>> +/*
>>> + * Allocate node_to_cpumask_map based on number of available nodes
>>> + * Requires node_possible_map to be valid.
>>> + *
>>> + * Note: cpumask_of_node() is not valid until after this is done.
>>> + * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
>>> + */
>>> +static void __init setup_node_to_cpumask_map(void)
>>> +{
>>> +     unsigned int cpu;
>>> +     int node;
>>> +
>>> +     /* setup nr_node_ids if not done yet */
>>> +     if (nr_node_ids == MAX_NUMNODES)
>>> +             setup_nr_node_ids();
>>
>> Where would this be done otherwise?
> in function free_area_init_nodes
nr_node_ids is initialized  to MAX_NUMANODE/node_possible_map.
setup_nr_node_ids is called to set to actual number of nodes.
>>
>> If we can initialise this earlier, what happens if we actually had
>> MAX_NUMNODES nodes?
> init of table for actual number of nodes, this is just to avoid
> uncesary table allocation.
>>
>>> +
>>> +     /* allocate the map */
>>> +     for (node = 0; node < nr_node_ids; node++)
>>> +             alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
>>> +
>>> +     /* Clear the mapping */
>>> +     for (node = 0; node < nr_node_ids; node++)
>>> +             cpumask_clear(node_to_cpumask_map[node]);
>>
>> Why not do these at the same time?
> ok, can be done.
>>
>> Can an allocation fail?
> most unlikely, however will add check.
>>
>>> +
>>> +     for_each_possible_cpu(cpu)
>>> +             set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>> +
>>> +     /* cpumask_of_node() will now work */
>>> +     pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>> +}
>>> +
>>> +/*
>>> + *  Set the cpu to node and mem mapping
>>> + */
>>> +void numa_store_cpu_info(unsigned int cpu)
>>> +{
>>> +     map_cpu_to_node(cpu, numa_off ? 0 : node_cpu_hwid[cpu].node_id);
>>> +}
>>> +
>>> +/**
>>> + * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
>>> + */
>>> +
>>> +static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>>> +                                  struct numa_meminfo *mi)
>>> +{
>>> +     /* ignore zero length blks */
>>> +     if (start == end)
>>> +             return 0;
>>> +
>>> +     /* whine about and ignore invalid blks */
>>> +     if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
>>> +             pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
>>> +                             nid, start, end - 1);
>>> +             return 0;
>>> +     }
>>
>> When would this happen?
> if acpi or dt binding is wrong/corrupt.
>>
>>> +
>>> +     if (mi->nr_blks >= NR_NODE_MEMBLKS) {
>>> +             pr_err("NUMA: too many memblk ranges\n");
>>> +             return -EINVAL;
>>> +     }
>>> +
>>> +     pr_info("NUMA: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
>>> +                     mi->nr_blks, start, end, nid);
>>> +     mi->blk[mi->nr_blks].start = start;
>>> +     mi->blk[mi->nr_blks].end = end;
>>> +     mi->blk[mi->nr_blks].nid = nid;
>>> +     mi->nr_blks++;
>>> +     return 0;
>>> +}
>>
>> As I mentioned earlier, I think that we should keep the memblock
>> infrastructure around, and reuse it here.
> will try as said in the beginning.
>>
>>> +
>>> +/**
>>> + * numa_add_memblk - Add one numa_memblk to numa_meminfo
>>> + * @nid: NUMA node ID of the new memblk
>>> + * @start: Start address of the new memblk
>>> + * @end: End address of the new memblk
>>> + *
>>> + * Add a new memblk to the default numa_meminfo.
>>> + *
>>> + * RETURNS:
>>> + * 0 on success, -errno on failure.
>>> + */
>>> +#define MAX_PHYS_ADDR        ((phys_addr_t)~0)
>>
>> We should probably rethink MAX_MEMBLOCK_ADDR and potentially make it
>> more generic so we can use it here and elsewhere. See commit
>> 8eafeb4802281651 ("of/fdt: make memblock maximum physical address arch
>> configurable").
>>
>> However, that might not matter if we're able to reuse memblock.
> ok
>>
>>> +
>>> +int __init numa_add_memblk(int nid, u64 base, u64 end)
>>> +{
>>> +     const u64 phys_offset = __pa(PAGE_OFFSET);
>>> +
>>> +     base &= PAGE_MASK;
>>> +     end &= PAGE_MASK;
>>> +
>>> +     if (base > MAX_PHYS_ADDR) {
>>> +             pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
>>> +                             base, base + end);
>>> +             return -ENOMEM;
>>> +     }
>>> +
>>> +     if (base + end > MAX_PHYS_ADDR) {
>>> +             pr_info("NUMA: Ignoring memory range 0x%lx - 0x%llx\n",
>>> +                             ULONG_MAX, base + end);
>>> +             end = MAX_PHYS_ADDR - base;
>>> +     }
>>> +
>>> +     if (base + end < phys_offset) {
>>> +             pr_warn("NUMA: Ignoring memory block 0x%llx - 0x%llx\n",
>>> +                        base, base + end);
>>> +             return -ENOMEM;
>>> +     }
>>> +     if (base < phys_offset) {
>>> +             pr_info("NUMA: Ignoring memory range 0x%llx - 0x%llx\n",
>>> +                        base, phys_offset);
>>> +             end -= phys_offset - base;
>>> +             base = phys_offset;
>>> +     }
>>> +
>>> +     return numa_add_memblk_to(nid, base, base + end, &numa_meminfo);
>>> +}
>>> +EXPORT_SYMBOL(numa_add_memblk);
>>
>> I take it this is only used to look up the node for a given mermoy
>> region, rather than any region describe being usable by the kernel?
>> Otherwise that rounding of the base is worrying.
> yes this is to look up node for a memblock.
>>
>>> +
>>> +/* Initialize NODE_DATA for a node on the local memory */
>>> +static void __init setup_node_data(int nid, u64 start, u64 end)
>>> +{
>>> +     const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
>>> +     u64 nd_pa;
>>> +     void *nd;
>>> +     int tnid;
>>> +
>>> +     start = roundup(start, ZONE_ALIGN);
>>> +
>>> +     pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>> +            nid, start, end - 1);
>>> +
>>> +     /*
>>> +      * Allocate node data.  Try node-local memory and then any node.
>>> +      */
>>> +     nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>
>> Why was nd_size rounded to PAGE_SIZE earlier if we only care about
> it is just aligned to pg_data_t too.
>> SMP_CACHE_BYTES alignment? I had assumed we wanted naturally-aligned
>> pages, but that doesn't seem to be the case given the above.
> what is the concern here?
>>
>>> +     if (!nd_pa) {
>>> +             nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
>>> +                                           MEMBLOCK_ALLOC_ACCESSIBLE);
>>> +             if (!nd_pa) {
>>> +                     pr_err("Cannot find %zu bytes in node %d\n",
>>> +                            nd_size, nid);
>>> +                     return;
>>> +             }
>>> +     }
>>> +     nd = __va(nd_pa);
>>
>> Isn't memblock_alloc_try_nid sufficient for the above?
> thanks, will replace with memblock_alloc_try_nid.
>>
>>> +
>>> +     /* report and initialize */
>>> +     pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
>>> +            nd_pa, nd_pa + nd_size - 1);
>>> +     tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>>> +     if (tnid != nid)
>>> +             pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>>> +
>>> +     node_data[nid] = nd;
>>> +     memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>>> +     NODE_DATA(nid)->node_id = nid;
>>> +     NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
>>> +     NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
>>> +
>>> +     node_set_online(nid);
>>> +}
>>> +
>>> +/*
>>> + * Set nodes, which have memory in @mi, in *@nodemask.
>>> + */
>>> +static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
>>> +                                           const struct numa_meminfo *mi)
>>> +{
>>> +     int i;
>>> +
>>> +     for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
>>> +             if (mi->blk[i].start != mi->blk[i].end &&
>>> +                 mi->blk[i].nid != NUMA_NO_NODE)
>>> +                     node_set(mi->blk[i].nid, *nodemask);
>>> +}
>>> +
>>> +/*
>>> + * Sanity check to catch more bad NUMA configurations (they are amazingly
>>> + * common).  Make sure the nodes cover all memory.
>>> + */
>>
>> This comment is surprising, given this functionality is brand new.
> sorry. this is from x86, will remove it.
>>
>>> +static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
>>> +{
>>> +     u64 numaram, totalram;
>>> +     int i;
>>> +
>>> +     numaram = 0;
>>> +     for (i = 0; i < mi->nr_blks; i++) {
>>> +             u64 s = mi->blk[i].start >> PAGE_SHIFT;
>>> +             u64 e = mi->blk[i].end >> PAGE_SHIFT;
>>> +
>>> +             numaram += e - s;
>>> +             numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
>>> +             if ((s64)numaram < 0)
>>> +                     numaram = 0;
>>> +     }
>>> +
>>> +     totalram = max_pfn - absent_pages_in_range(0, max_pfn);
>>> +
>>> +     /* We seem to lose 3 pages somewhere. Allow 1M of slack. */
>>
>> We shouldn't rely on magic like this.
>>
>> Where and why are we "losing" pages, and why is that considered ok?
> need to experiment to know that is applicable still.
> will remove, if not relavant for us.
>>
>>> +     if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
>>> +             pr_err("NUMA: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
>>> +                    (numaram << PAGE_SHIFT) >> 20,
>>> +                    (totalram << PAGE_SHIFT) >> 20);
>>> +             return false;
>>> +     }
>>> +     return true;
>>> +}
>>> +
>>> +/**
>>> + * numa_reset_distance - Reset NUMA distance table
>>> + *
>>> + * The current table is freed.
>>> + * The next numa_set_distance() call will create a new one.
>>> + */
>>> +void __init numa_reset_distance(void)
>>> +{
>>> +     size_t size = numa_distance_cnt * numa_distance_cnt *
>>> +             sizeof(numa_distance[0]);
>>> +
>>> +     /* numa_distance could be 1LU marking allocation failure, test cnt */
>>> +     if (numa_distance_cnt)
>>> +             memblock_free(__pa(numa_distance), size);
>>> +     numa_distance_cnt = 0;
>>> +     numa_distance = NULL;   /* enable table creation */
>>> +}
>>> +
>>> +static int __init numa_alloc_distance(void)
>>> +{
>>> +     nodemask_t nodes_parsed;
>>> +     size_t size;
>>> +     int i, j, cnt = 0;
>>> +     u64 phys;
>>> +
>>> +     /* size the new table and allocate it */
>>> +     nodes_parsed = numa_nodes_parsed;
>>> +     numa_nodemask_from_meminfo(&nodes_parsed, &numa_meminfo);
>>> +
>>> +     for_each_node_mask(i, nodes_parsed)
>>> +             cnt = i;
>>> +     cnt++;
>>> +     size = cnt * cnt * sizeof(numa_distance[0]);
>>> +
>>> +     phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
>>> +                                   size, PAGE_SIZE);
>>> +     if (!phys) {
>>> +             pr_warn("NUMA: Warning: can't allocate distance table!\n");
>>> +             /* don't retry until explicitly reset */
>>> +             numa_distance = (void *)1LU;
>>
>> This doesn't look good. Why do we need to set this to a non-pointer
>> value?
> to avoid further attempts until unless table is reset.
>>
>> Thanks,
>> Mark.
>>
>>> +             return -ENOMEM;
>>> +     }
>>> +     memblock_reserve(phys, size);
>>> +
>>> +     numa_distance = __va(phys);
>>> +     numa_distance_cnt = cnt;
>>> +
>>> +     /* fill with the default distances */
>>> +     for (i = 0; i < cnt; i++)
>>> +             for (j = 0; j < cnt; j++)
>>> +                     numa_distance[i * cnt + j] = i == j ?
>>> +                             LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> +     pr_debug("NUMA: Initialized distance table, cnt=%d\n", cnt);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * numa_set_distance - Set NUMA distance from one NUMA to another
>>> + * @from: the 'from' node to set distance
>>> + * @to: the 'to'  node to set distance
>>> + * @distance: NUMA distance
>>> + *
>>> + * Set the distance from node @from to @to to @distance.  If distance table
>>> + * doesn't exist, one which is large enough to accommodate all the currently
>>> + * known nodes will be created.
>>> + *
>>> + * If such table cannot be allocated, a warning is printed and further
>>> + * calls are ignored until the distance table is reset with
>>> + * numa_reset_distance().
>>> + *
>>> + * If @from or @to is higher than the highest known node or lower than zero
>>> + * at the time of table creation or @distance doesn't make sense, the call
>>> + * is ignored.
>>> + * This is to allow simplification of specific NUMA config implementations.
>>> + */
>>> +void __init numa_set_distance(int from, int to, int distance)
>>> +{
>>> +     if (!numa_distance && numa_alloc_distance() < 0)
>>> +             return;
>>> +
>>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>>> +                     from < 0 || to < 0) {
>>> +             pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>>> +                         from, to, distance);
>>> +             return;
>>> +     }
>>> +
>>> +     if ((u8)distance != distance ||
>>> +         (from == to && distance != LOCAL_DISTANCE)) {
>>> +             pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>>> +                          from, to, distance);
>>> +             return;
>>> +     }
>>> +
>>> +     numa_distance[from * numa_distance_cnt + to] = distance;
>>> +}
>>> +EXPORT_SYMBOL(numa_set_distance);
>>> +
>>> +int __node_distance(int from, int to)
>>> +{
>>> +     if (from >= numa_distance_cnt || to >= numa_distance_cnt)
>>> +             return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> +     return numa_distance[from * numa_distance_cnt + to];
>>> +}
>>> +EXPORT_SYMBOL(__node_distance);
>>> +
>>> +static int __init numa_register_memblks(struct numa_meminfo *mi)
>>> +{
>>> +     unsigned long uninitialized_var(pfn_align);
>>> +     int i, nid;
>>> +
>>> +     /* Account for nodes with cpus and no memory */
>>> +     node_possible_map = numa_nodes_parsed;
>>> +     numa_nodemask_from_meminfo(&node_possible_map, mi);
>>> +     if (WARN_ON(nodes_empty(node_possible_map)))
>>> +             return -EINVAL;
>>> +
>>> +     for (i = 0; i < mi->nr_blks; i++) {
>>> +             struct numa_memblk *mb = &mi->blk[i];
>>> +
>>> +             memblock_set_node(mb->start, mb->end - mb->start,
>>> +                               &memblock.memory, mb->nid);
>>> +     }
>>> +
>>> +     /*
>>> +      * If sections array is gonna be used for pfn -> nid mapping, check
>>> +      * whether its granularity is fine enough.
>>> +      */
>>> +#ifdef NODE_NOT_IN_PAGE_FLAGS
>>> +     pfn_align = node_map_pfn_alignment();
>>> +     if (pfn_align && pfn_align < PAGES_PER_SECTION) {
>>> +             pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
>>> +                    PFN_PHYS(pfn_align) >> 20,
>>> +                    PFN_PHYS(PAGES_PER_SECTION) >> 20);
>>> +             return -EINVAL;
>>> +     }
>>> +#endif
>>> +     if (!numa_meminfo_cover_memory(mi))
>>> +             return -EINVAL;
>>> +
>>> +     /* Finally register nodes. */
>>> +     for_each_node_mask(nid, node_possible_map) {
>>> +             u64 start = PFN_PHYS(max_pfn);
>>> +             u64 end = 0;
>>> +
>>> +             for (i = 0; i < mi->nr_blks; i++) {
>>> +                     if (nid != mi->blk[i].nid)
>>> +                             continue;
>>> +                     start = min(mi->blk[i].start, start);
>>> +                     end = max(mi->blk[i].end, end);
>>> +             }
>>> +
>>> +             if (start < end)
>>> +                     setup_node_data(nid, start, end);
>>> +     }
>>> +
>>> +     /* Dump memblock with node info and return. */
>>> +     memblock_dump_all();
>>> +     return 0;
>>> +}
>>> +
>>> +static int __init numa_init(int (*init_func)(void))
>>> +{
>>> +     int ret;
>>> +
>>> +     nodes_clear(numa_nodes_parsed);
>>> +     nodes_clear(node_possible_map);
>>> +     nodes_clear(node_online_map);
>>> +     numa_reset_distance();
>>> +
>>> +     ret = init_func();
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     ret = numa_register_memblks(&numa_meminfo);
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     setup_node_to_cpumask_map();
>>> +
>>> +     /* init boot processor */
>>> +     map_cpu_to_node(0, 0);
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * dummy_numa_init - Fallback dummy NUMA init
>>> + *
>>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>>> + * fails, or NUMA is disabled on the command line.
>>> + *
>>> + * Must online at least one node and add memory blocks that cover all
>>> + * allowed memory.  This function must not fail.
>>> + */
>>> +static int __init dummy_numa_init(void)
>>> +{
>>> +     pr_info("%s\n", "No NUMA configuration found");
>>> +     pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>>> +            0LLU, PFN_PHYS(max_pfn) - 1);
>>> +     node_set(0, numa_nodes_parsed);
>>> +     numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>>> +     numa_off = 1;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +/**
>>> + * arm64_numa_init - Initialize NUMA
>>> + *
>>> + * Try each configured NUMA initialization method until one succeeds.  The
>>> + * last fallback is dummy single node config encomapssing whole memory and
>>> + * never fails.
>>> + */
>>> +void __init arm64_numa_init(void)
>>> +{
>>> +     numa_init(dummy_numa_init);
>>> +}
>>> --
>>> 1.8.1.4
>>>
> thanks
> Ganapat
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]               ` <CACVXFVPoZqjFeqK7=M7wK0JQ0iS9tYoN8BHsYMc25OL-4j=e1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  7:20                 ` Ming Lei
       [not found]                   ` <CACVXFVO5QGu9z1C49Wos_A5MpPafABL+WDidiHp3S2v1xUsRZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2015-10-26  7:20 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Pawel Moll, Al Stone,
	Prasun Kapoor, Mark Salter, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Len Brown, devicetree-u79uwXL29TY76Z2rM5mHXA, Steve Capper,
	Arnd Bergmann, Ian Campbell, marc.zyngier-5wv7dgnIgG8,
	Leif Lindholm, Robert Richter, Grant Likely, Rob Herring,
	linux-arm-kernel, Ard Biesheuvel, Rafael J. Wysocki, Hanjun Guo,
	Kumar

On Mon, Oct 26, 2015 at 12:53 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Mon, Oct 26, 2015 at 11:50 AM, Ganapatrao Kulkarni
> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Mon, Oct 26, 2015 at 7:14 AM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>> On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
>>> <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
>>>> Adding numa dt binding support for arm64 based platforms.
>>>> dt node parsing for numa topology is done using device property
>>>> proximity and device node distance-map.
>>>>
>>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>>> ---
>>>>  arch/arm64/Kconfig            |  10 ++
>>>>  arch/arm64/include/asm/numa.h |  10 ++
>>>>  arch/arm64/kernel/Makefile    |   1 +
>>>>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>>>>  arch/arm64/kernel/smp.c       |   1 +
>>>>  arch/arm64/mm/numa.c          |  10 +-
>>>>  6 files changed, 252 insertions(+), 1 deletion(-)
>>>>  create mode 100644 arch/arm64/kernel/of_numa.c
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 0f9cdc7..6cf8d20 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -426,6 +426,16 @@ config NUMA
>>>>           local memory controller of the CPU and add some more
>>>>           NUMA awareness to the kernel.
>>>>
>>>> +config OF_NUMA
>>>> +       bool "Device Tree NUMA support"
>>>> +       depends on NUMA
>>>> +       depends on OF
>>>> +       default y
>>>> +       help
>>>> +         Enable Device Tree NUMA support.
>>>> +         This enables the numa mapping of cpu, memory, io and
>>>> +         inter node distances using dt bindings.
>>>
>>> Enabling the above config option can cause numa_init() warning in
>>> numa-less arm64 system, please see the following report:
>> this is taken care. which version are you using?
>
> V5.

The issue persists on V6 too, so looks V7 is needed.

>
>>
>>>
>>>           https://bugs.launchpad.net/bugs/1509221
>>>
>>> Thanks,
>>
>> Thanks
>> Ganapat
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]                   ` <CACVXFVO5QGu9z1C49Wos_A5MpPafABL+WDidiHp3S2v1xUsRZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  8:24                     ` Ganapatrao Kulkarni
       [not found]                       ` <CAFpQJXXN-xixG2k7JJuJeCbLkGmR4RKX3heEeaOCpWBVNL4Mog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-26  8:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Pawel Moll, Al Stone,
	Prasun Kapoor, Mark Salter, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Len Brown, devicetree-u79uwXL29TY76Z2rM5mHXA, Steve Capper,
	Arnd Bergmann, Ian Campbell, marc.zyngier-5wv7dgnIgG8,
	Leif Lindholm, Robert Richter, Grant Likely, Rob Herring,
	linux-arm-kernel, Ard Biesheuvel, Rafael J. Wysocki, Hanjun Guo,
	Kumar

On Mon, Oct 26, 2015 at 12:50 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Mon, Oct 26, 2015 at 12:53 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> On Mon, Oct 26, 2015 at 11:50 AM, Ganapatrao Kulkarni
>> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Mon, Oct 26, 2015 at 7:14 AM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>>> On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
>>>> <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
>>>>> Adding numa dt binding support for arm64 based platforms.
>>>>> dt node parsing for numa topology is done using device property
>>>>> proximity and device node distance-map.
>>>>>
>>>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>>>> ---
>>>>>  arch/arm64/Kconfig            |  10 ++
>>>>>  arch/arm64/include/asm/numa.h |  10 ++
>>>>>  arch/arm64/kernel/Makefile    |   1 +
>>>>>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>>>>>  arch/arm64/kernel/smp.c       |   1 +
>>>>>  arch/arm64/mm/numa.c          |  10 +-
>>>>>  6 files changed, 252 insertions(+), 1 deletion(-)
>>>>>  create mode 100644 arch/arm64/kernel/of_numa.c
>>>>>
>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>> index 0f9cdc7..6cf8d20 100644
>>>>> --- a/arch/arm64/Kconfig
>>>>> +++ b/arch/arm64/Kconfig
>>>>> @@ -426,6 +426,16 @@ config NUMA
>>>>>           local memory controller of the CPU and add some more
>>>>>           NUMA awareness to the kernel.
>>>>>
>>>>> +config OF_NUMA
>>>>> +       bool "Device Tree NUMA support"
>>>>> +       depends on NUMA
>>>>> +       depends on OF
>>>>> +       default y
>>>>> +       help
>>>>> +         Enable Device Tree NUMA support.
>>>>> +         This enables the numa mapping of cpu, memory, io and
>>>>> +         inter node distances using dt bindings.
>>>>
>>>> Enabling the above config option can cause numa_init() warning in
>>>> numa-less arm64 system, please see the following report:
>>> this is taken care. which version are you using?
>>
>> V5.
>
> The issue persists on V6 too, so looks V7 is needed.
can you please share the boot log with v6.
you can send to me only.
>
>>
>>>
>>>>
>>>>           https://bugs.launchpad.net/bugs/1509221
>>>>
>>>> Thanks,
>>>
>>> Thanks
>>> Ganapat
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]                       ` <CAFpQJXXN-xixG2k7JJuJeCbLkGmR4RKX3heEeaOCpWBVNL4Mog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  9:00                         ` Ming Lei
       [not found]                           ` <CACVXFVO=WuC01-B6Dxdi6CxT=B0kDJXG4VFsYxyEarfQwPaVqQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2015-10-26  9:00 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Pawel Moll, Al Stone,
	Prasun Kapoor, Mark Salter, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Len Brown, devicetree-u79uwXL29TY76Z2rM5mHXA, Steve Capper,
	Arnd Bergmann, Ian Campbell, marc.zyngier-5wv7dgnIgG8,
	Leif Lindholm, Robert Richter, Grant Likely, Rob Herring,
	linux-arm-kernel, Ard Biesheuvel, Rafael J. Wysocki, Hanjun Guo,
	Kumar

On Mon, Oct 26, 2015 at 4:24 PM, Ganapatrao Kulkarni
<gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Oct 26, 2015 at 12:50 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>> On Mon, Oct 26, 2015 at 12:53 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>> On Mon, Oct 26, 2015 at 11:50 AM, Ganapatrao Kulkarni
>>> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> On Mon, Oct 26, 2015 at 7:14 AM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>>>> On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
>>>>> <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
>>>>>> Adding numa dt binding support for arm64 based platforms.
>>>>>> dt node parsing for numa topology is done using device property
>>>>>> proximity and device node distance-map.
>>>>>>
>>>>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>>>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>>>>> ---
>>>>>>  arch/arm64/Kconfig            |  10 ++
>>>>>>  arch/arm64/include/asm/numa.h |  10 ++
>>>>>>  arch/arm64/kernel/Makefile    |   1 +
>>>>>>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>  arch/arm64/kernel/smp.c       |   1 +
>>>>>>  arch/arm64/mm/numa.c          |  10 +-
>>>>>>  6 files changed, 252 insertions(+), 1 deletion(-)
>>>>>>  create mode 100644 arch/arm64/kernel/of_numa.c
>>>>>>
>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>> index 0f9cdc7..6cf8d20 100644
>>>>>> --- a/arch/arm64/Kconfig
>>>>>> +++ b/arch/arm64/Kconfig
>>>>>> @@ -426,6 +426,16 @@ config NUMA
>>>>>>           local memory controller of the CPU and add some more
>>>>>>           NUMA awareness to the kernel.
>>>>>>
>>>>>> +config OF_NUMA
>>>>>> +       bool "Device Tree NUMA support"
>>>>>> +       depends on NUMA
>>>>>> +       depends on OF
>>>>>> +       default y
>>>>>> +       help
>>>>>> +         Enable Device Tree NUMA support.
>>>>>> +         This enables the numa mapping of cpu, memory, io and
>>>>>> +         inter node distances using dt bindings.
>>>>>
>>>>> Enabling the above config option can cause numa_init() warning in
>>>>> numa-less arm64 system, please see the following report:
>>>> this is taken care. which version are you using?
>>>
>>> V5.
>>
>> The issue persists on V6 too, so looks V7 is needed.
> can you please share the boot log with v6.
> you can send to me only.

http://kernel.ubuntu.com/~ming/bugs/1509221/4.3-rc7-numa.dmesg

>>
>>>
>>>>
>>>>>
>>>>>           https://bugs.launchpad.net/bugs/1509221
>>>>>
>>>>> Thanks,
>>>>
>>>> Thanks
>>>> Ganapat
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms
       [not found]                           ` <CACVXFVO=WuC01-B6Dxdi6CxT=B0kDJXG4VFsYxyEarfQwPaVqQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-26  9:24                             ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 19+ messages in thread
From: Ganapatrao Kulkarni @ 2015-10-26  9:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Pawel Moll, Al Stone,
	Prasun Kapoor, Mark Salter, rfranz-YGCgFSpz5w/QT0dZR+AlfA,
	Len Brown, devicetree-u79uwXL29TY76Z2rM5mHXA, Steve Capper,
	Arnd Bergmann, Ian Campbell, marc.zyngier-5wv7dgnIgG8,
	Leif Lindholm, Robert Richter, Grant Likely, Rob Herring,
	linux-arm-kernel, Ard Biesheuvel, Rafael J. Wysocki, Hanjun Guo,
	Kumar

On Mon, Oct 26, 2015 at 2:30 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
> On Mon, Oct 26, 2015 at 4:24 PM, Ganapatrao Kulkarni
> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Mon, Oct 26, 2015 at 12:50 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>> On Mon, Oct 26, 2015 at 12:53 PM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>>> On Mon, Oct 26, 2015 at 11:50 AM, Ganapatrao Kulkarni
>>>> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>> On Mon, Oct 26, 2015 at 7:14 AM, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org> wrote:
>>>>>> On Tue, Oct 20, 2015 at 6:45 PM, Ganapatrao Kulkarni
>>>>>> <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org> wrote:
>>>>>>> Adding numa dt binding support for arm64 based platforms.
>>>>>>> dt node parsing for numa topology is done using device property
>>>>>>> proximity and device node distance-map.
>>>>>>>
>>>>>>> Reviewed-by: Robert Richter <rrichter-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>>>>>>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>>>>>>> ---
>>>>>>>  arch/arm64/Kconfig            |  10 ++
>>>>>>>  arch/arm64/include/asm/numa.h |  10 ++
>>>>>>>  arch/arm64/kernel/Makefile    |   1 +
>>>>>>>  arch/arm64/kernel/of_numa.c   | 221 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  arch/arm64/kernel/smp.c       |   1 +
>>>>>>>  arch/arm64/mm/numa.c          |  10 +-
>>>>>>>  6 files changed, 252 insertions(+), 1 deletion(-)
>>>>>>>  create mode 100644 arch/arm64/kernel/of_numa.c
>>>>>>>
>>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>> index 0f9cdc7..6cf8d20 100644
>>>>>>> --- a/arch/arm64/Kconfig
>>>>>>> +++ b/arch/arm64/Kconfig
>>>>>>> @@ -426,6 +426,16 @@ config NUMA
>>>>>>>           local memory controller of the CPU and add some more
>>>>>>>           NUMA awareness to the kernel.
>>>>>>>
>>>>>>> +config OF_NUMA
>>>>>>> +       bool "Device Tree NUMA support"
>>>>>>> +       depends on NUMA
>>>>>>> +       depends on OF
>>>>>>> +       default y
>>>>>>> +       help
>>>>>>> +         Enable Device Tree NUMA support.
>>>>>>> +         This enables the numa mapping of cpu, memory, io and
>>>>>>> +         inter node distances using dt bindings.
>>>>>>
>>>>>> Enabling the above config option can cause numa_init() warning in
>>>>>> numa-less arm64 system, please see the following report:
>>>>> this is taken care. which version are you using?
>>>>
>>>> V5.
>>>
>>> The issue persists on V6 too, so looks V7 is needed.
>> can you please share the boot log with v6.
>> you can send to me only.
>
> http://kernel.ubuntu.com/~ming/bugs/1509221/4.3-rc7-numa.dmesg
it is due to,  memory nodes are getting deleted in mainline kernel and
no memory nodes are added to numa and it expects atleast 1.
thanks, this will be taken care in v7.
>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>           https://bugs.launchpad.net/bugs/1509221
>>>>>>
>>>>>> Thanks,
>>>>>
>>>>> Thanks
>>>>> Ganapat
>>>>>
>>>>> _______________________________________________
>>>>> linux-arm-kernel mailing list
>>>>> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-10-26  9:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-20 10:45 [PATCH v6 0/4] arm64, numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
2015-10-20 10:45 ` [PATCH v6 1/4] arm64, numa: adding " Ganapatrao Kulkarni
     [not found]   ` <1445337931-11344-2-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-10-20 14:47     ` Mark Rutland
2015-10-21  8:54       ` Ganapatrao Kulkarni
     [not found]         ` <CAFpQJXUXw2AP-fJR0eLJFnHQML3RbtbP7iNXT3RFtdc+0jzvKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  5:45           ` Ganapatrao Kulkarni
2015-10-23 15:11     ` Matthias Brugger
     [not found]       ` <562A4E21.2060600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-24 10:07         ` Ganapatrao Kulkarni
2015-10-20 10:45 ` [PATCH v6 2/4] Documentation, dt, arm64/arm: dt bindings for numa Ganapatrao Kulkarni
     [not found]   ` <1445337931-11344-3-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-10-20 15:35     ` Mark Rutland
2015-10-21  4:27       ` Ganapatrao Kulkarni
2015-10-20 10:45 ` [PATCH v6 3/4] arm64/arm, numa, dt: adding numa dt binding implementation for arm64 platforms Ganapatrao Kulkarni
     [not found]   ` <1445337931-11344-4-git-send-email-gkulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2015-10-26  1:44     ` Ming Lei
     [not found]       ` <CACVXFVMiZs_g3qw99-PhxLiX5oLe8_z5cOAfJZ_4-09jN2koCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  3:50         ` Ganapatrao Kulkarni
     [not found]           ` <CAFpQJXX8VXX+gTBn-+7MepSAOzbOuc-yWJe4f=B7MH_Nj7FsKw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  4:53             ` Ming Lei
     [not found]               ` <CACVXFVPoZqjFeqK7=M7wK0JQ0iS9tYoN8BHsYMc25OL-4j=e1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  7:20                 ` Ming Lei
     [not found]                   ` <CACVXFVO5QGu9z1C49Wos_A5MpPafABL+WDidiHp3S2v1xUsRZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  8:24                     ` Ganapatrao Kulkarni
     [not found]                       ` <CAFpQJXXN-xixG2k7JJuJeCbLkGmR4RKX3heEeaOCpWBVNL4Mog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  9:00                         ` Ming Lei
     [not found]                           ` <CACVXFVO=WuC01-B6Dxdi6CxT=B0kDJXG4VFsYxyEarfQwPaVqQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-26  9:24                             ` Ganapatrao Kulkarni
2015-10-20 10:45 ` [PATCH v6 4/4] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology Ganapatrao Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).