All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support
@ 2014-04-13  0:24 Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids) Huacai Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen, Hongliang Tao, Hua Yan

This patchset is prepared for the next 3.16 release for Linux/MIPS. In
this series we add NUMA and Loongson-3B support. Multiple Loongson-3A
chips can be interconnected with HT0-bus. This is a CC-NUMA system that
every chip (node) has its own local memory and cache coherency is
maintained by hardware. Loongson-3B is a 8-cores processor which looks
like there are two Loongson-3A integrated in one chip: 8 cores are
separated into two groups (two NUMA node).

V1 -> V2:
1, Rework the first patch.
2, Use compat numa-related syscall for N32/O32 ABI.
3, Drop the patch "MIPS: Loongson: Make CPU name more clear".

Huacai Chen(8):
 MIPS: Support hard limit of cpu count (nr_cpu_ids).
 MIPS: Support CPU topology files in sysfs.
 MIPS: Loongson: Modify ChipConfig register definition.
 MIPS: Add NUMA support for Loongson-3.
 MIPS: Add numa api support.
 MIPS: Add Loongson-3B support.
 MIPS: Loongson-3: Enable the COP2 usage.
 MIPS: Loongson: Rename CONFIG_LEMOTE_MACH3A to CONFIG_LOONGSON_MACH3X.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Hongliang Tao <taohl@lemote.com>
Signed-off-by: Hua Yan <yanh@lemote.com> 
---
 arch/mips/Kconfig                                  |    7 +-
 arch/mips/configs/loongson3_defconfig              |    2 +-
 arch/mips/include/asm/addrspace.h                  |    6 +
 arch/mips/include/asm/cop2.h                       |    8 +
 arch/mips/include/asm/cpu-info.h                   |    1 +
 arch/mips/include/asm/cpu.h                        |    2 +
 arch/mips/include/asm/mach-loongson/boot_param.h   |    4 +
 .../include/asm/mach-loongson/kernel-entry-init.h  |   51 +++
 arch/mips/include/asm/mach-loongson/loongson.h     |   11 +-
 arch/mips/include/asm/mach-loongson/machine.h      |    4 +-
 arch/mips/include/asm/mach-loongson/mmzone.h       |   51 +++
 arch/mips/include/asm/mach-loongson/topology.h     |   23 ++
 arch/mips/include/asm/smp.h                        |    6 +
 arch/mips/include/asm/sparsemem.h                  |    5 +
 arch/mips/kernel/cpu-probe.c                       |    6 +
 arch/mips/kernel/proc.c                            |    1 +
 arch/mips/kernel/scall32-o32.S                     |    4 +-
 arch/mips/kernel/scall64-64.S                      |    4 +-
 arch/mips/kernel/scall64-n32.S                     |   10 +-
 arch/mips/kernel/scall64-o32.S                     |    8 +-
 arch/mips/kernel/setup.c                           |   22 +-
 arch/mips/kernel/smp.c                             |   26 ++-
 arch/mips/loongson/Kconfig                         |    9 +-
 arch/mips/loongson/Platform                        |    2 +-
 arch/mips/loongson/common/env.c                    |   49 +++-
 arch/mips/loongson/common/init.c                   |    4 +
 arch/mips/loongson/common/pm.c                     |    8 +-
 arch/mips/loongson/lemote-2f/clock.c               |    4 +-
 arch/mips/loongson/lemote-2f/reset.c               |    2 +-
 arch/mips/loongson/loongson-3/Makefile             |    4 +-
 arch/mips/loongson/loongson-3/cop2-ex.c            |   63 ++++
 arch/mips/loongson/loongson-3/irq.c                |   26 +-
 arch/mips/loongson/loongson-3/numa.c               |  290 +++++++++++++++
 arch/mips/loongson/loongson-3/smp.c                |  387 +++++++++++++++-----
 arch/mips/loongson/loongson-3/smp.h                |   37 +-
 arch/mips/pci/Makefile                             |    2 +-
 drivers/cpufreq/loongson2_cpufreq.c                |    6 +-
 37 files changed, 997 insertions(+), 158 deletions(-)
 create mode 100644 arch/mips/include/asm/mach-loongson/kernel-entry-init.h
 create mode 100644 arch/mips/include/asm/mach-loongson/mmzone.h
 create mode 100644 arch/mips/include/asm/mach-loongson/topology.h
 create mode 100644 arch/mips/loongson/loongson-3/cop2-ex.c
 create mode 100644 arch/mips/loongson/loongson-3/numa.c
--
1.7.7.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids)
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-04-14 14:48   ` Andreas Herrmann
  2014-04-13  0:24 ` [PATCH V2 2/8] MIPS: Support CPU topology files in sysfs Huacai Chen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

On MIPS currently, only the soft limit of cpu count (maxcpus) has its
effect, this patch enable the hard limit (nr_cpus) as well. Processor
cores which greater than maxcpus and less than nr_cpus can be taken up
via cpu hotplug. The code is borrowed from X86.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/kernel/setup.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index a842154..2f01201 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -729,6 +729,25 @@ static void __init resource_init(void)
 	}
 }
 
+#ifdef CONFIG_SMP
+static void __init prefill_possible_map(void)
+{
+	int i, possible = num_possible_cpus();
+
+	if (possible > nr_cpu_ids)
+		possible = nr_cpu_ids;
+
+	for (i = 0; i < possible; i++)
+		set_cpu_possible(i, true);
+	for (; i < NR_CPUS; i++)
+		set_cpu_possible(i, false);
+
+	nr_cpu_ids = possible;
+}
+#else
+static inline void prefill_possible_map(void) {}
+#endif
+
 void __init setup_arch(char **cmdline_p)
 {
 	cpu_probe();
@@ -752,6 +771,7 @@ void __init setup_arch(char **cmdline_p)
 
 	resource_init();
 	plat_smp_setup();
+	prefill_possible_map();
 
 	cpu_cache_init();
 }
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 2/8] MIPS: Support CPU topology files in sysfs
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids) Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-04-14 15:04   ` Andreas Herrmann
  2014-04-13  0:24 ` [PATCH V2 3/8] MIPS: Loongson: Modify ChipConfig register definition Huacai Chen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

This patch is prepared for Loongson's NUMA support, it offer meaningful
sysfs files such as physical_package_id, core_id, core_siblings and
thread_siblings in /sys/devices/system/cpu/cpu?/topology.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cpu-info.h |    1 +
 arch/mips/include/asm/smp.h      |    6 ++++++
 arch/mips/kernel/proc.c          |    1 +
 arch/mips/kernel/smp.c           |   26 +++++++++++++++++++++++++-
 4 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
index dc2135b..2dfa00b 100644
--- a/arch/mips/include/asm/cpu-info.h
+++ b/arch/mips/include/asm/cpu-info.h
@@ -61,6 +61,7 @@ struct cpuinfo_mips {
 	struct cache_desc	scache; /* Secondary cache */
 	struct cache_desc	tcache; /* Tertiary/split secondary cache */
 	int			srsets; /* Shadow register sets */
+	int			package;/* physical package number */
 	int			core;	/* physical core number */
 #ifdef CONFIG_64BIT
 	int			vmbits; /* Virtual memory size in bits */
diff --git a/arch/mips/include/asm/smp.h b/arch/mips/include/asm/smp.h
index efa02ac..fea4051 100644
--- a/arch/mips/include/asm/smp.h
+++ b/arch/mips/include/asm/smp.h
@@ -22,6 +22,7 @@
 
 extern int smp_num_siblings;
 extern cpumask_t cpu_sibling_map[];
+extern cpumask_t cpu_core_map[];
 
 #define raw_smp_processor_id() (current_thread_info()->cpu)
 
@@ -36,6 +37,11 @@ extern int __cpu_logical_map[NR_CPUS];
 
 #define NO_PROC_ID	(-1)
 
+#define topology_physical_package_id(cpu)	(cpu_data[cpu].package)
+#define topology_core_id(cpu)			(cpu_data[cpu].core)
+#define topology_core_cpumask(cpu)		(&cpu_core_map[cpu])
+#define topology_thread_cpumask(cpu)		(&cpu_sibling_map[cpu])
+
 #define SMP_RESCHEDULE_YOURSELF 0x1	/* XXX braindead */
 #define SMP_CALL_FUNCTION	0x2
 /* Octeon - Tell another core to flush its icache */
diff --git a/arch/mips/kernel/proc.c b/arch/mips/kernel/proc.c
index 037a44d..62c4439 100644
--- a/arch/mips/kernel/proc.c
+++ b/arch/mips/kernel/proc.c
@@ -123,6 +123,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 		      cpu_data[n].srsets);
 	seq_printf(m, "kscratch registers\t: %d\n",
 		      hweight8(cpu_data[n].kscratch_mask));
+	seq_printf(m, "package\t\t\t: %d\n", cpu_data[n].package);
 	seq_printf(m, "core\t\t\t: %d\n", cpu_data[n].core);
 
 	sprintf(fmt, "VCE%%c exceptions\t\t: %s\n",
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 0a022ee..0fa5429 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -63,9 +63,16 @@ EXPORT_SYMBOL(smp_num_siblings);
 cpumask_t cpu_sibling_map[NR_CPUS] __read_mostly;
 EXPORT_SYMBOL(cpu_sibling_map);
 
+/* representing the core map of multi-core chips of each logical CPU */
+cpumask_t cpu_core_map[NR_CPUS] __read_mostly;
+EXPORT_SYMBOL(cpu_core_map);
+
 /* representing cpus for which sibling maps can be computed */
 static cpumask_t cpu_sibling_setup_map;
 
+/* representing cpus for which core maps can be computed */
+static cpumask_t cpu_core_setup_map;
+
 static inline void set_cpu_sibling_map(int cpu)
 {
 	int i;
@@ -74,7 +81,8 @@ static inline void set_cpu_sibling_map(int cpu)
 
 	if (smp_num_siblings > 1) {
 		for_each_cpu_mask(i, cpu_sibling_setup_map) {
-			if (cpu_data[cpu].core == cpu_data[i].core) {
+			if (cpu_data[cpu].package == cpu_data[i].package &&
+				    cpu_data[cpu].core == cpu_data[i].core) {
 				cpu_set(i, cpu_sibling_map[cpu]);
 				cpu_set(cpu, cpu_sibling_map[i]);
 			}
@@ -83,6 +91,20 @@ static inline void set_cpu_sibling_map(int cpu)
 		cpu_set(cpu, cpu_sibling_map[cpu]);
 }
 
+static inline void set_cpu_core_map(int cpu)
+{
+	int i;
+
+	cpu_set(cpu, cpu_core_setup_map);
+
+	for_each_cpu_mask(i, cpu_core_setup_map) {
+		if (cpu_data[cpu].package == cpu_data[i].package) {
+			cpu_set(i, cpu_core_map[cpu]);
+			cpu_set(cpu, cpu_core_map[i]);
+		}
+	}
+}
+
 struct plat_smp_ops *mp_ops;
 EXPORT_SYMBOL(mp_ops);
 
@@ -129,6 +151,7 @@ asmlinkage void start_secondary(void)
 	set_cpu_online(cpu, true);
 
 	set_cpu_sibling_map(cpu);
+	set_cpu_core_map(cpu);
 
 	cpu_set(cpu, cpu_callin_map);
 
@@ -183,6 +206,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	current_thread_info()->cpu = 0;
 	mp_ops->prepare_cpus(max_cpus);
 	set_cpu_sibling_map(0);
+	set_cpu_core_map(0);
 #ifndef CONFIG_HOTPLUG_CPU
 	init_cpu_present(cpu_possible_mask);
 #endif
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 3/8] MIPS: Loongson: Modify ChipConfig register definition
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids) Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 2/8] MIPS: Support CPU topology files in sysfs Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3 Huacai Chen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

This patch is prepared for Multi-chip interconnection. Since each chip
has a ChipConfig register, LOONGSON_CHIPCFG should be an array.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/mach-loongson/loongson.h |    7 +++++--
 arch/mips/loongson/common/env.c                |   11 +++++++++++
 arch/mips/loongson/common/pm.c                 |    8 ++++----
 arch/mips/loongson/lemote-2f/clock.c           |    4 ++--
 arch/mips/loongson/lemote-2f/reset.c           |    2 +-
 arch/mips/loongson/loongson-3/smp.c            |    4 ++--
 drivers/cpufreq/loongson2_cpufreq.c            |    6 +++---
 7 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/arch/mips/include/asm/mach-loongson/loongson.h b/arch/mips/include/asm/mach-loongson/loongson.h
index f3fd1eb..a1c76ca 100644
--- a/arch/mips/include/asm/mach-loongson/loongson.h
+++ b/arch/mips/include/asm/mach-loongson/loongson.h
@@ -249,8 +249,11 @@ static inline void do_perfcnt_IRQ(void)
 #define LOONGSON_PXARB_CFG		LOONGSON_REG(LOONGSON_REGBASE + 0x68)
 #define LOONGSON_PXARB_STATUS		LOONGSON_REG(LOONGSON_REGBASE + 0x6c)
 
-/* Chip Config */
-#define LOONGSON_CHIPCFG0		LOONGSON_REG(LOONGSON_REGBASE + 0x80)
+#define MAX_PACKAGES 4
+
+/* Chip Config registor of each physical cpu package, PRid >= Loongson-2F */
+extern u64 loongson_chipcfg[MAX_PACKAGES];
+#define LOONGSON_CHIPCFG(id) (*(volatile u32 *)(loongson_chipcfg[id]))
 
 /* pcimap */
 
diff --git a/arch/mips/loongson/common/env.c b/arch/mips/loongson/common/env.c
index 0c543ea..dc59241 100644
--- a/arch/mips/loongson/common/env.c
+++ b/arch/mips/loongson/common/env.c
@@ -27,6 +27,8 @@ EXPORT_SYMBOL(cpu_clock_freq);
 struct efi_memory_map_loongson *loongson_memmap;
 struct loongson_system_configuration loongson_sysconf;
 
+u64 loongson_chipcfg[MAX_PACKAGES] = {0xffffffffbfc00180};
+
 #define parse_even_earlier(res, option, p)				\
 do {									\
 	unsigned int tmp __maybe_unused;				\
@@ -77,6 +79,15 @@ void __init prom_init_env(void)
 
 	cpu_clock_freq = ecpu->cpu_clock_freq;
 	loongson_sysconf.cputype = ecpu->cputype;
+	if (ecpu->cputype == Loongson_3A) {
+		loongson_chipcfg[0] = 0x900000001fe00180;
+		loongson_chipcfg[1] = 0x900010001fe00180;
+		loongson_chipcfg[2] = 0x900020001fe00180;
+		loongson_chipcfg[3] = 0x900030001fe00180;
+	} else {
+		loongson_chipcfg[0] = 0x900000001fe00180;
+	}
+
 	loongson_sysconf.nr_cpus = ecpu->nr_cpus;
 	if (ecpu->nr_cpus > NR_CPUS || ecpu->nr_cpus == 0)
 		loongson_sysconf.nr_cpus = NR_CPUS;
diff --git a/arch/mips/loongson/common/pm.c b/arch/mips/loongson/common/pm.c
index f55e07a..a6b67cc 100644
--- a/arch/mips/loongson/common/pm.c
+++ b/arch/mips/loongson/common/pm.c
@@ -79,7 +79,7 @@ int __weak wakeup_loongson(void)
 static void wait_for_wakeup_events(void)
 {
 	while (!wakeup_loongson())
-		LOONGSON_CHIPCFG0 &= ~0x7;
+		LOONGSON_CHIPCFG(0) &= ~0x7;
 }
 
 /*
@@ -102,15 +102,15 @@ static void loongson_suspend_enter(void)
 
 	stop_perf_counters();
 
-	cached_cpu_freq = LOONGSON_CHIPCFG0;
+	cached_cpu_freq = LOONGSON_CHIPCFG(0);
 
 	/* Put CPU into wait mode */
-	LOONGSON_CHIPCFG0 &= ~0x7;
+	LOONGSON_CHIPCFG(0) &= ~0x7;
 
 	/* wait for the given events to wakeup cpu from wait mode */
 	wait_for_wakeup_events();
 
-	LOONGSON_CHIPCFG0 = cached_cpu_freq;
+	LOONGSON_CHIPCFG(0) = cached_cpu_freq;
 	mmiowb();
 }
 
diff --git a/arch/mips/loongson/lemote-2f/clock.c b/arch/mips/loongson/lemote-2f/clock.c
index 7d8c9cc..2e2067b 100644
--- a/arch/mips/loongson/lemote-2f/clock.c
+++ b/arch/mips/loongson/lemote-2f/clock.c
@@ -120,10 +120,10 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
 
 	clk->rate = rate;
 
-	regval = LOONGSON_CHIPCFG0;
+	regval = LOONGSON_CHIPCFG(0);
 	regval = (regval & ~0x7) |
 		(loongson2_clockmod_table[i].driver_data - 1);
-	LOONGSON_CHIPCFG0 = regval;
+	LOONGSON_CHIPCFG(0) = regval;
 
 	return ret;
 }
diff --git a/arch/mips/loongson/lemote-2f/reset.c b/arch/mips/loongson/lemote-2f/reset.c
index 90962a3..79ac694 100644
--- a/arch/mips/loongson/lemote-2f/reset.c
+++ b/arch/mips/loongson/lemote-2f/reset.c
@@ -28,7 +28,7 @@ static void reset_cpu(void)
 	 * reset cpu to full speed, this is needed when enabling cpu frequency
 	 * scalling
 	 */
-	LOONGSON_CHIPCFG0 |= 0x7;
+	LOONGSON_CHIPCFG(0) |= 0x7;
 }
 
 /* reset support for fuloong2f */
diff --git a/arch/mips/loongson/loongson-3/smp.c b/arch/mips/loongson/loongson-3/smp.c
index c665fe1..1d120d3 100644
--- a/arch/mips/loongson/loongson-3/smp.c
+++ b/arch/mips/loongson/loongson-3/smp.c
@@ -406,12 +406,12 @@ static int loongson3_cpu_callback(struct notifier_block *nfb,
 	case CPU_POST_DEAD:
 	case CPU_POST_DEAD_FROZEN:
 		pr_info("Disable clock for CPU#%d\n", cpu);
-		LOONGSON_CHIPCFG0 &= ~(1 << (12 + cpu));
+		LOONGSON_CHIPCFG(0) &= ~(1 << (12 + cpu));
 		break;
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
 		pr_info("Enable clock for CPU#%d\n", cpu);
-		LOONGSON_CHIPCFG0 |= 1 << (12 + cpu);
+		LOONGSON_CHIPCFG(0) |= 1 << (12 + cpu);
 		break;
 	}
 
diff --git a/drivers/cpufreq/loongson2_cpufreq.c b/drivers/cpufreq/loongson2_cpufreq.c
index fe891ed..82a7f7f 100644
--- a/drivers/cpufreq/loongson2_cpufreq.c
+++ b/drivers/cpufreq/loongson2_cpufreq.c
@@ -148,9 +148,9 @@ static void loongson2_cpu_wait(void)
 	u32 cpu_freq;
 
 	spin_lock_irqsave(&loongson2_wait_lock, flags);
-	cpu_freq = LOONGSON_CHIPCFG0;
-	LOONGSON_CHIPCFG0 &= ~0x7;	/* Put CPU into wait mode */
-	LOONGSON_CHIPCFG0 = cpu_freq;	/* Restore CPU state */
+	cpu_freq = LOONGSON_CHIPCFG(0);
+	LOONGSON_CHIPCFG(0) &= ~0x7;	/* Put CPU into wait mode */
+	LOONGSON_CHIPCFG(0) = cpu_freq;	/* Restore CPU state */
 	spin_unlock_irqrestore(&loongson2_wait_lock, flags);
 	local_irq_enable();
 }
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
                   ` (2 preceding siblings ...)
  2014-04-13  0:24 ` [PATCH V2 3/8] MIPS: Loongson: Modify ChipConfig register definition Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-06-03 22:47   ` Ralf Baechle
  2014-04-13  0:24 ` [PATCH V2 5/8] MIPS: Add numa api support Huacai Chen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

Multiple Loongson-3A chips can be interconnected with HT0-bus. This is
a CC-NUMA system that every chip (node) has its own local memory and
cache coherency is maintained by hardware. The 64-bit physical memory
address format is as follows:

0x-0000-YZZZ-ZZZZ-ZZZZ

The high 16 bits should be 0, which means the real physical address
supported by Loongson-3 is 48-bit. The "Y" bits is the base address of
each node, which can be also considered as the node-id. The "Z" bits is
the address offset within a node, which means every node has a 44 bits
address space.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                  |    7 +-
 arch/mips/include/asm/addrspace.h                  |    6 +
 arch/mips/include/asm/mach-loongson/boot_param.h   |    3 +
 .../include/asm/mach-loongson/kernel-entry-init.h  |   51 ++++
 arch/mips/include/asm/mach-loongson/mmzone.h       |   51 ++++
 arch/mips/include/asm/mach-loongson/topology.h     |   23 ++
 arch/mips/include/asm/sparsemem.h                  |    5 +
 arch/mips/kernel/setup.c                           |    2 +-
 arch/mips/loongson/Kconfig                         |    1 +
 arch/mips/loongson/common/env.c                    |    7 +
 arch/mips/loongson/common/init.c                   |    4 +
 arch/mips/loongson/loongson-3/Makefile             |    2 +
 arch/mips/loongson/loongson-3/numa.c               |  290 ++++++++++++++++++++
 arch/mips/loongson/loongson-3/smp.c                |    8 +-
 14 files changed, 456 insertions(+), 4 deletions(-)
 create mode 100644 arch/mips/include/asm/mach-loongson/kernel-entry-init.h
 create mode 100644 arch/mips/include/asm/mach-loongson/mmzone.h
 create mode 100644 arch/mips/include/asm/mach-loongson/topology.h
 create mode 100644 arch/mips/loongson/loongson-3/numa.c

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 5cd695f..ffac45b 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2233,9 +2233,14 @@ config SYS_SUPPORTS_NUMA
 	bool
 
 config NODES_SHIFT
-	int
+	int "Maximum Number of NUMA Nodes Shift"
+	range 1 10
 	default "6"
 	depends on NEED_MULTIPLE_NODES
+	help
+	  This option specifies the maximum number of available NUMA nodes
+	  on the target system. MAX_NUMNODES will be 2^(This value).
+	  If in doubt, use the default.
 
 config HW_PERF_EVENTS
 	bool "Enable hardware performance counter support for perf events"
diff --git a/arch/mips/include/asm/addrspace.h b/arch/mips/include/asm/addrspace.h
index 3f74545..091b317 100644
--- a/arch/mips/include/asm/addrspace.h
+++ b/arch/mips/include/asm/addrspace.h
@@ -51,8 +51,14 @@
  * Returns the physical address of a CKSEGx / XKPHYS address
  */
 #define CPHYSADDR(a)		((_ACAST32_(a)) & 0x1fffffff)
+
+#ifndef CONFIG_NUMA
 #define XPHYSADDR(a)		((_ACAST64_(a)) &			\
 				 _CONST64_(0x000000ffffffffff))
+#else
+#define XPHYSADDR(a)		((_ACAST64_(a)) &			\
+				 _CONST64_(0x0000ffffffffffff))
+#endif
 
 #ifdef CONFIG_64BIT
 
diff --git a/arch/mips/include/asm/mach-loongson/boot_param.h b/arch/mips/include/asm/mach-loongson/boot_param.h
index 829a7ec..8b06c96 100644
--- a/arch/mips/include/asm/mach-loongson/boot_param.h
+++ b/arch/mips/include/asm/mach-loongson/boot_param.h
@@ -146,6 +146,9 @@ struct boot_params {
 
 struct loongson_system_configuration {
 	u32 nr_cpus;
+	u32 nr_nodes;
+	int cores_per_node;
+	int cores_per_package;
 	enum loongson_cpu_type cputype;
 	u64 ht_control_base;
 	u64 pci_mem_start_addr;
diff --git a/arch/mips/include/asm/mach-loongson/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson/kernel-entry-init.h
new file mode 100644
index 0000000..d7abef5
--- /dev/null
+++ b/arch/mips/include/asm/mach-loongson/kernel-entry-init.h
@@ -0,0 +1,51 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2005 Embedded Alley Solutions, Inc
+ * Copyright (C) 2005 Ralf Baechle (ralf@linux-mips.org)
+ * Copyright (C) 2009 Jiajie Chen (chenjiajie@cse.buaa.edu.cn)
+ */
+#ifndef __ASM_MACH_LOONGSON_KERNEL_ENTRY_H
+#define __ASM_MACH_LOONGSON_KERNEL_ENTRY_H
+
+/*
+ * Override macros used in arch/mips/kernel/head.S.
+ */
+	.macro	kernel_entry_setup
+#ifdef CONFIG_NUMA
+	.set	push
+	.set	mips64
+	/* Set LPA on LOONGSON3 config3 */
+	mfc0	t0, $16, 3
+	or	t0, (0x1 << 7)
+	mtc0	t0, $16, 3
+	/* Set ELPA on LOONGSON3 pagegrain */
+	li	t0, (0x1 << 29)
+	mtc0	t0, $5, 1
+	_ehb
+	.set	pop
+#endif
+	.endm
+
+/*
+ * Do SMP slave processor setup.
+ */
+	.macro	smp_slave_setup
+#ifdef CONFIG_NUMA
+	.set	push
+	.set	mips64
+	/* Set LPA on LOONGSON3 config3 */
+	mfc0	t0, $16, 3
+	or	t0, (0x1 << 7)
+	mtc0	t0, $16, 3
+	/* Set ELPA on LOONGSON3 pagegrain */
+	li	t0, (0x1 << 29)
+	mtc0	t0, $5, 1
+	_ehb
+	.set	pop
+#endif
+	.endm
+
+#endif /* __ASM_MACH_LOONGSON_KERNEL_ENTRY_H */
diff --git a/arch/mips/include/asm/mach-loongson/mmzone.h b/arch/mips/include/asm/mach-loongson/mmzone.h
new file mode 100644
index 0000000..be9bad4
--- /dev/null
+++ b/arch/mips/include/asm/mach-loongson/mmzone.h
@@ -0,0 +1,51 @@
+/*
+ * Copyright (C) 2010 Loongson Inc. & Insititute of Computing Technology
+ * Author:  Gao Xiang, gaoxiang@ict.ac.cn
+ *          Meng Xiaofu, Zhang Shuangshuang
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+#ifndef _ASM_MACH_MMZONE_H
+#define _ASM_MACH_MMZONE_H
+
+#include <boot_param.h>
+#define NODE_ADDRSPACE_SHIFT 44
+#define NODE0_ADDRSPACE_OFFSET 0x000000000000UL
+#define NODE1_ADDRSPACE_OFFSET 0x100000000000UL
+#define NODE2_ADDRSPACE_OFFSET 0x200000000000UL
+#define NODE3_ADDRSPACE_OFFSET 0x300000000000UL
+
+#define pa_to_nid(addr)  (((addr) & 0xf00000000000) >> NODE_ADDRSPACE_SHIFT)
+
+#define LEVELS_PER_SLICE 128
+
+struct slice_data {
+	unsigned long irq_enable_mask[2];
+	int level_to_irq[LEVELS_PER_SLICE];
+};
+
+struct hub_data {
+	cpumask_t	h_cpus;
+	unsigned long slice_map;
+	unsigned long irq_alloc_mask[2];
+	struct slice_data slice[2];
+};
+
+struct node_data {
+	struct pglist_data pglist;
+	struct hub_data hub;
+	cpumask_t cpumask;
+};
+
+extern struct node_data *__node_data[];
+
+#define NODE_DATA(n)		(&__node_data[(n)]->pglist)
+#define hub_data(n)		(&__node_data[(n)]->hub)
+
+extern void setup_zero_pages(void);
+extern void __init prom_init_numa_memory(void);
+
+#endif /* _ASM_MACH_MMZONE_H */
diff --git a/arch/mips/include/asm/mach-loongson/topology.h b/arch/mips/include/asm/mach-loongson/topology.h
new file mode 100644
index 0000000..5598ba7
--- /dev/null
+++ b/arch/mips/include/asm/mach-loongson/topology.h
@@ -0,0 +1,23 @@
+#ifndef _ASM_MACH_TOPOLOGY_H
+#define _ASM_MACH_TOPOLOGY_H
+
+#ifdef CONFIG_NUMA
+
+#define cpu_to_node(cpu)	((cpu) >> 2)
+#define parent_node(node)	(node)
+#define cpumask_of_node(node)	(&__node_data[(node)]->cpumask)
+
+struct pci_bus;
+extern int pcibus_to_node(struct pci_bus *);
+
+#define cpumask_of_pcibus(bus)	(cpu_online_mask)
+
+extern unsigned char __node_distances[MAX_NUMNODES][MAX_NUMNODES];
+
+#define node_distance(from, to)	(__node_distances[(from)][(to)])
+
+#endif
+
+#include <asm-generic/topology.h>
+
+#endif /* _ASM_MACH_TOPOLOGY_H */
diff --git a/arch/mips/include/asm/sparsemem.h b/arch/mips/include/asm/sparsemem.h
index d2da53c..c001a90 100644
--- a/arch/mips/include/asm/sparsemem.h
+++ b/arch/mips/include/asm/sparsemem.h
@@ -11,7 +11,12 @@
 #else
 # define SECTION_SIZE_BITS	28
 #endif
+
+#ifdef CONFIG_NUMA
+#define MAX_PHYSMEM_BITS	48
+#else
 #define MAX_PHYSMEM_BITS	35
+#endif
 
 #endif /* CONFIG_SPARSEMEM */
 #endif /* _MIPS_SPARSEMEM_H */
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2f01201..7c1fe2b 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -282,7 +282,7 @@ static unsigned long __init init_initrd(void)
  * Initialize the bootmem allocator. It also setup initrd related data
  * if needed.
  */
-#ifdef CONFIG_SGI_IP27
+#if defined(CONFIG_SGI_IP27) || (defined(CONFIG_CPU_LOONGSON3) && defined(CONFIG_NUMA))
 
 static void __init bootmem_init(void)
 {
diff --git a/arch/mips/loongson/Kconfig b/arch/mips/loongson/Kconfig
index 603d79a..976c858 100644
--- a/arch/mips/loongson/Kconfig
+++ b/arch/mips/loongson/Kconfig
@@ -79,6 +79,7 @@ config LEMOTE_MACH3A
 	select SYS_HAS_EARLY_PRINTK
 	select SYS_SUPPORTS_SMP
 	select SYS_SUPPORTS_HOTPLUG_CPU
+	select SYS_SUPPORTS_NUMA
 	select SYS_SUPPORTS_64BIT_KERNEL
 	select SYS_SUPPORTS_HIGHMEM
 	select SYS_SUPPORTS_LITTLE_ENDIAN
diff --git a/arch/mips/loongson/common/env.c b/arch/mips/loongson/common/env.c
index dc59241..33a13b9 100644
--- a/arch/mips/loongson/common/env.c
+++ b/arch/mips/loongson/common/env.c
@@ -80,17 +80,24 @@ void __init prom_init_env(void)
 	cpu_clock_freq = ecpu->cpu_clock_freq;
 	loongson_sysconf.cputype = ecpu->cputype;
 	if (ecpu->cputype == Loongson_3A) {
+		loongson_sysconf.cores_per_node = 4;
+		loongson_sysconf.cores_per_package = 4;
 		loongson_chipcfg[0] = 0x900000001fe00180;
 		loongson_chipcfg[1] = 0x900010001fe00180;
 		loongson_chipcfg[2] = 0x900020001fe00180;
 		loongson_chipcfg[3] = 0x900030001fe00180;
 	} else {
+		loongson_sysconf.cores_per_node = 1;
+		loongson_sysconf.cores_per_package = 1;
 		loongson_chipcfg[0] = 0x900000001fe00180;
 	}
 
 	loongson_sysconf.nr_cpus = ecpu->nr_cpus;
 	if (ecpu->nr_cpus > NR_CPUS || ecpu->nr_cpus == 0)
 		loongson_sysconf.nr_cpus = NR_CPUS;
+	loongson_sysconf.nr_nodes = (loongson_sysconf.nr_cpus +
+		loongson_sysconf.cores_per_node - 1) /
+		loongson_sysconf.cores_per_node;
 
 	loongson_sysconf.pci_mem_start_addr = eirq_source->pci_mem_start_addr;
 	loongson_sysconf.pci_mem_end_addr = eirq_source->pci_mem_end_addr;
diff --git a/arch/mips/loongson/common/init.c b/arch/mips/loongson/common/init.c
index f37fe54..f6af3ab 100644
--- a/arch/mips/loongson/common/init.c
+++ b/arch/mips/loongson/common/init.c
@@ -30,7 +30,11 @@ void __init prom_init(void)
 	set_io_port_base((unsigned long)
 		ioremap(LOONGSON_PCIIO_BASE, LOONGSON_PCIIO_SIZE));
 
+#ifdef CONFIG_NUMA
+	prom_init_numa_memory();
+#else
 	prom_init_memory();
+#endif
 
 	/*init the uart base address */
 	prom_init_uart_base();
diff --git a/arch/mips/loongson/loongson-3/Makefile b/arch/mips/loongson/loongson-3/Makefile
index 70152b2..471b0f2a 100644
--- a/arch/mips/loongson/loongson-3/Makefile
+++ b/arch/mips/loongson/loongson-3/Makefile
@@ -4,3 +4,5 @@
 obj-y			+= irq.o
 
 obj-$(CONFIG_SMP)	+= smp.o
+
+obj-$(CONFIG_NUMA)	+= numa.o
diff --git a/arch/mips/loongson/loongson-3/numa.c b/arch/mips/loongson/loongson-3/numa.c
new file mode 100644
index 0000000..d667db1
--- /dev/null
+++ b/arch/mips/loongson/loongson-3/numa.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright (C) 2010 Loongson Inc. & Insititute of Computing Technology
+ * Author:  Gao Xiang, gaoxiang@ict.ac.cn
+ *          Meng Xiaofu, Zhang Shuangshuang
+ *          Chen Huacai, chenhc@lemote.com
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/swap.h>
+#include <linux/memblock.h>
+#include <linux/bootmem.h>
+#include <linux/pfn.h>
+#include <linux/highmem.h>
+#include <asm/page.h>
+#include <asm/pgalloc.h>
+#include <asm/sections.h>
+#include <linux/bootmem.h>
+#include <linux/init.h>
+#include <linux/irq.h>
+#include <asm/bootinfo.h>
+#include <asm/mc146818-time.h>
+#include <asm/time.h>
+#include <asm/wbflush.h>
+#include <boot_param.h>
+
+static struct node_data prealloc__node_data[MAX_NUMNODES];
+unsigned char __node_distances[MAX_NUMNODES][MAX_NUMNODES];
+struct node_data *__node_data[MAX_NUMNODES];
+EXPORT_SYMBOL(__node_data);
+
+static void enable_lpa(void)
+{
+	unsigned long value;
+
+	value = __read_32bit_c0_register($16, 3);
+	value |= 0x00000080;
+	__write_32bit_c0_register($16, 3, value);
+	value = __read_32bit_c0_register($16, 3);
+	pr_info("CP0_Config3: CP0 16.3 (0x%lx)\n", value);
+
+	value = __read_32bit_c0_register($5, 1);
+	value |= 0x20000000;
+	__write_32bit_c0_register($5, 1, value);
+	value = __read_32bit_c0_register($5, 1);
+	pr_info("CP0_PageGrain: CP0 5.1 (0x%lx)\n", value);
+}
+
+static void cpu_node_probe(void)
+{
+	int i;
+
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+	for (i = 0; i < loongson_sysconf.nr_nodes; i++) {
+		node_set_state(num_online_nodes(), N_POSSIBLE);
+		node_set_online(num_online_nodes());
+	}
+
+	pr_info("NUMA: Discovered %d cpus on %d nodes\n",
+		loongson_sysconf.nr_cpus, num_online_nodes());
+}
+
+static int __init compute_node_distance(int row, int col)
+{
+	int package_row = row * loongson_sysconf.cores_per_node /
+				loongson_sysconf.cores_per_package;
+	int package_col = col * loongson_sysconf.cores_per_node /
+				loongson_sysconf.cores_per_package;
+
+	if (col == row)
+		return 0;
+	else if (package_row == package_col)
+		return 40;
+	else
+		return 100;
+}
+
+static void __init init_topology_matrix(void)
+{
+	int row, col;
+
+	for (row = 0; row < MAX_NUMNODES; row++)
+		for (col = 0; col < MAX_NUMNODES; col++)
+			__node_distances[row][col] = -1;
+
+	for_each_online_node(row) {
+		for_each_online_node(col) {
+			__node_distances[row][col] =
+				compute_node_distance(row, col);
+		}
+	}
+}
+
+static unsigned long nid_to_addroffset(unsigned int nid)
+{
+	unsigned long result;
+	switch (nid) {
+	case 0:
+	default:
+		result = NODE0_ADDRSPACE_OFFSET;
+		break;
+	case 1:
+		result = NODE1_ADDRSPACE_OFFSET;
+		break;
+	case 2:
+		result = NODE2_ADDRSPACE_OFFSET;
+		break;
+	case 3:
+		result = NODE3_ADDRSPACE_OFFSET;
+		break;
+	}
+	return result;
+}
+
+static void __init szmem(unsigned int node)
+{
+	u32 i, mem_type;
+	static unsigned long num_physpages = 0;
+	u64 node_id, node_psize, start_pfn, end_pfn, mem_start, mem_size;
+
+	/* Parse memory information and activate */
+	for (i = 0; i < loongson_memmap->nr_map; i++) {
+		node_id = loongson_memmap->map[i].node_id;
+		if (node_id != node)
+			continue;
+
+		mem_type = loongson_memmap->map[i].mem_type;
+		mem_size = loongson_memmap->map[i].mem_size;
+		mem_start = loongson_memmap->map[i].mem_start;
+
+		switch (mem_type) {
+		case SYSTEM_RAM_LOW:
+			start_pfn = ((node_id << 44) + mem_start) >> PAGE_SHIFT;
+			node_psize = (mem_size << 20) >> PAGE_SHIFT;
+			end_pfn  = start_pfn + node_psize;
+			num_physpages += node_psize;
+			pr_info("Node%d: mem_type:%d, mem_start:0x%llx, mem_size:0x%llx MB\n",
+				(u32)node_id, mem_type, mem_start, mem_size);
+			pr_info("       start_pfn:0x%llx, end_pfn:0x%llx, num_physpages:0x%lx\n",
+				start_pfn, end_pfn, num_physpages);
+			add_memory_region((node_id << 44) + mem_start,
+				(u64)mem_size << 20, BOOT_MEM_RAM);
+			memblock_add_node(PFN_PHYS(start_pfn),
+				PFN_PHYS(end_pfn - start_pfn), node);
+			break;
+		case SYSTEM_RAM_HIGH:
+			start_pfn = ((node_id << 44) + mem_start) >> PAGE_SHIFT;
+			node_psize = (mem_size << 20) >> PAGE_SHIFT;
+			end_pfn  = start_pfn + node_psize;
+			num_physpages += node_psize;
+			pr_info("Node%d: mem_type:%d, mem_start:0x%llx, mem_size:0x%llx MB\n",
+				(u32)node_id, mem_type, mem_start, mem_size);
+			pr_info("       start_pfn:0x%llx, end_pfn:0x%llx, num_physpages:0x%lx\n",
+				start_pfn, end_pfn, num_physpages);
+			add_memory_region((node_id << 44) + mem_start,
+				(u64)mem_size << 20, BOOT_MEM_RAM);
+			memblock_add_node(PFN_PHYS(start_pfn),
+				PFN_PHYS(end_pfn - start_pfn), node);
+			break;
+		case MEM_RESERVED:
+			pr_info("Node%d: mem_type:%d, mem_start:0x%llx, mem_size:0x%llx MB\n",
+				(u32)node_id, mem_type, mem_start, mem_size);
+			add_memory_region((node_id << 44) + mem_start,
+				(u64)mem_size << 20, BOOT_MEM_RESERVED);
+			memblock_reserve(((node_id << 44) + mem_start),
+				mem_size << 20);
+			break;
+		}
+	}
+}
+
+static void __init node_mem_init(unsigned int node)
+{
+	unsigned long bootmap_size;
+	unsigned long node_addrspace_offset;
+	unsigned long start_pfn, end_pfn, freepfn;
+
+	node_addrspace_offset = nid_to_addroffset(node);
+	pr_info("Node%d's addrspace_offset is 0x%lx\n",
+			node, node_addrspace_offset);
+
+	get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
+	freepfn = start_pfn;
+	if (node == 0)
+		freepfn = PFN_UP(__pa_symbol(&_end)); /* kernel end address */
+	pr_info("Node%d: start_pfn=0x%lx, end_pfn=0x%lx, freepfn=0x%lx\n",
+		node, start_pfn, end_pfn, freepfn);
+
+	__node_data[node] = prealloc__node_data + node;
+
+	NODE_DATA(node)->bdata = &bootmem_node_data[node];
+	NODE_DATA(node)->node_start_pfn = start_pfn;
+	NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
+
+	bootmap_size = init_bootmem_node(NODE_DATA(node), freepfn,
+					start_pfn, end_pfn);
+	free_bootmem_with_active_regions(node, end_pfn);
+	if (node == 0) /* used by finalize_initrd() */
+		max_low_pfn = end_pfn;
+
+	/* This is reserved for the kernel and bdata->node_bootmem_map */
+	reserve_bootmem_node(NODE_DATA(node), start_pfn << PAGE_SHIFT,
+		((freepfn - start_pfn) << PAGE_SHIFT) + bootmap_size,
+		BOOTMEM_DEFAULT);
+
+	if (node == 0 && node_end_pfn(0) >= (0xffffffff >> PAGE_SHIFT)) {
+		/* Reserve 0xff800000~0xffffffff for RS780E integrated GPU */
+		reserve_bootmem_node(NODE_DATA(node),
+				(node_addrspace_offset | 0xff800000),
+				8 << 20, BOOTMEM_DEFAULT);
+	}
+
+	sparse_memory_present_with_active_regions(node);
+}
+
+static __init void prom_meminit(void)
+{
+	unsigned int node, cpu;
+
+	cpu_node_probe();
+	init_topology_matrix();
+
+	for (node = 0; node < loongson_sysconf.nr_nodes; node++) {
+		if (node_online(node)) {
+			szmem(node);
+			node_mem_init(node);
+			cpus_clear(__node_data[(node)]->cpumask);
+		}
+	}
+	for (cpu = 0; cpu < loongson_sysconf.nr_cpus; cpu++) {
+		node = cpu / loongson_sysconf.cores_per_node;
+		if (node >= num_online_nodes())
+			node = 0;
+		pr_info("NUMA: set cpumask cpu %d on node %d\n", cpu, node);
+		cpu_set(cpu, __node_data[(node)]->cpumask);
+	}
+}
+
+void __init paging_init(void)
+{
+	unsigned node;
+	unsigned long zones_size[MAX_NR_ZONES] = {0, };
+
+	pagetable_init();
+
+	for_each_online_node(node) {
+		unsigned long  start_pfn, end_pfn;
+
+		get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
+
+		if (end_pfn > max_low_pfn)
+			max_low_pfn = end_pfn;
+	}
+#ifdef CONFIG_ZONE_DMA32
+	zones_size[ZONE_DMA32] = MAX_DMA32_PFN;
+#endif
+	zones_size[ZONE_NORMAL] = max_low_pfn;
+	free_area_init_nodes(zones_size);
+}
+
+void __init mem_init(void)
+{
+	high_memory = (void *) __va(get_num_physpages() << PAGE_SHIFT);
+	free_all_bootmem();
+	setup_zero_pages();	/* This comes from node 0 */
+	mem_init_print_info(NULL);
+}
+
+/* All PCI device belongs to logical Node-0 */
+int pcibus_to_node(struct pci_bus *bus)
+{
+	return 0;
+}
+EXPORT_SYMBOL(pcibus_to_node);
+
+void __init prom_init_numa_memory(void)
+{
+	enable_lpa();
+	prom_meminit();
+}
+EXPORT_SYMBOL(prom_init_numa_memory);
diff --git a/arch/mips/loongson/loongson-3/smp.c b/arch/mips/loongson/loongson-3/smp.c
index 1d120d3..f99122c 100644
--- a/arch/mips/loongson/loongson-3/smp.c
+++ b/arch/mips/loongson/loongson-3/smp.c
@@ -203,6 +203,8 @@ static void loongson3_init_secondary(void)
 	for (i = 0; i < loongson_sysconf.nr_cpus; i++)
 		loongson3_ipi_write32(0xffffffff, ipi_en0_regs[i]);
 
+	cpu_data[cpu].package = cpu / loongson_sysconf.cores_per_package;
+	cpu_data[cpu].core = cpu % loongson_sysconf.cores_per_package;
 	per_cpu(cpu_state, cpu) = CPU_ONLINE;
 
 	i = 0;
@@ -401,17 +403,19 @@ static int loongson3_cpu_callback(struct notifier_block *nfb,
 	unsigned long action, void *hcpu)
 {
 	unsigned int cpu = (unsigned long)hcpu;
+	uint64_t core_id = cpu_data[cpu].core;
+	uint64_t package_id = cpu_data[cpu].package;
 
 	switch (action) {
 	case CPU_POST_DEAD:
 	case CPU_POST_DEAD_FROZEN:
 		pr_info("Disable clock for CPU#%d\n", cpu);
-		LOONGSON_CHIPCFG(0) &= ~(1 << (12 + cpu));
+		LOONGSON_CHIPCFG(package_id) &= ~(1 << (12 + core_id));
 		break;
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
 		pr_info("Enable clock for CPU#%d\n", cpu);
-		LOONGSON_CHIPCFG(0) |= 1 << (12 + cpu);
+		LOONGSON_CHIPCFG(package_id) |= 1 << (12 + core_id);
 		break;
 	}
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 5/8] MIPS: Add numa api support
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
                   ` (3 preceding siblings ...)
  2014-04-13  0:24 ` [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3 Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 6/8] MIPS: Add Loongson-3B support Huacai Chen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

Enable sys_mbind()/sys_get_mempolicy()/sys_set_mempolicy() for O32, N32,
and N64 ABIs. By the way, O32/N32 should use the compat version of
sys_migrate_pages()/sys_move_pages(), so fix that.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/kernel/scall32-o32.S |    4 ++--
 arch/mips/kernel/scall64-64.S  |    4 ++--
 arch/mips/kernel/scall64-n32.S |   10 +++++-----
 arch/mips/kernel/scall64-o32.S |    8 ++++----
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index fdc70b4..7f7e2fb 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -495,8 +495,8 @@ EXPORT(sys_call_table)
 	PTR	sys_tgkill
 	PTR	sys_utimes
 	PTR	sys_mbind
-	PTR	sys_ni_syscall			/* sys_get_mempolicy */
-	PTR	sys_ni_syscall			/* 4270 sys_set_mempolicy */
+	PTR	sys_get_mempolicy
+	PTR	sys_set_mempolicy		/* 4270 */
 	PTR	sys_mq_open
 	PTR	sys_mq_unlink
 	PTR	sys_mq_timedsend
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index dd99c328..a4baf06 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -347,8 +347,8 @@ EXPORT(sys_call_table)
 	PTR	sys_tgkill			/* 5225 */
 	PTR	sys_utimes
 	PTR	sys_mbind
-	PTR	sys_ni_syscall			/* sys_get_mempolicy */
-	PTR	sys_ni_syscall			/* sys_set_mempolicy */
+	PTR	sys_get_mempolicy
+	PTR	sys_set_mempolicy
 	PTR	sys_mq_open			/* 5230 */
 	PTR	sys_mq_unlink
 	PTR	sys_mq_timedsend
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index f68d2f4..6811d35 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -339,9 +339,9 @@ EXPORT(sysn32_call_table)
 	PTR	compat_sys_clock_nanosleep
 	PTR	sys_tgkill
 	PTR	compat_sys_utimes		/* 6230 */
-	PTR	sys_ni_syscall			/* sys_mbind */
-	PTR	sys_ni_syscall			/* sys_get_mempolicy */
-	PTR	sys_ni_syscall			/* sys_set_mempolicy */
+	PTR	compat_sys_mbind
+	PTR	compat_sys_get_mempolicy
+	PTR	compat_sys_set_mempolicy
 	PTR	compat_sys_mq_open
 	PTR	sys_mq_unlink			/* 6235 */
 	PTR	compat_sys_mq_timedsend
@@ -358,7 +358,7 @@ EXPORT(sysn32_call_table)
 	PTR	sys_inotify_init
 	PTR	sys_inotify_add_watch
 	PTR	sys_inotify_rm_watch
-	PTR	sys_migrate_pages		/* 6250 */
+	PTR	compat_sys_migrate_pages	/* 6250 */
 	PTR	sys_openat
 	PTR	sys_mkdirat
 	PTR	sys_mknodat
@@ -379,7 +379,7 @@ EXPORT(sysn32_call_table)
 	PTR	sys_sync_file_range
 	PTR	sys_tee
 	PTR	compat_sys_vmsplice		/* 6270 */
-	PTR	sys_move_pages
+	PTR	compat_sys_move_pages
 	PTR	compat_sys_set_robust_list
 	PTR	compat_sys_get_robust_list
 	PTR	compat_sys_kexec_load
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 70f6ace..221abd1 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -473,9 +473,9 @@ EXPORT(sys32_call_table)
 	PTR	compat_sys_clock_nanosleep	/* 4265 */
 	PTR	sys_tgkill
 	PTR	compat_sys_utimes
-	PTR	sys_ni_syscall			/* sys_mbind */
-	PTR	sys_ni_syscall			/* sys_get_mempolicy */
-	PTR	sys_ni_syscall			/* 4270 sys_set_mempolicy */
+	PTR	compat_sys_mbind
+	PTR	compat_sys_get_mempolicy
+	PTR	compat_sys_set_mempolicy	/* 4270 */
 	PTR	compat_sys_mq_open
 	PTR	sys_mq_unlink
 	PTR	compat_sys_mq_timedsend
@@ -492,7 +492,7 @@ EXPORT(sys32_call_table)
 	PTR	sys_inotify_init
 	PTR	sys_inotify_add_watch		/* 4285 */
 	PTR	sys_inotify_rm_watch
-	PTR	sys_migrate_pages
+	PTR	compat_sys_migrate_pages
 	PTR	compat_sys_openat
 	PTR	sys_mkdirat
 	PTR	sys_mknodat			/* 4290 */
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 6/8] MIPS: Add Loongson-3B support
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
                   ` (4 preceding siblings ...)
  2014-04-13  0:24 ` [PATCH V2 5/8] MIPS: Add numa api support Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 7/8] MIPS: Loongson-3: Enable the COP2 usage Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 8/8] MIPS: Loongson: Rename CONFIG_LEMOTE_MACH3A to CONFIG_LOONGSON_MACH3X Huacai Chen
  7 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

Loongson-3B is a 8-cores processor. In general it looks like there are
two Loongson-3A integrated in one chip: 8 cores are separated into two
groups (two NUMA node), each node has its own local memory.

Of course there are some differences between one Loongson-3B and two
Loongson-3A. E.g., the base addresses of IPI registers of each node are
not the same; Loongson-3A use ChipConfig register to enable/disable
clock, but Loongson-3B use FreqControl register instead.

There are two revision of Loongson-3B, the first revision is called as
Loongson-3B1000, whose frequency is 1GHz and has a PRid 0x6306, the
second revision is called as Loongson-3B1500, whose frequency is 1.5GHz
and has a PRid 0x6307. Both revisions has a bug that clock cannot be
disabled at runtime, but this will be fixed in future.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cpu.h                      |    2 +
 arch/mips/include/asm/mach-loongson/boot_param.h |    1 +
 arch/mips/include/asm/mach-loongson/loongson.h   |    4 +
 arch/mips/kernel/cpu-probe.c                     |    6 +
 arch/mips/loongson/common/env.c                  |   31 ++-
 arch/mips/loongson/loongson-3/irq.c              |   26 ++-
 arch/mips/loongson/loongson-3/smp.c              |  387 ++++++++++++++++------
 arch/mips/loongson/loongson-3/smp.h              |   37 +-
 8 files changed, 372 insertions(+), 122 deletions(-)

diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 530eb8b..630a4c6 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -232,6 +232,8 @@
 #define PRID_REV_LOONGSON2E	0x0002
 #define PRID_REV_LOONGSON2F	0x0003
 #define PRID_REV_LOONGSON3A	0x0005
+#define PRID_REV_LOONGSON3B_R1	0x0006
+#define PRID_REV_LOONGSON3B_R2	0x0007
 
 /*
  * Older processors used to encode processor version and revision in two
diff --git a/arch/mips/include/asm/mach-loongson/boot_param.h b/arch/mips/include/asm/mach-loongson/boot_param.h
index 8b06c96..3388fc5 100644
--- a/arch/mips/include/asm/mach-loongson/boot_param.h
+++ b/arch/mips/include/asm/mach-loongson/boot_param.h
@@ -163,4 +163,5 @@ struct loongson_system_configuration {
 
 extern struct efi_memory_map_loongson *loongson_memmap;
 extern struct loongson_system_configuration loongson_sysconf;
+extern int cpuhotplug_workaround;
 #endif
diff --git a/arch/mips/include/asm/mach-loongson/loongson.h b/arch/mips/include/asm/mach-loongson/loongson.h
index a1c76ca..92bf76c 100644
--- a/arch/mips/include/asm/mach-loongson/loongson.h
+++ b/arch/mips/include/asm/mach-loongson/loongson.h
@@ -255,6 +255,10 @@ static inline void do_perfcnt_IRQ(void)
 extern u64 loongson_chipcfg[MAX_PACKAGES];
 #define LOONGSON_CHIPCFG(id) (*(volatile u32 *)(loongson_chipcfg[id]))
 
+/* Freq Control register of each physical cpu package, PRid >= Loongson-3B */
+extern u64 loongson_freqctrl[MAX_PACKAGES];
+#define LOONGSON_FREQCTRL(id) (*(volatile u32 *)(loongson_freqctrl[id]))
+
 /* pcimap */
 
 #define LOONGSON_PCIMAP_PCIMAP_LO0	0x0000003f
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 6e8fb85..585f996 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -755,6 +755,12 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
 			__cpu_name[cpu] = "ICT Loongson-3";
 			set_elf_platform(cpu, "loongson3a");
 			break;
+		case PRID_REV_LOONGSON3B_R1:
+		case PRID_REV_LOONGSON3B_R2:
+			c->cputype = CPU_LOONGSON3;
+			__cpu_name[cpu] = "ICT Loongson-3";
+			set_elf_platform(cpu, "loongson3b");
+			break;
 		}
 
 		set_isa(c, MIPS_CPU_ISA_III);
diff --git a/arch/mips/loongson/common/env.c b/arch/mips/loongson/common/env.c
index 33a13b9..f152285 100644
--- a/arch/mips/loongson/common/env.c
+++ b/arch/mips/loongson/common/env.c
@@ -28,6 +28,10 @@ struct efi_memory_map_loongson *loongson_memmap;
 struct loongson_system_configuration loongson_sysconf;
 
 u64 loongson_chipcfg[MAX_PACKAGES] = {0xffffffffbfc00180};
+u64 loongson_freqctrl[MAX_PACKAGES];
+
+unsigned long long smp_group[4];
+int cpuhotplug_workaround = 0;
 
 #define parse_even_earlier(res, option, p)				\
 do {									\
@@ -82,10 +86,32 @@ void __init prom_init_env(void)
 	if (ecpu->cputype == Loongson_3A) {
 		loongson_sysconf.cores_per_node = 4;
 		loongson_sysconf.cores_per_package = 4;
+		smp_group[0] = 0x900000003ff01000;
+		smp_group[1] = 0x900010003ff01000;
+		smp_group[2] = 0x900020003ff01000;
+		smp_group[3] = 0x900030003ff01000;
 		loongson_chipcfg[0] = 0x900000001fe00180;
 		loongson_chipcfg[1] = 0x900010001fe00180;
 		loongson_chipcfg[2] = 0x900020001fe00180;
 		loongson_chipcfg[3] = 0x900030001fe00180;
+		loongson_sysconf.ht_control_base = 0x90000EFDFB000000;
+	} else if (ecpu->cputype == Loongson_3B) {
+		loongson_sysconf.cores_per_node = 4; /* One chip has 2 nodes */
+		loongson_sysconf.cores_per_package = 8;
+		smp_group[0] = 0x900000003ff01000;
+		smp_group[1] = 0x900010003ff05000;
+		smp_group[2] = 0x900020003ff09000;
+		smp_group[3] = 0x900030003ff0d000;
+		loongson_chipcfg[0] = 0x900000001fe00180;
+		loongson_chipcfg[1] = 0x900020001fe00180;
+		loongson_chipcfg[2] = 0x900040001fe00180;
+		loongson_chipcfg[3] = 0x900060001fe00180;
+		loongson_freqctrl[0] = 0x900000001fe001d0;
+		loongson_freqctrl[1] = 0x900020001fe001d0;
+		loongson_freqctrl[2] = 0x900040001fe001d0;
+		loongson_freqctrl[3] = 0x900060001fe001d0;
+		loongson_sysconf.ht_control_base = 0x90001EFDFB000000;
+		cpuhotplug_workaround = 1;
 	} else {
 		loongson_sysconf.cores_per_node = 1;
 		loongson_sysconf.cores_per_package = 1;
@@ -111,7 +137,6 @@ void __init prom_init_env(void)
 	loongson_sysconf.poweroff_addr = boot_p->reset_system.Shutdown;
 	loongson_sysconf.suspend_addr = boot_p->reset_system.DoSuspend;
 
-	loongson_sysconf.ht_control_base = 0x90000EFDFB000000;
 	loongson_sysconf.vgabios_addr = boot_p->efi.smbios.vga_bios;
 	pr_debug("Shutdown Addr: %llx, Restart Addr: %llx, VBIOS Addr: %llx\n",
 		loongson_sysconf.poweroff_addr, loongson_sysconf.restart_addr,
@@ -129,6 +154,10 @@ void __init prom_init_env(void)
 		case PRID_REV_LOONGSON3A:
 			cpu_clock_freq = 900000000;
 			break;
+		case PRID_REV_LOONGSON3B_R1:
+		case PRID_REV_LOONGSON3B_R2:
+			cpu_clock_freq = 1000000000;
+			break;
 		default:
 			cpu_clock_freq = 100000000;
 			break;
diff --git a/arch/mips/loongson/loongson-3/irq.c b/arch/mips/loongson/loongson-3/irq.c
index f240828..ca1c62a 100644
--- a/arch/mips/loongson/loongson-3/irq.c
+++ b/arch/mips/loongson/loongson-3/irq.c
@@ -7,6 +7,8 @@
 #include <asm/i8259.h>
 #include <asm/mipsregs.h>
 
+#include "smp.h"
+
 unsigned int ht_irq[] = {1, 3, 4, 5, 6, 7, 8, 12, 14, 15};
 
 static void ht_irqdispatch(void)
@@ -53,9 +55,15 @@ static inline void mask_loongson_irq(struct irq_data *d)
 	/* Workaround: UART IRQ may deliver to any core */
 	if (d->irq == LOONGSON_UART_IRQ) {
 		int cpu = smp_processor_id();
-
-		LOONGSON_INT_ROUTER_INTENCLR = 1 << 10;
-		LOONGSON_INT_ROUTER_LPC = 0x10 + (1<<cpu);
+		int node_id = cpu / loongson_sysconf.cores_per_node;
+		int core_id = cpu % loongson_sysconf.cores_per_node;
+		u64 intenclr_addr = smp_group[node_id] |
+			(u64)(&LOONGSON_INT_ROUTER_INTENCLR);
+		u64 introuter_lpc_addr = smp_group[node_id] |
+			(u64)(&LOONGSON_INT_ROUTER_LPC);
+
+		*(volatile u32 *)intenclr_addr = 1 << 10;
+		*(volatile u8 *)introuter_lpc_addr = 0x10 + (1<<core_id);
 	}
 }
 
@@ -64,9 +72,15 @@ static inline void unmask_loongson_irq(struct irq_data *d)
 	/* Workaround: UART IRQ may deliver to any core */
 	if (d->irq == LOONGSON_UART_IRQ) {
 		int cpu = smp_processor_id();
-
-		LOONGSON_INT_ROUTER_INTENSET = 1 << 10;
-		LOONGSON_INT_ROUTER_LPC = 0x10 + (1<<cpu);
+		int node_id = cpu / loongson_sysconf.cores_per_node;
+		int core_id = cpu % loongson_sysconf.cores_per_node;
+		u64 intenset_addr = smp_group[node_id] |
+			(u64)(&LOONGSON_INT_ROUTER_INTENSET);
+		u64 introuter_lpc_addr = smp_group[node_id] |
+			(u64)(&LOONGSON_INT_ROUTER_LPC);
+
+		*(volatile u32 *)intenset_addr = 1 << 10;
+		*(volatile u8 *)introuter_lpc_addr = 0x10 + (1<<core_id);
 	}
 
 	set_c0_status(0x100 << (d->irq - MIPS_CPU_IRQ_BASE));
diff --git a/arch/mips/loongson/loongson-3/smp.c b/arch/mips/loongson/loongson-3/smp.c
index f99122c..8089f5f 100644
--- a/arch/mips/loongson/loongson-3/smp.c
+++ b/arch/mips/loongson/loongson-3/smp.c
@@ -31,6 +31,12 @@
 DEFINE_PER_CPU(int, cpu_state);
 DEFINE_PER_CPU(uint32_t, core0_c0count);
 
+static void *ipi_set0_regs[16];
+static void *ipi_clear0_regs[16];
+static void *ipi_status0_regs[16];
+static void *ipi_en0_regs[16];
+static void *ipi_mailbox_buf[16];
+
 /* read a 32bit value from ipi register */
 #define loongson3_ipi_read32(addr) readl(addr)
 /* read a 64bit value from ipi register */
@@ -48,100 +54,185 @@ DEFINE_PER_CPU(uint32_t, core0_c0count);
 		__wbflush();			\
 	} while (0)
 
-static void *ipi_set0_regs[] = {
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + SET0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + SET0),
-};
+static void ipi_set0_regs_init(void)
+{
+	ipi_set0_regs[0] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + SET0);
+	ipi_set0_regs[1] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + SET0);
+	ipi_set0_regs[2] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + SET0);
+	ipi_set0_regs[3] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + SET0);
+	ipi_set0_regs[4] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + SET0);
+	ipi_set0_regs[5] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + SET0);
+	ipi_set0_regs[6] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + SET0);
+	ipi_set0_regs[7] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + SET0);
+	ipi_set0_regs[8] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + SET0);
+	ipi_set0_regs[9] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + SET0);
+	ipi_set0_regs[10] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + SET0);
+	ipi_set0_regs[11] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + SET0);
+	ipi_set0_regs[12] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + SET0);
+	ipi_set0_regs[13] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + SET0);
+	ipi_set0_regs[14] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + SET0);
+	ipi_set0_regs[15] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + SET0);
+}
 
-static void *ipi_clear0_regs[] = {
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + CLEAR0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + CLEAR0),
-};
+static void ipi_clear0_regs_init(void)
+{
+	ipi_clear0_regs[0] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + CLEAR0);
+	ipi_clear0_regs[1] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + CLEAR0);
+	ipi_clear0_regs[2] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + CLEAR0);
+	ipi_clear0_regs[3] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + CLEAR0);
+	ipi_clear0_regs[4] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + CLEAR0);
+	ipi_clear0_regs[5] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + CLEAR0);
+	ipi_clear0_regs[6] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + CLEAR0);
+	ipi_clear0_regs[7] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + CLEAR0);
+	ipi_clear0_regs[8] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + CLEAR0);
+	ipi_clear0_regs[9] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + CLEAR0);
+	ipi_clear0_regs[10] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + CLEAR0);
+	ipi_clear0_regs[11] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + CLEAR0);
+	ipi_clear0_regs[12] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + CLEAR0);
+	ipi_clear0_regs[13] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + CLEAR0);
+	ipi_clear0_regs[14] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + CLEAR0);
+	ipi_clear0_regs[15] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + CLEAR0);
+}
 
-static void *ipi_status0_regs[] = {
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + STATUS0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + STATUS0),
-};
+static void ipi_status0_regs_init(void)
+{
+	ipi_status0_regs[0] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + STATUS0);
+	ipi_status0_regs[1] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + STATUS0);
+	ipi_status0_regs[2] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + STATUS0);
+	ipi_status0_regs[3] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + STATUS0);
+	ipi_status0_regs[4] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + STATUS0);
+	ipi_status0_regs[5] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + STATUS0);
+	ipi_status0_regs[6] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + STATUS0);
+	ipi_status0_regs[7] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + STATUS0);
+	ipi_status0_regs[8] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + STATUS0);
+	ipi_status0_regs[9] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + STATUS0);
+	ipi_status0_regs[10] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + STATUS0);
+	ipi_status0_regs[11] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + STATUS0);
+	ipi_status0_regs[12] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + STATUS0);
+	ipi_status0_regs[13] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + STATUS0);
+	ipi_status0_regs[14] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + STATUS0);
+	ipi_status0_regs[15] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + STATUS0);
+}
 
-static void *ipi_en0_regs[] = {
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + EN0),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + EN0),
-};
+static void ipi_en0_regs_init(void)
+{
+	ipi_en0_regs[0] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + EN0);
+	ipi_en0_regs[1] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + EN0);
+	ipi_en0_regs[2] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + EN0);
+	ipi_en0_regs[3] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + EN0);
+	ipi_en0_regs[4] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + EN0);
+	ipi_en0_regs[5] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + EN0);
+	ipi_en0_regs[6] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + EN0);
+	ipi_en0_regs[7] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + EN0);
+	ipi_en0_regs[8] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + EN0);
+	ipi_en0_regs[9] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + EN0);
+	ipi_en0_regs[10] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + EN0);
+	ipi_en0_regs[11] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + EN0);
+	ipi_en0_regs[12] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + EN0);
+	ipi_en0_regs[13] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + EN0);
+	ipi_en0_regs[14] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + EN0);
+	ipi_en0_regs[15] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + EN0);
+}
 
-static void *ipi_mailbox_buf[] = {
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + BUF),
-	(void *)(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + BUF),
-};
+static void ipi_mailbox_buf_init(void)
+{
+	ipi_mailbox_buf[0] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE0_OFFSET + BUF);
+	ipi_mailbox_buf[1] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE1_OFFSET + BUF);
+	ipi_mailbox_buf[2] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE2_OFFSET + BUF);
+	ipi_mailbox_buf[3] = (void *)
+		(SMP_CORE_GROUP0_BASE + SMP_CORE3_OFFSET + BUF);
+	ipi_mailbox_buf[4] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE0_OFFSET + BUF);
+	ipi_mailbox_buf[5] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE1_OFFSET + BUF);
+	ipi_mailbox_buf[6] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE2_OFFSET + BUF);
+	ipi_mailbox_buf[7] = (void *)
+		(SMP_CORE_GROUP1_BASE + SMP_CORE3_OFFSET + BUF);
+	ipi_mailbox_buf[8] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE0_OFFSET + BUF);
+	ipi_mailbox_buf[9] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE1_OFFSET + BUF);
+	ipi_mailbox_buf[10] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE2_OFFSET + BUF);
+	ipi_mailbox_buf[11] = (void *)
+		(SMP_CORE_GROUP2_BASE + SMP_CORE3_OFFSET + BUF);
+	ipi_mailbox_buf[12] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE0_OFFSET + BUF);
+	ipi_mailbox_buf[13] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE1_OFFSET + BUF);
+	ipi_mailbox_buf[14] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE2_OFFSET + BUF);
+	ipi_mailbox_buf[15] = (void *)
+		(SMP_CORE_GROUP3_BASE + SMP_CORE3_OFFSET + BUF);
+}
 
 /*
  * Simple enough, just poke the appropriate ipi register
@@ -248,6 +339,11 @@ static void __init loongson3_smp_setup(void)
 		__cpu_number_map[i] = ++num;
 		__cpu_logical_map[num] = i;
 	}
+	ipi_set0_regs_init();
+	ipi_clear0_regs_init();
+	ipi_status0_regs_init();
+	ipi_en0_regs_init();
+	ipi_mailbox_buf_init();
 	pr_info("Detected %i available secondary CPU(s)\n", num);
 }
 
@@ -322,7 +418,7 @@ static void loongson3_cpu_die(unsigned int cpu)
  * flush all L1 entries at first. Then, another core (usually Core 0) can
  * safely disable the clock of the target core. loongson3_play_dead() is
  * called via CKSEG1 (uncached and unmmaped) */
-static void loongson3_play_dead(int *state_addr)
+static void loongson3a_play_dead(int *state_addr)
 {
 	register int val;
 	register long cpuid, core, node, count;
@@ -384,6 +480,70 @@ static void loongson3_play_dead(int *state_addr)
 		: "a1");
 }
 
+static void loongson3b_play_dead(int *state_addr)
+{
+	register int val;
+	register long cpuid, core, node, count;
+	register void *addr, *base, *initfunc;
+
+	__asm__ __volatile__(
+		"   .set push                     \n"
+		"   .set noreorder                \n"
+		"   li %[addr], 0x80000000        \n" /* KSEG0 */
+		"1: cache 0, 0(%[addr])           \n" /* flush L1 ICache */
+		"   cache 0, 1(%[addr])           \n"
+		"   cache 0, 2(%[addr])           \n"
+		"   cache 0, 3(%[addr])           \n"
+		"   cache 1, 0(%[addr])           \n" /* flush L1 DCache */
+		"   cache 1, 1(%[addr])           \n"
+		"   cache 1, 2(%[addr])           \n"
+		"   cache 1, 3(%[addr])           \n"
+		"   addiu %[sets], %[sets], -1    \n"
+		"   bnez  %[sets], 1b             \n"
+		"   addiu %[addr], %[addr], 0x20  \n"
+		"   li    %[val], 0x7             \n" /* *state_addr = CPU_DEAD; */
+		"   sw    %[val], (%[state_addr]) \n"
+		"   sync                          \n"
+		"   cache 21, (%[state_addr])     \n" /* flush entry of *state_addr */
+		"   .set pop                      \n"
+		: [addr] "=&r" (addr), [val] "=&r" (val)
+		: [state_addr] "r" (state_addr),
+		  [sets] "r" (cpu_data[smp_processor_id()].dcache.sets));
+
+	__asm__ __volatile__(
+		"   .set push                         \n"
+		"   .set noreorder                    \n"
+		"   .set mips64                       \n"
+		"   mfc0  %[cpuid], $15, 1            \n"
+		"   andi  %[cpuid], 0x3ff             \n"
+		"   dli   %[base], 0x900000003ff01000 \n"
+		"   andi  %[core], %[cpuid], 0x3      \n"
+		"   sll   %[core], 8                  \n" /* get core id */
+		"   or    %[base], %[base], %[core]   \n"
+		"   andi  %[node], %[cpuid], 0xc      \n"
+		"   dsll  %[node], 42                 \n" /* get node id */
+		"   or    %[base], %[base], %[node]   \n"
+		"   dsrl  %[node], 30                 \n" /* 15:14 */
+		"   or    %[base], %[base], %[node]   \n"
+		"1: li    %[count], 0x100             \n" /* wait for init loop */
+		"2: bnez  %[count], 2b                \n" /* limit mailbox access */
+		"   addiu %[count], -1                \n"
+		"   ld    %[initfunc], 0x20(%[base])  \n" /* get PC via mailbox */
+		"   beqz  %[initfunc], 1b             \n"
+		"   nop                               \n"
+		"   ld    $sp, 0x28(%[base])          \n" /* get SP via mailbox */
+		"   ld    $gp, 0x30(%[base])          \n" /* get GP via mailbox */
+		"   ld    $a1, 0x38(%[base])          \n"
+		"   jr    %[initfunc]                 \n" /* jump to initial PC */
+		"   nop                               \n"
+		"   .set pop                          \n"
+		: [core] "=&r" (core), [node] "=&r" (node),
+		  [base] "=&r" (base), [cpuid] "=&r" (cpuid),
+		  [count] "=&r" (count), [initfunc] "=&r" (initfunc)
+		: /* No Input */
+		: "a1");
+}
+
 void play_dead(void)
 {
 	int *state_addr;
@@ -391,31 +551,64 @@ void play_dead(void)
 	void (*play_dead_at_ckseg1)(int *);
 
 	idle_task_exit();
-	play_dead_at_ckseg1 =
-		(void *)CKSEG1ADDR((unsigned long)loongson3_play_dead);
+	switch (loongson_sysconf.cputype) {
+	case Loongson_3A:
+	default:
+		play_dead_at_ckseg1 =
+			(void *)CKSEG1ADDR((unsigned long)loongson3a_play_dead);
+		break;
+	case Loongson_3B:
+		play_dead_at_ckseg1 =
+			(void *)CKSEG1ADDR((unsigned long)loongson3b_play_dead);
+		break;
+	}
 	state_addr = &per_cpu(cpu_state, cpu);
 	mb();
 	play_dead_at_ckseg1(state_addr);
 }
 
+void loongson3_disable_clock(int cpu)
+{
+	uint64_t core_id = cpu_data[cpu].core;
+	uint64_t package_id = cpu_data[cpu].package;
+
+	if (loongson_sysconf.cputype == Loongson_3A) {
+		LOONGSON_CHIPCFG(package_id) &= ~(1 << (12 + core_id));
+	} else if (loongson_sysconf.cputype == Loongson_3B) {
+		if (!cpuhotplug_workaround)
+			LOONGSON_FREQCTRL(package_id) &= ~(1 << (core_id * 4 + 3));
+	}
+}
+
+void loongson3_enable_clock(int cpu)
+{
+	uint64_t core_id = cpu_data[cpu].core;
+	uint64_t package_id = cpu_data[cpu].package;
+
+	if (loongson_sysconf.cputype == Loongson_3A) {
+		LOONGSON_CHIPCFG(package_id) |= 1 << (12 + core_id);
+	} else if (loongson_sysconf.cputype == Loongson_3B) {
+		if (!cpuhotplug_workaround)
+			LOONGSON_FREQCTRL(package_id) |= 1 << (core_id * 4 + 3);
+	}
+}
+
 #define CPU_POST_DEAD_FROZEN	(CPU_POST_DEAD | CPU_TASKS_FROZEN)
 static int loongson3_cpu_callback(struct notifier_block *nfb,
 	unsigned long action, void *hcpu)
 {
 	unsigned int cpu = (unsigned long)hcpu;
-	uint64_t core_id = cpu_data[cpu].core;
-	uint64_t package_id = cpu_data[cpu].package;
 
 	switch (action) {
 	case CPU_POST_DEAD:
 	case CPU_POST_DEAD_FROZEN:
 		pr_info("Disable clock for CPU#%d\n", cpu);
-		LOONGSON_CHIPCFG(package_id) &= ~(1 << (12 + core_id));
+		loongson3_disable_clock(cpu);
 		break;
 	case CPU_UP_PREPARE:
 	case CPU_UP_PREPARE_FROZEN:
 		pr_info("Enable clock for CPU#%d\n", cpu);
-		LOONGSON_CHIPCFG(package_id) |= 1 << (12 + core_id);
+		loongson3_enable_clock(cpu);
 		break;
 	}
 
diff --git a/arch/mips/loongson/loongson-3/smp.h b/arch/mips/loongson/loongson-3/smp.h
index 3453e8c..d98ff65 100644
--- a/arch/mips/loongson/loongson-3/smp.h
+++ b/arch/mips/loongson/loongson-3/smp.h
@@ -1,29 +1,30 @@
 #ifndef __LOONGSON_SMP_H_
 #define __LOONGSON_SMP_H_
 
-/* for Loongson-3A smp support */
+/* for Loongson-3 smp support */
+extern unsigned long long smp_group[4];
 
 /* 4 groups(nodes) in maximum in numa case */
-#define  SMP_CORE_GROUP0_BASE    0x900000003ff01000
-#define  SMP_CORE_GROUP1_BASE    0x900010003ff01000
-#define  SMP_CORE_GROUP2_BASE    0x900020003ff01000
-#define  SMP_CORE_GROUP3_BASE    0x900030003ff01000
+#define SMP_CORE_GROUP0_BASE	(smp_group[0])
+#define SMP_CORE_GROUP1_BASE	(smp_group[1])
+#define SMP_CORE_GROUP2_BASE	(smp_group[2])
+#define SMP_CORE_GROUP3_BASE	(smp_group[3])
 
 /* 4 cores in each group(node) */
-#define  SMP_CORE0_OFFSET  0x000
-#define  SMP_CORE1_OFFSET  0x100
-#define  SMP_CORE2_OFFSET  0x200
-#define  SMP_CORE3_OFFSET  0x300
+#define SMP_CORE0_OFFSET  0x000
+#define SMP_CORE1_OFFSET  0x100
+#define SMP_CORE2_OFFSET  0x200
+#define SMP_CORE3_OFFSET  0x300
 
 /* ipi registers offsets */
-#define  STATUS0  0x00
-#define  EN0      0x04
-#define  SET0     0x08
-#define  CLEAR0   0x0c
-#define  STATUS1  0x10
-#define  MASK1    0x14
-#define  SET1     0x18
-#define  CLEAR1   0x1c
-#define  BUF      0x20
+#define STATUS0  0x00
+#define EN0      0x04
+#define SET0     0x08
+#define CLEAR0   0x0c
+#define STATUS1  0x10
+#define MASK1    0x14
+#define SET1     0x18
+#define CLEAR1   0x1c
+#define BUF      0x20
 
 #endif
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 7/8] MIPS: Loongson-3: Enable the COP2 usage
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
                   ` (5 preceding siblings ...)
  2014-04-13  0:24 ` [PATCH V2 6/8] MIPS: Add Loongson-3B support Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  2014-04-13  0:24 ` [PATCH V2 8/8] MIPS: Loongson: Rename CONFIG_LEMOTE_MACH3A to CONFIG_LOONGSON_MACH3X Huacai Chen
  7 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

Loongson-3 has some specific instructions (MMI/SIMD) in coprocessor 2.
COP2 isn't independent because it share COP1 (FPU)'s registers. This
patch enable the COP2 usage so user-space programs can use the MMI/SIMD
instructions. When COP2 exception happens, we enable both COP1 (FPU)
and COP2, only in this way the fp context can be saved and restored
correctly.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cop2.h            |    8 ++++
 arch/mips/loongson/loongson-3/Makefile  |    2 +-
 arch/mips/loongson/loongson-3/cop2-ex.c |   63 +++++++++++++++++++++++++++++++
 3 files changed, 72 insertions(+), 1 deletions(-)
 create mode 100644 arch/mips/loongson/loongson-3/cop2-ex.c

diff --git a/arch/mips/include/asm/cop2.h b/arch/mips/include/asm/cop2.h
index c1516cc..d035298 100644
--- a/arch/mips/include/asm/cop2.h
+++ b/arch/mips/include/asm/cop2.h
@@ -32,6 +32,14 @@ extern void nlm_cop2_restore(struct nlm_cop2_state *);
 #define cop2_present		1
 #define cop2_lazy_restore	0
 
+#elif defined(CONFIG_CPU_LOONGSON3)
+
+#define cop2_save(r)
+#define cop2_restore(r)
+
+#define cop2_present		1
+#define cop2_lazy_restore	1
+
 #else
 
 #define cop2_present		0
diff --git a/arch/mips/loongson/loongson-3/Makefile b/arch/mips/loongson/loongson-3/Makefile
index 471b0f2a..b4df775 100644
--- a/arch/mips/loongson/loongson-3/Makefile
+++ b/arch/mips/loongson/loongson-3/Makefile
@@ -1,7 +1,7 @@
 #
 # Makefile for Loongson-3 family machines
 #
-obj-y			+= irq.o
+obj-y			+= irq.o cop2-ex.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 
diff --git a/arch/mips/loongson/loongson-3/cop2-ex.c b/arch/mips/loongson/loongson-3/cop2-ex.c
new file mode 100644
index 0000000..9182e8d
--- /dev/null
+++ b/arch/mips/loongson/loongson-3/cop2-ex.c
@@ -0,0 +1,63 @@
+/*
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2014 Lemote Corporation.
+ *   written by Huacai Chen <chenhc@lemote.com>
+ *
+ * based on arch/mips/cavium-octeon/cpu.c
+ * Copyright (C) 2009 Wind River Systems,
+ *   written by Ralf Baechle <ralf@linux-mips.org>
+ */
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/notifier.h>
+
+#include <asm/fpu.h>
+#include <asm/cop2.h>
+#include <asm/current.h>
+#include <asm/mipsregs.h>
+
+static int loongson_cu2_call(struct notifier_block *nfb, unsigned long action,
+	void *data)
+{
+	int fpu_enabled;
+	int fr = !test_thread_flag(TIF_32BIT_FPREGS);
+
+	switch (action) {
+	case CU2_EXCEPTION:
+		preempt_disable();
+		fpu_enabled = read_c0_status() & ST0_CU1;
+		if (!fr)
+			set_c0_status(ST0_CU1 | ST0_CU2);
+		else
+			set_c0_status(ST0_CU1 | ST0_CU2 | ST0_FR);
+		enable_fpu_hazard();
+		KSTK_STATUS(current) |= (ST0_CU1 | ST0_CU2);
+		if (fr)
+			KSTK_STATUS(current) |= ST0_FR;
+		else
+			KSTK_STATUS(current) &= ~ST0_FR;
+		/* If FPU is enabled, we needn't init or restore fp */
+		if(!fpu_enabled) {
+			set_thread_flag(TIF_USEDFPU);
+			if (!used_math()) {
+				_init_fpu();
+				set_used_math();
+			} else
+				_restore_fp(current);
+		}
+		preempt_enable();
+
+		return NOTIFY_STOP;	/* Don't call default notifier */
+	}
+
+	return NOTIFY_OK;		/* Let default notifier send signals */
+}
+
+static int __init loongson_cu2_setup(void)
+{
+	return cu2_notifier(loongson_cu2_call, 0);
+}
+early_initcall(loongson_cu2_setup);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 8/8] MIPS: Loongson: Rename CONFIG_LEMOTE_MACH3A to CONFIG_LOONGSON_MACH3X
  2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
                   ` (6 preceding siblings ...)
  2014-04-13  0:24 ` [PATCH V2 7/8] MIPS: Loongson-3: Enable the COP2 usage Huacai Chen
@ 2014-04-13  0:24 ` Huacai Chen
  7 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2014-04-13  0:24 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu, Huacai Chen

Since this CONFIG option will be used for both Loongson-3A/3B machines,
and not all Loongson-3 machines are produced by Lemote, we rename
CONFIG_LEMOTE_MACH3A to CONFIG_LOONGSON_MACH3X.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/configs/loongson3_defconfig         |    2 +-
 arch/mips/include/asm/mach-loongson/machine.h |    4 ++--
 arch/mips/loongson/Kconfig                    |    8 ++++----
 arch/mips/loongson/Platform                   |    2 +-
 arch/mips/pci/Makefile                        |    2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/mips/configs/loongson3_defconfig b/arch/mips/configs/loongson3_defconfig
index ea1761f..130e31b 100644
--- a/arch/mips/configs/loongson3_defconfig
+++ b/arch/mips/configs/loongson3_defconfig
@@ -1,6 +1,6 @@
 CONFIG_MACH_LOONGSON=y
 CONFIG_SWIOTLB=y
-CONFIG_LEMOTE_MACH3A=y
+CONFIG_LOONGSON_MACH3X=y
 CONFIG_CPU_LOONGSON3=y
 CONFIG_64BIT=y
 CONFIG_PAGE_SIZE_16KB=y
diff --git a/arch/mips/include/asm/mach-loongson/machine.h b/arch/mips/include/asm/mach-loongson/machine.h
index 1b1f592..228e3784 100644
--- a/arch/mips/include/asm/mach-loongson/machine.h
+++ b/arch/mips/include/asm/mach-loongson/machine.h
@@ -24,10 +24,10 @@
 
 #endif
 
-#ifdef CONFIG_LEMOTE_MACH3A
+#ifdef CONFIG_LOONGSON_MACH3X
 
 #define LOONGSON_MACHTYPE MACH_LEMOTE_A1101
 
-#endif /* CONFIG_LEMOTE_MACH3A */
+#endif /* CONFIG_LOONGSON_MACH3X */
 
 #endif /* __ASM_MACH_LOONGSON_MACHINE_H */
diff --git a/arch/mips/loongson/Kconfig b/arch/mips/loongson/Kconfig
index 976c858..1c1595b 100644
--- a/arch/mips/loongson/Kconfig
+++ b/arch/mips/loongson/Kconfig
@@ -60,8 +60,8 @@ config LEMOTE_MACH2F
 	  These family machines include fuloong2f mini PC, yeeloong2f notebook,
 	  LingLoong allinone PC and so forth.
 
-config LEMOTE_MACH3A
-	bool "Lemote Loongson 3A family machines"
+config LOONGSON_MACH3X
+	bool "Generic Loongson 3 family machines"
 	select ARCH_SPARSEMEM_ENABLE
 	select GENERIC_ISA_DMA_SUPPORT_BROKEN
 	select BOOT_ELF32
@@ -87,8 +87,8 @@ config LEMOTE_MACH3A
 	select ZONE_DMA32
 	select LEFI_FIRMWARE_INTERFACE
 	help
-		Lemote Loongson 3A family machines utilize the 3A revision of
-		Loongson processor and RS780/SBX00 chipset.
+		Generic Loongson 3 family machines utilize the 3A/3B revision
+		of Loongson processor and RS780/SBX00 chipset.
 endchoice
 
 config CS5536
diff --git a/arch/mips/loongson/Platform b/arch/mips/loongson/Platform
index 6205372..0ac20eb 100644
--- a/arch/mips/loongson/Platform
+++ b/arch/mips/loongson/Platform
@@ -30,4 +30,4 @@ platform-$(CONFIG_MACH_LOONGSON) += loongson/
 cflags-$(CONFIG_MACH_LOONGSON) += -I$(srctree)/arch/mips/include/asm/mach-loongson -mno-branch-likely
 load-$(CONFIG_LEMOTE_FULOONG2E) += 0xffffffff80100000
 load-$(CONFIG_LEMOTE_MACH2F) += 0xffffffff80200000
-load-$(CONFIG_CPU_LOONGSON3) += 0xffffffff80200000
+load-$(CONFIG_LOONGSON_MACH3X) += 0xffffffff80200000
diff --git a/arch/mips/pci/Makefile b/arch/mips/pci/Makefile
index d61138a..afb5324 100644
--- a/arch/mips/pci/Makefile
+++ b/arch/mips/pci/Makefile
@@ -29,7 +29,7 @@ obj-$(CONFIG_LASAT)		+= pci-lasat.o
 obj-$(CONFIG_MIPS_COBALT)	+= fixup-cobalt.o
 obj-$(CONFIG_LEMOTE_FULOONG2E)	+= fixup-fuloong2e.o ops-loongson2.o
 obj-$(CONFIG_LEMOTE_MACH2F)	+= fixup-lemote2f.o ops-loongson2.o
-obj-$(CONFIG_LEMOTE_MACH3A)	+= fixup-loongson3.o ops-loongson3.o
+obj-$(CONFIG_LOONGSON_MACH3X)	+= fixup-loongson3.o ops-loongson3.o
 obj-$(CONFIG_MIPS_MALTA)	+= fixup-malta.o pci-malta.o
 obj-$(CONFIG_PMC_MSP7120_GW)	+= fixup-pmcmsp.o ops-pmcmsp.o
 obj-$(CONFIG_PMC_MSP7120_EVAL)	+= fixup-pmcmsp.o ops-pmcmsp.o
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids)
  2014-04-13  0:24 ` [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids) Huacai Chen
@ 2014-04-14 14:48   ` Andreas Herrmann
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Herrmann @ 2014-04-14 14:48 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, John Crispin, Steven J. Hill, Aurelien Jarno,
	linux-mips, Fuxin Zhang, Zhangjin Wu, Andreas Herrmann

On Sun, Apr 13, 2014 at 08:24:15AM +0800, Huacai Chen wrote:
> On MIPS currently, only the soft limit of cpu count (maxcpus) has its
> effect, this patch enable the hard limit (nr_cpus) as well. Processor
> cores which greater than maxcpus and less than nr_cpus can be taken up
> via cpu hotplug. The code is borrowed from X86.
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>

Reviewed-by: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>

W/o this patch nr_cpus had no effect and all CPUs present on a chip
(even if greater than or equal to nr_cpus) could be taken online with
CPU hotplug.  W/o the patch nr_cpus took effect.

Only nitpick: I find the name of the function somehow misleading.
I think it's rather a kind of fixup (to factor in nr_cpus) after
platform smp_setup might have already "prefilled"
cpu_possible_mask. At least in case of cavium-octeon this is the case.


Thanks,
Andreas

> ---
>  arch/mips/kernel/setup.c |   20 ++++++++++++++++++++
>  1 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index a842154..2f01201 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -729,6 +729,25 @@ static void __init resource_init(void)
>  	}
>  }
>  
> +#ifdef CONFIG_SMP
> +static void __init prefill_possible_map(void)
> +{
> +	int i, possible = num_possible_cpus();
> +
> +	if (possible > nr_cpu_ids)
> +		possible = nr_cpu_ids;
> +
> +	for (i = 0; i < possible; i++)
> +		set_cpu_possible(i, true);
> +	for (; i < NR_CPUS; i++)
> +		set_cpu_possible(i, false);
> +
> +	nr_cpu_ids = possible;
> +}
> +#else
> +static inline void prefill_possible_map(void) {}
> +#endif
> +
>  void __init setup_arch(char **cmdline_p)
>  {
>  	cpu_probe();
> @@ -752,6 +771,7 @@ void __init setup_arch(char **cmdline_p)
>  
>  	resource_init();
>  	plat_smp_setup();
> +	prefill_possible_map();
>  
>  	cpu_cache_init();
>  }
> -- 
> 1.7.7.3
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 2/8] MIPS: Support CPU topology files in sysfs
  2014-04-13  0:24 ` [PATCH V2 2/8] MIPS: Support CPU topology files in sysfs Huacai Chen
@ 2014-04-14 15:04   ` Andreas Herrmann
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Herrmann @ 2014-04-14 15:04 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, John Crispin, Steven J. Hill, Aurelien Jarno,
	linux-mips, Fuxin Zhang, Zhangjin Wu, Andreas Herrmann

On Sun, Apr 13, 2014 at 08:24:16AM +0800, Huacai Chen wrote:
> This patch is prepared for Loongson's NUMA support, it offer meaningful
> sysfs files such as physical_package_id, core_id, core_siblings and
> thread_siblings in /sys/devices/system/cpu/cpu?/topology.
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>

Reviewed-by: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>


Thanks,
Andreas

> ---
>  arch/mips/include/asm/cpu-info.h |    1 +
>  arch/mips/include/asm/smp.h      |    6 ++++++
>  arch/mips/kernel/proc.c          |    1 +
>  arch/mips/kernel/smp.c           |   26 +++++++++++++++++++++++++-
>  4 files changed, 33 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
> index dc2135b..2dfa00b 100644
> --- a/arch/mips/include/asm/cpu-info.h
> +++ b/arch/mips/include/asm/cpu-info.h
> @@ -61,6 +61,7 @@ struct cpuinfo_mips {
>  	struct cache_desc	scache; /* Secondary cache */
>  	struct cache_desc	tcache; /* Tertiary/split secondary cache */
>  	int			srsets; /* Shadow register sets */
> +	int			package;/* physical package number */
>  	int			core;	/* physical core number */
>  #ifdef CONFIG_64BIT
>  	int			vmbits; /* Virtual memory size in bits */
> diff --git a/arch/mips/include/asm/smp.h b/arch/mips/include/asm/smp.h
> index efa02ac..fea4051 100644
> --- a/arch/mips/include/asm/smp.h
> +++ b/arch/mips/include/asm/smp.h
> @@ -22,6 +22,7 @@
>  
>  extern int smp_num_siblings;
>  extern cpumask_t cpu_sibling_map[];
> +extern cpumask_t cpu_core_map[];
>  
>  #define raw_smp_processor_id() (current_thread_info()->cpu)
>  
> @@ -36,6 +37,11 @@ extern int __cpu_logical_map[NR_CPUS];
>  
>  #define NO_PROC_ID	(-1)
>  
> +#define topology_physical_package_id(cpu)	(cpu_data[cpu].package)
> +#define topology_core_id(cpu)			(cpu_data[cpu].core)
> +#define topology_core_cpumask(cpu)		(&cpu_core_map[cpu])
> +#define topology_thread_cpumask(cpu)		(&cpu_sibling_map[cpu])
> +
>  #define SMP_RESCHEDULE_YOURSELF 0x1	/* XXX braindead */
>  #define SMP_CALL_FUNCTION	0x2
>  /* Octeon - Tell another core to flush its icache */
> diff --git a/arch/mips/kernel/proc.c b/arch/mips/kernel/proc.c
> index 037a44d..62c4439 100644
> --- a/arch/mips/kernel/proc.c
> +++ b/arch/mips/kernel/proc.c
> @@ -123,6 +123,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
>  		      cpu_data[n].srsets);
>  	seq_printf(m, "kscratch registers\t: %d\n",
>  		      hweight8(cpu_data[n].kscratch_mask));
> +	seq_printf(m, "package\t\t\t: %d\n", cpu_data[n].package);
>  	seq_printf(m, "core\t\t\t: %d\n", cpu_data[n].core);
>  
>  	sprintf(fmt, "VCE%%c exceptions\t\t: %s\n",
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 0a022ee..0fa5429 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -63,9 +63,16 @@ EXPORT_SYMBOL(smp_num_siblings);
>  cpumask_t cpu_sibling_map[NR_CPUS] __read_mostly;
>  EXPORT_SYMBOL(cpu_sibling_map);
>  
> +/* representing the core map of multi-core chips of each logical CPU */
> +cpumask_t cpu_core_map[NR_CPUS] __read_mostly;
> +EXPORT_SYMBOL(cpu_core_map);
> +
>  /* representing cpus for which sibling maps can be computed */
>  static cpumask_t cpu_sibling_setup_map;
>  
> +/* representing cpus for which core maps can be computed */
> +static cpumask_t cpu_core_setup_map;
> +
>  static inline void set_cpu_sibling_map(int cpu)
>  {
>  	int i;
> @@ -74,7 +81,8 @@ static inline void set_cpu_sibling_map(int cpu)
>  
>  	if (smp_num_siblings > 1) {
>  		for_each_cpu_mask(i, cpu_sibling_setup_map) {
> -			if (cpu_data[cpu].core == cpu_data[i].core) {
> +			if (cpu_data[cpu].package == cpu_data[i].package &&
> +				    cpu_data[cpu].core == cpu_data[i].core) {
>  				cpu_set(i, cpu_sibling_map[cpu]);
>  				cpu_set(cpu, cpu_sibling_map[i]);
>  			}
> @@ -83,6 +91,20 @@ static inline void set_cpu_sibling_map(int cpu)
>  		cpu_set(cpu, cpu_sibling_map[cpu]);
>  }
>  
> +static inline void set_cpu_core_map(int cpu)
> +{
> +	int i;
> +
> +	cpu_set(cpu, cpu_core_setup_map);
> +
> +	for_each_cpu_mask(i, cpu_core_setup_map) {
> +		if (cpu_data[cpu].package == cpu_data[i].package) {
> +			cpu_set(i, cpu_core_map[cpu]);
> +			cpu_set(cpu, cpu_core_map[i]);
> +		}
> +	}
> +}
> +
>  struct plat_smp_ops *mp_ops;
>  EXPORT_SYMBOL(mp_ops);
>  
> @@ -129,6 +151,7 @@ asmlinkage void start_secondary(void)
>  	set_cpu_online(cpu, true);
>  
>  	set_cpu_sibling_map(cpu);
> +	set_cpu_core_map(cpu);
>  
>  	cpu_set(cpu, cpu_callin_map);
>  
> @@ -183,6 +206,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	current_thread_info()->cpu = 0;
>  	mp_ops->prepare_cpus(max_cpus);
>  	set_cpu_sibling_map(0);
> +	set_cpu_core_map(0);
>  #ifndef CONFIG_HOTPLUG_CPU
>  	init_cpu_present(cpu_possible_mask);
>  #endif
> -- 
> 1.7.7.3
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3
  2014-04-13  0:24 ` [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3 Huacai Chen
@ 2014-06-03 22:47   ` Ralf Baechle
  2014-06-03 23:47     ` David Daney
  0 siblings, 1 reply; 16+ messages in thread
From: Ralf Baechle @ 2014-06-03 22:47 UTC (permalink / raw)
  To: Huacai Chen
  Cc: John Crispin, Steven J. Hill, Aurelien Jarno, linux-mips,
	Fuxin Zhang, Zhangjin Wu

On Sun, Apr 13, 2014 at 08:24:18AM +0800, Huacai Chen wrote:

> Multiple Loongson-3A chips can be interconnected with HT0-bus. This is
> a CC-NUMA system that every chip (node) has its own local memory and
> cache coherency is maintained by hardware. The 64-bit physical memory
> address format is as follows:
> 
> 0x-0000-YZZZ-ZZZZ-ZZZZ
> 
> The high 16 bits should be 0, which means the real physical address
> supported by Loongson-3 is 48-bit. The "Y" bits is the base address of
> each node, which can be also considered as the node-id. The "Z" bits is
> the address offset within a node, which means every node has a 44 bits
> address space.
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>

> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -2233,9 +2233,14 @@ config SYS_SUPPORTS_NUMA
>  	bool
>  
>  config NODES_SHIFT
> -	int
> +	int "Maximum Number of NUMA Nodes Shift"
> +	range 1 10
>  	default "6"
>  	depends on NEED_MULTIPLE_NODES
> +	help
> +	  This option specifies the maximum number of available NUMA nodes
> +	  on the target system. MAX_NUMNODES will be 2^(This value).
> +	  If in doubt, use the default.

I always feel a bit uneasy to present options such as NODES_SHIFT to the
user.

> --- a/arch/mips/include/asm/addrspace.h
> +++ b/arch/mips/include/asm/addrspace.h
> @@ -51,8 +51,14 @@
>   * Returns the physical address of a CKSEGx / XKPHYS address
>   */
>  #define CPHYSADDR(a)		((_ACAST32_(a)) & 0x1fffffff)
> +
> +#ifndef CONFIG_NUMA
>  #define XPHYSADDR(a)		((_ACAST64_(a)) &			\
>  				 _CONST64_(0x000000ffffffffff))
> +#else
> +#define XPHYSADDR(a)		((_ACAST64_(a)) &			\
> +				 _CONST64_(0x0000ffffffffffff))
> +#endif

The mask in XPHYSADDR is a function of the processor architecture, not
imlementation, not NUMA.  The latest version of the MIPS architecture
permits PABITS to be as large as 49 bits, so the mask should be
0x0001ffffffffffff.  Always.

> diff --git a/arch/mips/include/asm/sparsemem.h b/arch/mips/include/asm/sparsemem.h
> index d2da53c..c001a90 100644
> --- a/arch/mips/include/asm/sparsemem.h
> +++ b/arch/mips/include/asm/sparsemem.h
> @@ -11,7 +11,12 @@
>  #else
>  # define SECTION_SIZE_BITS	28
>  #endif
> +
> +#ifdef CONFIG_NUMA
> +#define MAX_PHYSMEM_BITS	48
> +#else
>  #define MAX_PHYSMEM_BITS	35
> +#endif

Essentially the same comment as for XPHYSADDR above.

  Ralf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3
  2014-06-03 22:47   ` Ralf Baechle
@ 2014-06-03 23:47     ` David Daney
  2014-06-04  6:46       ` Ralf Baechle
  0 siblings, 1 reply; 16+ messages in thread
From: David Daney @ 2014-06-03 23:47 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Huacai Chen, John Crispin, Steven J. Hill, Aurelien Jarno,
	linux-mips, Fuxin Zhang, Zhangjin Wu

On 06/03/2014 03:47 PM, Ralf Baechle wrote:
[...]
>> --- a/arch/mips/include/asm/addrspace.h
>> +++ b/arch/mips/include/asm/addrspace.h
>> @@ -51,8 +51,14 @@
>>    * Returns the physical address of a CKSEGx / XKPHYS address
>>    */
>>   #define CPHYSADDR(a)		((_ACAST32_(a)) & 0x1fffffff)
>> +
>> +#ifndef CONFIG_NUMA
>>   #define XPHYSADDR(a)		((_ACAST64_(a)) &			\
>>   				 _CONST64_(0x000000ffffffffff))
>> +#else
>> +#define XPHYSADDR(a)		((_ACAST64_(a)) &			\
>> +				 _CONST64_(0x0000ffffffffffff))
>> +#endif
>
> The mask in XPHYSADDR is a function of the processor architecture, not
> imlementation, not NUMA.  The latest version of the MIPS architecture
> permits PABITS to be as large as 49 bits, so the mask should be
> 0x0001ffffffffffff.  Always.
>
>> diff --git a/arch/mips/include/asm/sparsemem.h b/arch/mips/include/asm/sparsemem.h
>> index d2da53c..c001a90 100644
>> --- a/arch/mips/include/asm/sparsemem.h
>> +++ b/arch/mips/include/asm/sparsemem.h
>> @@ -11,7 +11,12 @@
>>   #else
>>   # define SECTION_SIZE_BITS	28
>>   #endif
>> +
>> +#ifdef CONFIG_NUMA
>> +#define MAX_PHYSMEM_BITS	48
>> +#else
>>   #define MAX_PHYSMEM_BITS	35
>> +#endif
>
> Essentially the same comment as for XPHYSADDR above.

Are you saying to change it to 49 unconditionally for all configurations?

That would work for OCTEON too, where we have had to increase it to 42.

What are the implications for kernel data structures if this is set many 
orders of magnitude greater than the actual number of bits used on a system?


>
>    Ralf
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3
  2014-06-03 23:47     ` David Daney
@ 2014-06-04  6:46       ` Ralf Baechle
  2014-06-04 17:22         ` David Daney
  0 siblings, 1 reply; 16+ messages in thread
From: Ralf Baechle @ 2014-06-04  6:46 UTC (permalink / raw)
  To: David Daney
  Cc: Huacai Chen, John Crispin, Steven J. Hill, Aurelien Jarno,
	linux-mips, Fuxin Zhang, Zhangjin Wu

On Tue, Jun 03, 2014 at 04:47:52PM -0700, David Daney wrote:

> On 06/03/2014 03:47 PM, Ralf Baechle wrote:
> [...]
> >>--- a/arch/mips/include/asm/addrspace.h
> >>+++ b/arch/mips/include/asm/addrspace.h
> >>@@ -51,8 +51,14 @@
> >>   * Returns the physical address of a CKSEGx / XKPHYS address
> >>   */
> >>  #define CPHYSADDR(a)		((_ACAST32_(a)) & 0x1fffffff)
> >>+
> >>+#ifndef CONFIG_NUMA
> >>  #define XPHYSADDR(a)		((_ACAST64_(a)) &			\
> >>  				 _CONST64_(0x000000ffffffffff))
> >>+#else
> >>+#define XPHYSADDR(a)		((_ACAST64_(a)) &			\
> >>+				 _CONST64_(0x0000ffffffffffff))
> >>+#endif
> >
> >The mask in XPHYSADDR is a function of the processor architecture, not
> >imlementation, not NUMA.  The latest version of the MIPS architecture
> >permits PABITS to be as large as 49 bits, so the mask should be
> >0x0001ffffffffffff.  Always.
> >
> >>diff --git a/arch/mips/include/asm/sparsemem.h b/arch/mips/include/asm/sparsemem.h
> >>index d2da53c..c001a90 100644
> >>--- a/arch/mips/include/asm/sparsemem.h
> >>+++ b/arch/mips/include/asm/sparsemem.h
> >>@@ -11,7 +11,12 @@
> >>  #else
> >>  # define SECTION_SIZE_BITS	28
> >>  #endif
> >>+
> >>+#ifdef CONFIG_NUMA
> >>+#define MAX_PHYSMEM_BITS	48
> >>+#else
> >>  #define MAX_PHYSMEM_BITS	35
> >>+#endif
> >
> >Essentially the same comment as for XPHYSADDR above.
> 
> Are you saying to change it to 49 unconditionally for all configurations?
> 
> That would work for OCTEON too, where we have had to increase it to 42.
> 
> What are the implications for kernel data structures if this is set
> many orders of magnitude greater than the actual number of bits used
> on a system?

Shouldn't make a significant difference; the value is used to compute certain
limits in sparse.c and a bitmap in zsmalloc.c which is used only when
CONFIG_ZSMALLOC is enabled.

A more important value which I haven't noticed the Looongson patches to
modify is SECTION_SIZE_BITS in <asm/sparsemem.h>:

#if defined(CONFIG_MIPS_HUGE_TLB_SUPPORT) && defined(CONFIG_PAGE_SIZE_64KB)
# define SECTION_SIZE_BITS      29
#else
# define SECTION_SIZE_BITS      28
#endif

Don't ask me why its definition depends on MIPS_HUGE_TLB_SUPPORT and
PAGE_SIZE_64KB - the value describes the larges chunk of contiguous
memory (that is for example memory per node) and that doesn't depend
on these CONFIG_* symbols.

  Ralf

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3
  2014-06-04  6:46       ` Ralf Baechle
@ 2014-06-04 17:22         ` David Daney
  2014-06-05  9:15           ` Huacai Chen
  0 siblings, 1 reply; 16+ messages in thread
From: David Daney @ 2014-06-04 17:22 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Huacai Chen, John Crispin, Steven J. Hill, Aurelien Jarno,
	linux-mips, Fuxin Zhang, Zhangjin Wu

On 06/03/2014 11:46 PM, Ralf Baechle wrote:
>
> A more important value which I haven't noticed the Looongson patches to
> modify is SECTION_SIZE_BITS in <asm/sparsemem.h>:
>
> #if defined(CONFIG_MIPS_HUGE_TLB_SUPPORT) && defined(CONFIG_PAGE_SIZE_64KB)
> # define SECTION_SIZE_BITS      29
> #else
> # define SECTION_SIZE_BITS      28
> #endif
>
> Don't ask me why its definition depends on MIPS_HUGE_TLB_SUPPORT and
> PAGE_SIZE_64KB - the value describes the larges chunk of contiguous
> memory (that is for example memory per node) and that doesn't depend
> on these CONFIG_* symbols.
>

I think I can answer that.  We do the same thing for OCTEON I think.

IIRC, with SPARSEMEM, you cannot allocate high order pages that span 
multiple sections.  Therefore you have to have the sections be at least 
as large as a huge page.  in the case of CONFIG_PAGE_SIZE_64KB, the huge 
pages are 512MB which doesn't fit in 28 bits.

David.


>    Ralf
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3
  2014-06-04 17:22         ` David Daney
@ 2014-06-05  9:15           ` Huacai Chen
  0 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2014-06-05  9:15 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, John Crispin, Steven J. Hill, Aurelien Jarno,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

Now what should I do?  Change MAX_PHYSMEM_BITS and XPHYSADDR to 49
bits in a separte patch?

Huacai

On Thu, Jun 5, 2014 at 1:22 AM, David Daney <ddaney.cavm@gmail.com> wrote:
> On 06/03/2014 11:46 PM, Ralf Baechle wrote:
>>
>>
>> A more important value which I haven't noticed the Looongson patches to
>> modify is SECTION_SIZE_BITS in <asm/sparsemem.h>:
>>
>> #if defined(CONFIG_MIPS_HUGE_TLB_SUPPORT) &&
>> defined(CONFIG_PAGE_SIZE_64KB)
>> # define SECTION_SIZE_BITS      29
>> #else
>> # define SECTION_SIZE_BITS      28
>> #endif
>>
>> Don't ask me why its definition depends on MIPS_HUGE_TLB_SUPPORT and
>> PAGE_SIZE_64KB - the value describes the larges chunk of contiguous
>> memory (that is for example memory per node) and that doesn't depend
>> on these CONFIG_* symbols.
>>
>
> I think I can answer that.  We do the same thing for OCTEON I think.
>
> IIRC, with SPARSEMEM, you cannot allocate high order pages that span
> multiple sections.  Therefore you have to have the sections be at least as
> large as a huge page.  in the case of CONFIG_PAGE_SIZE_64KB, the huge pages
> are 512MB which doesn't fit in 28 bits.
>
> David.
>
>
>>    Ralf
>>
>>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-06-05  9:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-13  0:24 [PATCH V2 0/8] MIPS: Loongson-3: Add NUMA and Loongson-3B support Huacai Chen
2014-04-13  0:24 ` [PATCH V2 1/8] MIPS: Support hard limit of cpu count (nr_cpu_ids) Huacai Chen
2014-04-14 14:48   ` Andreas Herrmann
2014-04-13  0:24 ` [PATCH V2 2/8] MIPS: Support CPU topology files in sysfs Huacai Chen
2014-04-14 15:04   ` Andreas Herrmann
2014-04-13  0:24 ` [PATCH V2 3/8] MIPS: Loongson: Modify ChipConfig register definition Huacai Chen
2014-04-13  0:24 ` [PATCH V2 4/8] MIPS: Add NUMA support for Loongson-3 Huacai Chen
2014-06-03 22:47   ` Ralf Baechle
2014-06-03 23:47     ` David Daney
2014-06-04  6:46       ` Ralf Baechle
2014-06-04 17:22         ` David Daney
2014-06-05  9:15           ` Huacai Chen
2014-04-13  0:24 ` [PATCH V2 5/8] MIPS: Add numa api support Huacai Chen
2014-04-13  0:24 ` [PATCH V2 6/8] MIPS: Add Loongson-3B support Huacai Chen
2014-04-13  0:24 ` [PATCH V2 7/8] MIPS: Loongson-3: Enable the COP2 usage Huacai Chen
2014-04-13  0:24 ` [PATCH V2 8/8] MIPS: Loongson: Rename CONFIG_LEMOTE_MACH3A to CONFIG_LOONGSON_MACH3X Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.