All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 00/10] MIPS: Loongson: new features and improvements
@ 2018-09-05  9:33 Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime Huacai Chen
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen

This patchset is prepared for the next 4.20 release for Linux/MIPS. It
enable Loongson-3's SFB at runtime, adds "model name" and "CPU MHz"
knobs in /proc/cpuinfo which is needed by some userspace tools, adds
Loongson-3 kexec/kdump support, fixes CPU UART and BRIDGE irq delivery
problem, and introduces WAR_LLSC_MB to improve stability.

V1 -> V2:
1, Add Loongson-3A R3.1 basic support.
2, Fix CPU UART irq delivery problem.
3, Improve code and descriptions (Thank James Hogan).
4, Sync the code to upstream.

V2 -> V3:
1, Remove merged patches.
2, Improve code and descriptions (Thank James Hogan).
3, Sync the code to upstream.

V3 -> V4:
1, Remove merged patches.
2, Improve kdump support.
3, Sync the code to upstream.

Huacai Chen(10):
 MIPS: Loongson-3: Enable Store Fill Buffer at runtime.
 MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3.
 MIPS: Ensure pmd_present() returns false after pmd_mknotpresent().
 MIPS: Add __cpu_full_name[] to make CPU names more human-readable.
 MIPS: Align kernel load address to 64KB.
 MIPS: Reserve extra memory for crash dump.
 MIPS: Loongson: Add kexec/kdump support.
 MIPS: Loongson-3: Fix CPU UART irq delivery problem.
 MIPS: Loongson-3: Fix BRIDGE irq delivery problem.
 MIPS: Loongson: Introduce and use WAR_LLSC_MB.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/boot/compressed/calc_vmlinuz_load_addr.c |   7 +-
 arch/mips/include/asm/atomic.h                     |  36 +++++--
 arch/mips/include/asm/barrier.h                    |   6 ++
 arch/mips/include/asm/bitops.h                     |  15 +++
 arch/mips/include/asm/cmpxchg.h                    |   9 +-
 arch/mips/include/asm/cpu-info.h                   |   2 +
 arch/mips/include/asm/edac.h                       |   5 +-
 arch/mips/include/asm/futex.h                      |  18 ++--
 arch/mips/include/asm/io.h                         |   2 +-
 arch/mips/include/asm/kexec.h                      |  15 +++
 arch/mips/include/asm/local.h                      |  10 +-
 arch/mips/include/asm/mach-loongson64/boot_param.h |   1 +
 arch/mips/include/asm/mach-loongson64/irq.h        |   2 +-
 .../asm/mach-loongson64/kernel-entry-init.h        |  16 ++-
 arch/mips/include/asm/mach-loongson64/mmzone.h     |   1 +
 arch/mips/include/asm/mmzone.h                     |   8 ++
 arch/mips/include/asm/pgtable-64.h                 |   5 +
 arch/mips/include/asm/pgtable.h                    |   5 +-
 arch/mips/include/asm/r4kcache.h                   |  25 +++++
 arch/mips/include/asm/time.h                       |   2 +
 arch/mips/kernel/cpu-probe.c                       |  25 +++--
 arch/mips/kernel/proc.c                            |   7 ++
 arch/mips/kernel/relocate_kernel.S                 |  26 +++++
 arch/mips/kernel/setup.c                           |  55 ++++++++++
 arch/mips/kernel/syscall.c                         |   2 +
 arch/mips/kernel/time.c                            |   2 +
 arch/mips/loongson64/Platform                      |   3 +
 arch/mips/loongson64/common/env.c                  |  20 ++++
 arch/mips/loongson64/common/reset.c                | 120 +++++++++++++++++++++
 arch/mips/loongson64/loongson-3/irq.c              |  56 ++--------
 arch/mips/loongson64/loongson-3/smp.c              |   6 ++
 arch/mips/loongson64/loongson-3/smp.h              |   1 +
 arch/mips/mm/c-r4k.c                               |  44 ++++++--
 arch/mips/mm/tlbex.c                               |  11 ++
 34 files changed, 476 insertions(+), 92 deletions(-)
--
2.7.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05 16:54   ` Paul Burton
  2018-09-05  9:33 ` [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3 Huacai Chen
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen

New Loongson-3 (Loongson-3A R2, Loongson-3A R3, and newer) has SFB
(Store Fill Buffer) which can improve the performance of memory access.
Now, SFB enablement is controlled by CONFIG_LOONGSON3_ENHANCEMENT, and
the generic kernel has no benefit from SFB (even it is running on a new
Loongson-3 machine). With this patch, we can enable SFB at runtime by
detecting the CPU type (the expense is war_io_reorder_wmb() will always
be a 'sync', which will hurt the performance of old Loongson-3).

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/io.h                               |  2 +-
 .../mips/include/asm/mach-loongson64/kernel-entry-init.h | 16 ++++++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 44f766b..eb357c9 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -289,7 +289,7 @@ static inline void iounmap(const volatile void __iomem *addr)
 #undef __IS_KSEG1
 }
 
-#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
+#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_CPU_LOONGSON3)
 #define war_io_reorder_wmb()		wmb()
 #else
 #define war_io_reorder_wmb()		barrier()
diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
index 3127391..cbac603 100644
--- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
+++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
@@ -11,6 +11,8 @@
 #ifndef __ASM_MACH_LOONGSON64_KERNEL_ENTRY_H
 #define __ASM_MACH_LOONGSON64_KERNEL_ENTRY_H
 
+#include <asm/cpu.h>
+
 /*
  * Override macros used in arch/mips/kernel/head.S.
  */
@@ -26,12 +28,15 @@
 	mfc0	t0, CP0_PAGEGRAIN
 	or	t0, (0x1 << 29)
 	mtc0	t0, CP0_PAGEGRAIN
-#ifdef CONFIG_LOONGSON3_ENHANCEMENT
 	/* Enable STFill Buffer */
+	mfc0	t0, CP0_PRID
+	andi	t0, (PRID_IMP_MASK | PRID_REV_MASK)
+	slti	t0, (PRID_IMP_LOONGSON_64 | PRID_REV_LOONGSON3A_R2)
+	bnez	t0, 1f
 	mfc0	t0, CP0_CONFIG6
 	or	t0, 0x100
 	mtc0	t0, CP0_CONFIG6
-#endif
+1:
 	_ehb
 	.set	pop
 #endif
@@ -52,12 +57,15 @@
 	mfc0	t0, CP0_PAGEGRAIN
 	or	t0, (0x1 << 29)
 	mtc0	t0, CP0_PAGEGRAIN
-#ifdef CONFIG_LOONGSON3_ENHANCEMENT
 	/* Enable STFill Buffer */
+	mfc0	t0, CP0_PRID
+	andi	t0, (PRID_IMP_MASK | PRID_REV_MASK)
+	slti	t0, (PRID_IMP_LOONGSON_64 | PRID_REV_LOONGSON3A_R2)
+	bnez	t0, 1f
 	mfc0	t0, CP0_CONFIG6
 	or	t0, 0x100
 	mtc0	t0, CP0_CONFIG6
-#endif
+1:
 	_ehb
 	.set	pop
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-26 21:47   ` Paul Burton
  2018-09-05  9:33 ` [PATCH V4 03/10] MIPS: Ensure pmd_present() returns false after pmd_mknotpresent() Huacai Chen
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen, # 3 . 15+

For multi-node Loongson-3 (NUMA configuration), r4k_blast_scache() can
only flush Node-0's scache. So we add r4k_blast_scache_node() by using
(CAC_BASE | (node_id << NODE_ADDRSPACE_SHIFT)) instead of CKSEG0 as the
start address.

Cc: <stable@vger.kernel.org> # 3.15+
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/mach-loongson64/mmzone.h |  1 +
 arch/mips/include/asm/mmzone.h                 |  8 +++++
 arch/mips/include/asm/r4kcache.h               | 25 +++++++++++++++
 arch/mips/mm/c-r4k.c                           | 44 ++++++++++++++++++++++----
 4 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/arch/mips/include/asm/mach-loongson64/mmzone.h b/arch/mips/include/asm/mach-loongson64/mmzone.h
index c9f7e23..59c8b11 100644
--- a/arch/mips/include/asm/mach-loongson64/mmzone.h
+++ b/arch/mips/include/asm/mach-loongson64/mmzone.h
@@ -21,6 +21,7 @@
 #define NODE3_ADDRSPACE_OFFSET 0x300000000000UL
 
 #define pa_to_nid(addr)  (((addr) & 0xf00000000000) >> NODE_ADDRSPACE_SHIFT)
+#define nid_to_addrbase(nid) ((nid) << NODE_ADDRSPACE_SHIFT)
 
 #define LEVELS_PER_SLICE 128
 
diff --git a/arch/mips/include/asm/mmzone.h b/arch/mips/include/asm/mmzone.h
index f085fba..2a0fe1d 100644
--- a/arch/mips/include/asm/mmzone.h
+++ b/arch/mips/include/asm/mmzone.h
@@ -9,6 +9,14 @@
 #include <asm/page.h>
 #include <mmzone.h>
 
+#ifndef pa_to_nid
+#define pa_to_nid(addr) 0
+#endif
+
+#ifndef nid_to_addrbase
+#define nid_to_addrbase(nid) 0
+#endif
+
 #ifdef CONFIG_DISCONTIGMEM
 
 #define pfn_to_nid(pfn)		pa_to_nid((pfn) << PAGE_SHIFT)
diff --git a/arch/mips/include/asm/r4kcache.h b/arch/mips/include/asm/r4kcache.h
index 7f12d7e..e3f70dc 100644
--- a/arch/mips/include/asm/r4kcache.h
+++ b/arch/mips/include/asm/r4kcache.h
@@ -747,4 +747,29 @@ __BUILD_BLAST_CACHE_RANGE(s, scache, Hit_Writeback_Inv_SD, , )
 __BUILD_BLAST_CACHE_RANGE(inv_d, dcache, Hit_Invalidate_D, , )
 __BUILD_BLAST_CACHE_RANGE(inv_s, scache, Hit_Invalidate_SD, , )
 
+/* Currently, this is very specific to Loongson-3 */
+#define __BUILD_BLAST_CACHE_NODE(pfx, desc, indexop, hitop, lsize)	\
+static inline void blast_##pfx##cache##lsize##_node(long node)		\
+{									\
+	unsigned long start = CAC_BASE | nid_to_addrbase(node);		\
+	unsigned long end = start + current_cpu_data.desc.waysize;	\
+	unsigned long ws_inc = 1UL << current_cpu_data.desc.waybit;	\
+	unsigned long ws_end = current_cpu_data.desc.ways <<		\
+			       current_cpu_data.desc.waybit;		\
+	unsigned long ws, addr;						\
+									\
+	__##pfx##flush_prologue						\
+									\
+	for (ws = 0; ws < ws_end; ws += ws_inc)				\
+		for (addr = start; addr < end; addr += lsize * 32)	\
+			cache##lsize##_unroll32(addr|ws, indexop);	\
+									\
+	__##pfx##flush_epilogue						\
+}
+
+__BUILD_BLAST_CACHE_NODE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 16)
+__BUILD_BLAST_CACHE_NODE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 32)
+__BUILD_BLAST_CACHE_NODE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 64)
+__BUILD_BLAST_CACHE_NODE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 128)
+
 #endif /* _ASM_R4KCACHE_H */
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index a9ef057..05a539d 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -459,11 +459,28 @@ static void r4k_blast_scache_setup(void)
 		r4k_blast_scache = blast_scache128;
 }
 
+static void (*r4k_blast_scache_node)(long node);
+
+static void r4k_blast_scache_node_setup(void)
+{
+	unsigned long sc_lsize = cpu_scache_line_size();
+
+	if (current_cpu_type() != CPU_LOONGSON3)
+		r4k_blast_scache_node = (void *)cache_noop;
+	else if (sc_lsize == 16)
+		r4k_blast_scache_node = blast_scache16_node;
+	else if (sc_lsize == 32)
+		r4k_blast_scache_node = blast_scache32_node;
+	else if (sc_lsize == 64)
+		r4k_blast_scache_node = blast_scache64_node;
+	else if (sc_lsize == 128)
+		r4k_blast_scache_node = blast_scache128_node;
+}
+
 static inline void local_r4k___flush_cache_all(void * args)
 {
 	switch (current_cpu_type()) {
 	case CPU_LOONGSON2:
-	case CPU_LOONGSON3:
 	case CPU_R4000SC:
 	case CPU_R4000MC:
 	case CPU_R4400SC:
@@ -480,6 +497,11 @@ static inline void local_r4k___flush_cache_all(void * args)
 		r4k_blast_scache();
 		break;
 
+	case CPU_LOONGSON3:
+		/* Use get_ebase_cpunum() for both NUMA=y/n */
+		r4k_blast_scache_node(get_ebase_cpunum() >> 2);
+		break;
+
 	case CPU_BMIPS5000:
 		r4k_blast_scache();
 		__sync();
@@ -840,10 +862,14 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size)
 
 	preempt_disable();
 	if (cpu_has_inclusive_pcaches) {
-		if (size >= scache_size)
-			r4k_blast_scache();
-		else
+		if (size >= scache_size) {
+			if (current_cpu_type() != CPU_LOONGSON3)
+				r4k_blast_scache();
+			else
+				r4k_blast_scache_node(pa_to_nid(addr));
+		} else {
 			blast_scache_range(addr, addr + size);
+		}
 		preempt_enable();
 		__sync();
 		return;
@@ -877,9 +903,12 @@ static void r4k_dma_cache_inv(unsigned long addr, unsigned long size)
 
 	preempt_disable();
 	if (cpu_has_inclusive_pcaches) {
-		if (size >= scache_size)
-			r4k_blast_scache();
-		else {
+		if (size >= scache_size) {
+			if (current_cpu_type() != CPU_LOONGSON3)
+				r4k_blast_scache();
+			else
+				r4k_blast_scache_node(pa_to_nid(addr));
+		} else {
 			/*
 			 * There is no clearly documented alignment requirement
 			 * for the cache instruction on MIPS processors and
@@ -1918,6 +1947,7 @@ void r4k_cache_init(void)
 	r4k_blast_scache_page_setup();
 	r4k_blast_scache_page_indexed_setup();
 	r4k_blast_scache_setup();
+	r4k_blast_scache_node_setup();
 #ifdef CONFIG_EVA
 	r4k_blast_dcache_user_page_setup();
 	r4k_blast_icache_user_page_setup();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 03/10] MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime Huacai Chen
  2018-09-05  9:33 ` [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3 Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 04/10] MIPS: Add __cpu_full_name[] to make CPU names more human-readable Huacai Chen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen, # 3 . 8+

This patch is borrowed from ARM64 to ensure pmd_present() returns false
after pmd_mknotpresent(). This is needed for THP.

Cc: <stable@vger.kernel.org> # 3.8+
References: 5bb1cc0ff9a6 ("arm64: Ensure pmd_present() returns false after pmd_mknotpresent()")
Reviewed-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/pgtable-64.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h
index 0036ea0..93a9dce 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -265,6 +265,11 @@ static inline int pmd_bad(pmd_t pmd)
 
 static inline int pmd_present(pmd_t pmd)
 {
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	if (unlikely(pmd_val(pmd) & _PAGE_HUGE))
+		return pmd_val(pmd) & _PAGE_PRESENT;
+#endif
+
 	return pmd_val(pmd) != (unsigned long) invalid_pte_table;
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 04/10] MIPS: Add __cpu_full_name[] to make CPU names more human-readable
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (2 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 03/10] MIPS: Ensure pmd_present() returns false after pmd_mknotpresent() Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 05/10] MIPS: Align kernel load address to 64KB Huacai Chen
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen

In /proc/cpuinfo, we keep "cpu model" as is, since GCC should use it
for -march=native. Besides, we add __cpu_full_name[] to describe the
processor in a more human-readable manner. The full name is displayed
as "model name" in cpuinfo, which is needed by some userspace tools
such as gnome-system-monitor.

The CPU frequency in "model name" is the default value (highest), and
there is also a "CPU MHz" whose value can be changed by cpufreq.

This is only used by Loongson now (ICT is dropped in cpu name, and cpu
name can be overwritten by BIOS).

Reviewed-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cpu-info.h                   |  2 ++
 arch/mips/include/asm/mach-loongson64/boot_param.h |  1 +
 arch/mips/include/asm/time.h                       |  2 ++
 arch/mips/kernel/cpu-probe.c                       | 25 ++++++++++++++++------
 arch/mips/kernel/proc.c                            |  7 ++++++
 arch/mips/kernel/time.c                            |  2 ++
 arch/mips/loongson64/common/env.c                  | 13 +++++++++++
 arch/mips/loongson64/loongson-3/smp.c              |  1 +
 arch/mips/loongson64/loongson-3/smp.h              |  1 +
 9 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
index a41059d..6128a37 100644
--- a/arch/mips/include/asm/cpu-info.h
+++ b/arch/mips/include/asm/cpu-info.h
@@ -116,7 +116,9 @@ extern void cpu_probe(void);
 extern void cpu_report(void);
 
 extern const char *__cpu_name[];
+extern const char *__cpu_full_name[];
 #define cpu_name_string()	__cpu_name[raw_smp_processor_id()]
+#define cpu_full_name_string()	__cpu_full_name[raw_smp_processor_id()]
 
 struct seq_file;
 struct notifier_block;
diff --git a/arch/mips/include/asm/mach-loongson64/boot_param.h b/arch/mips/include/asm/mach-loongson64/boot_param.h
index 8c286be..1d69e4c 100644
--- a/arch/mips/include/asm/mach-loongson64/boot_param.h
+++ b/arch/mips/include/asm/mach-loongson64/boot_param.h
@@ -58,6 +58,7 @@ struct efi_cpuinfo_loongson {
 	u16 reserved_cores_mask;
 	u32 cpu_clock_freq; /* cpu_clock */
 	u32 nr_cpus;
+	char cpuname[64];
 } __packed;
 
 #define MAX_UARTS 64
diff --git a/arch/mips/include/asm/time.h b/arch/mips/include/asm/time.h
index b85ec64..5cbfbec 100644
--- a/arch/mips/include/asm/time.h
+++ b/arch/mips/include/asm/time.h
@@ -26,6 +26,8 @@ extern spinlock_t rtc_lock;
  */
 extern void plat_time_init(void);
 
+extern unsigned int mips_cpu_frequency;
+
 /*
  * mips_hpt_frequency - must be set if you intend to use an R4k-compatible
  * counter as a timer interrupt source.
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index d535fc7..22d9906 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1472,30 +1472,40 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
 		switch (c->processor_id & PRID_REV_MASK) {
 		case PRID_REV_LOONGSON2E:
 			c->cputype = CPU_LOONGSON2;
-			__cpu_name[cpu] = "ICT Loongson-2";
+			__cpu_name[cpu] = "Loongson-2";
 			set_elf_platform(cpu, "loongson2e");
 			set_isa(c, MIPS_CPU_ISA_III);
 			c->fpu_msk31 |= FPU_CSR_CONDX;
+			__cpu_full_name[cpu] = "Loongson-2E";
 			break;
 		case PRID_REV_LOONGSON2F:
 			c->cputype = CPU_LOONGSON2;
-			__cpu_name[cpu] = "ICT Loongson-2";
+			__cpu_name[cpu] = "Loongson-2";
 			set_elf_platform(cpu, "loongson2f");
 			set_isa(c, MIPS_CPU_ISA_III);
 			c->fpu_msk31 |= FPU_CSR_CONDX;
+			__cpu_full_name[cpu] = "Loongson-2F";
 			break;
 		case PRID_REV_LOONGSON3A_R1:
 			c->cputype = CPU_LOONGSON3;
-			__cpu_name[cpu] = "ICT Loongson-3";
+			__cpu_name[cpu] = "Loongson-3";
 			set_elf_platform(cpu, "loongson3a");
 			set_isa(c, MIPS_CPU_ISA_M64R1);
+			__cpu_full_name[cpu] = "Loongson-3A R1 (Loongson-3A1000)";
 			break;
 		case PRID_REV_LOONGSON3B_R1:
+			c->cputype = CPU_LOONGSON3;
+			__cpu_name[cpu] = "Loongson-3";
+			set_elf_platform(cpu, "loongson3b");
+			set_isa(c, MIPS_CPU_ISA_M64R1);
+			__cpu_full_name[cpu] = "Loongson-3B R1 (Loongson-3B1000)";
+			break;
 		case PRID_REV_LOONGSON3B_R2:
 			c->cputype = CPU_LOONGSON3;
-			__cpu_name[cpu] = "ICT Loongson-3";
+			__cpu_name[cpu] = "Loongson-3";
 			set_elf_platform(cpu, "loongson3b");
 			set_isa(c, MIPS_CPU_ISA_M64R1);
+			__cpu_full_name[cpu] = "Loongson-3B R2 (Loongson-3B1500)";
 			break;
 		}
 
@@ -1845,16 +1855,18 @@ static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
 		switch (c->processor_id & PRID_REV_MASK) {
 		case PRID_REV_LOONGSON3A_R2:
 			c->cputype = CPU_LOONGSON3;
-			__cpu_name[cpu] = "ICT Loongson-3";
+			__cpu_name[cpu] = "Loongson-3";
 			set_elf_platform(cpu, "loongson3a");
 			set_isa(c, MIPS_CPU_ISA_M64R2);
+			__cpu_full_name[cpu] = "Loongson-3A R2 (Loongson-3A2000)";
 			break;
 		case PRID_REV_LOONGSON3A_R3_0:
 		case PRID_REV_LOONGSON3A_R3_1:
 			c->cputype = CPU_LOONGSON3;
-			__cpu_name[cpu] = "ICT Loongson-3";
+			__cpu_name[cpu] = "Loongson-3";
 			set_elf_platform(cpu, "loongson3a");
 			set_isa(c, MIPS_CPU_ISA_M64R2);
+			__cpu_full_name[cpu] = "Loongson-3A R3 (Loongson-3A3000)";
 			break;
 		}
 
@@ -1974,6 +1986,7 @@ EXPORT_SYMBOL(__ua_limit);
 #endif
 
 const char *__cpu_name[NR_CPUS];
+const char *__cpu_full_name[NR_CPUS];
 const char *__elf_platform;
 
 void cpu_probe(void)
diff --git a/arch/mips/kernel/proc.c b/arch/mips/kernel/proc.c
index b2de408..b4c31b0 100644
--- a/arch/mips/kernel/proc.c
+++ b/arch/mips/kernel/proc.c
@@ -15,6 +15,7 @@
 #include <asm/mipsregs.h>
 #include <asm/processor.h>
 #include <asm/prom.h>
+#include <asm/time.h>
 
 unsigned int vced_count, vcei_count;
 
@@ -63,6 +64,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 	seq_printf(m, fmt, __cpu_name[n],
 		      (version >> 4) & 0x0f, version & 0x0f,
 		      (fp_vers >> 4) & 0x0f, fp_vers & 0x0f);
+	if (__cpu_full_name[n])
+		seq_printf(m, "model name\t\t: %s\n", __cpu_full_name[n]);
+	if (mips_cpu_frequency)
+		seq_printf(m, "CPU MHz\t\t\t: %u.%02u\n",
+			      mips_cpu_frequency / 1000000,
+			      (mips_cpu_frequency / 10000) % 100);
 	seq_printf(m, "BogoMIPS\t\t: %u.%02u\n",
 		      cpu_data[n].udelay_val / (500000/HZ),
 		      (cpu_data[n].udelay_val / (5000/HZ)) % 100);
diff --git a/arch/mips/kernel/time.c b/arch/mips/kernel/time.c
index bfe02de..fc0323e 100644
--- a/arch/mips/kernel/time.c
+++ b/arch/mips/kernel/time.c
@@ -54,6 +54,8 @@ EXPORT_SYMBOL(perf_irq);
  * 2) calculate a couple of cached variables for later usage
  */
 
+unsigned int mips_cpu_frequency;
+EXPORT_SYMBOL_GPL(mips_cpu_frequency);
 unsigned int mips_hpt_frequency;
 EXPORT_SYMBOL_GPL(mips_hpt_frequency);
 
diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c
index 8f68ee0..2928ac5 100644
--- a/arch/mips/loongson64/common/env.c
+++ b/arch/mips/loongson64/common/env.c
@@ -18,6 +18,7 @@
  * option) any later version.
  */
 #include <linux/export.h>
+#include <asm/time.h>
 #include <asm/bootinfo.h>
 #include <loongson.h>
 #include <boot_param.h>
@@ -25,6 +26,7 @@
 
 u32 cpu_clock_freq;
 EXPORT_SYMBOL(cpu_clock_freq);
+char cpu_full_name[64];
 struct efi_memory_map_loongson *loongson_memmap;
 struct loongson_system_configuration loongson_sysconf;
 
@@ -45,6 +47,7 @@ do {									\
 void __init prom_init_env(void)
 {
 	/* pmon passes arguments in 32bit pointers */
+	char freq[12];
 	unsigned int processor_id;
 
 #ifndef CONFIG_LEFI_FIRMWARE_INTERFACE
@@ -151,6 +154,10 @@ void __init prom_init_env(void)
 	loongson_sysconf.nr_nodes = (loongson_sysconf.nr_cpus +
 		loongson_sysconf.cores_per_node - 1) /
 		loongson_sysconf.cores_per_node;
+	if (!strncmp(ecpu->cpuname, "Loongson", 8))
+		strncpy(cpu_full_name, ecpu->cpuname, sizeof(cpu_full_name));
+	if (cpu_full_name[0] == 0)
+		strncpy(cpu_full_name, __cpu_full_name[0], sizeof(cpu_full_name));
 
 	loongson_sysconf.pci_mem_start_addr = eirq_source->pci_mem_start_addr;
 	loongson_sysconf.pci_mem_end_addr = eirq_source->pci_mem_end_addr;
@@ -211,5 +218,11 @@ void __init prom_init_env(void)
 			break;
 		}
 	}
+	mips_cpu_frequency = cpu_clock_freq;
 	pr_info("CpuClock = %u\n", cpu_clock_freq);
+
+	/* Append default cpu frequency with round-off */
+	sprintf(freq, " @ %uMHz", (cpu_clock_freq + 500000) / 1000000);
+	strncat(cpu_full_name, freq, sizeof(cpu_full_name));
+	__cpu_full_name[0] = cpu_full_name;
 }
diff --git a/arch/mips/loongson64/loongson-3/smp.c b/arch/mips/loongson64/loongson-3/smp.c
index fea95d0..470e9c1 100644
--- a/arch/mips/loongson64/loongson-3/smp.c
+++ b/arch/mips/loongson64/loongson-3/smp.c
@@ -340,6 +340,7 @@ static void loongson3_init_secondary(void)
 		initcount = core0_c0count[cpu] + i/2;
 
 	write_c0_count(initcount);
+	__cpu_full_name[cpu] = cpu_full_name;
 }
 
 static void loongson3_smp_finish(void)
diff --git a/arch/mips/loongson64/loongson-3/smp.h b/arch/mips/loongson64/loongson-3/smp.h
index 957bde8..6112d9d 100644
--- a/arch/mips/loongson64/loongson-3/smp.h
+++ b/arch/mips/loongson64/loongson-3/smp.h
@@ -3,6 +3,7 @@
 #define __LOONGSON_SMP_H_
 
 /* for Loongson-3 smp support */
+extern char cpu_full_name[64];
 extern unsigned long long smp_group[4];
 
 /* 4 groups(nodes) in maximum in numa case */
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 05/10] MIPS: Align kernel load address to 64KB
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (3 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 04/10] MIPS: Add __cpu_full_name[] to make CPU names more human-readable Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 06/10] MIPS: Reserve extra memory for crash dump Huacai Chen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen, # 2 . 6 . 36+

KEXEC needs the new kernel's load address to be aligned on a page
boundary (see sanity_check_segment_list()), but on MIPS the default
vmlinuz load address is only explicitly aligned to 16 bytes.

Since the largest PAGE_SIZE supported by MIPS kernels is 64KB, increase
the alignment calculated by calc_vmlinuz_load_addr to 64KB.

Cc: <stable@vger.kernel.org>  # 2.6.36+
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/boot/compressed/calc_vmlinuz_load_addr.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c b/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c
index 37fe58c..542c3ed 100644
--- a/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c
+++ b/arch/mips/boot/compressed/calc_vmlinuz_load_addr.c
@@ -13,6 +13,7 @@
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include "../../../../include/linux/sizes.h"
 
 int main(int argc, char *argv[])
 {
@@ -45,11 +46,11 @@ int main(int argc, char *argv[])
 	vmlinuz_load_addr = vmlinux_load_addr + vmlinux_size;
 
 	/*
-	 * Align with 16 bytes: "greater than that used for any standard data
-	 * types by a MIPS compiler." -- See MIPS Run Linux (Second Edition).
+	 * Align with 64KB: KEXEC needs load sections to be aligned to PAGE_SIZE,
+	 * which may be as large as 64KB depending on the kernel configuration.
 	 */
 
-	vmlinuz_load_addr += (16 - vmlinux_size % 16);
+	vmlinuz_load_addr += (SZ_64K - vmlinux_size % SZ_64K);
 
 	printf("0x%llx\n", vmlinuz_load_addr);
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 06/10] MIPS: Reserve extra memory for crash dump
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (4 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 05/10] MIPS: Align kernel load address to 64KB Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 07/10] MIPS: Loongson64: Add kexec/kdump support Huacai Chen
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen

Traditionally, MIPS's contiguous low memory can be as less as 256M, so
crashkernel=X@Y may be unable to large enough in some cases. Moreover,
for the "multi numa node + sparse memory model" case, it is attempt to
allocate section_mem_maps on every node. Thus, if the total memory of a
node is more than 1GB, we reserve the top 128MB for the crash kernel.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/kernel/setup.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index c71d1eb..9f2d5d9 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -28,6 +28,7 @@
 #include <linux/dma-contiguous.h>
 #include <linux/decompress/generic.h>
 #include <linux/of_fdt.h>
+#include <linux/crash_dump.h>
 
 #include <asm/addrspace.h>
 #include <asm/bootinfo.h>
@@ -830,6 +831,52 @@ static void __init request_crashkernel(struct resource *res)
 #define BUILTIN_EXTEND_WITH_PROM	\
 	IS_ENABLED(CONFIG_MIPS_CMDLINE_BUILTIN_EXTEND)
 
+/* Traditionally, MIPS's contiguous low memory is 256M, so crashkernel=X@Y is
+ * unable to be large enough in some cases. Thus, if the total memory of a node
+ * is more than 1GB, we reserve the top 128MB for the crash kernel */
+static void reserve_crashm_region(int node, unsigned long s0, unsigned long e0)
+{
+#ifdef CONFIG_KEXEC
+	if (crashk_res.start == crashk_res.end)
+		return;
+
+	if ((e0 - s0) <= (SZ_1G >> PAGE_SHIFT))
+		return;
+
+	s0 = e0 - (SZ_128M >> PAGE_SHIFT);
+
+	reserve_bootmem_node(NODE_DATA(node), PFN_PHYS(s0),
+			(e0 - s0) << PAGE_SHIFT, BOOTMEM_DEFAULT);
+#endif
+}
+
+static void reserve_oldmem_region(int node, unsigned long s0, unsigned long e0)
+{
+#ifdef CONFIG_CRASH_DUMP
+	unsigned long s1, e1;
+
+	if (!is_kdump_kernel())
+		return;
+
+	if ((e0 - s0) > (SZ_1G >> PAGE_SHIFT))
+		e0 = e0 - (SZ_128M >> PAGE_SHIFT);
+
+	/* boot_mem_map.map[0] is crashk_res reserved by primary kernel */
+	s1 = PFN_UP(boot_mem_map.map[0].addr);
+	e1 = PFN_DOWN(boot_mem_map.map[0].addr + boot_mem_map.map[0].size);
+
+	if (node == 0) {
+		reserve_bootmem_node(NODE_DATA(node), PFN_PHYS(s0),
+				(s1 - s0) << PAGE_SHIFT, BOOTMEM_DEFAULT);
+		reserve_bootmem_node(NODE_DATA(node), PFN_PHYS(e1),
+				(e0 - e1) << PAGE_SHIFT, BOOTMEM_DEFAULT);
+	} else {
+		reserve_bootmem_node(NODE_DATA(node), PFN_PHYS(s0),
+				(e0 - s0) << PAGE_SHIFT, BOOTMEM_DEFAULT);
+	}
+#endif
+}
+
 /*
  * arch_mem_init - initialize memory management subsystem
  *
@@ -842,6 +843,8 @@ static void __init request_crashkernel(struct resource *res)
  */
 static void __init arch_mem_init(char **cmdline_p)
 {
+	unsigned int node;
+	unsigned long start_pfn, end_pfn;
 	struct memblock_region *reg;
 	extern void plat_mem_setup(void);
 
@@ -923,6 +972,12 @@ static void __init arch_mem_init(char **cmdline_p)
 				crashk_res.end - crashk_res.start + 1,
 				BOOTMEM_DEFAULT);
 #endif
+	for_each_online_node(node) {
+		get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
+		reserve_crashm_region(node, start_pfn, end_pfn);
+		reserve_oldmem_region(node, start_pfn, end_pfn);
+	}
+
 	device_tree_init();
 	sparse_init();
 	plat_swiotlb_setup();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 07/10] MIPS: Loongson64: Add kexec/kdump support
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (5 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 06/10] MIPS: Reserve extra memory for crash dump Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 08/10] MIPS: Loongson-3: Fix CPU UART irq delivery problem Huacai Chen
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen, Eric Biederman

Add kexec/kdump support for Loongson64 by:
1, Set maximum physical address to 0x20000000000L (2^41, Node-0's
   address space for Loongson64) in CONFIG_64BIT;
2, Provide Loongson-specific kexec functions: loongson_kexec_prepare,
   loongson_kexec_shutdown and loongson_crash_shutdown;
3, Provide Loongson-specific code in kexec_smp_wait;
4, Clear mailbox in loongson3_smp_setup() since KEXEC bypass BIOS;
5, KEXEC always run at boot-cpu, but KDUMP may triggered at non-boot-
   cpu. Loongson64 assume boot-cpu is the first possible cpu, so fix
   boot_cpu_id in prom_init_env();

Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/kexec.h         |  15 +++++
 arch/mips/kernel/relocate_kernel.S    |  26 ++++++++
 arch/mips/loongson64/common/env.c     |   7 ++
 arch/mips/loongson64/common/reset.c   | 120 ++++++++++++++++++++++++++++++++++
 arch/mips/loongson64/loongson-3/smp.c |   5 ++
 5 files changed, 173 insertions(+)

diff --git a/arch/mips/include/asm/kexec.h b/arch/mips/include/asm/kexec.h
index 493a3cc..ffd4cee 100644
--- a/arch/mips/include/asm/kexec.h
+++ b/arch/mips/include/asm/kexec.h
@@ -11,12 +11,26 @@
 
 #include <asm/stacktrace.h>
 
+#ifdef CONFIG_32BIT
+
 /* Maximum physical address we can use pages from */
 #define KEXEC_SOURCE_MEMORY_LIMIT (0x20000000)
 /* Maximum address we can reach in physical address mode */
 #define KEXEC_DESTINATION_MEMORY_LIMIT (0x20000000)
  /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT (0x20000000)
+
+#else
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (0x20000000000L)
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (0x20000000000L)
+ /* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT (0x20000000000L)
+
+#endif
+
 /* Reserve 3*4096 bytes for board-specific info */
 #define KEXEC_CONTROL_PAGE_SIZE (4096 + 3*4096)
 
@@ -36,6 +50,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
 #ifdef CONFIG_KEXEC
 struct kimage;
 extern unsigned long kexec_args[4];
+extern const size_t relocate_new_kernel_size;
 extern int (*_machine_kexec_prepare)(struct kimage *);
 extern void (*_machine_kexec_shutdown)(void);
 extern void (*_machine_crash_shutdown)(struct pt_regs *regs);
diff --git a/arch/mips/kernel/relocate_kernel.S b/arch/mips/kernel/relocate_kernel.S
index 419c921..da281c5 100644
--- a/arch/mips/kernel/relocate_kernel.S
+++ b/arch/mips/kernel/relocate_kernel.S
@@ -135,6 +135,32 @@ LEAF(kexec_smp_wait)
 #else
 	sync
 #endif
+
+#ifdef CONFIG_CPU_LOONGSON3
+	/* s0:prid s1:initfn */
+	/* t0:base t1:cpuid t2:node t9:count */
+	mfc0  t1, CP0_EBASE
+	andi  t1, MIPS_EBASE_CPUNUM
+	dli   t0, 0x900000003ff01000 /* mailbox base */
+	dins  t0, t1, 8, 2        /* insert core id*/
+	dext  t2, t1, 2, 2
+	dins  t0, t2, 44, 2       /* insert node id */
+	mfc0  s0, CP0_PRID
+	andi  s0, s0, 0xf
+	blt   s0, 0x6, 1f         /* Loongson-3A1000 */
+	bgt   s0, 0x7, 1f         /* Loongson-3A2000/3A3000 */
+	dins  t0, t2, 14, 2       /* Loongson-3B1000/3B1500 need bit 15~14 */
+1:	li    t9, 0x100           /* wait for init loop */
+2:	addiu t9, -1              /* limit mailbox access */
+	bnez  t9, 2b
+	ld    s1, 0x20(t0)        /* get PC via mailbox */
+	beqz  s1, 1b
+	ld    sp, 0x28(t0)        /* get SP via mailbox */
+	ld    gp, 0x30(t0)        /* get GP via mailbox */
+	ld    a1, 0x38(t0)
+	jr    s1                  /* jump to initial PC */
+#endif
+
 	j		s1
 	END(kexec_smp_wait)
 #endif
diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c
index 2928ac5..4382e70 100644
--- a/arch/mips/loongson64/common/env.c
+++ b/arch/mips/loongson64/common/env.c
@@ -149,6 +149,13 @@ void __init prom_init_env(void)
 	loongson_sysconf.nr_cpus = ecpu->nr_cpus;
 	loongson_sysconf.boot_cpu_id = ecpu->cpu_startup_core_id;
 	loongson_sysconf.reserved_cpus_mask = ecpu->reserved_cores_mask;
+#ifdef CONFIG_KEXEC
+	loongson_sysconf.boot_cpu_id = get_ebase_cpunum();
+	loongson_sysconf.reserved_cpus_mask |=
+		(1 << loongson_sysconf.boot_cpu_id) - 1;
+	pr_info("Boot CPU ID is being fixed from %d to %d\n",
+		ecpu->cpu_startup_core_id, loongson_sysconf.boot_cpu_id);
+#endif
 	if (ecpu->nr_cpus > NR_CPUS || ecpu->nr_cpus == 0)
 		loongson_sysconf.nr_cpus = NR_CPUS;
 	loongson_sysconf.nr_nodes = (loongson_sysconf.nr_cpus +
diff --git a/arch/mips/loongson64/common/reset.c b/arch/mips/loongson64/common/reset.c
index a60715e..3d4a190 100644
--- a/arch/mips/loongson64/common/reset.c
+++ b/arch/mips/loongson64/common/reset.c
@@ -9,9 +9,14 @@
  * Copyright (C) 2009 Lemote, Inc.
  * Author: Zhangjin Wu, wuzhangjin@gmail.com
  */
+#include <linux/cpu.h>
+#include <linux/delay.h>
 #include <linux/init.h>
+#include <linux/kexec.h>
 #include <linux/pm.h>
+#include <linux/slab.h>
 
+#include <asm/bootinfo.h>
 #include <asm/idle.h>
 #include <asm/reboot.h>
 
@@ -80,12 +85,127 @@ static void loongson_halt(void)
 	}
 }
 
+#ifdef CONFIG_KEXEC
+
+/* 0X80000000~0X80200000 is safe */
+#define MAX_ARGS	64
+#define KEXEC_CTRL_CODE	0xFFFFFFFF80100000UL
+#define KEXEC_ARGV_ADDR	0xFFFFFFFF80108000UL
+#define KEXEC_ARGV_SIZE	3060
+#define KEXEC_ENVP_SIZE	4500
+
+void *kexec_argv;
+void *kexec_envp;
+
+static int loongson_kexec_prepare(struct kimage *image)
+{
+	int i, argc = 0;
+	unsigned int *argv;
+	char *str, *ptr, *bootloader = "kexec";
+
+	/* argv at offset 0, argv[] at offset KEXEC_ARGV_SIZE/2 */
+	argv = (unsigned int *)kexec_argv;
+	argv[argc++] = (unsigned int)(KEXEC_ARGV_ADDR + KEXEC_ARGV_SIZE/2);
+
+	for (i = 0; i < image->nr_segments; i++) {
+		if (!strncmp(bootloader, (char *)image->segment[i].buf,
+				strlen(bootloader))) {
+			/*
+			 * convert command line string to array
+			 * of parameters (as bootloader does).
+			 */
+			int offt;
+			memcpy(kexec_argv + KEXEC_ARGV_SIZE/2,
+				image->segment[i].buf, KEXEC_ARGV_SIZE/2);
+			str = (char *)kexec_argv + KEXEC_ARGV_SIZE/2;
+			ptr = strchr(str, ' ');
+
+			while (ptr && (argc < MAX_ARGS)) {
+				*ptr = '\0';
+				if (ptr[1] != ' ') {
+					offt = (int)(ptr - str + 1);
+					argv[argc] = KEXEC_ARGV_ADDR +
+						KEXEC_ARGV_SIZE/2 + offt;
+					argc++;
+				}
+				ptr = strchr(ptr + 1, ' ');
+			}
+			break;
+		}
+	}
+
+	kexec_args[0] = argc;
+	kexec_args[1] = fw_arg1;
+	kexec_args[2] = fw_arg2;
+	image->control_code_page = virt_to_page((void *)KEXEC_CTRL_CODE);
+
+	return 0;
+}
+
+#ifdef CONFIG_SMP
+static void kexec_smp_down(void *ignored)
+{
+	int cpu = smp_processor_id();
+
+	local_irq_disable();
+	set_cpu_online(cpu, false);
+	while (!atomic_read(&kexec_ready_to_reboot))
+		cpu_relax();
+
+	asm volatile (
+	"	sync					\n"
+	"	synci	($0)				\n");
+
+	relocated_kexec_smp_wait(NULL);
+}
+#endif
+
+static void loongson_kexec_shutdown(void)
+{
+#ifdef CONFIG_SMP
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		if (!cpu_online(cpu))
+			cpu_up(cpu); /* All cpus go to reboot_code_buffer */
+
+	smp_call_function(kexec_smp_down, NULL, 0);
+	smp_wmb();
+	while (num_online_cpus() > 1) {
+		mdelay(1);
+		cpu_relax();
+	}
+#endif
+	memcpy((void *)fw_arg1, kexec_argv, KEXEC_ARGV_SIZE);
+	memcpy((void *)fw_arg2, kexec_envp, KEXEC_ENVP_SIZE);
+}
+
+static void loongson_crash_shutdown(struct pt_regs *regs)
+{
+	default_machine_crash_shutdown(regs);
+	memcpy((void *)fw_arg1, kexec_argv, KEXEC_ARGV_SIZE);
+	memcpy((void *)fw_arg2, kexec_envp, KEXEC_ENVP_SIZE);
+}
+
+#endif
+
 static int __init mips_reboot_setup(void)
 {
 	_machine_restart = loongson_restart;
 	_machine_halt = loongson_halt;
 	pm_power_off = loongson_poweroff;
 
+#ifdef CONFIG_KEXEC
+	kexec_argv = kmalloc(KEXEC_ARGV_SIZE, GFP_KERNEL);
+	kexec_envp = kmalloc(KEXEC_ENVP_SIZE, GFP_KERNEL);
+	fw_arg1 = KEXEC_ARGV_ADDR;
+	memcpy(kexec_envp, (void *)fw_arg2, KEXEC_ENVP_SIZE);
+
+	_machine_kexec_prepare = loongson_kexec_prepare;
+	_machine_kexec_shutdown = loongson_kexec_shutdown;
+	_machine_crash_shutdown = loongson_crash_shutdown;
+#endif
+
 	return 0;
 }
 
diff --git a/arch/mips/loongson64/loongson-3/smp.c b/arch/mips/loongson64/loongson-3/smp.c
index 470e9c1..594fa17 100644
--- a/arch/mips/loongson64/loongson-3/smp.c
+++ b/arch/mips/loongson64/loongson-3/smp.c
@@ -387,6 +387,11 @@ static void __init loongson3_smp_setup(void)
 	ipi_status0_regs_init();
 	ipi_en0_regs_init();
 	ipi_mailbox_buf_init();
+
+	/* BIOS clear the mailbox, but KEXEC bypass BIOS so clear here */
+	for (i = 0; i < loongson_sysconf.nr_cpus; i++)
+		loongson3_ipi_write64(0, (void *)(ipi_mailbox_buf[i]+0x0));
+
 	cpu_set_core(&cpu_data[0],
 		     cpu_logical_map(0) % loongson_sysconf.cores_per_package);
 	cpu_data[0].package = cpu_logical_map(0) / loongson_sysconf.cores_per_package;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 08/10] MIPS: Loongson-3: Fix CPU UART irq delivery problem
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (6 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 07/10] MIPS: Loongson64: Add kexec/kdump support Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 09/10] MIPS: Loongson-3: Fix BRIDGE " Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 10/10] MIPS: Loongson: Introduce and use WAR_LLSC_MB Huacai Chen
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen, stable

Masking/unmasking the CPU UART irq in CP0_Status (and redirecting it to
other CPUs) may cause interrupts be lost, especially in multi-package
machines (Package-0's UART irq cannot be delivered to others). So make
mask_loongson_irq() and unmask_loongson_irq() be no-ops.

The original problem (UART IRQ may deliver to any core) is also because
of masking/unmasking the CPU UART irq in CP0_Status. So it is safe to
remove all of the stuff.

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/loongson64/loongson-3/irq.c | 43 +++--------------------------------
 1 file changed, 3 insertions(+), 40 deletions(-)

diff --git a/arch/mips/loongson64/loongson-3/irq.c b/arch/mips/loongson64/loongson-3/irq.c
index cbeb20f..2e115ab 100644
--- a/arch/mips/loongson64/loongson-3/irq.c
+++ b/arch/mips/loongson64/loongson-3/irq.c
@@ -102,45 +102,8 @@ static struct irqaction cascade_irqaction = {
 	.name = "cascade",
 };
 
-static inline void mask_loongson_irq(struct irq_data *d)
-{
-	clear_c0_status(0x100 << (d->irq - MIPS_CPU_IRQ_BASE));
-	irq_disable_hazard();
-
-	/* Workaround: UART IRQ may deliver to any core */
-	if (d->irq == LOONGSON_UART_IRQ) {
-		int cpu = smp_processor_id();
-		int node_id = cpu_logical_map(cpu) / loongson_sysconf.cores_per_node;
-		int core_id = cpu_logical_map(cpu) % loongson_sysconf.cores_per_node;
-		u64 intenclr_addr = smp_group[node_id] |
-			(u64)(&LOONGSON_INT_ROUTER_INTENCLR);
-		u64 introuter_lpc_addr = smp_group[node_id] |
-			(u64)(&LOONGSON_INT_ROUTER_LPC);
-
-		*(volatile u32 *)intenclr_addr = 1 << 10;
-		*(volatile u8 *)introuter_lpc_addr = 0x10 + (1<<core_id);
-	}
-}
-
-static inline void unmask_loongson_irq(struct irq_data *d)
-{
-	/* Workaround: UART IRQ may deliver to any core */
-	if (d->irq == LOONGSON_UART_IRQ) {
-		int cpu = smp_processor_id();
-		int node_id = cpu_logical_map(cpu) / loongson_sysconf.cores_per_node;
-		int core_id = cpu_logical_map(cpu) % loongson_sysconf.cores_per_node;
-		u64 intenset_addr = smp_group[node_id] |
-			(u64)(&LOONGSON_INT_ROUTER_INTENSET);
-		u64 introuter_lpc_addr = smp_group[node_id] |
-			(u64)(&LOONGSON_INT_ROUTER_LPC);
-
-		*(volatile u32 *)intenset_addr = 1 << 10;
-		*(volatile u8 *)introuter_lpc_addr = 0x10 + (1<<core_id);
-	}
-
-	set_c0_status(0x100 << (d->irq - MIPS_CPU_IRQ_BASE));
-	irq_enable_hazard();
-}
+static inline void mask_loongson_irq(struct irq_data *d) { }
+static inline void unmask_loongson_irq(struct irq_data *d) { }
 
  /* For MIPS IRQs which shared by all cores */
 static struct irq_chip loongson_irq_chip = {
@@ -183,7 +146,7 @@ void __init mach_init_irq(void)
 	chip->irq_set_affinity = plat_set_irq_affinity;
 
 	irq_set_chip_and_handler(LOONGSON_UART_IRQ,
-			&loongson_irq_chip, handle_level_irq);
+			&loongson_irq_chip, handle_percpu_irq);
 
 	/* setup HT1 irq */
 	setup_irq(LOONGSON_HT1_IRQ, &cascade_irqaction);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 09/10] MIPS: Loongson-3: Fix BRIDGE irq delivery problem
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (7 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 08/10] MIPS: Loongson-3: Fix CPU UART irq delivery problem Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  2018-09-05  9:33 ` [PATCH V4 10/10] MIPS: Loongson: Introduce and use WAR_LLSC_MB Huacai Chen
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen, stable

After commit e509bd7da149dc349160 ("genirq: Allow migration of chained
interrupts by installing default action") Loongson-3 fails at here:

setup_irq(LOONGSON_HT1_IRQ, &cascade_irqaction);

This is because both chained_action and cascade_irqaction don't have
IRQF_SHARED flag. This will cause Loongson-3 resume fails because HPET
timer interrupt can't be delivered during S3. So we set the irqchip of
the chained irq to loongson_irq_chip which doesn't disable the chained
irq in CP0.Status.

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/mach-loongson64/irq.h |  2 +-
 arch/mips/loongson64/loongson-3/irq.c       | 13 +++----------
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/mach-loongson64/irq.h b/arch/mips/include/asm/mach-loongson64/irq.h
index 3644b68..be9f727 100644
--- a/arch/mips/include/asm/mach-loongson64/irq.h
+++ b/arch/mips/include/asm/mach-loongson64/irq.h
@@ -10,7 +10,7 @@
 #define MIPS_CPU_IRQ_BASE 56
 
 #define LOONGSON_UART_IRQ   (MIPS_CPU_IRQ_BASE + 2) /* UART */
-#define LOONGSON_HT1_IRQ    (MIPS_CPU_IRQ_BASE + 3) /* HT1 */
+#define LOONGSON_BRIDGE_IRQ (MIPS_CPU_IRQ_BASE + 3) /* CASCADE */
 #define LOONGSON_TIMER_IRQ  (MIPS_CPU_IRQ_BASE + 7) /* CPU Timer */
 
 #define LOONGSON_HT1_CFG_BASE		loongson_sysconf.ht_control_base
diff --git a/arch/mips/loongson64/loongson-3/irq.c b/arch/mips/loongson64/loongson-3/irq.c
index 2e115ab..5605061 100644
--- a/arch/mips/loongson64/loongson-3/irq.c
+++ b/arch/mips/loongson64/loongson-3/irq.c
@@ -96,12 +96,6 @@ void mach_irq_dispatch(unsigned int pending)
 	}
 }
 
-static struct irqaction cascade_irqaction = {
-	.handler = no_action,
-	.flags = IRQF_NO_SUSPEND,
-	.name = "cascade",
-};
-
 static inline void mask_loongson_irq(struct irq_data *d) { }
 static inline void unmask_loongson_irq(struct irq_data *d) { }
 
@@ -147,11 +141,10 @@ void __init mach_init_irq(void)
 
 	irq_set_chip_and_handler(LOONGSON_UART_IRQ,
 			&loongson_irq_chip, handle_percpu_irq);
+	irq_set_chip_and_handler(LOONGSON_BRIDGE_IRQ,
+			&loongson_irq_chip, handle_percpu_irq);
 
-	/* setup HT1 irq */
-	setup_irq(LOONGSON_HT1_IRQ, &cascade_irqaction);
-
-	set_c0_status(STATUSF_IP2 | STATUSF_IP6);
+	set_c0_status(STATUSF_IP2 | STATUSF_IP3 | STATUSF_IP6);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V4 10/10] MIPS: Loongson: Introduce and use WAR_LLSC_MB
  2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
                   ` (8 preceding siblings ...)
  2018-09-05  9:33 ` [PATCH V4 09/10] MIPS: Loongson-3: Fix BRIDGE " Huacai Chen
@ 2018-09-05  9:33 ` Huacai Chen
  9 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-05  9:33 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Paul Burton, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, Huacai Chen

On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
lld/scd is very weak ordering. We should add sync instructions before
each ll/lld and after the last sc/scd to workaround. Otherwise, this
flaw will cause deadlock occationally (e.g. when doing heavy load test
with LTP).

This patch is not a minimal change (it is very difficult to make a
minimal change), but it is a safest change.

Why disable fix-loongson3-llsc in compiler?
Because compiler fix will cause problems in kernel's .fixup section.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/atomic.h  | 36 ++++++++++++++++++++++++++++--------
 arch/mips/include/asm/barrier.h |  6 ++++++
 arch/mips/include/asm/bitops.h  | 15 +++++++++++++++
 arch/mips/include/asm/cmpxchg.h |  9 +++++++--
 arch/mips/include/asm/edac.h    |  5 ++++-
 arch/mips/include/asm/futex.h   | 18 ++++++++++++------
 arch/mips/include/asm/local.h   | 10 ++++++++--
 arch/mips/include/asm/pgtable.h |  5 ++++-
 arch/mips/kernel/syscall.c      |  2 ++
 arch/mips/loongson64/Platform   |  3 +++
 arch/mips/mm/tlbex.c            | 11 +++++++++++
 11 files changed, 100 insertions(+), 20 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index d4ea7a5..9e91641 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -60,13 +60,16 @@ static __inline__ void atomic_##op(int i, atomic_t * v)			      \
 									      \
 		__asm__ __volatile__(					      \
 		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	ll	%0, %1		# atomic_" #op "	\n"   \
+		"1:				# atomic_" #op "	\n"   \
+		__WAR_LLSC_MB						      \
+		"	ll	%0, %1					\n"   \
 		"	" #asm_op " %0, %2				\n"   \
 		"	sc	%0, %1					\n"   \
 		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
 		: "Ir" (i));						      \
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");	      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -86,7 +89,9 @@ static __inline__ int atomic_##op##_return_relaxed(int i, atomic_t * v)	      \
 									      \
 		__asm__ __volatile__(					      \
 		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	ll	%1, %2		# atomic_" #op "_return	\n"   \
+		"1:				# atomic_" #op "_return	\n"   \
+		__WAR_LLSC_MB						      \
+		"	ll	%1, %2					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	sc	%0, %2					\n"   \
 		"\t" __scbeqz "	%0, 1b					\n"   \
@@ -118,7 +123,9 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	      \
 									      \
 		__asm__ __volatile__(					      \
 		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	ll	%1, %2		# atomic_fetch_" #op "	\n"   \
+		"1:				# atomic_fetch_" #op "	\n"   \
+		__WAR_LLSC_MB						      \
+		"	ll	%1, %2					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	sc	%0, %2					\n"   \
 		"\t" __scbeqz "	%0, 1b					\n"   \
@@ -127,6 +134,7 @@ static __inline__ int atomic_fetch_##op##_relaxed(int i, atomic_t * v)	      \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
 		: "Ir" (i));						      \
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");	      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -189,7 +197,9 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v)
 
 		__asm__ __volatile__(
 		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"1:	ll	%1, %2		# atomic_sub_if_positive\n"
+		"1:				# atomic_sub_if_positive\n"
+		__WAR_LLSC_MB
+		"	ll	%1, %2					\n"
 		"	.set	mips0					\n"
 		"	subu	%0, %1, %3				\n"
 		"	move	%1, %0					\n"
@@ -253,13 +263,16 @@ static __inline__ void atomic64_##op(long i, atomic64_t * v)		      \
 									      \
 		__asm__ __volatile__(					      \
 		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	lld	%0, %1		# atomic64_" #op "	\n"   \
+		"1:				# atomic64_" #op "	\n"   \
+		__WAR_LLSC_MB						      \
+		"	lld	%0, %1					\n"   \
 		"	" #asm_op " %0, %2				\n"   \
 		"	scd	%0, %1					\n"   \
 		"\t" __scbeqz "	%0, 1b					\n"   \
 		"	.set	mips0					\n"   \
 		: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (v->counter)	      \
 		: "Ir" (i));						      \
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");	      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -279,7 +292,9 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \
 									      \
 		__asm__ __volatile__(					      \
 		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	lld	%1, %2		# atomic64_" #op "_return\n"  \
+		"1:				# atomic64_" #op "_return\n"  \
+		__WAR_LLSC_MB						      \
+		"	lld	%1, %2					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	scd	%0, %2					\n"   \
 		"\t" __scbeqz "	%0, 1b					\n"   \
@@ -311,7 +326,9 @@ static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v)  \
 									      \
 		__asm__ __volatile__(					      \
 		"	.set	"MIPS_ISA_LEVEL"			\n"   \
-		"1:	lld	%1, %2		# atomic64_fetch_" #op "\n"   \
+		"1:				# atomic64_fetch_" #op "\n"   \
+		__WAR_LLSC_MB						      \
+		"	lld	%1, %2					\n"   \
 		"	" #asm_op " %0, %1, %3				\n"   \
 		"	scd	%0, %2					\n"   \
 		"\t" __scbeqz "	%0, 1b					\n"   \
@@ -320,6 +337,7 @@ static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v)  \
 		: "=&r" (result), "=&r" (temp),				      \
 		  "+" GCC_OFF_SMALL_ASM() (v->counter)			      \
 		: "Ir" (i));						      \
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");	      \
 	} else {							      \
 		unsigned long flags;					      \
 									      \
@@ -383,7 +401,9 @@ static __inline__ long atomic64_sub_if_positive(long i, atomic64_t * v)
 
 		__asm__ __volatile__(
 		"	.set	"MIPS_ISA_LEVEL"			\n"
-		"1:	lld	%1, %2		# atomic64_sub_if_positive\n"
+		"1:				# atomic64_sub_if_positive\n"
+		__WAR_LLSC_MB
+		"	lld	%1, %2					\n"
 		"	dsubu	%0, %1, %3				\n"
 		"	move	%1, %0					\n"
 		"	bltz	%0, 1f					\n"
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index a5eb1bb..d892094 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -203,6 +203,12 @@
 #define __WEAK_LLSC_MB		"		\n"
 #endif
 
+#if defined(CONFIG_LOONGSON3) && defined(CONFIG_SMP) /* Loongson-3's LLSC workaround */
+#define __WAR_LLSC_MB		"	sync	\n"
+#else
+#define __WAR_LLSC_MB		"		\n"
+#endif
+
 #define smp_llsc_mb()	__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index da1b8718..0612cff 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -70,17 +70,20 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
 		do {
 			__asm__ __volatile__(
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1		# set_bit	\n"
 			"	" __INS "%0, %3, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 			: "ir" (bit), "r" (~0));
 		} while (unlikely(!temp));
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 	} else if (kernel_uses_llsc) {
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1		# set_bit	\n"
 			"	or	%0, %2				\n"
 			"	" __SC	"%0, %1				\n"
@@ -88,6 +91,7 @@ static inline void set_bit(unsigned long nr, volatile unsigned long *addr)
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 			: "ir" (1UL << bit));
 		} while (unlikely(!temp));
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 	} else
 		__mips_set_bit(nr, addr);
 }
@@ -122,17 +126,20 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 	} else if (kernel_uses_llsc && __builtin_constant_p(bit)) {
 		do {
 			__asm__ __volatile__(
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1		# clear_bit	\n"
 			"	" __INS "%0, $0, %2, 1			\n"
 			"	" __SC "%0, %1				\n"
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 			: "ir" (bit));
 		} while (unlikely(!temp));
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 #endif /* CONFIG_CPU_MIPSR2 || CONFIG_CPU_MIPSR6 */
 	} else if (kernel_uses_llsc) {
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1		# clear_bit	\n"
 			"	and	%0, %2				\n"
 			"	" __SC "%0, %1				\n"
@@ -140,6 +147,7 @@ static inline void clear_bit(unsigned long nr, volatile unsigned long *addr)
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 			: "ir" (~(1UL << bit)));
 		} while (unlikely(!temp));
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 	} else
 		__mips_clear_bit(nr, addr);
 }
@@ -191,6 +199,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1		# change_bit	\n"
 			"	xor	%0, %2				\n"
 			"	" __SC	"%0, %1				\n"
@@ -198,6 +207,7 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr)
 			: "=&r" (temp), "+" GCC_OFF_SMALL_ASM() (*m)
 			: "ir" (1UL << bit));
 		} while (unlikely(!temp));
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 	} else
 		__mips_change_bit(nr, addr);
 }
@@ -240,6 +250,7 @@ static inline int test_and_set_bit(unsigned long nr,
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1	# test_and_set_bit	\n"
 			"	or	%2, %0, %3			\n"
 			"	" __SC	"%2, %1				\n"
@@ -294,6 +305,7 @@ static inline int test_and_set_bit_lock(unsigned long nr,
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL "%0, %1	# test_and_set_bit	\n"
 			"	or	%2, %0, %3			\n"
 			"	" __SC	"%2, %1				\n"
@@ -350,6 +362,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 
 		do {
 			__asm__ __volatile__(
+			__WAR_LLSC_MB
 			"	" __LL	"%0, %1 # test_and_clear_bit	\n"
 			"	" __EXT "%2, %0, %3, 1			\n"
 			"	" __INS "%0, $0, %3, 1			\n"
@@ -366,6 +379,7 @@ static inline int test_and_clear_bit(unsigned long nr,
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL	"%0, %1 # test_and_clear_bit	\n"
 			"	or	%2, %0, %3			\n"
 			"	xor	%2, %3				\n"
@@ -423,6 +437,7 @@ static inline int test_and_change_bit(unsigned long nr,
 		do {
 			__asm__ __volatile__(
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
+			__WAR_LLSC_MB
 			"	" __LL	"%0, %1 # test_and_change_bit	\n"
 			"	xor	%2, %0, %3			\n"
 			"	" __SC	"\t%2, %1			\n"
diff --git a/arch/mips/include/asm/cmpxchg.h b/arch/mips/include/asm/cmpxchg.h
index 89e9fb7..1f9c091 100644
--- a/arch/mips/include/asm/cmpxchg.h
+++ b/arch/mips/include/asm/cmpxchg.h
@@ -48,7 +48,9 @@ extern unsigned long __xchg_called_with_bad_pointer(void)
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
 		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
-		"1:	" ld "	%0, %2		# __xchg_asm	\n"	\
+		"1:				# __xchg_asm	\n"	\
+		__WAR_LLSC_MB						\
+		"	" ld "	%0, %2				\n"	\
 		"	.set	mips0				\n"	\
 		"	move	$1, %z3				\n"	\
 		"	.set	" MIPS_ISA_ARCH_LEVEL "		\n"	\
@@ -118,7 +120,9 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
-		"1:	" ld "	%0, %2		# __cmpxchg_asm \n"	\
+		"1:				# __cmpxchg_asm \n"	\
+		__WAR_LLSC_MB						\
+		"	" ld "	%0, %2				\n"	\
 		"	bne	%0, %z3, 2f			\n"	\
 		"	.set	mips0				\n"	\
 		"	move	$1, %z4				\n"	\
@@ -127,6 +131,7 @@ static inline unsigned long __xchg(volatile void *ptr, unsigned long x,
 		"\t" __scbeqz "	$1, 1b				\n"	\
 		"	.set	pop				\n"	\
 		"2:						\n"	\
+		__WAR_LLSC_MB						\
 		: "=&r" (__ret), "=" GCC_OFF_SMALL_ASM() (*m)		\
 		: GCC_OFF_SMALL_ASM() (*m), "Jr" (old), "Jr" (new)		\
 		: "memory");						\
diff --git a/arch/mips/include/asm/edac.h b/arch/mips/include/asm/edac.h
index fc46776..b07a3a9 100644
--- a/arch/mips/include/asm/edac.h
+++ b/arch/mips/include/asm/edac.h
@@ -22,13 +22,16 @@ static inline void edac_atomic_scrub(void *va, u32 size)
 
 		__asm__ __volatile__ (
 		"	.set	mips2					\n"
-		"1:	ll	%0, %1		# edac_atomic_scrub	\n"
+		"1:				# edac_atomic_scrub	\n"
+		__WAR_LLSC_MB
+		"	ll	%0, %1					\n"
 		"	addu	%0, $0					\n"
 		"	sc	%0, %1					\n"
 		"	beqz	%0, 1b					\n"
 		"	.set	mips0					\n"
 		: "=&r" (temp), "=" GCC_OFF_SMALL_ASM() (*virt_addr)
 		: GCC_OFF_SMALL_ASM() (*virt_addr));
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 
 		virt_addr++;
 	}
diff --git a/arch/mips/include/asm/futex.h b/arch/mips/include/asm/futex.h
index a9e61ea..d497ee9 100644
--- a/arch/mips/include/asm/futex.h
+++ b/arch/mips/include/asm/futex.h
@@ -54,7 +54,9 @@
 		"	.set	push				\n"	\
 		"	.set	noat				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
-		"1:	"user_ll("%1", "%4")" # __futex_atomic_op\n"	\
+		"1:				 # __futex_atomic_op\n"	\
+		__WAR_LLSC_MB						\
+		"	"user_ll("%1", "%4")"			\n"	\
 		"	.set	mips0				\n"	\
 		"	" insn	"				\n"	\
 		"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"	\
@@ -70,8 +72,9 @@
 		"	j	3b				\n"	\
 		"	.previous				\n"	\
 		"	.section __ex_table,\"a\"		\n"	\
-		"	"__UA_ADDR "\t1b, 4b			\n"	\
-		"	"__UA_ADDR "\t2b, 4b			\n"	\
+		"	"__UA_ADDR "\t(1b + 0), 4b		\n"	\
+		"	"__UA_ADDR "\t(1b + 4), 4b		\n"	\
+		"	"__UA_ADDR "\t(2b + 0), 4b		\n"	\
 		"	.previous				\n"	\
 		: "=r" (ret), "=&r" (oldval),				\
 		  "=" GCC_OFF_SMALL_ASM() (*uaddr)				\
@@ -167,7 +170,9 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 		"	.set	push					\n"
 		"	.set	noat					\n"
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
-		"1:	"user_ll("%1", "%3")"				\n"
+		"1:							\n"
+		__WAR_LLSC_MB
+		"	"user_ll("%1", "%3")"				\n"
 		"	bne	%1, %z4, 3f				\n"
 		"	.set	mips0					\n"
 		"	move	$1, %z5					\n"
@@ -183,8 +188,9 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 		"	j	3b					\n"
 		"	.previous					\n"
 		"	.section __ex_table,\"a\"			\n"
-		"	"__UA_ADDR "\t1b, 4b				\n"
-		"	"__UA_ADDR "\t2b, 4b				\n"
+		"	"__UA_ADDR "\t(1b + 0), 4b			\n"
+		"	"__UA_ADDR "\t(1b + 4), 4b			\n"
+		"	"__UA_ADDR "\t(2b + 0), 4b			\n"
 		"	.previous					\n"
 		: "+r" (ret), "=&r" (val), "=" GCC_OFF_SMALL_ASM() (*uaddr)
 		: GCC_OFF_SMALL_ASM() (*uaddr), "Jr" (oldval), "Jr" (newval),
diff --git a/arch/mips/include/asm/local.h b/arch/mips/include/asm/local.h
index ac8264e..4105b7a 100644
--- a/arch/mips/include/asm/local.h
+++ b/arch/mips/include/asm/local.h
@@ -50,7 +50,9 @@ static __inline__ long local_add_return(long i, local_t * l)
 
 		__asm__ __volatile__(
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
-		"1:"	__LL	"%1, %2		# local_add_return	\n"
+		"1:				# local_add_return	\n"
+			__WAR_LLSC_MB
+			__LL	"%1, %2					\n"
 		"	addu	%0, %1, %3				\n"
 			__SC	"%0, %2					\n"
 		"	beqz	%0, 1b					\n"
@@ -59,6 +61,7 @@ static __inline__ long local_add_return(long i, local_t * l)
 		: "=&r" (result), "=&r" (temp), "=m" (l->a.counter)
 		: "Ir" (i), "m" (l->a.counter)
 		: "memory");
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 	} else {
 		unsigned long flags;
 
@@ -95,7 +98,9 @@ static __inline__ long local_sub_return(long i, local_t * l)
 
 		__asm__ __volatile__(
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
-		"1:"	__LL	"%1, %2		# local_sub_return	\n"
+		"1:				# local_sub_return	\n"
+			__WAR_LLSC_MB
+			__LL	"%1, %2					\n"
 		"	subu	%0, %1, %3				\n"
 			__SC	"%0, %2					\n"
 		"	beqz	%0, 1b					\n"
@@ -104,6 +109,7 @@ static __inline__ long local_sub_return(long i, local_t * l)
 		: "=&r" (result), "=&r" (temp), "=m" (l->a.counter)
 		: "Ir" (i), "m" (l->a.counter)
 		: "memory");
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 	} else {
 		unsigned long flags;
 
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 129e032..e189384 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -233,7 +233,9 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 			"	.set	"MIPS_ISA_ARCH_LEVEL"		\n"
 			"	.set	push				\n"
 			"	.set	noreorder			\n"
-			"1:"	__LL	"%[tmp], %[buddy]		\n"
+			"1:						\n"
+				__WAR_LLSC_MB
+				__LL	"%[tmp], %[buddy]		\n"
 			"	bnez	%[tmp], 2f			\n"
 			"	 or	%[tmp], %[tmp], %[global]	\n"
 				__SC	"%[tmp], %[buddy]		\n"
@@ -244,6 +246,7 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 			"	.set	mips0				\n"
 			: [buddy] "+m" (buddy->pte), [tmp] "=&r" (tmp)
 			: [global] "r" (page_global));
+			__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 		}
 #else /* !CONFIG_SMP */
 		if (pte_none(*buddy))
diff --git a/arch/mips/kernel/syscall.c b/arch/mips/kernel/syscall.c
index 69c17b5..743714a 100644
--- a/arch/mips/kernel/syscall.c
+++ b/arch/mips/kernel/syscall.c
@@ -135,6 +135,7 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new)
 		"	.set	"MIPS_ISA_ARCH_LEVEL"			\n"
 		"	li	%[err], 0				\n"
 		"1:							\n"
+		__WAR_LLSC_MB
 		user_ll("%[old]", "(%[addr])")
 		"	move	%[tmp], %[new]				\n"
 		"2:							\n"
@@ -158,6 +159,7 @@ static inline int mips_atomic_set(unsigned long addr, unsigned long new)
 		  [new] "r" (new),
 		  [efault] "i" (-EFAULT)
 		: "memory");
+		__asm__ __volatile__(__WAR_LLSC_MB : : :"memory");
 	} else {
 		do {
 			preempt_disable();
diff --git a/arch/mips/loongson64/Platform b/arch/mips/loongson64/Platform
index 0fce460..3700dcf 100644
--- a/arch/mips/loongson64/Platform
+++ b/arch/mips/loongson64/Platform
@@ -23,6 +23,9 @@ ifdef CONFIG_CPU_LOONGSON2F_WORKAROUNDS
 endif
 
 cflags-$(CONFIG_CPU_LOONGSON3)	+= -Wa,--trap
+ifneq ($(call as-option,-Wa$(comma)-mfix-loongson3-llsc,),)
+  cflags-$(CONFIG_CPU_LOONGSON3) += -Wa$(comma)-mno-fix-loongson3-llsc
+endif
 #
 # binutils from v2.25 on and gcc starting from v4.9.0 treat -march=loongson3a
 # as MIPS64 R2; older versions as just R1.  This leaves the possibility open
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 0677142..74941b2 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -931,6 +931,8 @@ build_get_pgd_vmalloc64(u32 **p, struct uasm_label **l, struct uasm_reloc **r,
 		 * to mimic that here by taking a load/istream page
 		 * fault.
 		 */
+		if (current_cpu_type() == CPU_LOONGSON3)
+			uasm_i_sync(p, 0);
 		UASM_i_LA(p, ptr, (unsigned long)tlb_do_page_fault_0);
 		uasm_i_jr(p, ptr);
 
@@ -1555,6 +1557,7 @@ static void build_loongson3_tlb_refill_handler(void)
 
 	if (check_for_high_segbits) {
 		uasm_l_large_segbits_fault(&l, p);
+		uasm_i_sync(&p, 0);
 		UASM_i_LA(&p, K1, (unsigned long)tlb_do_page_fault_0);
 		uasm_i_jr(&p, K1);
 		uasm_i_nop(&p);
@@ -1645,6 +1648,8 @@ static void
 iPTE_LW(u32 **p, unsigned int pte, unsigned int ptr)
 {
 #ifdef CONFIG_SMP
+	if (current_cpu_type() == CPU_LOONGSON3)
+		uasm_i_sync(p, 0);
 # ifdef CONFIG_PHYS_ADDR_T_64BIT
 	if (cpu_has_64bits)
 		uasm_i_lld(p, pte, 0, ptr);
@@ -2258,6 +2263,8 @@ static void build_r4000_tlb_load_handler(void)
 #endif
 
 	uasm_l_nopage_tlbl(&l, p);
+	if (current_cpu_type() == CPU_LOONGSON3)
+		uasm_i_sync(&p, 0);
 	build_restore_work_registers(&p);
 #ifdef CONFIG_CPU_MICROMIPS
 	if ((unsigned long)tlb_do_page_fault_0 & 1) {
@@ -2312,6 +2319,8 @@ static void build_r4000_tlb_store_handler(void)
 #endif
 
 	uasm_l_nopage_tlbs(&l, p);
+	if (current_cpu_type() == CPU_LOONGSON3)
+		uasm_i_sync(&p, 0);
 	build_restore_work_registers(&p);
 #ifdef CONFIG_CPU_MICROMIPS
 	if ((unsigned long)tlb_do_page_fault_1 & 1) {
@@ -2367,6 +2376,8 @@ static void build_r4000_tlb_modify_handler(void)
 #endif
 
 	uasm_l_nopage_tlbm(&l, p);
+	if (current_cpu_type() == CPU_LOONGSON3)
+		uasm_i_sync(&p, 0);
 	build_restore_work_registers(&p);
 #ifdef CONFIG_CPU_MICROMIPS
 	if ((unsigned long)tlb_do_page_fault_1 & 1) {
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime
  2018-09-05  9:33 ` [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime Huacai Chen
@ 2018-09-05 16:54   ` Paul Burton
  2018-09-06  1:33     ` Huacai Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Burton @ 2018-09-05 16:54 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen

Hi Huacai,

On Wed, Sep 05, 2018 at 05:33:01PM +0800, Huacai Chen wrote:
> New Loongson-3 (Loongson-3A R2, Loongson-3A R3, and newer) has SFB
> (Store Fill Buffer) which can improve the performance of memory access.
> Now, SFB enablement is controlled by CONFIG_LOONGSON3_ENHANCEMENT, and
> the generic kernel has no benefit from SFB (even it is running on a new
> Loongson-3 machine). With this patch, we can enable SFB at runtime by
> detecting the CPU type (the expense is war_io_reorder_wmb() will always
> be a 'sync', which will hurt the performance of old Loongson-3).

This looks unchanged since v3, and I didn't see a response to my email
here:

    https://marc.info/?l=linux-mips&m=153248530725061&w=2

I still haven't seen any explanation for why you can't just do this as a
one-liner in C, the same way we enable tons of other CPU features during
cpu_probe().

I'm not saying I'll never accept the assembly version, but I want an
explanation for why it's necessary first. If that's not something you
can give, at least describe in the commit message what goes wrong when
you try to do it in C as justification for not doing it that way.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime
  2018-09-05 16:54   ` Paul Burton
@ 2018-09-06  1:33     ` Huacai Chen
  2018-09-18 23:24       ` Paul Burton
  0 siblings, 1 reply; 16+ messages in thread
From: Huacai Chen @ 2018-09-06  1:33 UTC (permalink / raw)
  To: Paul Burton
  Cc: Ralf Baechle, James Hogan, Linux MIPS Mailing List, Fuxin Zhang,
	Zhangjin Wu

On Thu, Sep 6, 2018 at 12:54 AM, Paul Burton <paul.burton@mips.com> wrote:
> Hi Huacai,
>
> On Wed, Sep 05, 2018 at 05:33:01PM +0800, Huacai Chen wrote:
>> New Loongson-3 (Loongson-3A R2, Loongson-3A R3, and newer) has SFB
>> (Store Fill Buffer) which can improve the performance of memory access.
>> Now, SFB enablement is controlled by CONFIG_LOONGSON3_ENHANCEMENT, and
>> the generic kernel has no benefit from SFB (even it is running on a new
>> Loongson-3 machine). With this patch, we can enable SFB at runtime by
>> detecting the CPU type (the expense is war_io_reorder_wmb() will always
>> be a 'sync', which will hurt the performance of old Loongson-3).
>
> This looks unchanged since v3, and I didn't see a response to my email
> here:
>
>     https://marc.info/?l=linux-mips&m=153248530725061&w=2
>
> I still haven't seen any explanation for why you can't just do this as a
> one-liner in C, the same way we enable tons of other CPU features during
> cpu_probe().
>
> I'm not saying I'll never accept the assembly version, but I want an
> explanation for why it's necessary first. If that's not something you
> can give, at least describe in the commit message what goes wrong when
> you try to do it in C as justification for not doing it that way.
In practise, I found that sometimes there are boot failures if I
enable SFB/LPA in cpu_probe(). I don't know why because processorr
designers also haven't give me an explaination, but I think this may
have some relationships to speculative execution.

Huacai

>
> Thanks,
>     Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime
  2018-09-06  1:33     ` Huacai Chen
@ 2018-09-18 23:24       ` Paul Burton
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Burton @ 2018-09-18 23:24 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, James Hogan, Linux MIPS Mailing List, Fuxin Zhang,
	Zhangjin Wu

Hi Huacai,

On Thu, Sep 06, 2018 at 09:33:15AM +0800, Huacai Chen wrote:
> On Thu, Sep 6, 2018 at 12:54 AM, Paul Burton <paul.burton@mips.com> wrote:
> > On Wed, Sep 05, 2018 at 05:33:01PM +0800, Huacai Chen wrote:
> >> New Loongson-3 (Loongson-3A R2, Loongson-3A R3, and newer) has SFB
> >> (Store Fill Buffer) which can improve the performance of memory access.
> >> Now, SFB enablement is controlled by CONFIG_LOONGSON3_ENHANCEMENT, and
> >> the generic kernel has no benefit from SFB (even it is running on a new
> >> Loongson-3 machine). With this patch, we can enable SFB at runtime by
> >> detecting the CPU type (the expense is war_io_reorder_wmb() will always
> >> be a 'sync', which will hurt the performance of old Loongson-3).
> >
> > This looks unchanged since v3, and I didn't see a response to my email
> > here:
> >
> >     https://marc.info/?l=linux-mips&m=153248530725061&w=2
> >
> > I still haven't seen any explanation for why you can't just do this as a
> > one-liner in C, the same way we enable tons of other CPU features during
> > cpu_probe().
> >
> > I'm not saying I'll never accept the assembly version, but I want an
> > explanation for why it's necessary first. If that's not something you
> > can give, at least describe in the commit message what goes wrong when
> > you try to do it in C as justification for not doing it that way.
>
> In practise, I found that sometimes there are boot failures if I
> enable SFB/LPA in cpu_probe(). I don't know why because processorr
> designers also haven't give me an explaination, but I think this may
> have some relationships to speculative execution.

OK, applied to mips-next for 4.20 with that noted in the commit message.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3
  2018-09-05  9:33 ` [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3 Huacai Chen
@ 2018-09-26 21:47   ` Paul Burton
  2018-09-27  7:25     ` Huacai Chen
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Burton @ 2018-09-26 21:47 UTC (permalink / raw)
  To: Huacai Chen, Christoph Hellwig, Marek Szyprowski, Robin Murphy
  Cc: Ralf Baechle, James Hogan, linux-mips, Fuxin Zhang, Zhangjin Wu,
	Huacai Chen, # 3 . 15+

Hi Huacai,

Copying DMA mapping maintainers for any input they may have.

On Wed, Sep 05, 2018 at 05:33:02PM +0800, Huacai Chen wrote:
>  static inline void local_r4k___flush_cache_all(void * args)
>  {
>  	switch (current_cpu_type()) {
>  	case CPU_LOONGSON2:
> -	case CPU_LOONGSON3:
>  	case CPU_R4000SC:
>  	case CPU_R4000MC:
>  	case CPU_R4400SC:
> @@ -480,6 +497,11 @@ static inline void local_r4k___flush_cache_all(void * args)
>  		r4k_blast_scache();
>  		break;
>  
> +	case CPU_LOONGSON3:
> +		/* Use get_ebase_cpunum() for both NUMA=y/n */
> +		r4k_blast_scache_node(get_ebase_cpunum() >> 2);
> +		break;
> +

I wonder if we could instead just include the node ID bits in
INDEX_BASE? Then we could continue using r4k_blast_scache() here as
usual.

>  	case CPU_BMIPS5000:
>  		r4k_blast_scache();
>  		__sync();
> @@ -840,10 +862,14 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size)
>  
>  	preempt_disable();
>  	if (cpu_has_inclusive_pcaches) {
> -		if (size >= scache_size)
> -			r4k_blast_scache();
> -		else
> +		if (size >= scache_size) {
> +			if (current_cpu_type() != CPU_LOONGSON3)
> +				r4k_blast_scache();
> +			else
> +				r4k_blast_scache_node(pa_to_nid(addr));
> +		} else {
>  			blast_scache_range(addr, addr + size);
> +		}
>  		preempt_enable();
>  		__sync();
>  		return;

Hmm, so if I understand correctly this will writeback+invalidate the L2
for one node only? ie. you just changed which node that is.

I'm presuming L2 ops performed in one node aren't broadcast to other
nodes, otherwise this patch is pointless?

Thus presumably L2 caches in other nodes may contain stale data, right?
Or even worse, dirty data which may get written back at any moment?

I'm not sure this is safe - do you need to operate on all L2 caches in
the system here?

I also wonder whether it would be cleaner for Loongson3 to provide a
custom struct dma_map_ops to implement this, rather than adding the
condition to the generic implementation.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3
  2018-09-26 21:47   ` Paul Burton
@ 2018-09-27  7:25     ` Huacai Chen
  0 siblings, 0 replies; 16+ messages in thread
From: Huacai Chen @ 2018-09-27  7:25 UTC (permalink / raw)
  To: Paul Burton
  Cc: Christoph Hellwig, Marek Szyprowski, Robin Murphy, Ralf Baechle,
	James Hogan, Linux MIPS Mailing List, zhangfx, wu zhangjin,
	stable

On Thu, Sep 27, 2018 at 5:47 AM Paul Burton <paul.burton@mips.com> wrote:
>
> Hi Huacai,
>
> Copying DMA mapping maintainers for any input they may have.
>
> On Wed, Sep 05, 2018 at 05:33:02PM +0800, Huacai Chen wrote:
> >  static inline void local_r4k___flush_cache_all(void * args)
> >  {
> >       switch (current_cpu_type()) {
> >       case CPU_LOONGSON2:
> > -     case CPU_LOONGSON3:
> >       case CPU_R4000SC:
> >       case CPU_R4000MC:
> >       case CPU_R4400SC:
> > @@ -480,6 +497,11 @@ static inline void local_r4k___flush_cache_all(void * args)
> >               r4k_blast_scache();
> >               break;
> >
> > +     case CPU_LOONGSON3:
> > +             /* Use get_ebase_cpunum() for both NUMA=y/n */
> > +             r4k_blast_scache_node(get_ebase_cpunum() >> 2);
> > +             break;
> > +
>
> I wonder if we could instead just include the node ID bits in
> INDEX_BASE? Then we could continue using r4k_blast_scache() here as
> usual.
Yes, but it has no advantages.

>
> >       case CPU_BMIPS5000:
> >               r4k_blast_scache();
> >               __sync();
> > @@ -840,10 +862,14 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size)
> >
> >       preempt_disable();
> >       if (cpu_has_inclusive_pcaches) {
> > -             if (size >= scache_size)
> > -                     r4k_blast_scache();
> > -             else
> > +             if (size >= scache_size) {
> > +                     if (current_cpu_type() != CPU_LOONGSON3)
> > +                             r4k_blast_scache();
> > +                     else
> > +                             r4k_blast_scache_node(pa_to_nid(addr));
> > +             } else {
> >                       blast_scache_range(addr, addr + size);
> > +             }
> >               preempt_enable();
> >               __sync();
> >               return;
>
> Hmm, so if I understand correctly this will writeback+invalidate the L2
> for one node only? ie. you just changed which node that is.
>
> I'm presuming L2 ops performed in one node aren't broadcast to other
> nodes, otherwise this patch is pointless?
>
> Thus presumably L2 caches in other nodes may contain stale data, right?
> Or even worse, dirty data which may get written back at any moment?
>
> I'm not sure this is safe - do you need to operate on all L2 caches in
> the system here?
>
> I also wonder whether it would be cleaner for Loongson3 to provide a
> custom struct dma_map_ops to implement this, rather than adding the
> condition to the generic implementation.
In Loongson-3, L2 cache is shared by all nodes, that means a memory
address has only one copy in L2 (node-0's memory only cached in
node-0's L2, node-1's memory only cached in node-1's memory. If node-0
want to access node-1's memory, it may hit in node-1's L2).

Huacai

>
> Thanks,
>     Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-09-27 13:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05  9:33 [PATCH V4 00/10] MIPS: Loongson: new features and improvements Huacai Chen
2018-09-05  9:33 ` [PATCH V4 01/10] MIPS: Loongson-3: Enable Store Fill Buffer at runtime Huacai Chen
2018-09-05 16:54   ` Paul Burton
2018-09-06  1:33     ` Huacai Chen
2018-09-18 23:24       ` Paul Burton
2018-09-05  9:33 ` [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3 Huacai Chen
2018-09-26 21:47   ` Paul Burton
2018-09-27  7:25     ` Huacai Chen
2018-09-05  9:33 ` [PATCH V4 03/10] MIPS: Ensure pmd_present() returns false after pmd_mknotpresent() Huacai Chen
2018-09-05  9:33 ` [PATCH V4 04/10] MIPS: Add __cpu_full_name[] to make CPU names more human-readable Huacai Chen
2018-09-05  9:33 ` [PATCH V4 05/10] MIPS: Align kernel load address to 64KB Huacai Chen
2018-09-05  9:33 ` [PATCH V4 06/10] MIPS: Reserve extra memory for crash dump Huacai Chen
2018-09-05  9:33 ` [PATCH V4 07/10] MIPS: Loongson64: Add kexec/kdump support Huacai Chen
2018-09-05  9:33 ` [PATCH V4 08/10] MIPS: Loongson-3: Fix CPU UART irq delivery problem Huacai Chen
2018-09-05  9:33 ` [PATCH V4 09/10] MIPS: Loongson-3: Fix BRIDGE " Huacai Chen
2018-09-05  9:33 ` [PATCH V4 10/10] MIPS: Loongson: Introduce and use WAR_LLSC_MB Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.