All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support
@ 2016-01-26 13:26 Huacai Chen
  2016-01-26 13:26 ` [PATCH 1/6] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

This patchset is is prepared for the next 4.6 release for Linux/MIPS.
It adds Loongson-3A R2 (Loongson-3A2000) support and fixes a potential
bug related to FTLB.

Loongson-3 CPU family:

Code-name       Brand-name       PRId
Loongson-3A R1  Loongson-3A1000  0x6305
Loongson-3A R2  Loongson-3A2000  0x6308
Loongson-3B R1  Loongson-3B1000  0x6306
Loongson-3B R2  Loongson-3B1500  0x6307

Features of R2 revision of Loongson-3A:
1, Primary cache includes I-Cache, D-Cache and V-Cache (Victim Cache).
2, I-Cache, D-Cache and V-Cache are 16-way set-associative, linesize is 64 Bytes.
3, 64 entries of VTLB (classic TLB), 1024 entries of FTLB (8-way set-associative).
4, Support DSP/DSPv2 instructions, UserLocal register and Read-Inhibit/Execute-Inhibit.

Huacai Chen(6):
 MIPS: Loongson: Add Loongson-3A R2 basic support.
 MIPS: Loongson: Invalidate special TLBs when needed.
 MIPS: Loongson-3: Fast TLB refill handler.
 MIPS: tlbex: Fix bugs in tlbchange handler.
 MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature.
 MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                  |  19 +++
 arch/mips/include/asm/cacheops.h                   |   6 +
 arch/mips/include/asm/cpu-features.h               |   6 +
 arch/mips/include/asm/cpu-info.h                   |   1 +
 arch/mips/include/asm/cpu.h                        |   6 +-
 arch/mips/include/asm/hazards.h                    |   7 +-
 arch/mips/include/asm/io.h                         |  10 +-
 arch/mips/include/asm/irqflags.h                   |   5 +
 .../asm/mach-loongson64/cpu-feature-overrides.h    |  13 +-
 .../asm/mach-loongson64/kernel-entry-init.h        |  18 ++-
 arch/mips/include/asm/mipsregs.h                   |   8 ++
 arch/mips/include/asm/pgtable-bits.h               |   8 +-
 arch/mips/include/asm/pgtable.h                    |   4 +-
 arch/mips/include/asm/uasm.h                       |   3 +-
 arch/mips/include/uapi/asm/inst.h                  |  10 ++
 arch/mips/kernel/cpu-probe.c                       |  40 +++++-
 arch/mips/kernel/idle.c                            |   5 +
 arch/mips/kernel/traps.c                           |   3 +-
 arch/mips/loongson64/common/env.c                  |   7 +-
 arch/mips/loongson64/loongson-3/smp.c              | 106 ++++++++++++--
 arch/mips/mm/c-r4k.c                               |  51 +++++++
 arch/mips/mm/page.c                                |   9 ++
 arch/mips/mm/tlb-r4k.c                             |  27 ++--
 arch/mips/mm/tlbex.c                               | 157 ++++++++++++++++++++-
 arch/mips/mm/uasm-mips.c                           |   2 +
 arch/mips/mm/uasm.c                                |   3 +
 drivers/platform/mips/cpu_hwmon.c                  |   4 +-
 27 files changed, 478 insertions(+), 60 deletions(-)
--
2.4.6

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] MIPS: Loongson: Add Loongson-3A R2 basic support
  2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
@ 2016-01-26 13:26 ` Huacai Chen
  2016-01-26 13:26 ` [PATCH 2/6] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-3 CPU family:

Code-name       Brand-name       PRId
Loongson-3A R1  Loongson-3A1000  0x6305
Loongson-3A R2  Loongson-3A2000  0x6308
Loongson-3B R1  Loongson-3B1000  0x6306
Loongson-3B R2  Loongson-3B1500  0x6307

Features of R2 revision of Loongson-3A:
1, Primary cache includes I-Cache, D-Cache and V-Cache (Victim Cache).
2, I-Cache, D-Cache and V-Cache are 16-way set-associative, linesize is 64 Bytes.
3, 64 entries of VTLB (classic TLB), 1024 entries of FTLB (8-way set-associative).
4, Support DSP/DSPv2 instructions, UserLocal register and Read-Inhibit/Execute-Inhibit.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                  |   1 +
 arch/mips/include/asm/cacheops.h                   |   6 ++
 arch/mips/include/asm/cpu-info.h                   |   1 +
 arch/mips/include/asm/cpu.h                        |   4 +-
 .../asm/mach-loongson64/cpu-feature-overrides.h    |  12 +--
 .../asm/mach-loongson64/kernel-entry-init.h        |   6 +-
 arch/mips/include/asm/mipsregs.h                   |   2 +
 arch/mips/include/asm/pgtable-bits.h               |   8 +-
 arch/mips/include/asm/pgtable.h                    |   4 +-
 arch/mips/kernel/cpu-probe.c                       |  38 +++++++-
 arch/mips/kernel/idle.c                            |   5 +
 arch/mips/kernel/traps.c                           |   3 +-
 arch/mips/loongson64/common/env.c                  |   7 +-
 arch/mips/loongson64/loongson-3/smp.c              | 106 +++++++++++++++++++--
 arch/mips/mm/c-r4k.c                               |  27 ++++++
 arch/mips/mm/tlbex.c                               |   2 +-
 drivers/platform/mips/cpu_hwmon.c                  |   4 +-
 17 files changed, 201 insertions(+), 35 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 57a945e..15faaf0 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1343,6 +1343,7 @@ config CPU_LOONGSON3
 	select CPU_SUPPORTS_HUGEPAGES
 	select WEAK_ORDERING
 	select WEAK_REORDERING_BEYOND_LLSC
+	select MIPS_PGD_C0_CONTEXT
 	select ARCH_REQUIRE_GPIOLIB
 	help
 		The Loongson 3 processor implements the MIPS64R2 instruction
diff --git a/arch/mips/include/asm/cacheops.h b/arch/mips/include/asm/cacheops.h
index c3212ff..8031fbc 100644
--- a/arch/mips/include/asm/cacheops.h
+++ b/arch/mips/include/asm/cacheops.h
@@ -21,6 +21,7 @@
 #define Cache_I				0x00
 #define Cache_D				0x01
 #define Cache_T				0x02
+#define Cache_V				0x02 /* Loongson-3 */
 #define Cache_S				0x03
 
 #define Index_Writeback_Inv		0x00
@@ -107,4 +108,9 @@
  */
 #define Hit_Invalidate_I_Loongson2	(Cache_I | 0x00)
 
+/*
+ * Loongson3-specific cacheops
+ */
+#define Index_Writeback_Inv_V		(Cache_V | Index_Writeback_Inv)
+
 #endif	/* __ASM_CACHEOPS_H */
diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
index e7dc785..6fd7b8bd 100644
--- a/arch/mips/include/asm/cpu-info.h
+++ b/arch/mips/include/asm/cpu-info.h
@@ -60,6 +60,7 @@ struct cpuinfo_mips {
 	int			tlbsizeftlbways;
 	struct cache_desc	icache; /* Primary I-cache */
 	struct cache_desc	dcache; /* Primary D or combined I/D cache */
+	struct cache_desc	vcache; /* Victim cache, between pcache and scache */
 	struct cache_desc	scache; /* Secondary cache */
 	struct cache_desc	tcache; /* Tertiary/split secondary cache */
 	int			srsets; /* Shadow register sets */
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index a97ca97..c4f7983 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -42,6 +42,7 @@
 #define PRID_COMP_LEXRA		0x0b0000
 #define PRID_COMP_NETLOGIC	0x0c0000
 #define PRID_COMP_CAVIUM	0x0d0000
+#define PRID_COMP_LOONGSON	0x140000
 #define PRID_COMP_INGENIC_D0	0xd00000	/* JZ4740, JZ4750 */
 #define PRID_COMP_INGENIC_D1	0xd10000	/* JZ4770, JZ4775 */
 #define PRID_COMP_INGENIC_E1	0xe10000	/* JZ4780 */
@@ -237,9 +238,10 @@
 #define PRID_REV_LOONGSON1B	0x0020
 #define PRID_REV_LOONGSON2E	0x0002
 #define PRID_REV_LOONGSON2F	0x0003
-#define PRID_REV_LOONGSON3A	0x0005
+#define PRID_REV_LOONGSON3A_R1	0x0005
 #define PRID_REV_LOONGSON3B_R1	0x0006
 #define PRID_REV_LOONGSON3B_R2	0x0007
+#define PRID_REV_LOONGSON3A_R2	0x0008
 
 /*
  * Older processors used to encode processor version and revision in two
diff --git a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
index 98963c2..c3406db 100644
--- a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
@@ -16,11 +16,6 @@
 #ifndef __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H
 #define __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H
 
-#define cpu_dcache_line_size()	32
-#define cpu_icache_line_size()	32
-#define cpu_scache_line_size()	32
-
-
 #define cpu_has_32fpr		1
 #define cpu_has_3k_cache	0
 #define cpu_has_4k_cache	1
@@ -31,8 +26,6 @@
 #define cpu_has_counter		1
 #define cpu_has_dc_aliases	(PAGE_SIZE < 0x4000)
 #define cpu_has_divec		0
-#define cpu_has_dsp		0
-#define cpu_has_dsp2		0
 #define cpu_has_ejtag		0
 #define cpu_has_ic_fills_f_dc	0
 #define cpu_has_inclusive_pcaches	1
@@ -40,15 +33,11 @@
 #define cpu_has_mcheck		0
 #define cpu_has_mdmx		0
 #define cpu_has_mips16		0
-#define cpu_has_mips32r2	0
 #define cpu_has_mips3d		0
-#define cpu_has_mips64r2	0
 #define cpu_has_mipsmt		0
-#define cpu_has_prefetch	0
 #define cpu_has_smartmips	0
 #define cpu_has_tlb		1
 #define cpu_has_tx39_cache	0
-#define cpu_has_userlocal	0
 #define cpu_has_vce		0
 #define cpu_has_veic		0
 #define cpu_has_vint		0
@@ -57,5 +46,6 @@
 #define cpu_has_local_ebase	0
 
 #define cpu_has_wsbh		IS_ENABLED(CONFIG_CPU_LOONGSON3)
+#define cpu_hwrena_impl_bits	0xc0000000
 
 #endif /* __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H */
diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
index 3f2f84f..da83482 100644
--- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
+++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
@@ -23,7 +23,8 @@
 	or	t0, (0x1 << 7)
 	mtc0	t0, $16, 3
 	/* Set ELPA on LOONGSON3 pagegrain */
-	li	t0, (0x1 << 29)
+	mfc0	t0, $5, 1
+	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
 	_ehb
 	.set	pop
@@ -42,7 +43,8 @@
 	or	t0, (0x1 << 7)
 	mtc0	t0, $16, 3
 	/* Set ELPA on LOONGSON3 pagegrain */
-	li	t0, (0x1 << 29)
+	mfc0	t0, $5, 1
+	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
 	_ehb
 	.set	pop
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 3ad19ad..9290fd4 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -633,6 +633,8 @@
 #define MIPS_CONF6_SYND		(_ULCAST_(1) << 13)
 /* proAptiv FTLB on/off bit */
 #define MIPS_CONF6_FTLBEN	(_ULCAST_(1) << 15)
+/* Loongson-3 FTLB on/off bit */
+#define MIPS_CONF6_FTLBDIS	(_ULCAST_(1) << 22)
 /* FTLB probability bits */
 #define MIPS_CONF6_FTLBP_SHIFT	(16)
 
diff --git a/arch/mips/include/asm/pgtable-bits.h b/arch/mips/include/asm/pgtable-bits.h
index 97b3138..32b77bd 100644
--- a/arch/mips/include/asm/pgtable-bits.h
+++ b/arch/mips/include/asm/pgtable-bits.h
@@ -113,7 +113,7 @@
 #define _PAGE_PRESENT_SHIFT	0
 #define _PAGE_PRESENT		(1 << _PAGE_PRESENT_SHIFT)
 /* R2 or later cores check for RI/XI support to determine _PAGE_READ */
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 #define _PAGE_WRITE_SHIFT	(_PAGE_PRESENT_SHIFT + 1)
 #define _PAGE_WRITE		(1 << _PAGE_WRITE_SHIFT)
 #else
@@ -133,7 +133,7 @@
 #define _PAGE_HUGE		(1 << _PAGE_HUGE_SHIFT)
 #endif	/* CONFIG_64BIT && CONFIG_MIPS_HUGE_TLB_SUPPORT */
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 /* XI - page cannot be executed */
 #ifdef _PAGE_HUGE_SHIFT
 #define _PAGE_NO_EXEC_SHIFT	(_PAGE_HUGE_SHIFT + 1)
@@ -147,7 +147,7 @@
 #define _PAGE_READ		(cpu_has_rixi ? 0 : (1 << _PAGE_READ_SHIFT))
 #define _PAGE_NO_READ_SHIFT	_PAGE_READ_SHIFT
 #define _PAGE_NO_READ		(cpu_has_rixi ? (1 << _PAGE_READ_SHIFT) : 0)
-#endif	/* defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) */
+#endif	/* defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3) */
 
 #if defined(_PAGE_NO_READ_SHIFT)
 #define _PAGE_GLOBAL_SHIFT	(_PAGE_NO_READ_SHIFT + 1)
@@ -198,7 +198,7 @@
  */
 static inline uint64_t pte_to_entrylo(unsigned long pte_val)
 {
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (cpu_has_rixi) {
 		int sa;
 #ifdef CONFIG_32BIT
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 9a4fe01..35cd713 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -353,7 +353,7 @@ static inline pte_t pte_mkdirty(pte_t pte)
 static inline pte_t pte_mkyoung(pte_t pte)
 {
 	pte_val(pte) |= _PAGE_ACCESSED;
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (!(pte_val(pte) & _PAGE_NO_READ))
 		pte_val(pte) |= _PAGE_SILENT_READ;
 	else
@@ -542,7 +542,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
 {
 	pmd_val(pmd) |= _PAGE_ACCESSED;
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (!(pmd_val(pmd) & _PAGE_NO_READ))
 		pmd_val(pmd) |= _PAGE_SILENT_READ;
 	else
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index b725b71..9e963f2 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -561,6 +561,16 @@ static int set_ftlb_enable(struct cpuinfo_mips *c, int enable)
 		write_c0_config7(config | (calculate_ftlb_probability(c)
 					   << MIPS_CONF7_FTLBP_SHIFT));
 		break;
+	case CPU_LOONGSON3:
+		/* Loongson-3 cores use Config6 to enable the FTLB */
+		config = read_c0_config6();
+		if (enable)
+			/* Enable FTLB */
+			write_c0_config6(config & ~MIPS_CONF6_FTLBDIS);
+		else
+			/* Disable FTLB */
+			write_c0_config6(config | MIPS_CONF6_FTLBDIS);
+		break;
 	default:
 		return 1;
 	}
@@ -1172,7 +1182,7 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
 			set_isa(c, MIPS_CPU_ISA_III);
 			c->fpu_msk31 |= FPU_CSR_CONDX;
 			break;
-		case PRID_REV_LOONGSON3A:
+		case PRID_REV_LOONGSON3A_R1:
 			c->cputype = CPU_LOONGSON3;
 			__cpu_name[cpu] = "ICT Loongson-3";
 			set_elf_platform(cpu, "loongson3a");
@@ -1493,6 +1503,29 @@ platform:
 	}
 }
 
+static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
+{
+	switch (c->processor_id & PRID_IMP_MASK) {
+	case PRID_IMP_LOONGSON_64:  /* Loongson-2/3 */
+		switch (c->processor_id & PRID_REV_MASK) {
+		case PRID_REV_LOONGSON3A_R2:
+			c->cputype = CPU_LOONGSON3;
+			__cpu_name[cpu] = "ICT Loongson-3";
+			set_elf_platform(cpu, "loongson3a");
+			set_isa(c, MIPS_CPU_ISA_M64R2);
+			break;
+		}
+
+		decode_configs(c);
+		c->options |= MIPS_CPU_TLBINV;
+		c->writecombine = _CACHE_UNCACHED_ACCELERATED;
+		break;
+	default:
+		panic("Unknown Loongson Processor ID!");
+		break;
+	}
+}
+
 static inline void cpu_probe_ingenic(struct cpuinfo_mips *c, unsigned int cpu)
 {
 	decode_configs(c);
@@ -1640,6 +1673,9 @@ void cpu_probe(void)
 	case PRID_COMP_CAVIUM:
 		cpu_probe_cavium(c, cpu);
 		break;
+	case PRID_COMP_LOONGSON:
+		cpu_probe_loongson(c, cpu);
+		break;
 	case PRID_COMP_INGENIC_D0:
 	case PRID_COMP_INGENIC_D1:
 	case PRID_COMP_INGENIC_E1:
diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c
index 46794d6..9ae2c46 100644
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -181,6 +181,11 @@ void __init check_wait(void)
 	case CPU_XLP:
 		cpu_wait = r4k_wait;
 		break;
+	case CPU_LOONGSON3:
+		if ((read_c0_prid() & 0xf) == PRID_REV_LOONGSON3A_R2)
+			cpu_wait = r4k_wait;
+		break;
+
 	case CPU_BMIPS5000:
 		cpu_wait = r4k_wait_irqoff;
 		break;
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index bafcb7a..165db67 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1783,7 +1783,8 @@ asmlinkage void do_ftlb(void)
 
 	/* For the moment, report the problem and hang. */
 	if ((cpu_has_mips_r2_r6) &&
-	    ((current_cpu_data.processor_id & 0xff0000) == PRID_COMP_MIPS)) {
+	    (((current_cpu_data.processor_id & 0xff0000) == PRID_COMP_MIPS) ||
+	    ((current_cpu_data.processor_id & 0xff0000) == PRID_COMP_LOONGSON))) {
 		pr_err("FTLB error exception, cp0_ecc=0x%08x:\n",
 		       read_c0_ecc());
 		pr_err("cp0_errorepc == %0*lx\n", field, read_c0_errorepc());
diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c
index d6d07ad..57d590a 100644
--- a/arch/mips/loongson64/common/env.c
+++ b/arch/mips/loongson64/common/env.c
@@ -105,6 +105,10 @@ void __init prom_init_env(void)
 		loongson_chiptemp[1] = 0x900010001fe0019c;
 		loongson_chiptemp[2] = 0x900020001fe0019c;
 		loongson_chiptemp[3] = 0x900030001fe0019c;
+		loongson_freqctrl[0] = 0x900000001fe001d0;
+		loongson_freqctrl[1] = 0x900010001fe001d0;
+		loongson_freqctrl[2] = 0x900020001fe001d0;
+		loongson_freqctrl[3] = 0x900030001fe001d0;
 		loongson_sysconf.ht_control_base = 0x90000EFDFB000000;
 		loongson_sysconf.workarounds = WORKAROUND_CPUFREQ;
 	} else if (ecpu->cputype == Loongson_3B) {
@@ -187,7 +191,8 @@ void __init prom_init_env(void)
 		case PRID_REV_LOONGSON2F:
 			cpu_clock_freq = 797000000;
 			break;
-		case PRID_REV_LOONGSON3A:
+		case PRID_REV_LOONGSON3A_R1:
+		case PRID_REV_LOONGSON3A_R2:
 			cpu_clock_freq = 900000000;
 			break;
 		case PRID_REV_LOONGSON3B_R1:
diff --git a/arch/mips/loongson64/loongson-3/smp.c b/arch/mips/loongson64/loongson-3/smp.c
index 509832a9..ef28d6a 100644
--- a/arch/mips/loongson64/loongson-3/smp.c
+++ b/arch/mips/loongson64/loongson-3/smp.c
@@ -440,7 +440,7 @@ static void loongson3_cpu_die(unsigned int cpu)
  * flush all L1 entries at first. Then, another core (usually Core 0) can
  * safely disable the clock of the target core. loongson3_play_dead() is
  * called via CKSEG1 (uncached and unmmaped) */
-static void loongson3a_play_dead(int *state_addr)
+static void loongson3a_r1_play_dead(int *state_addr)
 {
 	register int val;
 	register long cpuid, core, node, count;
@@ -502,6 +502,89 @@ static void loongson3a_play_dead(int *state_addr)
 		: "a1");
 }
 
+static void loongson3a_r2_play_dead(int *state_addr)
+{
+	register int val;
+	register long cpuid, core, node, count;
+	register void *addr, *base, *initfunc;
+
+	__asm__ __volatile__(
+		"   .set push                     \n"
+		"   .set noreorder                \n"
+		"   li %[addr], 0x80000000        \n" /* KSEG0 */
+		"1: cache 0, 0(%[addr])           \n" /* flush L1 ICache */
+		"   cache 0, 1(%[addr])           \n"
+		"   cache 0, 2(%[addr])           \n"
+		"   cache 0, 3(%[addr])           \n"
+		"   cache 1, 0(%[addr])           \n" /* flush L1 DCache */
+		"   cache 1, 1(%[addr])           \n"
+		"   cache 1, 2(%[addr])           \n"
+		"   cache 1, 3(%[addr])           \n"
+		"   addiu %[sets], %[sets], -1    \n"
+		"   bnez  %[sets], 1b             \n"
+		"   addiu %[addr], %[addr], 0x40  \n"
+		"   li %[addr], 0x80000000        \n" /* KSEG0 */
+		"2: cache 2, 0(%[addr])           \n" /* flush L1 VCache */
+		"   cache 2, 1(%[addr])           \n"
+		"   cache 2, 2(%[addr])           \n"
+		"   cache 2, 3(%[addr])           \n"
+		"   cache 2, 4(%[addr])           \n"
+		"   cache 2, 5(%[addr])           \n"
+		"   cache 2, 6(%[addr])           \n"
+		"   cache 2, 7(%[addr])           \n"
+		"   cache 2, 8(%[addr])           \n"
+		"   cache 2, 9(%[addr])           \n"
+		"   cache 2, 10(%[addr])          \n"
+		"   cache 2, 11(%[addr])          \n"
+		"   cache 2, 12(%[addr])          \n"
+		"   cache 2, 13(%[addr])          \n"
+		"   cache 2, 14(%[addr])          \n"
+		"   cache 2, 15(%[addr])          \n"
+		"   addiu %[vsets], %[vsets], -1  \n"
+		"   bnez  %[vsets], 2b            \n"
+		"   addiu %[addr], %[addr], 0x40  \n"
+		"   li    %[val], 0x7             \n" /* *state_addr = CPU_DEAD; */
+		"   sw    %[val], (%[state_addr]) \n"
+		"   sync                          \n"
+		"   cache 21, (%[state_addr])     \n" /* flush entry of *state_addr */
+		"   .set pop                      \n"
+		: [addr] "=&r" (addr), [val] "=&r" (val)
+		: [state_addr] "r" (state_addr),
+		  [sets] "r" (cpu_data[smp_processor_id()].dcache.sets),
+		  [vsets] "r" (cpu_data[smp_processor_id()].vcache.sets));
+
+	__asm__ __volatile__(
+		"   .set push                         \n"
+		"   .set noreorder                    \n"
+		"   .set mips64                       \n"
+		"   mfc0  %[cpuid], $15, 1            \n"
+		"   andi  %[cpuid], 0x3ff             \n"
+		"   dli   %[base], 0x900000003ff01000 \n"
+		"   andi  %[core], %[cpuid], 0x3      \n"
+		"   sll   %[core], 8                  \n" /* get core id */
+		"   or    %[base], %[base], %[core]   \n"
+		"   andi  %[node], %[cpuid], 0xc      \n"
+		"   dsll  %[node], 42                 \n" /* get node id */
+		"   or    %[base], %[base], %[node]   \n"
+		"1: li    %[count], 0x100             \n" /* wait for init loop */
+		"2: bnez  %[count], 2b                \n" /* limit mailbox access */
+		"   addiu %[count], -1                \n"
+		"   ld    %[initfunc], 0x20(%[base])  \n" /* get PC via mailbox */
+		"   beqz  %[initfunc], 1b             \n"
+		"   nop                               \n"
+		"   ld    $sp, 0x28(%[base])          \n" /* get SP via mailbox */
+		"   ld    $gp, 0x30(%[base])          \n" /* get GP via mailbox */
+		"   ld    $a1, 0x38(%[base])          \n"
+		"   jr    %[initfunc]                 \n" /* jump to initial PC */
+		"   nop                               \n"
+		"   .set pop                          \n"
+		: [core] "=&r" (core), [node] "=&r" (node),
+		  [base] "=&r" (base), [cpuid] "=&r" (cpuid),
+		  [count] "=&r" (count), [initfunc] "=&r" (initfunc)
+		: /* No Input */
+		: "a1");
+}
+
 static void loongson3b_play_dead(int *state_addr)
 {
 	register int val;
@@ -573,13 +656,18 @@ void play_dead(void)
 	void (*play_dead_at_ckseg1)(int *);
 
 	idle_task_exit();
-	switch (loongson_sysconf.cputype) {
-	case Loongson_3A:
+	switch (read_c0_prid() & 0xf) {
+	case PRID_REV_LOONGSON3A_R1:
 	default:
 		play_dead_at_ckseg1 =
-			(void *)CKSEG1ADDR((unsigned long)loongson3a_play_dead);
+			(void *)CKSEG1ADDR((unsigned long)loongson3a_r1_play_dead);
+		break;
+	case PRID_REV_LOONGSON3A_R2:
+		play_dead_at_ckseg1 =
+			(void *)CKSEG1ADDR((unsigned long)loongson3a_r2_play_dead);
 		break;
-	case Loongson_3B:
+	case PRID_REV_LOONGSON3B_R1:
+	case PRID_REV_LOONGSON3B_R2:
 		play_dead_at_ckseg1 =
 			(void *)CKSEG1ADDR((unsigned long)loongson3b_play_dead);
 		break;
@@ -594,9 +682,9 @@ void loongson3_disable_clock(int cpu)
 	uint64_t core_id = cpu_data[cpu].core;
 	uint64_t package_id = cpu_data[cpu].package;
 
-	if (loongson_sysconf.cputype == Loongson_3A) {
+	if ((read_c0_prid() & 0xf) == PRID_REV_LOONGSON3A_R1) {
 		LOONGSON_CHIPCFG(package_id) &= ~(1 << (12 + core_id));
-	} else if (loongson_sysconf.cputype == Loongson_3B) {
+	} else {
 		if (!(loongson_sysconf.workarounds & WORKAROUND_CPUHOTPLUG))
 			LOONGSON_FREQCTRL(package_id) &= ~(1 << (core_id * 4 + 3));
 	}
@@ -607,9 +695,9 @@ void loongson3_enable_clock(int cpu)
 	uint64_t core_id = cpu_data[cpu].core;
 	uint64_t package_id = cpu_data[cpu].package;
 
-	if (loongson_sysconf.cputype == Loongson_3A) {
+	if ((read_c0_prid() & 0xf) == PRID_REV_LOONGSON3A_R1) {
 		LOONGSON_CHIPCFG(package_id) |= 1 << (12 + core_id);
-	} else if (loongson_sysconf.cputype == Loongson_3B) {
+	} else {
 		if (!(loongson_sysconf.workarounds & WORKAROUND_CPUHOTPLUG))
 			LOONGSON_FREQCTRL(package_id) |= 1 << (core_id * 4 + 3);
 	}
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index caac3d7..2abc73d 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -77,6 +77,7 @@ static inline void r4k_on_each_cpu(void (*func) (void *info), void *info)
  */
 static unsigned long icache_size __read_mostly;
 static unsigned long dcache_size __read_mostly;
+static unsigned long vcache_size __read_mostly;
 static unsigned long scache_size __read_mostly;
 
 /*
@@ -1328,6 +1329,31 @@ static void probe_pcache(void)
 	       c->dcache.linesz);
 }
 
+static void probe_vcache(void)
+{
+	struct cpuinfo_mips *c = &current_cpu_data;
+	unsigned int config2, lsize;
+
+	if (current_cpu_type() != CPU_LOONGSON3)
+		return;
+
+	config2 = read_c0_config2();
+	if ((lsize = ((config2 >> 20) & 15)))
+		c->vcache.linesz = 2 << lsize;
+	else
+		c->vcache.linesz = lsize;
+
+	c->vcache.sets = 64 << ((config2 >> 24) & 15);
+	c->vcache.ways = 1 + ((config2 >> 16) & 15);
+
+	vcache_size = c->vcache.sets * c->vcache.ways * c->vcache.linesz;
+
+	c->vcache.waybit = 0;
+
+	pr_info("Unified victim cache %ldkB %s, linesize %d bytes.\n",
+		vcache_size >> 10, way_string[c->vcache.ways], c->vcache.linesz);
+}
+
 /*
  * If you even _breathe_ on this function, look at the gcc output and make sure
  * it does not pop things on and off the stack for the cache sizing loop that
@@ -1650,6 +1676,7 @@ void r4k_cache_init(void)
 	struct cpuinfo_mips *c = &current_cpu_data;
 
 	probe_pcache();
+	probe_vcache();
 	setup_scache();
 
 	r4k_blast_dcache_page_setup();
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 5a04b6f..e3574f4 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -241,7 +241,7 @@ static void output_pgtable_bits_defines(void)
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	pr_define("_PAGE_HUGE_SHIFT %d\n", _PAGE_HUGE_SHIFT);
 #endif
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (cpu_has_rixi) {
 #ifdef _PAGE_NO_EXEC_SHIFT
 		pr_define("_PAGE_NO_EXEC_SHIFT %d\n", _PAGE_NO_EXEC_SHIFT);
diff --git a/drivers/platform/mips/cpu_hwmon.c b/drivers/platform/mips/cpu_hwmon.c
index 0f6c63e..7c56d71 100644
--- a/drivers/platform/mips/cpu_hwmon.c
+++ b/drivers/platform/mips/cpu_hwmon.c
@@ -20,9 +20,9 @@ int loongson3_cpu_temp(int cpu)
 	u32 reg;
 
 	reg = LOONGSON_CHIPTEMP(cpu);
-	if (loongson_sysconf.cputype == Loongson_3A)
+	if ((read_c0_prid() & 0xf) == PRID_REV_LOONGSON3A_R1)
 		reg = (reg >> 8) & 0xff;
-	else if (loongson_sysconf.cputype == Loongson_3B)
+	else
 		reg = ((reg >> 8) & 0xff) - 100;
 
 	return (int)reg * 1000;
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] MIPS: Loongson: Invalidate special TLBs when needed
  2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
  2016-01-26 13:26 ` [PATCH 1/6] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
@ 2016-01-26 13:26 ` Huacai Chen
  2016-01-26 13:26 ` [PATCH 3/6] MIPS: Loongson-3: Fast TLB refill handler Huacai Chen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-2 has a 4 entry itlb which is a subset of jtlb, Loongson-3 has
a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. We should
write diag register to invalidate itlb/dtlb when flushing jtlb because
itlb/dtlb are not totally transparent to software.

For Loongson-3A R2 (and newer), we should invalidate ITLB, DTLB, VTLB
and FTLB before we enable/disable FTLB.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/kernel/cpu-probe.c |  2 ++
 arch/mips/mm/tlb-r4k.c       | 27 +++++++++++++++------------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9e963f2..2c5ec1e 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -562,6 +562,8 @@ static int set_ftlb_enable(struct cpuinfo_mips *c, int enable)
 					   << MIPS_CONF7_FTLBP_SHIFT));
 		break;
 	case CPU_LOONGSON3:
+		/* Flush ITLB, DTLB, VTLB and FTLB */
+		write_c0_diag(1<<2 | 1<<3 | 1<<12 | 1<<13);
 		/* Loongson-3 cores use Config6 to enable the FTLB */
 		config = read_c0_config6();
 		if (enable)
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index 5037d58..8baa288 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -27,25 +27,28 @@
 extern void build_tlb_refill_handler(void);
 
 /*
- * LOONGSON2/3 has a 4 entry itlb which is a subset of dtlb,
- * unfortunately, itlb is not totally transparent to software.
+ * LOONGSON-2 has a 4 entry itlb which is a subset of jtlb, LOONGSON-3 has
+ * a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. Unfortunately,
+ * itlb/dtlb are not totally transparent to software.
  */
-static inline void flush_itlb(void)
+static inline void flush_spec_tlb(void)
 {
 	switch (current_cpu_type()) {
 	case CPU_LOONGSON2:
+		write_c0_diag(0x4);
+		break;
 	case CPU_LOONGSON3:
-		write_c0_diag(4);
+		write_c0_diag(0xc);
 		break;
 	default:
 		break;
 	}
 }
 
-static inline void flush_itlb_vm(struct vm_area_struct *vma)
+static inline void flush_spec_tlb_vm(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_EXEC)
-		flush_itlb();
+		flush_spec_tlb();
 }
 
 void local_flush_tlb_all(void)
@@ -92,7 +95,7 @@ void local_flush_tlb_all(void)
 	tlbw_use_hazard();
 	write_c0_entryhi(old_ctx);
 	htw_start();
-	flush_itlb();
+	flush_spec_tlb();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(local_flush_tlb_all);
@@ -158,7 +161,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		} else {
 			drop_mmu_context(mm, cpu);
 		}
-		flush_itlb();
+		flush_spec_tlb();
 		local_irq_restore(flags);
 	}
 }
@@ -204,7 +207,7 @@ void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	} else {
 		local_flush_tlb_all();
 	}
-	flush_itlb();
+	flush_spec_tlb();
 	local_irq_restore(flags);
 }
 
@@ -239,7 +242,7 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 	finish:
 		write_c0_entryhi(oldpid);
 		htw_start();
-		flush_itlb_vm(vma);
+		flush_spec_tlb_vm(vma);
 		local_irq_restore(flags);
 	}
 }
@@ -273,7 +276,7 @@ void local_flush_tlb_one(unsigned long page)
 	}
 	write_c0_entryhi(oldpid);
 	htw_start();
-	flush_itlb();
+	flush_spec_tlb();
 	local_irq_restore(flags);
 }
 
@@ -356,7 +359,7 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	}
 	tlbw_use_hazard();
 	htw_start();
-	flush_itlb_vm(vma);
+	flush_spec_tlb_vm(vma);
 	local_irq_restore(flags);
 }
 
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] MIPS: Loongson-3: Fast TLB refill handler
  2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
  2016-01-26 13:26 ` [PATCH 1/6] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
  2016-01-26 13:26 ` [PATCH 2/6] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
@ 2016-01-26 13:26 ` Huacai Chen
  2016-01-26 13:26 ` [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler Huacai Chen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-3A R2 has pwbase/pwfield/pwsize/pwctl registers in CP0 (this
is very similar to HTW) and lwdir/lwpte/lddir/ldpte instructions which
can be used for fast TLB refill.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cpu-features.h |   3 +
 arch/mips/include/asm/cpu.h          |   1 +
 arch/mips/include/asm/mipsregs.h     |   6 ++
 arch/mips/include/asm/uasm.h         |   3 +-
 arch/mips/include/uapi/asm/inst.h    |  10 +++
 arch/mips/kernel/cpu-probe.c         |   2 +-
 arch/mips/mm/tlbex.c                 | 124 ++++++++++++++++++++++++++++++++++-
 arch/mips/mm/uasm-mips.c             |   2 +
 arch/mips/mm/uasm.c                  |   3 +
 9 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
index eeec8c8..e0ba50a 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -35,6 +35,9 @@
 #ifndef cpu_has_htw
 #define cpu_has_htw		(cpu_data[0].options & MIPS_CPU_HTW)
 #endif
+#ifndef cpu_has_ldpte
+#define cpu_has_ldpte		(cpu_data[0].options & MIPS_CPU_LDPTE)
+#endif
 #ifndef cpu_has_rixiex
 #define cpu_has_rixiex		(cpu_data[0].options & MIPS_CPU_RIXIEX)
 #endif
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index c4f7983..5f50551 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -390,6 +390,7 @@ enum cpu_type_enum {
 #define MIPS_CPU_FTLB		0x20000000000ull /* CPU has Fixed-page-size TLB */
 #define MIPS_CPU_NAN_LEGACY	0x40000000000ull /* Legacy NaN implemented */
 #define MIPS_CPU_NAN_2008	0x80000000000ull /* 2008 NaN implemented */
+#define MIPS_CPU_LDPTE		0x100000000000ull /* CPU has ldpte/lddir instructions */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 9290fd4..8affca2 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -1444,6 +1444,12 @@ do {									\
 #define read_c0_pwctl()		__read_32bit_c0_register($6, 6)
 #define write_c0_pwctl(val)	__write_32bit_c0_register($6, 6, val)
 
+#define read_c0_pgd()		__read_64bit_c0_register($9, 7)
+#define write_c0_pgd(val)	__write_64bit_c0_register($9, 7, val)
+
+#define read_c0_kpgd()		__read_64bit_c0_register($31, 7)
+#define write_c0_kpgd(val)	__write_64bit_c0_register($31, 7, val)
+
 /* Cavium OCTEON (cnMIPS) */
 #define read_c0_cvmcount()	__read_ulong_c0_register($9, 6)
 #define write_c0_cvmcount(val)	__write_ulong_c0_register($9, 6, val)
diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index fc1cdd2..b6ecfee 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -171,7 +171,8 @@ Ip_u2u1(_wsbh);
 Ip_u3u1u2(_xor);
 Ip_u2u1u3(_xori);
 Ip_u2u1(_yield);
-
+Ip_u1u2(_ldpte);
+Ip_u2u1u3(_lddir);
 
 /* Handle labels. */
 struct uasm_label {
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index ddea53e..3bb8cd9 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -204,6 +204,16 @@ enum mad_func {
 };
 
 /*
+ * func field for page table walker (Loongson-3).
+ */
+enum ptw_func {
+	lwdir_op = 0x00,
+	lwpte_op = 0x01,
+	lddir_op = 0x02,
+	ldpte_op = 0x03,
+};
+
+/*
  * func field for special3 lx opcodes (Cavium Octeon).
  */
 enum lx_func {
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 2c5ec1e..de43940 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1519,7 +1519,7 @@ static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
 		}
 
 		decode_configs(c);
-		c->options |= MIPS_CPU_TLBINV;
+		c->options |= MIPS_CPU_TLBINV | MIPS_CPU_LDPTE;
 		c->writecombine = _CACHE_UNCACHED_ACCELERATED;
 		break;
 	default:
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index e3574f4..d0975cd 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -284,7 +284,12 @@ static inline void dump_handler(const char *symbol, const u32 *handler, int coun
 #define C0_ENTRYLO1	3, 0
 #define C0_CONTEXT	4, 0
 #define C0_PAGEMASK	5, 0
+#define C0_PWBASE	5, 5
+#define C0_PWFIELD	5, 6
+#define C0_PWSIZE	5, 7
+#define C0_PWCTL	6, 6
 #define C0_BADVADDR	8, 0
+#define C0_PGD		9, 7
 #define C0_ENTRYHI	10, 0
 #define C0_EPC		14, 0
 #define C0_XCONTEXT	20, 0
@@ -808,7 +813,10 @@ build_get_pmde64(u32 **p, struct uasm_label **l, struct uasm_reloc **r,
 
 	if (pgd_reg != -1) {
 		/* pgd is in pgd_reg */
-		UASM_i_MFC0(p, ptr, c0_kscratch(), pgd_reg);
+		if (cpu_has_ldpte)
+			UASM_i_MFC0(p, ptr, C0_PWBASE);
+		else
+			UASM_i_MFC0(p, ptr, c0_kscratch(), pgd_reg);
 	} else {
 #if defined(CONFIG_MIPS_PGD_C0_CONTEXT)
 		/*
@@ -1421,6 +1429,108 @@ static void build_r4000_tlb_refill_handler(void)
 	dump_handler("r4000_tlb_refill", (u32 *)ebase, 64);
 }
 
+static void setup_pw(void)
+{
+	unsigned long pgd_i, pgd_w;
+#ifndef __PAGETABLE_PMD_FOLDED
+	unsigned long pmd_i, pmd_w;
+#endif
+	unsigned long pt_i, pt_w;
+	unsigned long pte_i, pte_w;
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	unsigned long psn;
+
+	psn = ilog2(_PAGE_HUGE);     /* bit used to indicate huge page */
+#endif
+	pgd_i = PGDIR_SHIFT;  /* 1st level PGD */
+#ifndef __PAGETABLE_PMD_FOLDED
+	pgd_w = PGDIR_SHIFT - PMD_SHIFT + PGD_ORDER;
+
+	pmd_i = PMD_SHIFT;    /* 2nd level PMD */
+	pmd_w = PMD_SHIFT - PAGE_SHIFT;
+#else
+	pgd_w = PGDIR_SHIFT - PAGE_SHIFT + PGD_ORDER;
+#endif
+
+	pt_i  = PAGE_SHIFT;    /* 3rd level PTE */
+	pt_w  = PAGE_SHIFT - 3;
+
+	pte_i = ilog2(_PAGE_GLOBAL);
+	pte_w = 0;
+
+#ifndef __PAGETABLE_PMD_FOLDED
+	write_c0_pwfield(pgd_i << 24 | pmd_i << 12 | pt_i << 6 | pte_i);
+	write_c0_pwsize(1 << 30 | pgd_w << 24 | pmd_w << 12 | pt_w << 6 | pte_w);
+#else
+	write_c0_pwfield(pgd_i << 24 | pt_i << 6 | pte_i);
+	write_c0_pwsize(1 << 30 | pgd_w << 24 | pt_w << 6 | pte_w);
+#endif
+
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	write_c0_pwctl(1 << 6 | psn);
+#endif
+	write_c0_kpgd(swapper_pg_dir);
+	kscratch_used_mask |= (1 << 7); /* KScratch6 is used for KPGD */
+}
+
+static void build_loongson3_tlb_refill_handler(void)
+{
+	u32 *p = tlb_handler;
+	struct uasm_label *l = labels;
+	struct uasm_reloc *r = relocs;
+
+	memset(labels, 0, sizeof(labels));
+	memset(relocs, 0, sizeof(relocs));
+	memset(tlb_handler, 0, sizeof(tlb_handler));
+
+	if (check_for_high_segbits) {
+		uasm_i_dmfc0(&p, K0, C0_BADVADDR);
+		uasm_i_dsrl_safe(&p, K1, K0, PGDIR_SHIFT + PGD_ORDER + PAGE_SHIFT - 3);
+		uasm_il_beqz(&p, &r, K1, label_vmalloc);
+		uasm_i_nop(&p);
+
+		uasm_il_bgez(&p, &r, K0, label_large_segbits_fault);
+		uasm_i_nop(&p);
+		uasm_l_vmalloc(&l, p);
+	}
+
+	uasm_i_dmfc0(&p, K1, C0_PGD);
+
+	uasm_i_lddir(&p, K0, K1, 3);  /* global page dir */
+#ifndef __PAGETABLE_PMD_FOLDED
+	uasm_i_lddir(&p, K1, K0, 1);  /* middle page dir */
+#endif
+	uasm_i_ldpte(&p, K1, 0);      /* even */
+	uasm_i_ldpte(&p, K1, 1);      /* odd */
+	uasm_i_tlbwr(&p);
+
+	/* restore page mask */
+	if (PM_DEFAULT_MASK >> 16) {
+		uasm_i_lui(&p, K0, PM_DEFAULT_MASK >> 16);
+		uasm_i_ori(&p, K0, K0, PM_DEFAULT_MASK & 0xffff);
+		uasm_i_mtc0(&p, K0, C0_PAGEMASK);
+	} else if (PM_DEFAULT_MASK) {
+		uasm_i_ori(&p, K0, 0, PM_DEFAULT_MASK);
+		uasm_i_mtc0(&p, K0, C0_PAGEMASK);
+	} else {
+		uasm_i_mtc0(&p, 0, C0_PAGEMASK);
+	}
+
+	uasm_i_eret(&p);
+
+	if (check_for_high_segbits) {
+		uasm_l_large_segbits_fault(&l, p);
+		UASM_i_LA(&p, K1, (unsigned long)tlb_do_page_fault_0);
+		uasm_i_jr(&p, K1);
+		uasm_i_nop(&p);
+	}
+
+	uasm_resolve_relocs(relocs, labels);
+	memcpy((void *)(ebase + 0x80), tlb_handler, 0x80);
+	local_flush_icache_range(ebase + 0x80, ebase + 0x100);
+	dump_handler("loongson3_tlb_refill", (u32 *)(ebase + 0x80), 32);
+}
+
 extern u32 handle_tlbl[], handle_tlbl_end[];
 extern u32 handle_tlbs[], handle_tlbs_end[];
 extern u32 handle_tlbm[], handle_tlbm_end[];
@@ -1468,7 +1578,10 @@ static void build_setup_pgd(void)
 	} else {
 		/* PGD in c0_KScratch */
 		uasm_i_jr(&p, 31);
-		UASM_i_MTC0(&p, a0, c0_kscratch(), pgd_reg);
+		if (cpu_has_ldpte)
+			UASM_i_MTC0(&p, a0, C0_PWBASE);
+		else
+			UASM_i_MTC0(&p, a0, c0_kscratch(), pgd_reg);
 	}
 #else
 #ifdef CONFIG_SMP
@@ -2437,13 +2550,18 @@ void build_tlb_refill_handler(void)
 		break;
 
 	default:
+		if (cpu_has_ldpte)
+			setup_pw();
+
 		if (!run_once) {
 			scratch_reg = allocate_kscratch();
 			build_setup_pgd();
 			build_r4000_tlb_load_handler();
 			build_r4000_tlb_store_handler();
 			build_r4000_tlb_modify_handler();
-			if (!cpu_has_local_ebase)
+			if (cpu_has_ldpte)
+				build_loongson3_tlb_refill_handler();
+			else if (!cpu_has_local_ebase)
 				build_r4000_tlb_refill_handler();
 			flush_tlb_handlers();
 			run_once++;
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index b4a83789..9c2220a 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -153,6 +153,8 @@ static struct insn insn_table[] = {
 	{ insn_xori,  M(xori_op, 0, 0, 0, 0, 0),  RS | RT | UIMM },
 	{ insn_xor,  M(spec_op, 0, 0, 0, 0, xor_op),  RS | RT | RD },
 	{ insn_yield, M(spec3_op, 0, 0, 0, 0, yield_op), RS | RD },
+	{ insn_ldpte, M(lwc2_op, 0, 0, 0, ldpte_op, mult_op), RS | RD },
+	{ insn_lddir, M(lwc2_op, 0, 0, 0, lddir_op, mult_op), RS | RT | RD },
 	{ insn_invalid, 0, 0 }
 };
 
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 319051c..ad718de 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -60,6 +60,7 @@ enum opcode {
 	insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv, insn_subu,
 	insn_sw, insn_sync, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
 	insn_tlbwr, insn_wait, insn_wsbh, insn_xor, insn_xori, insn_yield,
+	insn_lddir, insn_ldpte,
 };
 
 struct insn {
@@ -335,6 +336,8 @@ I_u1u2s3(_bbit0);
 I_u1u2s3(_bbit1);
 I_u3u1u2(_lwx)
 I_u3u1u2(_ldx)
+I_u1u2(_ldpte)
+I_u2u1u3(_lddir)
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 #include <asm/octeon/octeon.h>
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler
  2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
                   ` (2 preceding siblings ...)
  2016-01-26 13:26 ` [PATCH 3/6] MIPS: Loongson-3: Fast TLB refill handler Huacai Chen
@ 2016-01-26 13:26 ` Huacai Chen
  2016-01-26 21:15   ` David Daney
  2016-01-27  5:50   ` Joshua Kinard
  2016-01-26 13:26 ` [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature Huacai Chen
  2016-01-26 13:26 ` [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT Huacai Chen
  5 siblings, 2 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

If a tlb miss triggered when EXL=1, tlb refill exception is treated as
tlb invalid exception, so tlbp may fails. In this situation, CP0_Index
register doesn't contain a valid value. This may not be a problem for
VTLB since it is fully-associative. However, FTLB is set-associative so
not every tlb entry is valid for a specific address. Thus, we should
use tlbwr instead of tlbwi when tlbp fails.

There is a similar case for huge page, so build_huge_tlb_write_entry()
is also modified. If wmode != tlb_random, that means the caller is tlb
invalid handler, we should select tlbr/tlbi depend on the tlbp result.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/mm/tlbex.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index d0975cd..da68ffb 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -173,7 +173,10 @@ enum label_id {
 	label_large_segbits_fault,
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	label_tlb_huge_update,
+	label_tail_huge_miss,
+	label_tail_huge_done,
 #endif
+	label_tail_miss,
 };
 
 UASM_L_LA(_second_part)
@@ -192,7 +195,10 @@ UASM_L_LA(_r3000_write_probe_fail)
 UASM_L_LA(_large_segbits_fault)
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 UASM_L_LA(_tlb_huge_update)
+UASM_L_LA(_tail_huge_miss)
+UASM_L_LA(_tail_huge_done)
 #endif
+UASM_L_LA(_tail_miss)
 
 static int hazard_instance;
 
@@ -706,8 +712,24 @@ static void build_huge_tlb_write_entry(u32 **p, struct uasm_label **l,
 	uasm_i_ori(p, tmp, tmp, PM_HUGE_MASK & 0xffff);
 	uasm_i_mtc0(p, tmp, C0_PAGEMASK);
 
-	build_tlb_write_entry(p, l, r, wmode);
+	if (wmode == tlb_random) { /* Caller is TLB Refill Handler */
+		build_tlb_write_entry(p, l, r, wmode);
+		build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
+		return;
+	}
+
+	/* Caller is TLB Load/Store/Modify Handler */
+	uasm_i_mfc0(p, tmp, C0_INDEX);
+	uasm_il_bltz(p, r, tmp, label_tail_huge_miss);
+	uasm_i_nop(p);
+	build_tlb_write_entry(p, l, r, tlb_indexed);
+	uasm_il_b(p, r, label_tail_huge_done);
+	uasm_i_nop(p);
+
+	uasm_l_tail_huge_miss(l, *p);
+	build_tlb_write_entry(p, l, r, tlb_random);
 
+	uasm_l_tail_huge_done(l, *p);
 	build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
 }
 
@@ -2026,7 +2048,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct uasm_label **l,
 	uasm_i_ori(p, ptr, ptr, sizeof(pte_t));
 	uasm_i_xori(p, ptr, ptr, sizeof(pte_t));
 	build_update_entries(p, tmp, ptr);
+	uasm_i_mfc0(p, ptr, C0_INDEX);
+	uasm_il_bltz(p, r, ptr, label_tail_miss);
+	uasm_i_nop(p);
 	build_tlb_write_entry(p, l, r, tlb_indexed);
+	uasm_il_b(p, r, label_leave);
+	uasm_i_nop(p);
+	uasm_l_tail_miss(l, *p);
+	build_tlb_write_entry(p, l, r, tlb_random);
 	uasm_l_leave(l, *p);
 	build_restore_work_registers(p);
 	uasm_i_eret(p); /* return from trap */
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature
  2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
                   ` (3 preceding siblings ...)
  2016-01-26 13:26 ` [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler Huacai Chen
@ 2016-01-26 13:26 ` Huacai Chen
  2016-01-26 13:42     ` James Hogan
  2016-01-26 13:26 ` [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT Huacai Chen
  5 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-3 maintains cache coherency by hardware. So we introduce a cpu
feature named cpu_has_coherent_cache and use it to modify MIPS's cache
flushing functions.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cpu-features.h                |  3 +++
 arch/mips/include/asm/cpu.h                         |  1 +
 .../asm/mach-loongson64/cpu-feature-overrides.h     |  1 +
 arch/mips/mm/c-r4k.c                                | 21 +++++++++++++++++++++
 4 files changed, 26 insertions(+)

diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
index e0ba50a..1ec3dea 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -148,6 +148,9 @@
 #ifndef cpu_has_xpa
 #define cpu_has_xpa		(cpu_data[0].options & MIPS_CPU_XPA)
 #endif
+#ifndef cpu_has_coherent_cache
+#define cpu_has_coherent_cache	(cpu_data[0].options & MIPS_CPU_CACHE_COHERENT)
+#endif
 #ifndef cpu_has_vtag_icache
 #define cpu_has_vtag_icache	(cpu_data[0].icache.flags & MIPS_CACHE_VTAG)
 #endif
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 5f50551..28471f0 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -391,6 +391,7 @@ enum cpu_type_enum {
 #define MIPS_CPU_NAN_LEGACY	0x40000000000ull /* Legacy NaN implemented */
 #define MIPS_CPU_NAN_2008	0x80000000000ull /* 2008 NaN implemented */
 #define MIPS_CPU_LDPTE		0x100000000000ull /* CPU has ldpte/lddir instructions */
+#define MIPS_CPU_CACHE_COHERENT	0x200000000000ull /* CPU maintains cache coherency by hardware */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
index c3406db..647d952 100644
--- a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
@@ -46,6 +46,7 @@
 #define cpu_has_local_ebase	0
 
 #define cpu_has_wsbh		IS_ENABLED(CONFIG_CPU_LOONGSON3)
+#define cpu_has_coherent_cache	IS_ENABLED(CONFIG_CPU_LOONGSON3)
 #define cpu_hwrena_impl_bits	0xc0000000
 
 #endif /* __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H */
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 2abc73d..65fb28c 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -429,6 +429,9 @@ static void r4k_blast_scache_setup(void)
 
 static inline void local_r4k___flush_cache_all(void * args)
 {
+	if (cpu_has_coherent_cache)
+		return;
+
 	switch (current_cpu_type()) {
 	case CPU_LOONGSON2:
 	case CPU_LOONGSON3:
@@ -457,6 +460,9 @@ static inline void local_r4k___flush_cache_all(void * args)
 
 static void r4k___flush_cache_all(void)
 {
+	if (cpu_has_coherent_cache)
+		return;
+
 	r4k_on_each_cpu(local_r4k___flush_cache_all, NULL);
 }
 
@@ -503,6 +509,9 @@ static void r4k_flush_cache_range(struct vm_area_struct *vma,
 {
 	int exec = vma->vm_flags & VM_EXEC;
 
+	if (cpu_has_coherent_cache)
+		return;
+
 	if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc))
 		r4k_on_each_cpu(local_r4k_flush_cache_range, vma);
 }
@@ -627,6 +636,9 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
 {
 	struct flush_cache_page_args args;
 
+	if (cpu_has_coherent_cache)
+		return;
+
 	args.vma = vma;
 	args.addr = addr;
 	args.pfn = pfn;
@@ -636,11 +648,17 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
 
 static inline void local_r4k_flush_data_cache_page(void * addr)
 {
+	if (cpu_has_coherent_cache)
+		return;
+
 	r4k_blast_dcache_page((unsigned long) addr);
 }
 
 static void r4k_flush_data_cache_page(unsigned long addr)
 {
+	if (cpu_has_coherent_cache)
+		return;
+
 	if (in_atomic())
 		local_r4k_flush_data_cache_page((void *)addr);
 	else
@@ -825,6 +843,9 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
 
 static void r4k_flush_cache_sigtramp(unsigned long addr)
 {
+	if (cpu_has_coherent_cache)
+		return;
+
 	r4k_on_each_cpu(local_r4k_flush_cache_sigtramp, (void *) addr);
 }
 
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
  2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
                   ` (4 preceding siblings ...)
  2016-01-26 13:26 ` [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature Huacai Chen
@ 2016-01-26 13:26 ` Huacai Chen
  2016-01-26 14:19     ` James Hogan
  5 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2016-01-26 13:26 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
TLB refill support, etc.

This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
enable those enhancements which cannot be probed at run time. If you
want a generic kernel to run on all Loongson 3 machines, please say 'N'
here. If you want a high-performance kernel to run on new Loongson 3
machines only, please say 'Y' here.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                      | 18 ++++++++++++++++++
 arch/mips/include/asm/hazards.h                        |  7 ++++---
 arch/mips/include/asm/io.h                             | 10 +++++-----
 arch/mips/include/asm/irqflags.h                       |  5 +++++
 .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
 arch/mips/mm/c-r4k.c                                   |  3 +++
 arch/mips/mm/page.c                                    |  9 +++++++++
 7 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 15faaf0..e6d6f7b 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1349,6 +1349,24 @@ config CPU_LOONGSON3
 		The Loongson 3 processor implements the MIPS64R2 instruction
 		set with many extensions.
 
+config LOONGSON3_ENHANCEMENT
+	bool "New Loongson 3 CPU Enhancements"
+	default n
+	select CPU_MIPSR2
+	select CPU_HAS_PREFETCH
+	depends on CPU_LOONGSON3
+	help
+	  New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
+	  R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
+	  FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
+	  Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
+	  Fast TLB refill support, etc.
+
+	  This option enable those enhancements which cannot be probed at run
+	  time. If you want a generic kernel to run on all Loongson 3 machines,
+	  please say 'N' here. If you want a high-performance kernel to run on
+	  new Loongson 3 machines only, please say 'Y' here.
+
 config CPU_LOONGSON2E
 	bool "Loongson 2E"
 	depends on SYS_HAS_CPU_LOONGSON2E
diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
index 7b99efd..dbb1eb6 100644
--- a/arch/mips/include/asm/hazards.h
+++ b/arch/mips/include/asm/hazards.h
@@ -22,7 +22,8 @@
 /*
  * TLB hazards
  */
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
+#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
+	!defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
 
 /*
  * MIPSR2 defines ehb for hazard avoidance
@@ -155,8 +156,8 @@ do {									\
 } while (0)
 
 #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
-	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
-	defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
+	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
+	defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
 
 /*
  * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 2b4dc7a..ecabc00 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
 #undef __IS_KSEG1
 }
 
-#ifdef CONFIG_CPU_CAVIUM_OCTEON
-#define war_octeon_io_reorder_wmb()		wmb()
+#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
+#define war_io_reorder_wmb()		wmb()
 #else
-#define war_octeon_io_reorder_wmb()		do { } while (0)
+#define war_io_reorder_wmb()		do { } while (0)
 #endif
 
 #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)			\
@@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,				\
 	volatile type *__mem;						\
 	type __val;							\
 									\
-	war_octeon_io_reorder_wmb();					\
+	war_io_reorder_wmb();					\
 									\
 	__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));	\
 									\
@@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)	\
 	volatile type *__addr;						\
 	type __val;							\
 									\
-	war_octeon_io_reorder_wmb();					\
+	war_io_reorder_wmb();					\
 									\
 	__addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
 									\
diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
index 65c351e..12f80b5 100644
--- a/arch/mips/include/asm/irqflags.h
+++ b/arch/mips/include/asm/irqflags.h
@@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
 	"	.set	push						\n"
 	"	.set	reorder						\n"
 	"	.set	noat						\n"
+#if defined(CONFIG_LOONGSON3_ENHANCEMENT)
+	"	mfc0	%[flags], $12					\n"
+	"	di							\n"
+#else
 	"	di	%[flags]					\n"
+#endif
 	"	andi	%[flags], 1					\n"
 	"	" __stringify(__irq_disable_hazard) "			\n"
 	"	.set	pop						\n"
diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
index da83482..8393bc54 100644
--- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
+++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
@@ -26,6 +26,12 @@
 	mfc0	t0, $5, 1
 	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
+#ifdef CONFIG_LOONGSON3_ENHANCEMENT
+	/* Enable STFill Buffer */
+	mfc0	t0, $16, 6
+	or	t0, 0x100
+	mtc0	t0, $16, 6
+#endif
 	_ehb
 	.set	pop
 #endif
@@ -46,6 +52,12 @@
 	mfc0	t0, $5, 1
 	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
+#ifdef CONFIG_LOONGSON3_ENHANCEMENT
+	/* Enable STFill Buffer */
+	mfc0	t0, $16, 6
+	or	t0, 0x100
+	mtc0	t0, $16, 6
+#endif
 	_ehb
 	.set	pop
 #endif
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 65fb28c..903d8da 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -1170,6 +1170,9 @@ static void probe_pcache(void)
 					  c->dcache.ways *
 					  c->dcache.linesz;
 		c->dcache.waybit = 0;
+#ifdef CONFIG_CPU_HAS_PREFETCH
+		c->options |= MIPS_CPU_PREFETCH;
+#endif
 		break;
 
 	case CPU_CAVIUM_OCTEON3:
diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
index 885d73f..c41953c 100644
--- a/arch/mips/mm/page.c
+++ b/arch/mips/mm/page.c
@@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
 			}
 			break;
 
+		case CPU_LOONGSON3:
+			/* Loongson-3 only support the Pref_Load/Pref_Store. */
+			pref_bias_clear_store = 128;
+			pref_bias_copy_load = 128;
+			pref_bias_copy_store = 128;
+			pref_src_mode = Pref_Load;
+			pref_dst_mode = Pref_Store;
+			break;
+
 		default:
 			pref_bias_clear_store = 128;
 			pref_bias_copy_load = 256;
-- 
2.4.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature
@ 2016-01-26 13:42     ` James Hogan
  0 siblings, 0 replies; 21+ messages in thread
From: James Hogan @ 2016-01-26 13:42 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill, linux-mips,
	Fuxin Zhang, Zhangjin Wu

[-- Attachment #1: Type: text/plain, Size: 5045 bytes --]

Hi,

On Tue, Jan 26, 2016 at 09:26:23PM +0800, Huacai Chen wrote:
> Loongson-3 maintains cache coherency by hardware. So we introduce a cpu
> feature named cpu_has_coherent_cache and use it to modify MIPS's cache
> flushing functions.

This is rather ambiguous (the phrase "cache coherency" can be associated
with dcache coherency between cores). Are you saying that the icache is
coherent with the dcache, such that writes to dcache are immediately
visible to instruction fetches on all CPUs in the system without any
icache flushing?

If so, I think that needs clarifying, e.g. cpu_has_coherent_icache.

Perhaps it should be a flag of icache along with MIPS_CACHE_IC_F_DC too
which is already intended to avoid the dcache flushes, but not necessary
the icache flushes.

Cheers
James

> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/include/asm/cpu-features.h                |  3 +++
>  arch/mips/include/asm/cpu.h                         |  1 +
>  .../asm/mach-loongson64/cpu-feature-overrides.h     |  1 +
>  arch/mips/mm/c-r4k.c                                | 21 +++++++++++++++++++++
>  4 files changed, 26 insertions(+)
> 
> diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
> index e0ba50a..1ec3dea 100644
> --- a/arch/mips/include/asm/cpu-features.h
> +++ b/arch/mips/include/asm/cpu-features.h
> @@ -148,6 +148,9 @@
>  #ifndef cpu_has_xpa
>  #define cpu_has_xpa		(cpu_data[0].options & MIPS_CPU_XPA)
>  #endif
> +#ifndef cpu_has_coherent_cache
> +#define cpu_has_coherent_cache	(cpu_data[0].options & MIPS_CPU_CACHE_COHERENT)
> +#endif
>  #ifndef cpu_has_vtag_icache
>  #define cpu_has_vtag_icache	(cpu_data[0].icache.flags & MIPS_CACHE_VTAG)
>  #endif
> diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
> index 5f50551..28471f0 100644
> --- a/arch/mips/include/asm/cpu.h
> +++ b/arch/mips/include/asm/cpu.h
> @@ -391,6 +391,7 @@ enum cpu_type_enum {
>  #define MIPS_CPU_NAN_LEGACY	0x40000000000ull /* Legacy NaN implemented */
>  #define MIPS_CPU_NAN_2008	0x80000000000ull /* 2008 NaN implemented */
>  #define MIPS_CPU_LDPTE		0x100000000000ull /* CPU has ldpte/lddir instructions */
> +#define MIPS_CPU_CACHE_COHERENT	0x200000000000ull /* CPU maintains cache coherency by hardware */
>  
>  /*
>   * CPU ASE encodings
> diff --git a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
> index c3406db..647d952 100644
> --- a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
> +++ b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
> @@ -46,6 +46,7 @@
>  #define cpu_has_local_ebase	0
>  
>  #define cpu_has_wsbh		IS_ENABLED(CONFIG_CPU_LOONGSON3)
> +#define cpu_has_coherent_cache	IS_ENABLED(CONFIG_CPU_LOONGSON3)
>  #define cpu_hwrena_impl_bits	0xc0000000
>  
>  #endif /* __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H */
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 2abc73d..65fb28c 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -429,6 +429,9 @@ static void r4k_blast_scache_setup(void)
>  
>  static inline void local_r4k___flush_cache_all(void * args)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	switch (current_cpu_type()) {
>  	case CPU_LOONGSON2:
>  	case CPU_LOONGSON3:
> @@ -457,6 +460,9 @@ static inline void local_r4k___flush_cache_all(void * args)
>  
>  static void r4k___flush_cache_all(void)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	r4k_on_each_cpu(local_r4k___flush_cache_all, NULL);
>  }
>  
> @@ -503,6 +509,9 @@ static void r4k_flush_cache_range(struct vm_area_struct *vma,
>  {
>  	int exec = vma->vm_flags & VM_EXEC;
>  
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc))
>  		r4k_on_each_cpu(local_r4k_flush_cache_range, vma);
>  }
> @@ -627,6 +636,9 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>  {
>  	struct flush_cache_page_args args;
>  
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	args.vma = vma;
>  	args.addr = addr;
>  	args.pfn = pfn;
> @@ -636,11 +648,17 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>  
>  static inline void local_r4k_flush_data_cache_page(void * addr)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	r4k_blast_dcache_page((unsigned long) addr);
>  }
>  
>  static void r4k_flush_data_cache_page(unsigned long addr)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	if (in_atomic())
>  		local_r4k_flush_data_cache_page((void *)addr);
>  	else
> @@ -825,6 +843,9 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
>  
>  static void r4k_flush_cache_sigtramp(unsigned long addr)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	r4k_on_each_cpu(local_r4k_flush_cache_sigtramp, (void *) addr);
>  }
>  
> -- 
> 2.4.6
> 
> 
> 
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature
@ 2016-01-26 13:42     ` James Hogan
  0 siblings, 0 replies; 21+ messages in thread
From: James Hogan @ 2016-01-26 13:42 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill, linux-mips,
	Fuxin Zhang, Zhangjin Wu

[-- Attachment #1: Type: text/plain, Size: 5045 bytes --]

Hi,

On Tue, Jan 26, 2016 at 09:26:23PM +0800, Huacai Chen wrote:
> Loongson-3 maintains cache coherency by hardware. So we introduce a cpu
> feature named cpu_has_coherent_cache and use it to modify MIPS's cache
> flushing functions.

This is rather ambiguous (the phrase "cache coherency" can be associated
with dcache coherency between cores). Are you saying that the icache is
coherent with the dcache, such that writes to dcache are immediately
visible to instruction fetches on all CPUs in the system without any
icache flushing?

If so, I think that needs clarifying, e.g. cpu_has_coherent_icache.

Perhaps it should be a flag of icache along with MIPS_CACHE_IC_F_DC too
which is already intended to avoid the dcache flushes, but not necessary
the icache flushes.

Cheers
James

> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/include/asm/cpu-features.h                |  3 +++
>  arch/mips/include/asm/cpu.h                         |  1 +
>  .../asm/mach-loongson64/cpu-feature-overrides.h     |  1 +
>  arch/mips/mm/c-r4k.c                                | 21 +++++++++++++++++++++
>  4 files changed, 26 insertions(+)
> 
> diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
> index e0ba50a..1ec3dea 100644
> --- a/arch/mips/include/asm/cpu-features.h
> +++ b/arch/mips/include/asm/cpu-features.h
> @@ -148,6 +148,9 @@
>  #ifndef cpu_has_xpa
>  #define cpu_has_xpa		(cpu_data[0].options & MIPS_CPU_XPA)
>  #endif
> +#ifndef cpu_has_coherent_cache
> +#define cpu_has_coherent_cache	(cpu_data[0].options & MIPS_CPU_CACHE_COHERENT)
> +#endif
>  #ifndef cpu_has_vtag_icache
>  #define cpu_has_vtag_icache	(cpu_data[0].icache.flags & MIPS_CACHE_VTAG)
>  #endif
> diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
> index 5f50551..28471f0 100644
> --- a/arch/mips/include/asm/cpu.h
> +++ b/arch/mips/include/asm/cpu.h
> @@ -391,6 +391,7 @@ enum cpu_type_enum {
>  #define MIPS_CPU_NAN_LEGACY	0x40000000000ull /* Legacy NaN implemented */
>  #define MIPS_CPU_NAN_2008	0x80000000000ull /* 2008 NaN implemented */
>  #define MIPS_CPU_LDPTE		0x100000000000ull /* CPU has ldpte/lddir instructions */
> +#define MIPS_CPU_CACHE_COHERENT	0x200000000000ull /* CPU maintains cache coherency by hardware */
>  
>  /*
>   * CPU ASE encodings
> diff --git a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
> index c3406db..647d952 100644
> --- a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
> +++ b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
> @@ -46,6 +46,7 @@
>  #define cpu_has_local_ebase	0
>  
>  #define cpu_has_wsbh		IS_ENABLED(CONFIG_CPU_LOONGSON3)
> +#define cpu_has_coherent_cache	IS_ENABLED(CONFIG_CPU_LOONGSON3)
>  #define cpu_hwrena_impl_bits	0xc0000000
>  
>  #endif /* __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H */
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 2abc73d..65fb28c 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -429,6 +429,9 @@ static void r4k_blast_scache_setup(void)
>  
>  static inline void local_r4k___flush_cache_all(void * args)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	switch (current_cpu_type()) {
>  	case CPU_LOONGSON2:
>  	case CPU_LOONGSON3:
> @@ -457,6 +460,9 @@ static inline void local_r4k___flush_cache_all(void * args)
>  
>  static void r4k___flush_cache_all(void)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	r4k_on_each_cpu(local_r4k___flush_cache_all, NULL);
>  }
>  
> @@ -503,6 +509,9 @@ static void r4k_flush_cache_range(struct vm_area_struct *vma,
>  {
>  	int exec = vma->vm_flags & VM_EXEC;
>  
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc))
>  		r4k_on_each_cpu(local_r4k_flush_cache_range, vma);
>  }
> @@ -627,6 +636,9 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>  {
>  	struct flush_cache_page_args args;
>  
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	args.vma = vma;
>  	args.addr = addr;
>  	args.pfn = pfn;
> @@ -636,11 +648,17 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>  
>  static inline void local_r4k_flush_data_cache_page(void * addr)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	r4k_blast_dcache_page((unsigned long) addr);
>  }
>  
>  static void r4k_flush_data_cache_page(unsigned long addr)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	if (in_atomic())
>  		local_r4k_flush_data_cache_page((void *)addr);
>  	else
> @@ -825,6 +843,9 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
>  
>  static void r4k_flush_cache_sigtramp(unsigned long addr)
>  {
> +	if (cpu_has_coherent_cache)
> +		return;
> +
>  	r4k_on_each_cpu(local_r4k_flush_cache_sigtramp, (void *) addr);
>  }
>  
> -- 
> 2.4.6
> 
> 
> 
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
@ 2016-01-26 14:19     ` James Hogan
  0 siblings, 0 replies; 21+ messages in thread
From: James Hogan @ 2016-01-26 14:19 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill, linux-mips,
	Fuxin Zhang, Zhangjin Wu

[-- Attachment #1: Type: text/plain, Size: 7987 bytes --]

On Tue, Jan 26, 2016 at 09:26:24PM +0800, Huacai Chen wrote:
> New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
> Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
> L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
> register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
> TLB refill support, etc.
> 
> This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
> enable those enhancements which cannot be probed at run time. If you
> want a generic kernel to run on all Loongson 3 machines, please say 'N'
> here. If you want a high-performance kernel to run on new Loongson 3
> machines only, please say 'Y' here.
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/Kconfig                                      | 18 ++++++++++++++++++
>  arch/mips/include/asm/hazards.h                        |  7 ++++---
>  arch/mips/include/asm/io.h                             | 10 +++++-----
>  arch/mips/include/asm/irqflags.h                       |  5 +++++
>  .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
>  arch/mips/mm/c-r4k.c                                   |  3 +++
>  arch/mips/mm/page.c                                    |  9 +++++++++
>  7 files changed, 56 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 15faaf0..e6d6f7b 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1349,6 +1349,24 @@ config CPU_LOONGSON3
>  		The Loongson 3 processor implements the MIPS64R2 instruction
>  		set with many extensions.
>  
> +config LOONGSON3_ENHANCEMENT
> +	bool "New Loongson 3 CPU Enhancements"
> +	default n

no need, n is the default.

> +	select CPU_MIPSR2
> +	select CPU_HAS_PREFETCH
> +	depends on CPU_LOONGSON3
> +	help
> +	  New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
> +	  R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
> +	  FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
> +	  Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
> +	  Fast TLB refill support, etc.
> +
> +	  This option enable those enhancements which cannot be probed at run
> +	  time. If you want a generic kernel to run on all Loongson 3 machines,
> +	  please say 'N' here. If you want a high-performance kernel to run on
> +	  new Loongson 3 machines only, please say 'Y' here.
> +
>  config CPU_LOONGSON2E
>  	bool "Loongson 2E"
>  	depends on SYS_HAS_CPU_LOONGSON2E
> diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
> index 7b99efd..dbb1eb6 100644
> --- a/arch/mips/include/asm/hazards.h
> +++ b/arch/mips/include/asm/hazards.h
> @@ -22,7 +22,8 @@
>  /*
>   * TLB hazards
>   */
> -#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
> +#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
> +	!defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
>  
>  /*
>   * MIPSR2 defines ehb for hazard avoidance
> @@ -155,8 +156,8 @@ do {									\
>  } while (0)
>  
>  #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
> -	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
> -	defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
> +	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
> +	defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
>  
>  /*
>   * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
> diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
> index 2b4dc7a..ecabc00 100644
> --- a/arch/mips/include/asm/io.h
> +++ b/arch/mips/include/asm/io.h
> @@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
>  #undef __IS_KSEG1
>  }
>  
> -#ifdef CONFIG_CPU_CAVIUM_OCTEON
> -#define war_octeon_io_reorder_wmb()		wmb()
> +#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
> +#define war_io_reorder_wmb()		wmb()
>  #else
> -#define war_octeon_io_reorder_wmb()		do { } while (0)
> +#define war_io_reorder_wmb()		do { } while (0)
>  #endif

Doesn't this slow things down when enabled, or is it required due to
STFill buffer being enabled or something?

>  
>  #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)			\
> @@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,				\
>  	volatile type *__mem;						\
>  	type __val;							\
>  									\
> -	war_octeon_io_reorder_wmb();					\
> +	war_io_reorder_wmb();					\
>  									\
>  	__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));	\
>  									\
> @@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)	\
>  	volatile type *__addr;						\
>  	type __val;							\
>  									\
> -	war_octeon_io_reorder_wmb();					\
> +	war_io_reorder_wmb();					\
>  									\
>  	__addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
>  									\
> diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
> index 65c351e..12f80b5 100644
> --- a/arch/mips/include/asm/irqflags.h
> +++ b/arch/mips/include/asm/irqflags.h
> @@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
>  	"	.set	push						\n"
>  	"	.set	reorder						\n"
>  	"	.set	noat						\n"
> +#if defined(CONFIG_LOONGSON3_ENHANCEMENT)
> +	"	mfc0	%[flags], $12					\n"
> +	"	di							\n"

Does this somehow help performance, or is it necessary when STFill
buffer is enabled?

> +#else
>  	"	di	%[flags]					\n"
> +#endif
>  	"	andi	%[flags], 1					\n"
>  	"	" __stringify(__irq_disable_hazard) "			\n"
>  	"	.set	pop						\n"
> diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> index da83482..8393bc54 100644
> --- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> +++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> @@ -26,6 +26,12 @@
>  	mfc0	t0, $5, 1
>  	or	t0, (0x1 << 29)
>  	mtc0	t0, $5, 1
> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
> +	/* Enable STFill Buffer */
> +	mfc0	t0, $16, 6
> +	or	t0, 0x100
> +	mtc0	t0, $16, 6
> +#endif
>  	_ehb
>  	.set	pop
>  #endif
> @@ -46,6 +52,12 @@
>  	mfc0	t0, $5, 1
>  	or	t0, (0x1 << 29)
>  	mtc0	t0, $5, 1
> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
> +	/* Enable STFill Buffer */
> +	mfc0	t0, $16, 6
> +	or	t0, 0x100
> +	mtc0	t0, $16, 6
> +#endif

What does the STFill buffer do?

Given that you can get a portable kernel without this, can this not be
done from C code depending on the PRid?

>  	_ehb
>  	.set	pop
>  #endif
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 65fb28c..903d8da 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -1170,6 +1170,9 @@ static void probe_pcache(void)
>  					  c->dcache.ways *
>  					  c->dcache.linesz;
>  		c->dcache.waybit = 0;
> +#ifdef CONFIG_CPU_HAS_PREFETCH
> +		c->options |= MIPS_CPU_PREFETCH;
> +#endif

Can't do that based on PRid?

Cheers
James

>  		break;
>  
>  	case CPU_CAVIUM_OCTEON3:
> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
> index 885d73f..c41953c 100644
> --- a/arch/mips/mm/page.c
> +++ b/arch/mips/mm/page.c
> @@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
>  			}
>  			break;
>  
> +		case CPU_LOONGSON3:
> +			/* Loongson-3 only support the Pref_Load/Pref_Store. */
> +			pref_bias_clear_store = 128;
> +			pref_bias_copy_load = 128;
> +			pref_bias_copy_store = 128;
> +			pref_src_mode = Pref_Load;
> +			pref_dst_mode = Pref_Store;
> +			break;
> +
>  		default:
>  			pref_bias_clear_store = 128;
>  			pref_bias_copy_load = 256;
> -- 
> 2.4.6
> 
> 
> 
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
@ 2016-01-26 14:19     ` James Hogan
  0 siblings, 0 replies; 21+ messages in thread
From: James Hogan @ 2016-01-26 14:19 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill, linux-mips,
	Fuxin Zhang, Zhangjin Wu

[-- Attachment #1: Type: text/plain, Size: 7987 bytes --]

On Tue, Jan 26, 2016 at 09:26:24PM +0800, Huacai Chen wrote:
> New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
> Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
> L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
> register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
> TLB refill support, etc.
> 
> This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
> enable those enhancements which cannot be probed at run time. If you
> want a generic kernel to run on all Loongson 3 machines, please say 'N'
> here. If you want a high-performance kernel to run on new Loongson 3
> machines only, please say 'Y' here.
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/Kconfig                                      | 18 ++++++++++++++++++
>  arch/mips/include/asm/hazards.h                        |  7 ++++---
>  arch/mips/include/asm/io.h                             | 10 +++++-----
>  arch/mips/include/asm/irqflags.h                       |  5 +++++
>  .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
>  arch/mips/mm/c-r4k.c                                   |  3 +++
>  arch/mips/mm/page.c                                    |  9 +++++++++
>  7 files changed, 56 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 15faaf0..e6d6f7b 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1349,6 +1349,24 @@ config CPU_LOONGSON3
>  		The Loongson 3 processor implements the MIPS64R2 instruction
>  		set with many extensions.
>  
> +config LOONGSON3_ENHANCEMENT
> +	bool "New Loongson 3 CPU Enhancements"
> +	default n

no need, n is the default.

> +	select CPU_MIPSR2
> +	select CPU_HAS_PREFETCH
> +	depends on CPU_LOONGSON3
> +	help
> +	  New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
> +	  R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
> +	  FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
> +	  Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
> +	  Fast TLB refill support, etc.
> +
> +	  This option enable those enhancements which cannot be probed at run
> +	  time. If you want a generic kernel to run on all Loongson 3 machines,
> +	  please say 'N' here. If you want a high-performance kernel to run on
> +	  new Loongson 3 machines only, please say 'Y' here.
> +
>  config CPU_LOONGSON2E
>  	bool "Loongson 2E"
>  	depends on SYS_HAS_CPU_LOONGSON2E
> diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
> index 7b99efd..dbb1eb6 100644
> --- a/arch/mips/include/asm/hazards.h
> +++ b/arch/mips/include/asm/hazards.h
> @@ -22,7 +22,8 @@
>  /*
>   * TLB hazards
>   */
> -#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
> +#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
> +	!defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
>  
>  /*
>   * MIPSR2 defines ehb for hazard avoidance
> @@ -155,8 +156,8 @@ do {									\
>  } while (0)
>  
>  #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
> -	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
> -	defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
> +	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
> +	defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
>  
>  /*
>   * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
> diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
> index 2b4dc7a..ecabc00 100644
> --- a/arch/mips/include/asm/io.h
> +++ b/arch/mips/include/asm/io.h
> @@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
>  #undef __IS_KSEG1
>  }
>  
> -#ifdef CONFIG_CPU_CAVIUM_OCTEON
> -#define war_octeon_io_reorder_wmb()		wmb()
> +#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
> +#define war_io_reorder_wmb()		wmb()
>  #else
> -#define war_octeon_io_reorder_wmb()		do { } while (0)
> +#define war_io_reorder_wmb()		do { } while (0)
>  #endif

Doesn't this slow things down when enabled, or is it required due to
STFill buffer being enabled or something?

>  
>  #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)			\
> @@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,				\
>  	volatile type *__mem;						\
>  	type __val;							\
>  									\
> -	war_octeon_io_reorder_wmb();					\
> +	war_io_reorder_wmb();					\
>  									\
>  	__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));	\
>  									\
> @@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)	\
>  	volatile type *__addr;						\
>  	type __val;							\
>  									\
> -	war_octeon_io_reorder_wmb();					\
> +	war_io_reorder_wmb();					\
>  									\
>  	__addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
>  									\
> diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
> index 65c351e..12f80b5 100644
> --- a/arch/mips/include/asm/irqflags.h
> +++ b/arch/mips/include/asm/irqflags.h
> @@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
>  	"	.set	push						\n"
>  	"	.set	reorder						\n"
>  	"	.set	noat						\n"
> +#if defined(CONFIG_LOONGSON3_ENHANCEMENT)
> +	"	mfc0	%[flags], $12					\n"
> +	"	di							\n"

Does this somehow help performance, or is it necessary when STFill
buffer is enabled?

> +#else
>  	"	di	%[flags]					\n"
> +#endif
>  	"	andi	%[flags], 1					\n"
>  	"	" __stringify(__irq_disable_hazard) "			\n"
>  	"	.set	pop						\n"
> diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> index da83482..8393bc54 100644
> --- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> +++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> @@ -26,6 +26,12 @@
>  	mfc0	t0, $5, 1
>  	or	t0, (0x1 << 29)
>  	mtc0	t0, $5, 1
> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
> +	/* Enable STFill Buffer */
> +	mfc0	t0, $16, 6
> +	or	t0, 0x100
> +	mtc0	t0, $16, 6
> +#endif
>  	_ehb
>  	.set	pop
>  #endif
> @@ -46,6 +52,12 @@
>  	mfc0	t0, $5, 1
>  	or	t0, (0x1 << 29)
>  	mtc0	t0, $5, 1
> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
> +	/* Enable STFill Buffer */
> +	mfc0	t0, $16, 6
> +	or	t0, 0x100
> +	mtc0	t0, $16, 6
> +#endif

What does the STFill buffer do?

Given that you can get a portable kernel without this, can this not be
done from C code depending on the PRid?

>  	_ehb
>  	.set	pop
>  #endif
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 65fb28c..903d8da 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -1170,6 +1170,9 @@ static void probe_pcache(void)
>  					  c->dcache.ways *
>  					  c->dcache.linesz;
>  		c->dcache.waybit = 0;
> +#ifdef CONFIG_CPU_HAS_PREFETCH
> +		c->options |= MIPS_CPU_PREFETCH;
> +#endif

Can't do that based on PRid?

Cheers
James

>  		break;
>  
>  	case CPU_CAVIUM_OCTEON3:
> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
> index 885d73f..c41953c 100644
> --- a/arch/mips/mm/page.c
> +++ b/arch/mips/mm/page.c
> @@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
>  			}
>  			break;
>  
> +		case CPU_LOONGSON3:
> +			/* Loongson-3 only support the Pref_Load/Pref_Store. */
> +			pref_bias_clear_store = 128;
> +			pref_bias_copy_load = 128;
> +			pref_bias_copy_store = 128;
> +			pref_src_mode = Pref_Load;
> +			pref_dst_mode = Pref_Store;
> +			break;
> +
>  		default:
>  			pref_bias_clear_store = 128;
>  			pref_bias_copy_load = 256;
> -- 
> 2.4.6
> 
> 
> 
> 
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler
  2016-01-26 13:26 ` [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler Huacai Chen
@ 2016-01-26 21:15   ` David Daney
  2016-01-27  4:53     ` Huacai Chen
  2016-01-27  5:50   ` Joshua Kinard
  1 sibling, 1 reply; 21+ messages in thread
From: David Daney @ 2016-01-26 21:15 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill, linux-mips,
	Fuxin Zhang, Zhangjin Wu

On 01/26/2016 05:26 AM, Huacai Chen wrote:
> If a tlb miss triggered when EXL=1,

How is that possible?  The exception handlers are not in mapped memory, 
and we clear EXL very early in the exception handlers.

In valid code, how are you getting TLB related exceptions when EXL=1?

> tlb refill exception is treated as
> tlb invalid exception, so tlbp may fails. In this situation, CP0_Index
> register doesn't contain a valid value. This may not be a problem for
> VTLB since it is fully-associative. However, FTLB is set-associative so
> not every tlb entry is valid for a specific address. Thus, we should
> use tlbwr instead of tlbwi when tlbp fails.
>
> There is a similar case for huge page, so build_huge_tlb_write_entry()
> is also modified. If wmode != tlb_random, that means the caller is tlb
> invalid handler, we should select tlbr/tlbi depend on the tlbp result.
>
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>   arch/mips/mm/tlbex.c | 31 ++++++++++++++++++++++++++++++-
>   1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index d0975cd..da68ffb 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -173,7 +173,10 @@ enum label_id {
>   	label_large_segbits_fault,
>   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>   	label_tlb_huge_update,
> +	label_tail_huge_miss,
> +	label_tail_huge_done,
>   #endif
> +	label_tail_miss,
>   };
>
>   UASM_L_LA(_second_part)
> @@ -192,7 +195,10 @@ UASM_L_LA(_r3000_write_probe_fail)
>   UASM_L_LA(_large_segbits_fault)
>   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>   UASM_L_LA(_tlb_huge_update)
> +UASM_L_LA(_tail_huge_miss)
> +UASM_L_LA(_tail_huge_done)
>   #endif
> +UASM_L_LA(_tail_miss)
>
>   static int hazard_instance;
>
> @@ -706,8 +712,24 @@ static void build_huge_tlb_write_entry(u32 **p, struct uasm_label **l,
>   	uasm_i_ori(p, tmp, tmp, PM_HUGE_MASK & 0xffff);
>   	uasm_i_mtc0(p, tmp, C0_PAGEMASK);
>
> -	build_tlb_write_entry(p, l, r, wmode);
> +	if (wmode == tlb_random) { /* Caller is TLB Refill Handler */
> +		build_tlb_write_entry(p, l, r, wmode);
> +		build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
> +		return;
> +	}
> +
> +	/* Caller is TLB Load/Store/Modify Handler */
> +	uasm_i_mfc0(p, tmp, C0_INDEX);
> +	uasm_il_bltz(p, r, tmp, label_tail_huge_miss);
> +	uasm_i_nop(p);
> +	build_tlb_write_entry(p, l, r, tlb_indexed);
> +	uasm_il_b(p, r, label_tail_huge_done);
> +	uasm_i_nop(p);
> +
> +	uasm_l_tail_huge_miss(l, *p);
> +	build_tlb_write_entry(p, l, r, tlb_random);
>
> +	uasm_l_tail_huge_done(l, *p);
>   	build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
>   }
>
> @@ -2026,7 +2048,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct uasm_label **l,
>   	uasm_i_ori(p, ptr, ptr, sizeof(pte_t));
>   	uasm_i_xori(p, ptr, ptr, sizeof(pte_t));
>   	build_update_entries(p, tmp, ptr);
> +	uasm_i_mfc0(p, ptr, C0_INDEX);
> +	uasm_il_bltz(p, r, ptr, label_tail_miss);
> +	uasm_i_nop(p);
>   	build_tlb_write_entry(p, l, r, tlb_indexed);
> +	uasm_il_b(p, r, label_leave);
> +	uasm_i_nop(p);
> +	uasm_l_tail_miss(l, *p);
> +	build_tlb_write_entry(p, l, r, tlb_random);
>   	uasm_l_leave(l, *p);
>   	build_restore_work_registers(p);
>   	uasm_i_eret(p); /* return from trap */
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler
  2016-01-26 21:15   ` David Daney
@ 2016-01-27  4:53     ` Huacai Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-27  4:53 UTC (permalink / raw)
  To: David Daney
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

When unaligned access triggered, do_ade() will access user address
with EXL=1, and that may trigger tlb refill.

Huacai

On Wed, Jan 27, 2016 at 5:15 AM, David Daney <ddaney.cavm@gmail.com> wrote:
> On 01/26/2016 05:26 AM, Huacai Chen wrote:
>>
>> If a tlb miss triggered when EXL=1,
>
>
> How is that possible?  The exception handlers are not in mapped memory, and
> we clear EXL very early in the exception handlers.
>
> In valid code, how are you getting TLB related exceptions when EXL=1?
>
>
>> tlb refill exception is treated as
>> tlb invalid exception, so tlbp may fails. In this situation, CP0_Index
>> register doesn't contain a valid value. This may not be a problem for
>> VTLB since it is fully-associative. However, FTLB is set-associative so
>> not every tlb entry is valid for a specific address. Thus, we should
>> use tlbwr instead of tlbwi when tlbp fails.
>>
>> There is a similar case for huge page, so build_huge_tlb_write_entry()
>> is also modified. If wmode != tlb_random, that means the caller is tlb
>> invalid handler, we should select tlbr/tlbi depend on the tlbp result.
>>
>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>   arch/mips/mm/tlbex.c | 31 ++++++++++++++++++++++++++++++-
>>   1 file changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
>> index d0975cd..da68ffb 100644
>> --- a/arch/mips/mm/tlbex.c
>> +++ b/arch/mips/mm/tlbex.c
>> @@ -173,7 +173,10 @@ enum label_id {
>>         label_large_segbits_fault,
>>   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>>         label_tlb_huge_update,
>> +       label_tail_huge_miss,
>> +       label_tail_huge_done,
>>   #endif
>> +       label_tail_miss,
>>   };
>>
>>   UASM_L_LA(_second_part)
>> @@ -192,7 +195,10 @@ UASM_L_LA(_r3000_write_probe_fail)
>>   UASM_L_LA(_large_segbits_fault)
>>   #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>>   UASM_L_LA(_tlb_huge_update)
>> +UASM_L_LA(_tail_huge_miss)
>> +UASM_L_LA(_tail_huge_done)
>>   #endif
>> +UASM_L_LA(_tail_miss)
>>
>>   static int hazard_instance;
>>
>> @@ -706,8 +712,24 @@ static void build_huge_tlb_write_entry(u32 **p,
>> struct uasm_label **l,
>>         uasm_i_ori(p, tmp, tmp, PM_HUGE_MASK & 0xffff);
>>         uasm_i_mtc0(p, tmp, C0_PAGEMASK);
>>
>> -       build_tlb_write_entry(p, l, r, wmode);
>> +       if (wmode == tlb_random) { /* Caller is TLB Refill Handler */
>> +               build_tlb_write_entry(p, l, r, wmode);
>> +               build_restore_pagemask(p, r, tmp, label_leave,
>> restore_scratch);
>> +               return;
>> +       }
>> +
>> +       /* Caller is TLB Load/Store/Modify Handler */
>> +       uasm_i_mfc0(p, tmp, C0_INDEX);
>> +       uasm_il_bltz(p, r, tmp, label_tail_huge_miss);
>> +       uasm_i_nop(p);
>> +       build_tlb_write_entry(p, l, r, tlb_indexed);
>> +       uasm_il_b(p, r, label_tail_huge_done);
>> +       uasm_i_nop(p);
>> +
>> +       uasm_l_tail_huge_miss(l, *p);
>> +       build_tlb_write_entry(p, l, r, tlb_random);
>>
>> +       uasm_l_tail_huge_done(l, *p);
>>         build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
>>   }
>>
>> @@ -2026,7 +2048,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct
>> uasm_label **l,
>>         uasm_i_ori(p, ptr, ptr, sizeof(pte_t));
>>         uasm_i_xori(p, ptr, ptr, sizeof(pte_t));
>>         build_update_entries(p, tmp, ptr);
>> +       uasm_i_mfc0(p, ptr, C0_INDEX);
>> +       uasm_il_bltz(p, r, ptr, label_tail_miss);
>> +       uasm_i_nop(p);
>>         build_tlb_write_entry(p, l, r, tlb_indexed);
>> +       uasm_il_b(p, r, label_leave);
>> +       uasm_i_nop(p);
>> +       uasm_l_tail_miss(l, *p);
>> +       build_tlb_write_entry(p, l, r, tlb_random);
>>         uasm_l_leave(l, *p);
>>         build_restore_work_registers(p);
>>         uasm_i_eret(p); /* return from trap */
>>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature
  2016-01-26 13:42     ` James Hogan
  (?)
@ 2016-01-27  4:58     ` Huacai Chen
  2016-01-27 11:15       ` James Hogan
  -1 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2016-01-27  4:58 UTC (permalink / raw)
  To: James Hogan
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

"cache coherency" here means the coherency across cores, not ic/dc
coherency, could you please suggest a suitable name?

Huacai

On Tue, Jan 26, 2016 at 9:42 PM, James Hogan <james.hogan@imgtec.com> wrote:
> Hi,
>
> On Tue, Jan 26, 2016 at 09:26:23PM +0800, Huacai Chen wrote:
>> Loongson-3 maintains cache coherency by hardware. So we introduce a cpu
>> feature named cpu_has_coherent_cache and use it to modify MIPS's cache
>> flushing functions.
>
> This is rather ambiguous (the phrase "cache coherency" can be associated
> with dcache coherency between cores). Are you saying that the icache is
> coherent with the dcache, such that writes to dcache are immediately
> visible to instruction fetches on all CPUs in the system without any
> icache flushing?
>
> If so, I think that needs clarifying, e.g. cpu_has_coherent_icache.
>
> Perhaps it should be a flag of icache along with MIPS_CACHE_IC_F_DC too
> which is already intended to avoid the dcache flushes, but not necessary
> the icache flushes.
>
> Cheers
> James
>
>>
>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>  arch/mips/include/asm/cpu-features.h                |  3 +++
>>  arch/mips/include/asm/cpu.h                         |  1 +
>>  .../asm/mach-loongson64/cpu-feature-overrides.h     |  1 +
>>  arch/mips/mm/c-r4k.c                                | 21 +++++++++++++++++++++
>>  4 files changed, 26 insertions(+)
>>
>> diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
>> index e0ba50a..1ec3dea 100644
>> --- a/arch/mips/include/asm/cpu-features.h
>> +++ b/arch/mips/include/asm/cpu-features.h
>> @@ -148,6 +148,9 @@
>>  #ifndef cpu_has_xpa
>>  #define cpu_has_xpa          (cpu_data[0].options & MIPS_CPU_XPA)
>>  #endif
>> +#ifndef cpu_has_coherent_cache
>> +#define cpu_has_coherent_cache       (cpu_data[0].options & MIPS_CPU_CACHE_COHERENT)
>> +#endif
>>  #ifndef cpu_has_vtag_icache
>>  #define cpu_has_vtag_icache  (cpu_data[0].icache.flags & MIPS_CACHE_VTAG)
>>  #endif
>> diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
>> index 5f50551..28471f0 100644
>> --- a/arch/mips/include/asm/cpu.h
>> +++ b/arch/mips/include/asm/cpu.h
>> @@ -391,6 +391,7 @@ enum cpu_type_enum {
>>  #define MIPS_CPU_NAN_LEGACY  0x40000000000ull /* Legacy NaN implemented */
>>  #define MIPS_CPU_NAN_2008    0x80000000000ull /* 2008 NaN implemented */
>>  #define MIPS_CPU_LDPTE               0x100000000000ull /* CPU has ldpte/lddir instructions */
>> +#define MIPS_CPU_CACHE_COHERENT      0x200000000000ull /* CPU maintains cache coherency by hardware */
>>
>>  /*
>>   * CPU ASE encodings
>> diff --git a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
>> index c3406db..647d952 100644
>> --- a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
>> +++ b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
>> @@ -46,6 +46,7 @@
>>  #define cpu_has_local_ebase  0
>>
>>  #define cpu_has_wsbh         IS_ENABLED(CONFIG_CPU_LOONGSON3)
>> +#define cpu_has_coherent_cache       IS_ENABLED(CONFIG_CPU_LOONGSON3)
>>  #define cpu_hwrena_impl_bits 0xc0000000
>>
>>  #endif /* __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H */
>> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
>> index 2abc73d..65fb28c 100644
>> --- a/arch/mips/mm/c-r4k.c
>> +++ b/arch/mips/mm/c-r4k.c
>> @@ -429,6 +429,9 @@ static void r4k_blast_scache_setup(void)
>>
>>  static inline void local_r4k___flush_cache_all(void * args)
>>  {
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       switch (current_cpu_type()) {
>>       case CPU_LOONGSON2:
>>       case CPU_LOONGSON3:
>> @@ -457,6 +460,9 @@ static inline void local_r4k___flush_cache_all(void * args)
>>
>>  static void r4k___flush_cache_all(void)
>>  {
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       r4k_on_each_cpu(local_r4k___flush_cache_all, NULL);
>>  }
>>
>> @@ -503,6 +509,9 @@ static void r4k_flush_cache_range(struct vm_area_struct *vma,
>>  {
>>       int exec = vma->vm_flags & VM_EXEC;
>>
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc))
>>               r4k_on_each_cpu(local_r4k_flush_cache_range, vma);
>>  }
>> @@ -627,6 +636,9 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>>  {
>>       struct flush_cache_page_args args;
>>
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       args.vma = vma;
>>       args.addr = addr;
>>       args.pfn = pfn;
>> @@ -636,11 +648,17 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>>
>>  static inline void local_r4k_flush_data_cache_page(void * addr)
>>  {
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       r4k_blast_dcache_page((unsigned long) addr);
>>  }
>>
>>  static void r4k_flush_data_cache_page(unsigned long addr)
>>  {
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       if (in_atomic())
>>               local_r4k_flush_data_cache_page((void *)addr);
>>       else
>> @@ -825,6 +843,9 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
>>
>>  static void r4k_flush_cache_sigtramp(unsigned long addr)
>>  {
>> +     if (cpu_has_coherent_cache)
>> +             return;
>> +
>>       r4k_on_each_cpu(local_r4k_flush_cache_sigtramp, (void *) addr);
>>  }
>>
>> --
>> 2.4.6
>>
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
  2016-01-26 14:19     ` James Hogan
  (?)
@ 2016-01-27  5:02     ` Huacai Chen
  2016-01-27 11:18       ` James Hogan
  -1 siblings, 1 reply; 21+ messages in thread
From: Huacai Chen @ 2016-01-27  5:02 UTC (permalink / raw)
  To: James Hogan
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

STFill Buffer locate between core and L1 cache, it causes memory
access out of order, so writel/outl need a barrier. Loongson 3 has a
bug that di cannot save irqflag, so we need a mfc0.

On Tue, Jan 26, 2016 at 10:19 PM, James Hogan <james.hogan@imgtec.com> wrote:
> On Tue, Jan 26, 2016 at 09:26:24PM +0800, Huacai Chen wrote:
>> New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
>> Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
>> L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
>> register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
>> TLB refill support, etc.
>>
>> This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
>> enable those enhancements which cannot be probed at run time. If you
>> want a generic kernel to run on all Loongson 3 machines, please say 'N'
>> here. If you want a high-performance kernel to run on new Loongson 3
>> machines only, please say 'Y' here.
>>
>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>  arch/mips/Kconfig                                      | 18 ++++++++++++++++++
>>  arch/mips/include/asm/hazards.h                        |  7 ++++---
>>  arch/mips/include/asm/io.h                             | 10 +++++-----
>>  arch/mips/include/asm/irqflags.h                       |  5 +++++
>>  .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
>>  arch/mips/mm/c-r4k.c                                   |  3 +++
>>  arch/mips/mm/page.c                                    |  9 +++++++++
>>  7 files changed, 56 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
>> index 15faaf0..e6d6f7b 100644
>> --- a/arch/mips/Kconfig
>> +++ b/arch/mips/Kconfig
>> @@ -1349,6 +1349,24 @@ config CPU_LOONGSON3
>>               The Loongson 3 processor implements the MIPS64R2 instruction
>>               set with many extensions.
>>
>> +config LOONGSON3_ENHANCEMENT
>> +     bool "New Loongson 3 CPU Enhancements"
>> +     default n
>
> no need, n is the default.
>
>> +     select CPU_MIPSR2
>> +     select CPU_HAS_PREFETCH
>> +     depends on CPU_LOONGSON3
>> +     help
>> +       New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
>> +       R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
>> +       FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
>> +       Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
>> +       Fast TLB refill support, etc.
>> +
>> +       This option enable those enhancements which cannot be probed at run
>> +       time. If you want a generic kernel to run on all Loongson 3 machines,
>> +       please say 'N' here. If you want a high-performance kernel to run on
>> +       new Loongson 3 machines only, please say 'Y' here.
>> +
>>  config CPU_LOONGSON2E
>>       bool "Loongson 2E"
>>       depends on SYS_HAS_CPU_LOONGSON2E
>> diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
>> index 7b99efd..dbb1eb6 100644
>> --- a/arch/mips/include/asm/hazards.h
>> +++ b/arch/mips/include/asm/hazards.h
>> @@ -22,7 +22,8 @@
>>  /*
>>   * TLB hazards
>>   */
>> -#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
>> +#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
>> +     !defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
>>
>>  /*
>>   * MIPSR2 defines ehb for hazard avoidance
>> @@ -155,8 +156,8 @@ do {                                                                      \
>>  } while (0)
>>
>>  #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
>> -     defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
>> -     defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
>> +     defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
>> +     defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
>>
>>  /*
>>   * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
>> diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
>> index 2b4dc7a..ecabc00 100644
>> --- a/arch/mips/include/asm/io.h
>> +++ b/arch/mips/include/asm/io.h
>> @@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
>>  #undef __IS_KSEG1
>>  }
>>
>> -#ifdef CONFIG_CPU_CAVIUM_OCTEON
>> -#define war_octeon_io_reorder_wmb()          wmb()
>> +#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
>> +#define war_io_reorder_wmb()         wmb()
>>  #else
>> -#define war_octeon_io_reorder_wmb()          do { } while (0)
>> +#define war_io_reorder_wmb()         do { } while (0)
>>  #endif
>
> Doesn't this slow things down when enabled, or is it required due to
> STFill buffer being enabled or something?
>
>>
>>  #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)                  \
>> @@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,                             \
>>       volatile type *__mem;                                           \
>>       type __val;                                                     \
>>                                                                       \
>> -     war_octeon_io_reorder_wmb();                                    \
>> +     war_io_reorder_wmb();                                   \
>>                                                                       \
>>       __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));    \
>>                                                                       \
>> @@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)        \
>>       volatile type *__addr;                                          \
>>       type __val;                                                     \
>>                                                                       \
>> -     war_octeon_io_reorder_wmb();                                    \
>> +     war_io_reorder_wmb();                                   \
>>                                                                       \
>>       __addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
>>                                                                       \
>> diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
>> index 65c351e..12f80b5 100644
>> --- a/arch/mips/include/asm/irqflags.h
>> +++ b/arch/mips/include/asm/irqflags.h
>> @@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
>>       "       .set    push                                            \n"
>>       "       .set    reorder                                         \n"
>>       "       .set    noat                                            \n"
>> +#if defined(CONFIG_LOONGSON3_ENHANCEMENT)
>> +     "       mfc0    %[flags], $12                                   \n"
>> +     "       di                                                      \n"
>
> Does this somehow help performance, or is it necessary when STFill
> buffer is enabled?
>
>> +#else
>>       "       di      %[flags]                                        \n"
>> +#endif
>>       "       andi    %[flags], 1                                     \n"
>>       "       " __stringify(__irq_disable_hazard) "                   \n"
>>       "       .set    pop                                             \n"
>> diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
>> index da83482..8393bc54 100644
>> --- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
>> +++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
>> @@ -26,6 +26,12 @@
>>       mfc0    t0, $5, 1
>>       or      t0, (0x1 << 29)
>>       mtc0    t0, $5, 1
>> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
>> +     /* Enable STFill Buffer */
>> +     mfc0    t0, $16, 6
>> +     or      t0, 0x100
>> +     mtc0    t0, $16, 6
>> +#endif
>>       _ehb
>>       .set    pop
>>  #endif
>> @@ -46,6 +52,12 @@
>>       mfc0    t0, $5, 1
>>       or      t0, (0x1 << 29)
>>       mtc0    t0, $5, 1
>> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
>> +     /* Enable STFill Buffer */
>> +     mfc0    t0, $16, 6
>> +     or      t0, 0x100
>> +     mtc0    t0, $16, 6
>> +#endif
>
> What does the STFill buffer do?
>
> Given that you can get a portable kernel without this, can this not be
> done from C code depending on the PRid?
>
>>       _ehb
>>       .set    pop
>>  #endif
>> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
>> index 65fb28c..903d8da 100644
>> --- a/arch/mips/mm/c-r4k.c
>> +++ b/arch/mips/mm/c-r4k.c
>> @@ -1170,6 +1170,9 @@ static void probe_pcache(void)
>>                                         c->dcache.ways *
>>                                         c->dcache.linesz;
>>               c->dcache.waybit = 0;
>> +#ifdef CONFIG_CPU_HAS_PREFETCH
>> +             c->options |= MIPS_CPU_PREFETCH;
>> +#endif
>
> Can't do that based on PRid?
>
> Cheers
> James
>
>>               break;
>>
>>       case CPU_CAVIUM_OCTEON3:
>> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
>> index 885d73f..c41953c 100644
>> --- a/arch/mips/mm/page.c
>> +++ b/arch/mips/mm/page.c
>> @@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
>>                       }
>>                       break;
>>
>> +             case CPU_LOONGSON3:
>> +                     /* Loongson-3 only support the Pref_Load/Pref_Store. */
>> +                     pref_bias_clear_store = 128;
>> +                     pref_bias_copy_load = 128;
>> +                     pref_bias_copy_store = 128;
>> +                     pref_src_mode = Pref_Load;
>> +                     pref_dst_mode = Pref_Store;
>> +                     break;
>> +
>>               default:
>>                       pref_bias_clear_store = 128;
>>                       pref_bias_copy_load = 256;
>> --
>> 2.4.6
>>
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler
  2016-01-26 13:26 ` [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler Huacai Chen
  2016-01-26 21:15   ` David Daney
@ 2016-01-27  5:50   ` Joshua Kinard
  2016-01-27 10:22     ` Joshua Kinard
  1 sibling, 1 reply; 21+ messages in thread
From: Joshua Kinard @ 2016-01-27  5:50 UTC (permalink / raw)
  To: Huacai Chen, Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang, Zhangjin Wu

On 01/26/2016 08:26, Huacai Chen wrote:
> If a tlb miss triggered when EXL=1, tlb refill exception is treated as
> tlb invalid exception, so tlbp may fails. In this situation, CP0_Index
> register doesn't contain a valid value. This may not be a problem for
> VTLB since it is fully-associative. However, FTLB is set-associative so
> not every tlb entry is valid for a specific address. Thus, we should
> use tlbwr instead of tlbwi when tlbp fails.
> 
> There is a similar case for huge page, so build_huge_tlb_write_entry()
> is also modified. If wmode != tlb_random, that means the caller is tlb
> invalid handler, we should select tlbr/tlbi depend on the tlbp result.

This patch triggered instruction bus errors on my Octane (R14000) once it
switched to userland, using PAGE_SIZE_64KB, MAX_ORDER=14, and THP.  Without the
patch, that same configuration boots fine, so I suspect this change isn't
appropriate for all systems.

--J


> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/mm/tlbex.c | 31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index d0975cd..da68ffb 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -173,7 +173,10 @@ enum label_id {
>  	label_large_segbits_fault,
>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>  	label_tlb_huge_update,
> +	label_tail_huge_miss,
> +	label_tail_huge_done,
>  #endif
> +	label_tail_miss,
>  };
>  
>  UASM_L_LA(_second_part)
> @@ -192,7 +195,10 @@ UASM_L_LA(_r3000_write_probe_fail)
>  UASM_L_LA(_large_segbits_fault)
>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>  UASM_L_LA(_tlb_huge_update)
> +UASM_L_LA(_tail_huge_miss)
> +UASM_L_LA(_tail_huge_done)
>  #endif
> +UASM_L_LA(_tail_miss)
>  
>  static int hazard_instance;
>  
> @@ -706,8 +712,24 @@ static void build_huge_tlb_write_entry(u32 **p, struct uasm_label **l,
>  	uasm_i_ori(p, tmp, tmp, PM_HUGE_MASK & 0xffff);
>  	uasm_i_mtc0(p, tmp, C0_PAGEMASK);
>  
> -	build_tlb_write_entry(p, l, r, wmode);
> +	if (wmode == tlb_random) { /* Caller is TLB Refill Handler */
> +		build_tlb_write_entry(p, l, r, wmode);
> +		build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
> +		return;
> +	}
> +
> +	/* Caller is TLB Load/Store/Modify Handler */
> +	uasm_i_mfc0(p, tmp, C0_INDEX);
> +	uasm_il_bltz(p, r, tmp, label_tail_huge_miss);
> +	uasm_i_nop(p);
> +	build_tlb_write_entry(p, l, r, tlb_indexed);
> +	uasm_il_b(p, r, label_tail_huge_done);
> +	uasm_i_nop(p);
> +
> +	uasm_l_tail_huge_miss(l, *p);
> +	build_tlb_write_entry(p, l, r, tlb_random);
>  
> +	uasm_l_tail_huge_done(l, *p);
>  	build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
>  }
>  
> @@ -2026,7 +2048,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct uasm_label **l,
>  	uasm_i_ori(p, ptr, ptr, sizeof(pte_t));
>  	uasm_i_xori(p, ptr, ptr, sizeof(pte_t));
>  	build_update_entries(p, tmp, ptr);
> +	uasm_i_mfc0(p, ptr, C0_INDEX);
> +	uasm_il_bltz(p, r, ptr, label_tail_miss);
> +	uasm_i_nop(p);
>  	build_tlb_write_entry(p, l, r, tlb_indexed);
> +	uasm_il_b(p, r, label_leave);
> +	uasm_i_nop(p);
> +	uasm_l_tail_miss(l, *p);
> +	build_tlb_write_entry(p, l, r, tlb_random);
>  	uasm_l_leave(l, *p);
>  	build_restore_work_registers(p);
>  	uasm_i_eret(p); /* return from trap */
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler
  2016-01-27  5:50   ` Joshua Kinard
@ 2016-01-27 10:22     ` Joshua Kinard
  0 siblings, 0 replies; 21+ messages in thread
From: Joshua Kinard @ 2016-01-27 10:22 UTC (permalink / raw)
  To: Huacai Chen, Ralf Baechle
  Cc: Aurelien Jarno, Steven J. Hill, linux-mips, Fuxin Zhang, Zhangjin Wu

On 01/27/2016 00:50, Joshua Kinard wrote:
> On 01/26/2016 08:26, Huacai Chen wrote:
>> If a tlb miss triggered when EXL=1, tlb refill exception is treated as
>> tlb invalid exception, so tlbp may fails. In this situation, CP0_Index
>> register doesn't contain a valid value. This may not be a problem for
>> VTLB since it is fully-associative. However, FTLB is set-associative so
>> not every tlb entry is valid for a specific address. Thus, we should
>> use tlbwr instead of tlbwi when tlbp fails.
>>
>> There is a similar case for huge page, so build_huge_tlb_write_entry()
>> is also modified. If wmode != tlb_random, that means the caller is tlb
>> invalid handler, we should select tlbr/tlbi depend on the tlbp result.
> 
> This patch triggered instruction bus errors on my Octane (R14000) once it
> switched to userland, using PAGE_SIZE_64KB, MAX_ORDER=14, and THP.  Without the
> patch, that same configuration boots fine, so I suspect this change isn't
> appropriate for all systems.
> 
> --J

Scratch that, it was the heisenbugs again.  Transparent Hugepages are
fundamentally broken on R10K CPUs, I'm fairly certain of that.  You have a
random chance of rebooting and getting a completely stable userland bringup,
and on a subsequent reboot, crash and burn with instruction bus errors.  It was
simple chance that I rebooted into a kernel with this patch applied and
triggered the IBE's.

So I retested this patch after killing THP in my kernel with fire, and am not
seeing any ill effects thus far.

--J

Tested-by: Joshua Kinard <kumba@gentoo.org>


>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>  arch/mips/mm/tlbex.c | 31 ++++++++++++++++++++++++++++++-
>>  1 file changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
>> index d0975cd..da68ffb 100644
>> --- a/arch/mips/mm/tlbex.c
>> +++ b/arch/mips/mm/tlbex.c
>> @@ -173,7 +173,10 @@ enum label_id {
>>  	label_large_segbits_fault,
>>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>>  	label_tlb_huge_update,
>> +	label_tail_huge_miss,
>> +	label_tail_huge_done,
>>  #endif
>> +	label_tail_miss,
>>  };
>>  
>>  UASM_L_LA(_second_part)
>> @@ -192,7 +195,10 @@ UASM_L_LA(_r3000_write_probe_fail)
>>  UASM_L_LA(_large_segbits_fault)
>>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>>  UASM_L_LA(_tlb_huge_update)
>> +UASM_L_LA(_tail_huge_miss)
>> +UASM_L_LA(_tail_huge_done)
>>  #endif
>> +UASM_L_LA(_tail_miss)
>>  
>>  static int hazard_instance;
>>  
>> @@ -706,8 +712,24 @@ static void build_huge_tlb_write_entry(u32 **p, struct uasm_label **l,
>>  	uasm_i_ori(p, tmp, tmp, PM_HUGE_MASK & 0xffff);
>>  	uasm_i_mtc0(p, tmp, C0_PAGEMASK);
>>  
>> -	build_tlb_write_entry(p, l, r, wmode);
>> +	if (wmode == tlb_random) { /* Caller is TLB Refill Handler */
>> +		build_tlb_write_entry(p, l, r, wmode);
>> +		build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
>> +		return;
>> +	}
>> +
>> +	/* Caller is TLB Load/Store/Modify Handler */
>> +	uasm_i_mfc0(p, tmp, C0_INDEX);
>> +	uasm_il_bltz(p, r, tmp, label_tail_huge_miss);
>> +	uasm_i_nop(p);
>> +	build_tlb_write_entry(p, l, r, tlb_indexed);
>> +	uasm_il_b(p, r, label_tail_huge_done);
>> +	uasm_i_nop(p);
>> +
>> +	uasm_l_tail_huge_miss(l, *p);
>> +	build_tlb_write_entry(p, l, r, tlb_random);
>>  
>> +	uasm_l_tail_huge_done(l, *p);
>>  	build_restore_pagemask(p, r, tmp, label_leave, restore_scratch);
>>  }
>>  
>> @@ -2026,7 +2048,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct uasm_label **l,
>>  	uasm_i_ori(p, ptr, ptr, sizeof(pte_t));
>>  	uasm_i_xori(p, ptr, ptr, sizeof(pte_t));
>>  	build_update_entries(p, tmp, ptr);
>> +	uasm_i_mfc0(p, ptr, C0_INDEX);
>> +	uasm_il_bltz(p, r, ptr, label_tail_miss);
>> +	uasm_i_nop(p);
>>  	build_tlb_write_entry(p, l, r, tlb_indexed);
>> +	uasm_il_b(p, r, label_leave);
>> +	uasm_i_nop(p);
>> +	uasm_l_tail_miss(l, *p);
>> +	build_tlb_write_entry(p, l, r, tlb_random);
>>  	uasm_l_leave(l, *p);
>>  	build_restore_work_registers(p);
>>  	uasm_i_eret(p); /* return from trap */
>>
> 
> 
> 


-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature
  2016-01-27  4:58     ` Huacai Chen
@ 2016-01-27 11:15       ` James Hogan
  2016-01-28  7:38         ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: James Hogan @ 2016-01-27 11:15 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]

Hi Huacai,

On Wed, Jan 27, 2016 at 12:58:42PM +0800, Huacai Chen wrote:
> "cache coherency" here means the coherency across cores, not ic/dc
> coherency, could you please suggest a suitable name?

Data cache coherency across cores is pretty much a requirement for SMP.
It looks more like for various reasons you can skip the cache management
functions, e.g.

> >> @@ -503,6 +509,9 @@ static void r4k_flush_cache_range(struct vm_area_struct *vma,
> >>  {
> >>       int exec = vma->vm_flags & VM_EXEC;
> >>
> >> +     if (cpu_has_coherent_cache)
> >> +             return;

This seems to suggest:
1) Your dcaches don't alias.
2) Your icache is coherent with your dcache, otherwise you would need to
   flush the icache here so that mprotect RW->RX makes code executable
   without risk of stale lines existing in icache.

So, is that the case?

If so, it would seem better to ensure that cpu_has_dc_aliases evaluates
to false, and add a similar one for icache coherency, hence my original
suggestion.


> >> +
> >>       if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc))
> >>               r4k_on_each_cpu(local_r4k_flush_cache_range, vma);

(Note, this cpu_has_ic_fills_f_dc check is wrong, it shouldn't prevent
icache flush, see http://patchwork.linux-mips.org/patch/12179/)

Cheers
James

> >>  }
> >> @@ -627,6 +636,9 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
> >>  {
> >>       struct flush_cache_page_args args;
> >>
> >> +     if (cpu_has_coherent_cache)
> >> +             return;
> >> +
> >>       args.vma = vma;
> >>       args.addr = addr;
> >>       args.pfn = pfn;
> >> @@ -636,11 +648,17 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
> >>
> >>  static inline void local_r4k_flush_data_cache_page(void * addr)
> >>  {
> >> +     if (cpu_has_coherent_cache)
> >> +             return;
> >> +
> >>       r4k_blast_dcache_page((unsigned long) addr);
> >>  }
> >>
> >>  static void r4k_flush_data_cache_page(unsigned long addr)
> >>  {
> >> +     if (cpu_has_coherent_cache)
> >> +             return;
> >> +
> >>       if (in_atomic())
> >>               local_r4k_flush_data_cache_page((void *)addr);
> >>       else
> >> @@ -825,6 +843,9 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
> >>
> >>  static void r4k_flush_cache_sigtramp(unsigned long addr)
> >>  {
> >> +     if (cpu_has_coherent_cache)
> >> +             return;
> >> +
> >>       r4k_on_each_cpu(local_r4k_flush_cache_sigtramp, (void *) addr);
> >>  }
> >>
> >> --
> >> 2.4.6
> >>
> >>
> >>
> >>
> >>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
  2016-01-27  5:02     ` Huacai Chen
@ 2016-01-27 11:18       ` James Hogan
  2016-01-28  7:48         ` Huacai Chen
  0 siblings, 1 reply; 21+ messages in thread
From: James Hogan @ 2016-01-27 11:18 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

[-- Attachment #1: Type: text/plain, Size: 10963 bytes --]

On Wed, Jan 27, 2016 at 01:02:38PM +0800, Huacai Chen wrote:
> STFill Buffer locate between core and L1 cache, it causes memory
> access out of order, so writel/outl need a barrier. Loongson 3 has a
> bug that di cannot save irqflag, so we need a mfc0.

Shouldn't it use that even without CONFIG_LOONGSON3_ENHANCEMENT then, so
as not to break the "generic kernel to run on all Loongson 3 machines"?

Cheers
James

> 
> On Tue, Jan 26, 2016 at 10:19 PM, James Hogan <james.hogan@imgtec.com> wrote:
> > On Tue, Jan 26, 2016 at 09:26:24PM +0800, Huacai Chen wrote:
> >> New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
> >> Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
> >> L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
> >> register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
> >> TLB refill support, etc.
> >>
> >> This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
> >> enable those enhancements which cannot be probed at run time. If you
> >> want a generic kernel to run on all Loongson 3 machines, please say 'N'
> >> here. If you want a high-performance kernel to run on new Loongson 3
> >> machines only, please say 'Y' here.
> >>
> >> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> >> ---
> >>  arch/mips/Kconfig                                      | 18 ++++++++++++++++++
> >>  arch/mips/include/asm/hazards.h                        |  7 ++++---
> >>  arch/mips/include/asm/io.h                             | 10 +++++-----
> >>  arch/mips/include/asm/irqflags.h                       |  5 +++++
> >>  .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
> >>  arch/mips/mm/c-r4k.c                                   |  3 +++
> >>  arch/mips/mm/page.c                                    |  9 +++++++++
> >>  7 files changed, 56 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> >> index 15faaf0..e6d6f7b 100644
> >> --- a/arch/mips/Kconfig
> >> +++ b/arch/mips/Kconfig
> >> @@ -1349,6 +1349,24 @@ config CPU_LOONGSON3
> >>               The Loongson 3 processor implements the MIPS64R2 instruction
> >>               set with many extensions.
> >>
> >> +config LOONGSON3_ENHANCEMENT
> >> +     bool "New Loongson 3 CPU Enhancements"
> >> +     default n
> >
> > no need, n is the default.
> >
> >> +     select CPU_MIPSR2
> >> +     select CPU_HAS_PREFETCH
> >> +     depends on CPU_LOONGSON3
> >> +     help
> >> +       New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
> >> +       R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
> >> +       FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
> >> +       Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
> >> +       Fast TLB refill support, etc.
> >> +
> >> +       This option enable those enhancements which cannot be probed at run
> >> +       time. If you want a generic kernel to run on all Loongson 3 machines,
> >> +       please say 'N' here. If you want a high-performance kernel to run on
> >> +       new Loongson 3 machines only, please say 'Y' here.
> >> +
> >>  config CPU_LOONGSON2E
> >>       bool "Loongson 2E"
> >>       depends on SYS_HAS_CPU_LOONGSON2E
> >> diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
> >> index 7b99efd..dbb1eb6 100644
> >> --- a/arch/mips/include/asm/hazards.h
> >> +++ b/arch/mips/include/asm/hazards.h
> >> @@ -22,7 +22,8 @@
> >>  /*
> >>   * TLB hazards
> >>   */
> >> -#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
> >> +#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
> >> +     !defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
> >>
> >>  /*
> >>   * MIPSR2 defines ehb for hazard avoidance
> >> @@ -155,8 +156,8 @@ do {                                                                      \
> >>  } while (0)
> >>
> >>  #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
> >> -     defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
> >> -     defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
> >> +     defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
> >> +     defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
> >>
> >>  /*
> >>   * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
> >> diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
> >> index 2b4dc7a..ecabc00 100644
> >> --- a/arch/mips/include/asm/io.h
> >> +++ b/arch/mips/include/asm/io.h
> >> @@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
> >>  #undef __IS_KSEG1
> >>  }
> >>
> >> -#ifdef CONFIG_CPU_CAVIUM_OCTEON
> >> -#define war_octeon_io_reorder_wmb()          wmb()
> >> +#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
> >> +#define war_io_reorder_wmb()         wmb()
> >>  #else
> >> -#define war_octeon_io_reorder_wmb()          do { } while (0)
> >> +#define war_io_reorder_wmb()         do { } while (0)
> >>  #endif
> >
> > Doesn't this slow things down when enabled, or is it required due to
> > STFill buffer being enabled or something?
> >
> >>
> >>  #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)                  \
> >> @@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,                             \
> >>       volatile type *__mem;                                           \
> >>       type __val;                                                     \
> >>                                                                       \
> >> -     war_octeon_io_reorder_wmb();                                    \
> >> +     war_io_reorder_wmb();                                   \
> >>                                                                       \
> >>       __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));    \
> >>                                                                       \
> >> @@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)        \
> >>       volatile type *__addr;                                          \
> >>       type __val;                                                     \
> >>                                                                       \
> >> -     war_octeon_io_reorder_wmb();                                    \
> >> +     war_io_reorder_wmb();                                   \
> >>                                                                       \
> >>       __addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
> >>                                                                       \
> >> diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
> >> index 65c351e..12f80b5 100644
> >> --- a/arch/mips/include/asm/irqflags.h
> >> +++ b/arch/mips/include/asm/irqflags.h
> >> @@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
> >>       "       .set    push                                            \n"
> >>       "       .set    reorder                                         \n"
> >>       "       .set    noat                                            \n"
> >> +#if defined(CONFIG_LOONGSON3_ENHANCEMENT)
> >> +     "       mfc0    %[flags], $12                                   \n"
> >> +     "       di                                                      \n"
> >
> > Does this somehow help performance, or is it necessary when STFill
> > buffer is enabled?
> >
> >> +#else
> >>       "       di      %[flags]                                        \n"
> >> +#endif
> >>       "       andi    %[flags], 1                                     \n"
> >>       "       " __stringify(__irq_disable_hazard) "                   \n"
> >>       "       .set    pop                                             \n"
> >> diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> >> index da83482..8393bc54 100644
> >> --- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> >> +++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
> >> @@ -26,6 +26,12 @@
> >>       mfc0    t0, $5, 1
> >>       or      t0, (0x1 << 29)
> >>       mtc0    t0, $5, 1
> >> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
> >> +     /* Enable STFill Buffer */
> >> +     mfc0    t0, $16, 6
> >> +     or      t0, 0x100
> >> +     mtc0    t0, $16, 6
> >> +#endif
> >>       _ehb
> >>       .set    pop
> >>  #endif
> >> @@ -46,6 +52,12 @@
> >>       mfc0    t0, $5, 1
> >>       or      t0, (0x1 << 29)
> >>       mtc0    t0, $5, 1
> >> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
> >> +     /* Enable STFill Buffer */
> >> +     mfc0    t0, $16, 6
> >> +     or      t0, 0x100
> >> +     mtc0    t0, $16, 6
> >> +#endif
> >
> > What does the STFill buffer do?
> >
> > Given that you can get a portable kernel without this, can this not be
> > done from C code depending on the PRid?
> >
> >>       _ehb
> >>       .set    pop
> >>  #endif
> >> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> >> index 65fb28c..903d8da 100644
> >> --- a/arch/mips/mm/c-r4k.c
> >> +++ b/arch/mips/mm/c-r4k.c
> >> @@ -1170,6 +1170,9 @@ static void probe_pcache(void)
> >>                                         c->dcache.ways *
> >>                                         c->dcache.linesz;
> >>               c->dcache.waybit = 0;
> >> +#ifdef CONFIG_CPU_HAS_PREFETCH
> >> +             c->options |= MIPS_CPU_PREFETCH;
> >> +#endif
> >
> > Can't do that based on PRid?
> >
> > Cheers
> > James
> >
> >>               break;
> >>
> >>       case CPU_CAVIUM_OCTEON3:
> >> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
> >> index 885d73f..c41953c 100644
> >> --- a/arch/mips/mm/page.c
> >> +++ b/arch/mips/mm/page.c
> >> @@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
> >>                       }
> >>                       break;
> >>
> >> +             case CPU_LOONGSON3:
> >> +                     /* Loongson-3 only support the Pref_Load/Pref_Store. */
> >> +                     pref_bias_clear_store = 128;
> >> +                     pref_bias_copy_load = 128;
> >> +                     pref_bias_copy_store = 128;
> >> +                     pref_src_mode = Pref_Load;
> >> +                     pref_dst_mode = Pref_Store;
> >> +                     break;
> >> +
> >>               default:
> >>                       pref_bias_clear_store = 128;
> >>                       pref_bias_copy_load = 256;
> >> --
> >> 2.4.6
> >>
> >>
> >>
> >>
> >>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature
  2016-01-27 11:15       ` James Hogan
@ 2016-01-28  7:38         ` Huacai Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-28  7:38 UTC (permalink / raw)
  To: James Hogan
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

Hi, James,

First I want to update some information:
1) Loongson-3's dcaches don't alias (if PAGE_SIZE>=16K).
2) Loongson-3's icache is coherent with dcache.
3) Loongson-3 maintain cache coherency across cores.

Even if I set cpu_has_dc_aliases to 0 and set cpu_has_ic_fills_f_dc to
1 is not enough, because I also want to skip flush_cache_all(). So,
maybe cpu_has_coherent_cache is a good name?

Huacai

On Wed, Jan 27, 2016 at 7:15 PM, James Hogan <james.hogan@imgtec.com> wrote:
> Hi Huacai,
>
> On Wed, Jan 27, 2016 at 12:58:42PM +0800, Huacai Chen wrote:
>> "cache coherency" here means the coherency across cores, not ic/dc
>> coherency, could you please suggest a suitable name?
>
> Data cache coherency across cores is pretty much a requirement for SMP.
> It looks more like for various reasons you can skip the cache management
> functions, e.g.
>
>> >> @@ -503,6 +509,9 @@ static void r4k_flush_cache_range(struct vm_area_struct *vma,
>> >>  {
>> >>       int exec = vma->vm_flags & VM_EXEC;
>> >>
>> >> +     if (cpu_has_coherent_cache)
>> >> +             return;
>
> This seems to suggest:
> 1) Your dcaches don't alias.
> 2) Your icache is coherent with your dcache, otherwise you would need to
>    flush the icache here so that mprotect RW->RX makes code executable
>    without risk of stale lines existing in icache.
>
> So, is that the case?
>
> If so, it would seem better to ensure that cpu_has_dc_aliases evaluates
> to false, and add a similar one for icache coherency, hence my original
> suggestion.
>
>
>> >> +
>> >>       if (cpu_has_dc_aliases || (exec && !cpu_has_ic_fills_f_dc))
>> >>               r4k_on_each_cpu(local_r4k_flush_cache_range, vma);
>
> (Note, this cpu_has_ic_fills_f_dc check is wrong, it shouldn't prevent
> icache flush, see http://patchwork.linux-mips.org/patch/12179/)
>
> Cheers
> James
>
>> >>  }
>> >> @@ -627,6 +636,9 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>> >>  {
>> >>       struct flush_cache_page_args args;
>> >>
>> >> +     if (cpu_has_coherent_cache)
>> >> +             return;
>> >> +
>> >>       args.vma = vma;
>> >>       args.addr = addr;
>> >>       args.pfn = pfn;
>> >> @@ -636,11 +648,17 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma,
>> >>
>> >>  static inline void local_r4k_flush_data_cache_page(void * addr)
>> >>  {
>> >> +     if (cpu_has_coherent_cache)
>> >> +             return;
>> >> +
>> >>       r4k_blast_dcache_page((unsigned long) addr);
>> >>  }
>> >>
>> >>  static void r4k_flush_data_cache_page(unsigned long addr)
>> >>  {
>> >> +     if (cpu_has_coherent_cache)
>> >> +             return;
>> >> +
>> >>       if (in_atomic())
>> >>               local_r4k_flush_data_cache_page((void *)addr);
>> >>       else
>> >> @@ -825,6 +843,9 @@ static void local_r4k_flush_cache_sigtramp(void * arg)
>> >>
>> >>  static void r4k_flush_cache_sigtramp(unsigned long addr)
>> >>  {
>> >> +     if (cpu_has_coherent_cache)
>> >> +             return;
>> >> +
>> >>       r4k_on_each_cpu(local_r4k_flush_cache_sigtramp, (void *) addr);
>> >>  }
>> >>
>> >> --
>> >> 2.4.6
>> >>
>> >>
>> >>
>> >>
>> >>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
  2016-01-27 11:18       ` James Hogan
@ 2016-01-28  7:48         ` Huacai Chen
  0 siblings, 0 replies; 21+ messages in thread
From: Huacai Chen @ 2016-01-28  7:48 UTC (permalink / raw)
  To: James Hogan
  Cc: Ralf Baechle, Aurelien Jarno, Steven J. Hill,
	Linux MIPS Mailing List, Fuxin Zhang, Zhangjin Wu

Hi, James,

CONFIG_CPU_MIPSR2 in only seleted by CONFIG_LOONGSON3_ENHANCEMENT, so
Loongson-3 doesn't use ei/di at all if without
CONFIG_LOONGSON3_ENHANCEMENT.

Huacai

On Wed, Jan 27, 2016 at 7:18 PM, James Hogan <james.hogan@imgtec.com> wrote:
> On Wed, Jan 27, 2016 at 01:02:38PM +0800, Huacai Chen wrote:
>> STFill Buffer locate between core and L1 cache, it causes memory
>> access out of order, so writel/outl need a barrier. Loongson 3 has a
>> bug that di cannot save irqflag, so we need a mfc0.
>
> Shouldn't it use that even without CONFIG_LOONGSON3_ENHANCEMENT then, so
> as not to break the "generic kernel to run on all Loongson 3 machines"?
>
> Cheers
> James
>
>>
>> On Tue, Jan 26, 2016 at 10:19 PM, James Hogan <james.hogan@imgtec.com> wrote:
>> > On Tue, Jan 26, 2016 at 09:26:24PM +0800, Huacai Chen wrote:
>> >> New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
>> >> Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
>> >> L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
>> >> register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
>> >> TLB refill support, etc.
>> >>
>> >> This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
>> >> enable those enhancements which cannot be probed at run time. If you
>> >> want a generic kernel to run on all Loongson 3 machines, please say 'N'
>> >> here. If you want a high-performance kernel to run on new Loongson 3
>> >> machines only, please say 'Y' here.
>> >>
>> >> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> >> ---
>> >>  arch/mips/Kconfig                                      | 18 ++++++++++++++++++
>> >>  arch/mips/include/asm/hazards.h                        |  7 ++++---
>> >>  arch/mips/include/asm/io.h                             | 10 +++++-----
>> >>  arch/mips/include/asm/irqflags.h                       |  5 +++++
>> >>  .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
>> >>  arch/mips/mm/c-r4k.c                                   |  3 +++
>> >>  arch/mips/mm/page.c                                    |  9 +++++++++
>> >>  7 files changed, 56 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
>> >> index 15faaf0..e6d6f7b 100644
>> >> --- a/arch/mips/Kconfig
>> >> +++ b/arch/mips/Kconfig
>> >> @@ -1349,6 +1349,24 @@ config CPU_LOONGSON3
>> >>               The Loongson 3 processor implements the MIPS64R2 instruction
>> >>               set with many extensions.
>> >>
>> >> +config LOONGSON3_ENHANCEMENT
>> >> +     bool "New Loongson 3 CPU Enhancements"
>> >> +     default n
>> >
>> > no need, n is the default.
>> >
>> >> +     select CPU_MIPSR2
>> >> +     select CPU_HAS_PREFETCH
>> >> +     depends on CPU_LOONGSON3
>> >> +     help
>> >> +       New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
>> >> +       R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
>> >> +       FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
>> >> +       Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
>> >> +       Fast TLB refill support, etc.
>> >> +
>> >> +       This option enable those enhancements which cannot be probed at run
>> >> +       time. If you want a generic kernel to run on all Loongson 3 machines,
>> >> +       please say 'N' here. If you want a high-performance kernel to run on
>> >> +       new Loongson 3 machines only, please say 'Y' here.
>> >> +
>> >>  config CPU_LOONGSON2E
>> >>       bool "Loongson 2E"
>> >>       depends on SYS_HAS_CPU_LOONGSON2E
>> >> diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
>> >> index 7b99efd..dbb1eb6 100644
>> >> --- a/arch/mips/include/asm/hazards.h
>> >> +++ b/arch/mips/include/asm/hazards.h
>> >> @@ -22,7 +22,8 @@
>> >>  /*
>> >>   * TLB hazards
>> >>   */
>> >> -#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
>> >> +#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
>> >> +     !defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
>> >>
>> >>  /*
>> >>   * MIPSR2 defines ehb for hazard avoidance
>> >> @@ -155,8 +156,8 @@ do {                                                                      \
>> >>  } while (0)
>> >>
>> >>  #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
>> >> -     defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
>> >> -     defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
>> >> +     defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
>> >> +     defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
>> >>
>> >>  /*
>> >>   * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
>> >> diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
>> >> index 2b4dc7a..ecabc00 100644
>> >> --- a/arch/mips/include/asm/io.h
>> >> +++ b/arch/mips/include/asm/io.h
>> >> @@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
>> >>  #undef __IS_KSEG1
>> >>  }
>> >>
>> >> -#ifdef CONFIG_CPU_CAVIUM_OCTEON
>> >> -#define war_octeon_io_reorder_wmb()          wmb()
>> >> +#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
>> >> +#define war_io_reorder_wmb()         wmb()
>> >>  #else
>> >> -#define war_octeon_io_reorder_wmb()          do { } while (0)
>> >> +#define war_io_reorder_wmb()         do { } while (0)
>> >>  #endif
>> >
>> > Doesn't this slow things down when enabled, or is it required due to
>> > STFill buffer being enabled or something?
>> >
>> >>
>> >>  #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)                  \
>> >> @@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,                             \
>> >>       volatile type *__mem;                                           \
>> >>       type __val;                                                     \
>> >>                                                                       \
>> >> -     war_octeon_io_reorder_wmb();                                    \
>> >> +     war_io_reorder_wmb();                                   \
>> >>                                                                       \
>> >>       __mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));    \
>> >>                                                                       \
>> >> @@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)        \
>> >>       volatile type *__addr;                                          \
>> >>       type __val;                                                     \
>> >>                                                                       \
>> >> -     war_octeon_io_reorder_wmb();                                    \
>> >> +     war_io_reorder_wmb();                                   \
>> >>                                                                       \
>> >>       __addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
>> >>                                                                       \
>> >> diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
>> >> index 65c351e..12f80b5 100644
>> >> --- a/arch/mips/include/asm/irqflags.h
>> >> +++ b/arch/mips/include/asm/irqflags.h
>> >> @@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
>> >>       "       .set    push                                            \n"
>> >>       "       .set    reorder                                         \n"
>> >>       "       .set    noat                                            \n"
>> >> +#if defined(CONFIG_LOONGSON3_ENHANCEMENT)
>> >> +     "       mfc0    %[flags], $12                                   \n"
>> >> +     "       di                                                      \n"
>> >
>> > Does this somehow help performance, or is it necessary when STFill
>> > buffer is enabled?
>> >
>> >> +#else
>> >>       "       di      %[flags]                                        \n"
>> >> +#endif
>> >>       "       andi    %[flags], 1                                     \n"
>> >>       "       " __stringify(__irq_disable_hazard) "                   \n"
>> >>       "       .set    pop                                             \n"
>> >> diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
>> >> index da83482..8393bc54 100644
>> >> --- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
>> >> +++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
>> >> @@ -26,6 +26,12 @@
>> >>       mfc0    t0, $5, 1
>> >>       or      t0, (0x1 << 29)
>> >>       mtc0    t0, $5, 1
>> >> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
>> >> +     /* Enable STFill Buffer */
>> >> +     mfc0    t0, $16, 6
>> >> +     or      t0, 0x100
>> >> +     mtc0    t0, $16, 6
>> >> +#endif
>> >>       _ehb
>> >>       .set    pop
>> >>  #endif
>> >> @@ -46,6 +52,12 @@
>> >>       mfc0    t0, $5, 1
>> >>       or      t0, (0x1 << 29)
>> >>       mtc0    t0, $5, 1
>> >> +#ifdef CONFIG_LOONGSON3_ENHANCEMENT
>> >> +     /* Enable STFill Buffer */
>> >> +     mfc0    t0, $16, 6
>> >> +     or      t0, 0x100
>> >> +     mtc0    t0, $16, 6
>> >> +#endif
>> >
>> > What does the STFill buffer do?
>> >
>> > Given that you can get a portable kernel without this, can this not be
>> > done from C code depending on the PRid?
>> >
>> >>       _ehb
>> >>       .set    pop
>> >>  #endif
>> >> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
>> >> index 65fb28c..903d8da 100644
>> >> --- a/arch/mips/mm/c-r4k.c
>> >> +++ b/arch/mips/mm/c-r4k.c
>> >> @@ -1170,6 +1170,9 @@ static void probe_pcache(void)
>> >>                                         c->dcache.ways *
>> >>                                         c->dcache.linesz;
>> >>               c->dcache.waybit = 0;
>> >> +#ifdef CONFIG_CPU_HAS_PREFETCH
>> >> +             c->options |= MIPS_CPU_PREFETCH;
>> >> +#endif
>> >
>> > Can't do that based on PRid?
>> >
>> > Cheers
>> > James
>> >
>> >>               break;
>> >>
>> >>       case CPU_CAVIUM_OCTEON3:
>> >> diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
>> >> index 885d73f..c41953c 100644
>> >> --- a/arch/mips/mm/page.c
>> >> +++ b/arch/mips/mm/page.c
>> >> @@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
>> >>                       }
>> >>                       break;
>> >>
>> >> +             case CPU_LOONGSON3:
>> >> +                     /* Loongson-3 only support the Pref_Load/Pref_Store. */
>> >> +                     pref_bias_clear_store = 128;
>> >> +                     pref_bias_copy_load = 128;
>> >> +                     pref_bias_copy_store = 128;
>> >> +                     pref_src_mode = Pref_Load;
>> >> +                     pref_dst_mode = Pref_Store;
>> >> +                     break;
>> >> +
>> >>               default:
>> >>                       pref_bias_clear_store = 128;
>> >>                       pref_bias_copy_load = 256;
>> >> --
>> >> 2.4.6
>> >>
>> >>
>> >>
>> >>
>> >>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-01-28  7:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26 13:26 [PATCH 0/6] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
2016-01-26 13:26 ` [PATCH 1/6] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
2016-01-26 13:26 ` [PATCH 2/6] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
2016-01-26 13:26 ` [PATCH 3/6] MIPS: Loongson-3: Fast TLB refill handler Huacai Chen
2016-01-26 13:26 ` [PATCH 4/6] MIPS: tlbex: Fix bugs in tlbchange handler Huacai Chen
2016-01-26 21:15   ` David Daney
2016-01-27  4:53     ` Huacai Chen
2016-01-27  5:50   ` Joshua Kinard
2016-01-27 10:22     ` Joshua Kinard
2016-01-26 13:26 ` [PATCH 5/6] MIPS: Loongson: Introduce and use cpu_has_coherent_cache feature Huacai Chen
2016-01-26 13:42   ` James Hogan
2016-01-26 13:42     ` James Hogan
2016-01-27  4:58     ` Huacai Chen
2016-01-27 11:15       ` James Hogan
2016-01-28  7:38         ` Huacai Chen
2016-01-26 13:26 ` [PATCH 6/6] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT Huacai Chen
2016-01-26 14:19   ` James Hogan
2016-01-26 14:19     ` James Hogan
2016-01-27  5:02     ` Huacai Chen
2016-01-27 11:18       ` James Hogan
2016-01-28  7:48         ` Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.