All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support
@ 2016-02-27 10:07 Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 1/5] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Huacai Chen @ 2016-02-27 10:07 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

This patchset is is prepared for the next 4.6 release for Linux/MIPS.
It adds Loongson-3A R2 (Loongson-3A2000) support and fixes a potential
bug related to FTLB.

Loongson-3 CPU family:

Code-name       Brand-name       PRId
Loongson-3A R1  Loongson-3A1000  0x6305
Loongson-3A R2  Loongson-3A2000  0x6308
Loongson-3B R1  Loongson-3B1000  0x6306
Loongson-3B R2  Loongson-3B1500  0x6307

Features of R2 revision of Loongson-3A:
1, Primary cache includes I-Cache, D-Cache and V-Cache (Victim Cache).
2, I-Cache, D-Cache and V-Cache are 16-way set-associative, linesize is 64 Bytes.
3, 64 entries of VTLB (classic TLB), 1024 entries of FTLB (8-way set-associative).
4, Support DSP/DSPv2 instructions, UserLocal register and Read-Inhibit/Execute-Inhibit.

V1 -> V2:
1, Probe MIPS_CPU_PREFETCH by PRId.
2, Use PRID_REV_MASK instead of hardcode.
3, Update commit messages to avoid confusion.

V2 -> V3:
1, Remove the 4th patch since it is a bugfix not only for Loongson.
2, Split the 5th patch and remove the generic part since that is not only for Loongson.

V3 -> V4:
1, Rework by setting the relevant handlers to `cache_noop' in `r4k_cache_init'.

Huacai Chen(5):
 MIPS: Loongson: Add Loongson-3A R2 basic support.
 MIPS: Loongson-3: Set cache flush handlers to cache_noop.
 MIPS: Loongson: Invalidate special TLBs when needed.
 MIPS: Loongson-3: Fast TLB refill handler.
 MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                  |  19 ++++
 arch/mips/include/asm/cacheops.h                   |   6 +
 arch/mips/include/asm/cpu-features.h               |   3 +
 arch/mips/include/asm/cpu-info.h                   |   1 +
 arch/mips/include/asm/cpu.h                        |   5 +-
 arch/mips/include/asm/hazards.h                    |   7 +-
 arch/mips/include/asm/io.h                         |  10 +-
 arch/mips/include/asm/irqflags.h                   |   5 +
 .../asm/mach-loongson64/cpu-feature-overrides.h    |  12 +-
 .../asm/mach-loongson64/kernel-entry-init.h        |  18 ++-
 arch/mips/include/asm/mipsregs.h                   |   8 ++
 arch/mips/include/asm/pgtable-bits.h               |   8 +-
 arch/mips/include/asm/pgtable.h                    |   4 +-
 arch/mips/include/asm/uasm.h                       |   3 +-
 arch/mips/include/uapi/asm/inst.h                  |  10 ++
 arch/mips/kernel/cpu-probe.c                       |  40 ++++++-
 arch/mips/kernel/idle.c                            |   5 +
 arch/mips/kernel/traps.c                           |   3 +-
 arch/mips/loongson64/common/env.c                  |   7 +-
 arch/mips/loongson64/loongson-3/smp.c              | 106 +++++++++++++++--
 arch/mips/mm/c-r4k.c                               |  43 +++++++
 arch/mips/mm/page.c                                |   9 ++
 arch/mips/mm/tlb-r4k.c                             |  27 +++--
 arch/mips/mm/tlbex.c                               | 126 ++++++++++++++++++++-
 arch/mips/mm/uasm-mips.c                           |   2 +
 arch/mips/mm/uasm.c                                |   3 +
 drivers/platform/mips/cpu_hwmon.c                  |   4 +-
 27 files changed, 435 insertions(+), 59 deletions(-)
--
2.7.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V4 1/5] MIPS: Loongson: Add Loongson-3A R2 basic support
  2016-02-27 10:07 [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
@ 2016-02-27 10:07 ` Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 2/5] MIPS: Loongson-3: Set cache flush handlers to cache_noop Huacai Chen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Huacai Chen @ 2016-02-27 10:07 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-3 CPU family:

Code-name       Brand-name       PRId
Loongson-3A R1  Loongson-3A1000  0x6305
Loongson-3A R2  Loongson-3A2000  0x6308
Loongson-3B R1  Loongson-3B1000  0x6306
Loongson-3B R2  Loongson-3B1500  0x6307

Features of R2 revision of Loongson-3A:
1, Primary cache includes I-Cache, D-Cache and V-Cache (Victim Cache).
2, I-Cache, D-Cache and V-Cache are 16-way set-associative, linesize is 64 Bytes.
3, 64 entries of VTLB (classic TLB), 1024 entries of FTLB (8-way set-associative).
4, Support DSP/DSPv2 instructions, UserLocal register and Read-Inhibit/Execute-Inhibit.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                  |   1 +
 arch/mips/include/asm/cacheops.h                   |   6 ++
 arch/mips/include/asm/cpu-info.h                   |   1 +
 arch/mips/include/asm/cpu.h                        |   4 +-
 .../asm/mach-loongson64/cpu-feature-overrides.h    |  12 +--
 .../asm/mach-loongson64/kernel-entry-init.h        |   6 +-
 arch/mips/include/asm/mipsregs.h                   |   2 +
 arch/mips/include/asm/pgtable-bits.h               |   8 +-
 arch/mips/include/asm/pgtable.h                    |   4 +-
 arch/mips/kernel/cpu-probe.c                       |  38 +++++++-
 arch/mips/kernel/idle.c                            |   5 +
 arch/mips/kernel/traps.c                           |   3 +-
 arch/mips/loongson64/common/env.c                  |   7 +-
 arch/mips/loongson64/loongson-3/smp.c              | 106 +++++++++++++++++++--
 arch/mips/mm/c-r4k.c                               |  27 ++++++
 arch/mips/mm/tlbex.c                               |   2 +-
 drivers/platform/mips/cpu_hwmon.c                  |   4 +-
 17 files changed, 201 insertions(+), 35 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c5bf89cb..38c0b11 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1344,6 +1344,7 @@ config CPU_LOONGSON3
 	select CPU_SUPPORTS_HUGEPAGES
 	select WEAK_ORDERING
 	select WEAK_REORDERING_BEYOND_LLSC
+	select MIPS_PGD_C0_CONTEXT
 	select ARCH_REQUIRE_GPIOLIB
 	help
 		The Loongson 3 processor implements the MIPS64R2 instruction
diff --git a/arch/mips/include/asm/cacheops.h b/arch/mips/include/asm/cacheops.h
index c3212ff..8031fbc 100644
--- a/arch/mips/include/asm/cacheops.h
+++ b/arch/mips/include/asm/cacheops.h
@@ -21,6 +21,7 @@
 #define Cache_I				0x00
 #define Cache_D				0x01
 #define Cache_T				0x02
+#define Cache_V				0x02 /* Loongson-3 */
 #define Cache_S				0x03
 
 #define Index_Writeback_Inv		0x00
@@ -107,4 +108,9 @@
  */
 #define Hit_Invalidate_I_Loongson2	(Cache_I | 0x00)
 
+/*
+ * Loongson3-specific cacheops
+ */
+#define Index_Writeback_Inv_V		(Cache_V | Index_Writeback_Inv)
+
 #endif	/* __ASM_CACHEOPS_H */
diff --git a/arch/mips/include/asm/cpu-info.h b/arch/mips/include/asm/cpu-info.h
index e7dc785..6fd7b8bd 100644
--- a/arch/mips/include/asm/cpu-info.h
+++ b/arch/mips/include/asm/cpu-info.h
@@ -60,6 +60,7 @@ struct cpuinfo_mips {
 	int			tlbsizeftlbways;
 	struct cache_desc	icache; /* Primary I-cache */
 	struct cache_desc	dcache; /* Primary D or combined I/D cache */
+	struct cache_desc	vcache; /* Victim cache, between pcache and scache */
 	struct cache_desc	scache; /* Secondary cache */
 	struct cache_desc	tcache; /* Tertiary/split secondary cache */
 	int			srsets; /* Shadow register sets */
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 7bea0f3..68807b8 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -42,6 +42,7 @@
 #define PRID_COMP_LEXRA		0x0b0000
 #define PRID_COMP_NETLOGIC	0x0c0000
 #define PRID_COMP_CAVIUM	0x0d0000
+#define PRID_COMP_LOONGSON	0x140000
 #define PRID_COMP_INGENIC_D0	0xd00000	/* JZ4740, JZ4750 */
 #define PRID_COMP_INGENIC_D1	0xd10000	/* JZ4770, JZ4775 */
 #define PRID_COMP_INGENIC_E1	0xe10000	/* JZ4780 */
@@ -239,9 +240,10 @@
 #define PRID_REV_LOONGSON1B	0x0020
 #define PRID_REV_LOONGSON2E	0x0002
 #define PRID_REV_LOONGSON2F	0x0003
-#define PRID_REV_LOONGSON3A	0x0005
+#define PRID_REV_LOONGSON3A_R1	0x0005
 #define PRID_REV_LOONGSON3B_R1	0x0006
 #define PRID_REV_LOONGSON3B_R2	0x0007
+#define PRID_REV_LOONGSON3A_R2	0x0008
 
 /*
  * Older processors used to encode processor version and revision in two
diff --git a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
index 98963c2..c3406db 100644
--- a/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h
@@ -16,11 +16,6 @@
 #ifndef __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H
 #define __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H
 
-#define cpu_dcache_line_size()	32
-#define cpu_icache_line_size()	32
-#define cpu_scache_line_size()	32
-
-
 #define cpu_has_32fpr		1
 #define cpu_has_3k_cache	0
 #define cpu_has_4k_cache	1
@@ -31,8 +26,6 @@
 #define cpu_has_counter		1
 #define cpu_has_dc_aliases	(PAGE_SIZE < 0x4000)
 #define cpu_has_divec		0
-#define cpu_has_dsp		0
-#define cpu_has_dsp2		0
 #define cpu_has_ejtag		0
 #define cpu_has_ic_fills_f_dc	0
 #define cpu_has_inclusive_pcaches	1
@@ -40,15 +33,11 @@
 #define cpu_has_mcheck		0
 #define cpu_has_mdmx		0
 #define cpu_has_mips16		0
-#define cpu_has_mips32r2	0
 #define cpu_has_mips3d		0
-#define cpu_has_mips64r2	0
 #define cpu_has_mipsmt		0
-#define cpu_has_prefetch	0
 #define cpu_has_smartmips	0
 #define cpu_has_tlb		1
 #define cpu_has_tx39_cache	0
-#define cpu_has_userlocal	0
 #define cpu_has_vce		0
 #define cpu_has_veic		0
 #define cpu_has_vint		0
@@ -57,5 +46,6 @@
 #define cpu_has_local_ebase	0
 
 #define cpu_has_wsbh		IS_ENABLED(CONFIG_CPU_LOONGSON3)
+#define cpu_hwrena_impl_bits	0xc0000000
 
 #endif /* __ASM_MACH_LOONGSON64_CPU_FEATURE_OVERRIDES_H */
diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
index 3f2f84f..da83482 100644
--- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
+++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
@@ -23,7 +23,8 @@
 	or	t0, (0x1 << 7)
 	mtc0	t0, $16, 3
 	/* Set ELPA on LOONGSON3 pagegrain */
-	li	t0, (0x1 << 29)
+	mfc0	t0, $5, 1
+	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
 	_ehb
 	.set	pop
@@ -42,7 +43,8 @@
 	or	t0, (0x1 << 7)
 	mtc0	t0, $16, 3
 	/* Set ELPA on LOONGSON3 pagegrain */
-	li	t0, (0x1 << 29)
+	mfc0	t0, $5, 1
+	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
 	_ehb
 	.set	pop
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 3ad19ad..9290fd4 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -633,6 +633,8 @@
 #define MIPS_CONF6_SYND		(_ULCAST_(1) << 13)
 /* proAptiv FTLB on/off bit */
 #define MIPS_CONF6_FTLBEN	(_ULCAST_(1) << 15)
+/* Loongson-3 FTLB on/off bit */
+#define MIPS_CONF6_FTLBDIS	(_ULCAST_(1) << 22)
 /* FTLB probability bits */
 #define MIPS_CONF6_FTLBP_SHIFT	(16)
 
diff --git a/arch/mips/include/asm/pgtable-bits.h b/arch/mips/include/asm/pgtable-bits.h
index 97b3138..32b77bd 100644
--- a/arch/mips/include/asm/pgtable-bits.h
+++ b/arch/mips/include/asm/pgtable-bits.h
@@ -113,7 +113,7 @@
 #define _PAGE_PRESENT_SHIFT	0
 #define _PAGE_PRESENT		(1 << _PAGE_PRESENT_SHIFT)
 /* R2 or later cores check for RI/XI support to determine _PAGE_READ */
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 #define _PAGE_WRITE_SHIFT	(_PAGE_PRESENT_SHIFT + 1)
 #define _PAGE_WRITE		(1 << _PAGE_WRITE_SHIFT)
 #else
@@ -133,7 +133,7 @@
 #define _PAGE_HUGE		(1 << _PAGE_HUGE_SHIFT)
 #endif	/* CONFIG_64BIT && CONFIG_MIPS_HUGE_TLB_SUPPORT */
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 /* XI - page cannot be executed */
 #ifdef _PAGE_HUGE_SHIFT
 #define _PAGE_NO_EXEC_SHIFT	(_PAGE_HUGE_SHIFT + 1)
@@ -147,7 +147,7 @@
 #define _PAGE_READ		(cpu_has_rixi ? 0 : (1 << _PAGE_READ_SHIFT))
 #define _PAGE_NO_READ_SHIFT	_PAGE_READ_SHIFT
 #define _PAGE_NO_READ		(cpu_has_rixi ? (1 << _PAGE_READ_SHIFT) : 0)
-#endif	/* defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) */
+#endif	/* defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3) */
 
 #if defined(_PAGE_NO_READ_SHIFT)
 #define _PAGE_GLOBAL_SHIFT	(_PAGE_NO_READ_SHIFT + 1)
@@ -198,7 +198,7 @@
  */
 static inline uint64_t pte_to_entrylo(unsigned long pte_val)
 {
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (cpu_has_rixi) {
 		int sa;
 #ifdef CONFIG_32BIT
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 9a4fe01..35cd713 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -353,7 +353,7 @@ static inline pte_t pte_mkdirty(pte_t pte)
 static inline pte_t pte_mkyoung(pte_t pte)
 {
 	pte_val(pte) |= _PAGE_ACCESSED;
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (!(pte_val(pte) & _PAGE_NO_READ))
 		pte_val(pte) |= _PAGE_SILENT_READ;
 	else
@@ -542,7 +542,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
 {
 	pmd_val(pmd) |= _PAGE_ACCESSED;
 
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (!(pmd_val(pmd) & _PAGE_NO_READ))
 		pmd_val(pmd) |= _PAGE_SILENT_READ;
 	else
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9ad6157..2a2ae86 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -561,6 +561,16 @@ static int set_ftlb_enable(struct cpuinfo_mips *c, int enable)
 		write_c0_config7(config | (calculate_ftlb_probability(c)
 					   << MIPS_CONF7_FTLBP_SHIFT));
 		break;
+	case CPU_LOONGSON3:
+		/* Loongson-3 cores use Config6 to enable the FTLB */
+		config = read_c0_config6();
+		if (enable)
+			/* Enable FTLB */
+			write_c0_config6(config & ~MIPS_CONF6_FTLBDIS);
+		else
+			/* Disable FTLB */
+			write_c0_config6(config | MIPS_CONF6_FTLBDIS);
+		break;
 	default:
 		return 1;
 	}
@@ -1172,7 +1182,7 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
 			set_isa(c, MIPS_CPU_ISA_III);
 			c->fpu_msk31 |= FPU_CSR_CONDX;
 			break;
-		case PRID_REV_LOONGSON3A:
+		case PRID_REV_LOONGSON3A_R1:
 			c->cputype = CPU_LOONGSON3;
 			__cpu_name[cpu] = "ICT Loongson-3";
 			set_elf_platform(cpu, "loongson3a");
@@ -1495,6 +1505,29 @@ platform:
 	}
 }
 
+static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
+{
+	switch (c->processor_id & PRID_IMP_MASK) {
+	case PRID_IMP_LOONGSON_64:  /* Loongson-2/3 */
+		switch (c->processor_id & PRID_REV_MASK) {
+		case PRID_REV_LOONGSON3A_R2:
+			c->cputype = CPU_LOONGSON3;
+			__cpu_name[cpu] = "ICT Loongson-3";
+			set_elf_platform(cpu, "loongson3a");
+			set_isa(c, MIPS_CPU_ISA_M64R2);
+			break;
+		}
+
+		decode_configs(c);
+		c->options |= MIPS_CPU_TLBINV;
+		c->writecombine = _CACHE_UNCACHED_ACCELERATED;
+		break;
+	default:
+		panic("Unknown Loongson Processor ID!");
+		break;
+	}
+}
+
 static inline void cpu_probe_ingenic(struct cpuinfo_mips *c, unsigned int cpu)
 {
 	decode_configs(c);
@@ -1642,6 +1675,9 @@ void cpu_probe(void)
 	case PRID_COMP_CAVIUM:
 		cpu_probe_cavium(c, cpu);
 		break;
+	case PRID_COMP_LOONGSON:
+		cpu_probe_loongson(c, cpu);
+		break;
 	case PRID_COMP_INGENIC_D0:
 	case PRID_COMP_INGENIC_D1:
 	case PRID_COMP_INGENIC_E1:
diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c
index 46794d6..60ab4c4 100644
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -181,6 +181,11 @@ void __init check_wait(void)
 	case CPU_XLP:
 		cpu_wait = r4k_wait;
 		break;
+	case CPU_LOONGSON3:
+		if ((c->processor_id & PRID_REV_MASK) >= PRID_REV_LOONGSON3A_R2)
+			cpu_wait = r4k_wait;
+		break;
+
 	case CPU_BMIPS5000:
 		cpu_wait = r4k_wait_irqoff;
 		break;
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index c01f615..e49e7bc 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1770,7 +1770,8 @@ asmlinkage void do_ftlb(void)
 
 	/* For the moment, report the problem and hang. */
 	if ((cpu_has_mips_r2_r6) &&
-	    ((current_cpu_data.processor_id & 0xff0000) == PRID_COMP_MIPS)) {
+	    (((current_cpu_data.processor_id & 0xff0000) == PRID_COMP_MIPS) ||
+	    ((current_cpu_data.processor_id & 0xff0000) == PRID_COMP_LOONGSON))) {
 		pr_err("FTLB error exception, cp0_ecc=0x%08x:\n",
 		       read_c0_ecc());
 		pr_err("cp0_errorepc == %0*lx\n", field, read_c0_errorepc());
diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c
index d6d07ad..57d590a 100644
--- a/arch/mips/loongson64/common/env.c
+++ b/arch/mips/loongson64/common/env.c
@@ -105,6 +105,10 @@ void __init prom_init_env(void)
 		loongson_chiptemp[1] = 0x900010001fe0019c;
 		loongson_chiptemp[2] = 0x900020001fe0019c;
 		loongson_chiptemp[3] = 0x900030001fe0019c;
+		loongson_freqctrl[0] = 0x900000001fe001d0;
+		loongson_freqctrl[1] = 0x900010001fe001d0;
+		loongson_freqctrl[2] = 0x900020001fe001d0;
+		loongson_freqctrl[3] = 0x900030001fe001d0;
 		loongson_sysconf.ht_control_base = 0x90000EFDFB000000;
 		loongson_sysconf.workarounds = WORKAROUND_CPUFREQ;
 	} else if (ecpu->cputype == Loongson_3B) {
@@ -187,7 +191,8 @@ void __init prom_init_env(void)
 		case PRID_REV_LOONGSON2F:
 			cpu_clock_freq = 797000000;
 			break;
-		case PRID_REV_LOONGSON3A:
+		case PRID_REV_LOONGSON3A_R1:
+		case PRID_REV_LOONGSON3A_R2:
 			cpu_clock_freq = 900000000;
 			break;
 		case PRID_REV_LOONGSON3B_R1:
diff --git a/arch/mips/loongson64/loongson-3/smp.c b/arch/mips/loongson64/loongson-3/smp.c
index b913cd2..e59759a 100644
--- a/arch/mips/loongson64/loongson-3/smp.c
+++ b/arch/mips/loongson64/loongson-3/smp.c
@@ -439,7 +439,7 @@ static void loongson3_cpu_die(unsigned int cpu)
  * flush all L1 entries at first. Then, another core (usually Core 0) can
  * safely disable the clock of the target core. loongson3_play_dead() is
  * called via CKSEG1 (uncached and unmmaped) */
-static void loongson3a_play_dead(int *state_addr)
+static void loongson3a_r1_play_dead(int *state_addr)
 {
 	register int val;
 	register long cpuid, core, node, count;
@@ -501,6 +501,89 @@ static void loongson3a_play_dead(int *state_addr)
 		: "a1");
 }
 
+static void loongson3a_r2_play_dead(int *state_addr)
+{
+	register int val;
+	register long cpuid, core, node, count;
+	register void *addr, *base, *initfunc;
+
+	__asm__ __volatile__(
+		"   .set push                     \n"
+		"   .set noreorder                \n"
+		"   li %[addr], 0x80000000        \n" /* KSEG0 */
+		"1: cache 0, 0(%[addr])           \n" /* flush L1 ICache */
+		"   cache 0, 1(%[addr])           \n"
+		"   cache 0, 2(%[addr])           \n"
+		"   cache 0, 3(%[addr])           \n"
+		"   cache 1, 0(%[addr])           \n" /* flush L1 DCache */
+		"   cache 1, 1(%[addr])           \n"
+		"   cache 1, 2(%[addr])           \n"
+		"   cache 1, 3(%[addr])           \n"
+		"   addiu %[sets], %[sets], -1    \n"
+		"   bnez  %[sets], 1b             \n"
+		"   addiu %[addr], %[addr], 0x40  \n"
+		"   li %[addr], 0x80000000        \n" /* KSEG0 */
+		"2: cache 2, 0(%[addr])           \n" /* flush L1 VCache */
+		"   cache 2, 1(%[addr])           \n"
+		"   cache 2, 2(%[addr])           \n"
+		"   cache 2, 3(%[addr])           \n"
+		"   cache 2, 4(%[addr])           \n"
+		"   cache 2, 5(%[addr])           \n"
+		"   cache 2, 6(%[addr])           \n"
+		"   cache 2, 7(%[addr])           \n"
+		"   cache 2, 8(%[addr])           \n"
+		"   cache 2, 9(%[addr])           \n"
+		"   cache 2, 10(%[addr])          \n"
+		"   cache 2, 11(%[addr])          \n"
+		"   cache 2, 12(%[addr])          \n"
+		"   cache 2, 13(%[addr])          \n"
+		"   cache 2, 14(%[addr])          \n"
+		"   cache 2, 15(%[addr])          \n"
+		"   addiu %[vsets], %[vsets], -1  \n"
+		"   bnez  %[vsets], 2b            \n"
+		"   addiu %[addr], %[addr], 0x40  \n"
+		"   li    %[val], 0x7             \n" /* *state_addr = CPU_DEAD; */
+		"   sw    %[val], (%[state_addr]) \n"
+		"   sync                          \n"
+		"   cache 21, (%[state_addr])     \n" /* flush entry of *state_addr */
+		"   .set pop                      \n"
+		: [addr] "=&r" (addr), [val] "=&r" (val)
+		: [state_addr] "r" (state_addr),
+		  [sets] "r" (cpu_data[smp_processor_id()].dcache.sets),
+		  [vsets] "r" (cpu_data[smp_processor_id()].vcache.sets));
+
+	__asm__ __volatile__(
+		"   .set push                         \n"
+		"   .set noreorder                    \n"
+		"   .set mips64                       \n"
+		"   mfc0  %[cpuid], $15, 1            \n"
+		"   andi  %[cpuid], 0x3ff             \n"
+		"   dli   %[base], 0x900000003ff01000 \n"
+		"   andi  %[core], %[cpuid], 0x3      \n"
+		"   sll   %[core], 8                  \n" /* get core id */
+		"   or    %[base], %[base], %[core]   \n"
+		"   andi  %[node], %[cpuid], 0xc      \n"
+		"   dsll  %[node], 42                 \n" /* get node id */
+		"   or    %[base], %[base], %[node]   \n"
+		"1: li    %[count], 0x100             \n" /* wait for init loop */
+		"2: bnez  %[count], 2b                \n" /* limit mailbox access */
+		"   addiu %[count], -1                \n"
+		"   ld    %[initfunc], 0x20(%[base])  \n" /* get PC via mailbox */
+		"   beqz  %[initfunc], 1b             \n"
+		"   nop                               \n"
+		"   ld    $sp, 0x28(%[base])          \n" /* get SP via mailbox */
+		"   ld    $gp, 0x30(%[base])          \n" /* get GP via mailbox */
+		"   ld    $a1, 0x38(%[base])          \n"
+		"   jr    %[initfunc]                 \n" /* jump to initial PC */
+		"   nop                               \n"
+		"   .set pop                          \n"
+		: [core] "=&r" (core), [node] "=&r" (node),
+		  [base] "=&r" (base), [cpuid] "=&r" (cpuid),
+		  [count] "=&r" (count), [initfunc] "=&r" (initfunc)
+		: /* No Input */
+		: "a1");
+}
+
 static void loongson3b_play_dead(int *state_addr)
 {
 	register int val;
@@ -572,13 +655,18 @@ void play_dead(void)
 	void (*play_dead_at_ckseg1)(int *);
 
 	idle_task_exit();
-	switch (loongson_sysconf.cputype) {
-	case Loongson_3A:
+	switch (read_c0_prid() & PRID_REV_MASK) {
+	case PRID_REV_LOONGSON3A_R1:
 	default:
 		play_dead_at_ckseg1 =
-			(void *)CKSEG1ADDR((unsigned long)loongson3a_play_dead);
+			(void *)CKSEG1ADDR((unsigned long)loongson3a_r1_play_dead);
+		break;
+	case PRID_REV_LOONGSON3A_R2:
+		play_dead_at_ckseg1 =
+			(void *)CKSEG1ADDR((unsigned long)loongson3a_r2_play_dead);
 		break;
-	case Loongson_3B:
+	case PRID_REV_LOONGSON3B_R1:
+	case PRID_REV_LOONGSON3B_R2:
 		play_dead_at_ckseg1 =
 			(void *)CKSEG1ADDR((unsigned long)loongson3b_play_dead);
 		break;
@@ -593,9 +681,9 @@ void loongson3_disable_clock(int cpu)
 	uint64_t core_id = cpu_data[cpu].core;
 	uint64_t package_id = cpu_data[cpu].package;
 
-	if (loongson_sysconf.cputype == Loongson_3A) {
+	if ((read_c0_prid() & PRID_REV_MASK) == PRID_REV_LOONGSON3A_R1) {
 		LOONGSON_CHIPCFG(package_id) &= ~(1 << (12 + core_id));
-	} else if (loongson_sysconf.cputype == Loongson_3B) {
+	} else {
 		if (!(loongson_sysconf.workarounds & WORKAROUND_CPUHOTPLUG))
 			LOONGSON_FREQCTRL(package_id) &= ~(1 << (core_id * 4 + 3));
 	}
@@ -606,9 +694,9 @@ void loongson3_enable_clock(int cpu)
 	uint64_t core_id = cpu_data[cpu].core;
 	uint64_t package_id = cpu_data[cpu].package;
 
-	if (loongson_sysconf.cputype == Loongson_3A) {
+	if ((read_c0_prid() & PRID_REV_MASK) == PRID_REV_LOONGSON3A_R1) {
 		LOONGSON_CHIPCFG(package_id) |= 1 << (12 + core_id);
-	} else if (loongson_sysconf.cputype == Loongson_3B) {
+	} else {
 		if (!(loongson_sysconf.workarounds & WORKAROUND_CPUHOTPLUG))
 			LOONGSON_FREQCTRL(package_id) |= 1 << (core_id * 4 + 3);
 	}
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index caac3d7..2abc73d 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -77,6 +77,7 @@ static inline void r4k_on_each_cpu(void (*func) (void *info), void *info)
  */
 static unsigned long icache_size __read_mostly;
 static unsigned long dcache_size __read_mostly;
+static unsigned long vcache_size __read_mostly;
 static unsigned long scache_size __read_mostly;
 
 /*
@@ -1328,6 +1329,31 @@ static void probe_pcache(void)
 	       c->dcache.linesz);
 }
 
+static void probe_vcache(void)
+{
+	struct cpuinfo_mips *c = &current_cpu_data;
+	unsigned int config2, lsize;
+
+	if (current_cpu_type() != CPU_LOONGSON3)
+		return;
+
+	config2 = read_c0_config2();
+	if ((lsize = ((config2 >> 20) & 15)))
+		c->vcache.linesz = 2 << lsize;
+	else
+		c->vcache.linesz = lsize;
+
+	c->vcache.sets = 64 << ((config2 >> 24) & 15);
+	c->vcache.ways = 1 + ((config2 >> 16) & 15);
+
+	vcache_size = c->vcache.sets * c->vcache.ways * c->vcache.linesz;
+
+	c->vcache.waybit = 0;
+
+	pr_info("Unified victim cache %ldkB %s, linesize %d bytes.\n",
+		vcache_size >> 10, way_string[c->vcache.ways], c->vcache.linesz);
+}
+
 /*
  * If you even _breathe_ on this function, look at the gcc output and make sure
  * it does not pop things on and off the stack for the cache sizing loop that
@@ -1650,6 +1676,7 @@ void r4k_cache_init(void)
 	struct cpuinfo_mips *c = &current_cpu_data;
 
 	probe_pcache();
+	probe_vcache();
 	setup_scache();
 
 	r4k_blast_dcache_page_setup();
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 84c6e3f..b1a712f 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -241,7 +241,7 @@ static void output_pgtable_bits_defines(void)
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	pr_define("_PAGE_HUGE_SHIFT %d\n", _PAGE_HUGE_SHIFT);
 #endif
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
+#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) || defined(CONFIG_CPU_LOONGSON3)
 	if (cpu_has_rixi) {
 #ifdef _PAGE_NO_EXEC_SHIFT
 		pr_define("_PAGE_NO_EXEC_SHIFT %d\n", _PAGE_NO_EXEC_SHIFT);
diff --git a/drivers/platform/mips/cpu_hwmon.c b/drivers/platform/mips/cpu_hwmon.c
index 0f6c63e..8c5072b 100644
--- a/drivers/platform/mips/cpu_hwmon.c
+++ b/drivers/platform/mips/cpu_hwmon.c
@@ -20,9 +20,9 @@ int loongson3_cpu_temp(int cpu)
 	u32 reg;
 
 	reg = LOONGSON_CHIPTEMP(cpu);
-	if (loongson_sysconf.cputype == Loongson_3A)
+	if ((read_c0_prid() & PRID_REV_MASK) == PRID_REV_LOONGSON3A_R1)
 		reg = (reg >> 8) & 0xff;
-	else if (loongson_sysconf.cputype == Loongson_3B)
+	else
 		reg = ((reg >> 8) & 0xff) - 100;
 
 	return (int)reg * 1000;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 2/5] MIPS: Loongson-3: Set cache flush handlers to cache_noop
  2016-02-27 10:07 [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 1/5] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
@ 2016-02-27 10:07 ` Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Huacai Chen @ 2016-02-27 10:07 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-3 maintains cache coherency by hardware, this means:
 1) It's icache is coherent with dcache.
 2) It's dcaches don't alias (maybe depend on PAGE_SIZE).
 3) It maintains cache coherency across cores (and for DMA).

So we can skip most cache flush operations by setting relevant handlers
to `cache_noop' in `r4k_cache_init'.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/mm/c-r4k.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 2abc73d..8df4a94 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -1777,6 +1777,20 @@ void r4k_cache_init(void)
 		/* Optimization: an L2 flush implicitly flushes the L1 */
 		current_cpu_data.options |= MIPS_CPU_INCLUSIVE_CACHES;
 		break;
+	case CPU_LOONGSON3:
+		/* Loongson-3 maintains cache coherency by hardware */
+		__flush_cache_all	= cache_noop;
+		__flush_cache_vmap	= cache_noop;
+		__flush_cache_vunmap	= cache_noop;
+		__flush_kernel_vmap_range = (void *)cache_noop;
+		flush_cache_mm		= (void *)cache_noop;
+		flush_cache_page	= (void *)cache_noop;
+		flush_cache_range	= (void *)cache_noop;
+		flush_cache_sigtramp	= (void *)cache_noop;
+		flush_icache_all	= (void *)cache_noop;
+		flush_data_cache_page	= (void *)cache_noop;
+		local_flush_data_cache_page	= (void *)cache_noop;
+		break;
 	}
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed
  2016-02-27 10:07 [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 1/5] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 2/5] MIPS: Loongson-3: Set cache flush handlers to cache_noop Huacai Chen
@ 2016-02-27 10:07 ` Huacai Chen
  2016-03-02 10:41   ` Ralf Baechle
  2016-02-27 10:07 ` [PATCH V4 4/5] MIPS: Loongson-3: Fast TLB refill handler Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 5/5] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT Huacai Chen
  4 siblings, 1 reply; 8+ messages in thread
From: Huacai Chen @ 2016-02-27 10:07 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-2 has a 4 entry itlb which is a subset of jtlb, Loongson-3 has
a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. We should
write diag register to invalidate itlb/dtlb when flushing jtlb because
itlb/dtlb are not totally transparent to software.

For Loongson-3A R2 (and newer), we should invalidate ITLB, DTLB, VTLB
and FTLB before we enable/disable FTLB.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/kernel/cpu-probe.c |  2 ++
 arch/mips/mm/tlb-r4k.c       | 27 +++++++++++++++------------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 2a2ae86..ef605e2 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -562,6 +562,8 @@ static int set_ftlb_enable(struct cpuinfo_mips *c, int enable)
 					   << MIPS_CONF7_FTLBP_SHIFT));
 		break;
 	case CPU_LOONGSON3:
+		/* Flush ITLB, DTLB, VTLB and FTLB */
+		write_c0_diag(1<<2 | 1<<3 | 1<<12 | 1<<13);
 		/* Loongson-3 cores use Config6 to enable the FTLB */
 		config = read_c0_config6();
 		if (enable)
diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
index c17d762..7593529 100644
--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -28,25 +28,28 @@
 extern void build_tlb_refill_handler(void);
 
 /*
- * LOONGSON2/3 has a 4 entry itlb which is a subset of dtlb,
- * unfortunately, itlb is not totally transparent to software.
+ * LOONGSON-2 has a 4 entry itlb which is a subset of jtlb, LOONGSON-3 has
+ * a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. Unfortunately,
+ * itlb/dtlb are not totally transparent to software.
  */
-static inline void flush_itlb(void)
+static inline void flush_spec_tlb(void)
 {
 	switch (current_cpu_type()) {
 	case CPU_LOONGSON2:
+		write_c0_diag(0x4);
+		break;
 	case CPU_LOONGSON3:
-		write_c0_diag(4);
+		write_c0_diag(0xc);
 		break;
 	default:
 		break;
 	}
 }
 
-static inline void flush_itlb_vm(struct vm_area_struct *vma)
+static inline void flush_spec_tlb_vm(struct vm_area_struct *vma)
 {
 	if (vma->vm_flags & VM_EXEC)
-		flush_itlb();
+		flush_spec_tlb();
 }
 
 void local_flush_tlb_all(void)
@@ -93,7 +96,7 @@ void local_flush_tlb_all(void)
 	tlbw_use_hazard();
 	write_c0_entryhi(old_ctx);
 	htw_start();
-	flush_itlb();
+	flush_spec_tlb();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL(local_flush_tlb_all);
@@ -159,7 +162,7 @@ void local_flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 		} else {
 			drop_mmu_context(mm, cpu);
 		}
-		flush_itlb();
+		flush_spec_tlb();
 		local_irq_restore(flags);
 	}
 }
@@ -205,7 +208,7 @@ void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 	} else {
 		local_flush_tlb_all();
 	}
-	flush_itlb();
+	flush_spec_tlb();
 	local_irq_restore(flags);
 }
 
@@ -240,7 +243,7 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long page)
 	finish:
 		write_c0_entryhi(oldpid);
 		htw_start();
-		flush_itlb_vm(vma);
+		flush_spec_tlb_vm(vma);
 		local_irq_restore(flags);
 	}
 }
@@ -274,7 +277,7 @@ void local_flush_tlb_one(unsigned long page)
 	}
 	write_c0_entryhi(oldpid);
 	htw_start();
-	flush_itlb();
+	flush_spec_tlb();
 	local_irq_restore(flags);
 }
 
@@ -357,7 +360,7 @@ void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t pte)
 	}
 	tlbw_use_hazard();
 	htw_start();
-	flush_itlb_vm(vma);
+	flush_spec_tlb_vm(vma);
 	local_irq_restore(flags);
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 4/5] MIPS: Loongson-3: Fast TLB refill handler
  2016-02-27 10:07 [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
                   ` (2 preceding siblings ...)
  2016-02-27 10:07 ` [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
@ 2016-02-27 10:07 ` Huacai Chen
  2016-02-27 10:07 ` [PATCH V4 5/5] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT Huacai Chen
  4 siblings, 0 replies; 8+ messages in thread
From: Huacai Chen @ 2016-02-27 10:07 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

Loongson-3A R2 has pwbase/pwfield/pwsize/pwctl registers in CP0 (this
is very similar to HTW) and lwdir/lwpte/lddir/ldpte instructions which
can be used for fast TLB refill.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/include/asm/cpu-features.h |   3 +
 arch/mips/include/asm/cpu.h          |   1 +
 arch/mips/include/asm/mipsregs.h     |   6 ++
 arch/mips/include/asm/uasm.h         |   3 +-
 arch/mips/include/uapi/asm/inst.h    |  10 +++
 arch/mips/kernel/cpu-probe.c         |   2 +-
 arch/mips/mm/tlbex.c                 | 124 ++++++++++++++++++++++++++++++++++-
 arch/mips/mm/uasm-mips.c             |   2 +
 arch/mips/mm/uasm.c                  |   3 +
 9 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
index eeec8c8..e0ba50a 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -35,6 +35,9 @@
 #ifndef cpu_has_htw
 #define cpu_has_htw		(cpu_data[0].options & MIPS_CPU_HTW)
 #endif
+#ifndef cpu_has_ldpte
+#define cpu_has_ldpte		(cpu_data[0].options & MIPS_CPU_LDPTE)
+#endif
 #ifndef cpu_has_rixiex
 #define cpu_has_rixiex		(cpu_data[0].options & MIPS_CPU_RIXIEX)
 #endif
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 68807b8..67bbff5 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -392,6 +392,7 @@ enum cpu_type_enum {
 #define MIPS_CPU_FTLB		0x20000000000ull /* CPU has Fixed-page-size TLB */
 #define MIPS_CPU_NAN_LEGACY	0x40000000000ull /* Legacy NaN implemented */
 #define MIPS_CPU_NAN_2008	0x80000000000ull /* 2008 NaN implemented */
+#define MIPS_CPU_LDPTE		0x100000000000ull /* CPU has ldpte/lddir instructions */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 9290fd4..8affca2 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -1444,6 +1444,12 @@ do {									\
 #define read_c0_pwctl()		__read_32bit_c0_register($6, 6)
 #define write_c0_pwctl(val)	__write_32bit_c0_register($6, 6, val)
 
+#define read_c0_pgd()		__read_64bit_c0_register($9, 7)
+#define write_c0_pgd(val)	__write_64bit_c0_register($9, 7, val)
+
+#define read_c0_kpgd()		__read_64bit_c0_register($31, 7)
+#define write_c0_kpgd(val)	__write_64bit_c0_register($31, 7, val)
+
 /* Cavium OCTEON (cnMIPS) */
 #define read_c0_cvmcount()	__read_ulong_c0_register($9, 6)
 #define write_c0_cvmcount(val)	__write_ulong_c0_register($9, 6, val)
diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h
index fc1cdd2..b6ecfee 100644
--- a/arch/mips/include/asm/uasm.h
+++ b/arch/mips/include/asm/uasm.h
@@ -171,7 +171,8 @@ Ip_u2u1(_wsbh);
 Ip_u3u1u2(_xor);
 Ip_u2u1u3(_xori);
 Ip_u2u1(_yield);
-
+Ip_u1u2(_ldpte);
+Ip_u2u1u3(_lddir);
 
 /* Handle labels. */
 struct uasm_label {
diff --git a/arch/mips/include/uapi/asm/inst.h b/arch/mips/include/uapi/asm/inst.h
index ddea53e..3bb8cd9 100644
--- a/arch/mips/include/uapi/asm/inst.h
+++ b/arch/mips/include/uapi/asm/inst.h
@@ -204,6 +204,16 @@ enum mad_func {
 };
 
 /*
+ * func field for page table walker (Loongson-3).
+ */
+enum ptw_func {
+	lwdir_op = 0x00,
+	lwpte_op = 0x01,
+	lddir_op = 0x02,
+	ldpte_op = 0x03,
+};
+
+/*
  * func field for special3 lx opcodes (Cavium Octeon).
  */
 enum lx_func {
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index ef605e2..aa2e615 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1521,7 +1521,7 @@ static inline void cpu_probe_loongson(struct cpuinfo_mips *c, unsigned int cpu)
 		}
 
 		decode_configs(c);
-		c->options |= MIPS_CPU_TLBINV;
+		c->options |= MIPS_CPU_TLBINV | MIPS_CPU_LDPTE;
 		c->writecombine = _CACHE_UNCACHED_ACCELERATED;
 		break;
 	default:
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index b1a712f..7bc8978 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -284,7 +284,12 @@ static inline void dump_handler(const char *symbol, const u32 *handler, int coun
 #define C0_ENTRYLO1	3, 0
 #define C0_CONTEXT	4, 0
 #define C0_PAGEMASK	5, 0
+#define C0_PWBASE	5, 5
+#define C0_PWFIELD	5, 6
+#define C0_PWSIZE	5, 7
+#define C0_PWCTL	6, 6
 #define C0_BADVADDR	8, 0
+#define C0_PGD		9, 7
 #define C0_ENTRYHI	10, 0
 #define C0_EPC		14, 0
 #define C0_XCONTEXT	20, 0
@@ -808,7 +813,10 @@ build_get_pmde64(u32 **p, struct uasm_label **l, struct uasm_reloc **r,
 
 	if (pgd_reg != -1) {
 		/* pgd is in pgd_reg */
-		UASM_i_MFC0(p, ptr, c0_kscratch(), pgd_reg);
+		if (cpu_has_ldpte)
+			UASM_i_MFC0(p, ptr, C0_PWBASE);
+		else
+			UASM_i_MFC0(p, ptr, c0_kscratch(), pgd_reg);
 	} else {
 #if defined(CONFIG_MIPS_PGD_C0_CONTEXT)
 		/*
@@ -1421,6 +1429,108 @@ static void build_r4000_tlb_refill_handler(void)
 	dump_handler("r4000_tlb_refill", (u32 *)ebase, 64);
 }
 
+static void setup_pw(void)
+{
+	unsigned long pgd_i, pgd_w;
+#ifndef __PAGETABLE_PMD_FOLDED
+	unsigned long pmd_i, pmd_w;
+#endif
+	unsigned long pt_i, pt_w;
+	unsigned long pte_i, pte_w;
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	unsigned long psn;
+
+	psn = ilog2(_PAGE_HUGE);     /* bit used to indicate huge page */
+#endif
+	pgd_i = PGDIR_SHIFT;  /* 1st level PGD */
+#ifndef __PAGETABLE_PMD_FOLDED
+	pgd_w = PGDIR_SHIFT - PMD_SHIFT + PGD_ORDER;
+
+	pmd_i = PMD_SHIFT;    /* 2nd level PMD */
+	pmd_w = PMD_SHIFT - PAGE_SHIFT;
+#else
+	pgd_w = PGDIR_SHIFT - PAGE_SHIFT + PGD_ORDER;
+#endif
+
+	pt_i  = PAGE_SHIFT;    /* 3rd level PTE */
+	pt_w  = PAGE_SHIFT - 3;
+
+	pte_i = ilog2(_PAGE_GLOBAL);
+	pte_w = 0;
+
+#ifndef __PAGETABLE_PMD_FOLDED
+	write_c0_pwfield(pgd_i << 24 | pmd_i << 12 | pt_i << 6 | pte_i);
+	write_c0_pwsize(1 << 30 | pgd_w << 24 | pmd_w << 12 | pt_w << 6 | pte_w);
+#else
+	write_c0_pwfield(pgd_i << 24 | pt_i << 6 | pte_i);
+	write_c0_pwsize(1 << 30 | pgd_w << 24 | pt_w << 6 | pte_w);
+#endif
+
+#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
+	write_c0_pwctl(1 << 6 | psn);
+#endif
+	write_c0_kpgd(swapper_pg_dir);
+	kscratch_used_mask |= (1 << 7); /* KScratch6 is used for KPGD */
+}
+
+static void build_loongson3_tlb_refill_handler(void)
+{
+	u32 *p = tlb_handler;
+	struct uasm_label *l = labels;
+	struct uasm_reloc *r = relocs;
+
+	memset(labels, 0, sizeof(labels));
+	memset(relocs, 0, sizeof(relocs));
+	memset(tlb_handler, 0, sizeof(tlb_handler));
+
+	if (check_for_high_segbits) {
+		uasm_i_dmfc0(&p, K0, C0_BADVADDR);
+		uasm_i_dsrl_safe(&p, K1, K0, PGDIR_SHIFT + PGD_ORDER + PAGE_SHIFT - 3);
+		uasm_il_beqz(&p, &r, K1, label_vmalloc);
+		uasm_i_nop(&p);
+
+		uasm_il_bgez(&p, &r, K0, label_large_segbits_fault);
+		uasm_i_nop(&p);
+		uasm_l_vmalloc(&l, p);
+	}
+
+	uasm_i_dmfc0(&p, K1, C0_PGD);
+
+	uasm_i_lddir(&p, K0, K1, 3);  /* global page dir */
+#ifndef __PAGETABLE_PMD_FOLDED
+	uasm_i_lddir(&p, K1, K0, 1);  /* middle page dir */
+#endif
+	uasm_i_ldpte(&p, K1, 0);      /* even */
+	uasm_i_ldpte(&p, K1, 1);      /* odd */
+	uasm_i_tlbwr(&p);
+
+	/* restore page mask */
+	if (PM_DEFAULT_MASK >> 16) {
+		uasm_i_lui(&p, K0, PM_DEFAULT_MASK >> 16);
+		uasm_i_ori(&p, K0, K0, PM_DEFAULT_MASK & 0xffff);
+		uasm_i_mtc0(&p, K0, C0_PAGEMASK);
+	} else if (PM_DEFAULT_MASK) {
+		uasm_i_ori(&p, K0, 0, PM_DEFAULT_MASK);
+		uasm_i_mtc0(&p, K0, C0_PAGEMASK);
+	} else {
+		uasm_i_mtc0(&p, 0, C0_PAGEMASK);
+	}
+
+	uasm_i_eret(&p);
+
+	if (check_for_high_segbits) {
+		uasm_l_large_segbits_fault(&l, p);
+		UASM_i_LA(&p, K1, (unsigned long)tlb_do_page_fault_0);
+		uasm_i_jr(&p, K1);
+		uasm_i_nop(&p);
+	}
+
+	uasm_resolve_relocs(relocs, labels);
+	memcpy((void *)(ebase + 0x80), tlb_handler, 0x80);
+	local_flush_icache_range(ebase + 0x80, ebase + 0x100);
+	dump_handler("loongson3_tlb_refill", (u32 *)(ebase + 0x80), 32);
+}
+
 extern u32 handle_tlbl[], handle_tlbl_end[];
 extern u32 handle_tlbs[], handle_tlbs_end[];
 extern u32 handle_tlbm[], handle_tlbm_end[];
@@ -1468,7 +1578,10 @@ static void build_setup_pgd(void)
 	} else {
 		/* PGD in c0_KScratch */
 		uasm_i_jr(&p, 31);
-		UASM_i_MTC0(&p, a0, c0_kscratch(), pgd_reg);
+		if (cpu_has_ldpte)
+			UASM_i_MTC0(&p, a0, C0_PWBASE);
+		else
+			UASM_i_MTC0(&p, a0, c0_kscratch(), pgd_reg);
 	}
 #else
 #ifdef CONFIG_SMP
@@ -2437,13 +2550,18 @@ void build_tlb_refill_handler(void)
 		break;
 
 	default:
+		if (cpu_has_ldpte)
+			setup_pw();
+
 		if (!run_once) {
 			scratch_reg = allocate_kscratch();
 			build_setup_pgd();
 			build_r4000_tlb_load_handler();
 			build_r4000_tlb_store_handler();
 			build_r4000_tlb_modify_handler();
-			if (!cpu_has_local_ebase)
+			if (cpu_has_ldpte)
+				build_loongson3_tlb_refill_handler();
+			else if (!cpu_has_local_ebase)
 				build_r4000_tlb_refill_handler();
 			flush_tlb_handlers();
 			run_once++;
diff --git a/arch/mips/mm/uasm-mips.c b/arch/mips/mm/uasm-mips.c
index b4a83789..9c2220a 100644
--- a/arch/mips/mm/uasm-mips.c
+++ b/arch/mips/mm/uasm-mips.c
@@ -153,6 +153,8 @@ static struct insn insn_table[] = {
 	{ insn_xori,  M(xori_op, 0, 0, 0, 0, 0),  RS | RT | UIMM },
 	{ insn_xor,  M(spec_op, 0, 0, 0, 0, xor_op),  RS | RT | RD },
 	{ insn_yield, M(spec3_op, 0, 0, 0, 0, yield_op), RS | RD },
+	{ insn_ldpte, M(lwc2_op, 0, 0, 0, ldpte_op, mult_op), RS | RD },
+	{ insn_lddir, M(lwc2_op, 0, 0, 0, lddir_op, mult_op), RS | RT | RD },
 	{ insn_invalid, 0, 0 }
 };
 
diff --git a/arch/mips/mm/uasm.c b/arch/mips/mm/uasm.c
index 319051c..ad718de 100644
--- a/arch/mips/mm/uasm.c
+++ b/arch/mips/mm/uasm.c
@@ -60,6 +60,7 @@ enum opcode {
 	insn_sltiu, insn_sltu, insn_sra, insn_srl, insn_srlv, insn_subu,
 	insn_sw, insn_sync, insn_syscall, insn_tlbp, insn_tlbr, insn_tlbwi,
 	insn_tlbwr, insn_wait, insn_wsbh, insn_xor, insn_xori, insn_yield,
+	insn_lddir, insn_ldpte,
 };
 
 struct insn {
@@ -335,6 +336,8 @@ I_u1u2s3(_bbit0);
 I_u1u2s3(_bbit1);
 I_u3u1u2(_lwx)
 I_u3u1u2(_ldx)
+I_u1u2(_ldpte)
+I_u2u1u3(_lddir)
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 #include <asm/octeon/octeon.h>
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V4 5/5] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT
  2016-02-27 10:07 [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
                   ` (3 preceding siblings ...)
  2016-02-27 10:07 ` [PATCH V4 4/5] MIPS: Loongson-3: Fast TLB refill handler Huacai Chen
@ 2016-02-27 10:07 ` Huacai Chen
  4 siblings, 0 replies; 8+ messages in thread
From: Huacai Chen @ 2016-02-27 10:07 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang,
	Zhangjin Wu, Huacai Chen

New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A R1,
Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as FTLB,
L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User Local
register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer), Fast
TLB refill support, etc.

This patch introduce a config option, CONFIG_LOONGSON3_ENHANCEMENT, to
enable those enhancements which cannot be probed at run time. If you
want a generic kernel to run on all Loongson 3 machines, please say 'N'
here. If you want a high-performance kernel to run on new Loongson 3
machines only, please say 'Y' here.

Some additional explanations:
1) SFB locates between core and L1 cache, it causes memory access out
   of order, so writel/outl (and other similar functions) need a I/O
   reorder barrier.
2) Loongson 3 has a bug that di instruction can not save the irqflag,
   so arch_local_irq_save() is modified. Since CPU_MIPSR2 is selected
   by CONFIG_LOONGSON3_ENHANCEMENT, generic kernel doesn't use ei/di
   at all.
3) CPU_HAS_PREFETCH is selected by CONFIG_LOONGSON3_ENHANCEMENT, so
   MIPS_CPU_PREFETCH (used by uasm) probing is also put in this patch.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 arch/mips/Kconfig                                      | 18 ++++++++++++++++++
 arch/mips/include/asm/hazards.h                        |  7 ++++---
 arch/mips/include/asm/io.h                             | 10 +++++-----
 arch/mips/include/asm/irqflags.h                       |  5 +++++
 .../include/asm/mach-loongson64/kernel-entry-init.h    | 12 ++++++++++++
 arch/mips/mm/c-r4k.c                                   |  2 ++
 arch/mips/mm/page.c                                    |  9 +++++++++
 7 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 38c0b11..68ec57a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1350,6 +1350,24 @@ config CPU_LOONGSON3
 		The Loongson 3 processor implements the MIPS64R2 instruction
 		set with many extensions.
 
+config LOONGSON3_ENHANCEMENT
+	bool "New Loongson 3 CPU Enhancements"
+	default n
+	select CPU_MIPSR2
+	select CPU_HAS_PREFETCH
+	depends on CPU_LOONGSON3
+	help
+	  New Loongson 3 CPU (since Loongson-3A R2, as opposed to Loongson-3A
+	  R1, Loongson-3B R1 and Loongson-3B R2) has many enhancements, such as
+	  FTLB, L1-VCache, EI/DI/Wait/Prefetch instruction, DSP/DSPv2 ASE, User
+	  Local register, Read-Inhibit/Execute-Inhibit, SFB (Store Fill Buffer),
+	  Fast TLB refill support, etc.
+
+	  This option enable those enhancements which cannot be probed at run
+	  time. If you want a generic kernel to run on all Loongson 3 machines,
+	  please say 'N' here. If you want a high-performance kernel to run on
+	  new Loongson 3 machines only, please say 'Y' here.
+
 config CPU_LOONGSON2E
 	bool "Loongson 2E"
 	depends on SYS_HAS_CPU_LOONGSON2E
diff --git a/arch/mips/include/asm/hazards.h b/arch/mips/include/asm/hazards.h
index 7b99efd..dbb1eb6 100644
--- a/arch/mips/include/asm/hazards.h
+++ b/arch/mips/include/asm/hazards.h
@@ -22,7 +22,8 @@
 /*
  * TLB hazards
  */
-#if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6) && !defined(CONFIG_CPU_CAVIUM_OCTEON)
+#if (defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)) && \
+	!defined(CONFIG_CPU_CAVIUM_OCTEON) && !defined(CONFIG_LOONGSON3_ENHANCEMENT)
 
 /*
  * MIPSR2 defines ehb for hazard avoidance
@@ -155,8 +156,8 @@ do {									\
 } while (0)
 
 #elif defined(CONFIG_MIPS_ALCHEMY) || defined(CONFIG_CPU_CAVIUM_OCTEON) || \
-	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_CPU_R10000) || \
-	defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
+	defined(CONFIG_CPU_LOONGSON2) || defined(CONFIG_LOONGSON3_ENHANCEMENT) || \
+	defined(CONFIG_CPU_R10000) || defined(CONFIG_CPU_R5500) || defined(CONFIG_CPU_XLR)
 
 /*
  * R10000 rocks - all hazards handled in hardware, so this becomes a nobrainer.
diff --git a/arch/mips/include/asm/io.h b/arch/mips/include/asm/io.h
index 2b4dc7a..ecabc00 100644
--- a/arch/mips/include/asm/io.h
+++ b/arch/mips/include/asm/io.h
@@ -304,10 +304,10 @@ static inline void iounmap(const volatile void __iomem *addr)
 #undef __IS_KSEG1
 }
 
-#ifdef CONFIG_CPU_CAVIUM_OCTEON
-#define war_octeon_io_reorder_wmb()		wmb()
+#if defined(CONFIG_CPU_CAVIUM_OCTEON) || defined(CONFIG_LOONGSON3_ENHANCEMENT)
+#define war_io_reorder_wmb()		wmb()
 #else
-#define war_octeon_io_reorder_wmb()		do { } while (0)
+#define war_io_reorder_wmb()		do { } while (0)
 #endif
 
 #define __BUILD_MEMORY_SINGLE(pfx, bwlq, type, irq)			\
@@ -318,7 +318,7 @@ static inline void pfx##write##bwlq(type val,				\
 	volatile type *__mem;						\
 	type __val;							\
 									\
-	war_octeon_io_reorder_wmb();					\
+	war_io_reorder_wmb();					\
 									\
 	__mem = (void *)__swizzle_addr_##bwlq((unsigned long)(mem));	\
 									\
@@ -387,7 +387,7 @@ static inline void pfx##out##bwlq##p(type val, unsigned long port)	\
 	volatile type *__addr;						\
 	type __val;							\
 									\
-	war_octeon_io_reorder_wmb();					\
+	war_io_reorder_wmb();					\
 									\
 	__addr = (void *)__swizzle_addr_##bwlq(mips_io_port_base + port); \
 									\
diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
index 65c351e..9d3610b 100644
--- a/arch/mips/include/asm/irqflags.h
+++ b/arch/mips/include/asm/irqflags.h
@@ -41,7 +41,12 @@ static inline unsigned long arch_local_irq_save(void)
 	"	.set	push						\n"
 	"	.set	reorder						\n"
 	"	.set	noat						\n"
+#if defined(CONFIG_CPU_LOONGSON3)
+	"	mfc0	%[flags], $12					\n"
+	"	di							\n"
+#else
 	"	di	%[flags]					\n"
+#endif
 	"	andi	%[flags], 1					\n"
 	"	" __stringify(__irq_disable_hazard) "			\n"
 	"	.set	pop						\n"
diff --git a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
index da83482..8393bc54 100644
--- a/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
+++ b/arch/mips/include/asm/mach-loongson64/kernel-entry-init.h
@@ -26,6 +26,12 @@
 	mfc0	t0, $5, 1
 	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
+#ifdef CONFIG_LOONGSON3_ENHANCEMENT
+	/* Enable STFill Buffer */
+	mfc0	t0, $16, 6
+	or	t0, 0x100
+	mtc0	t0, $16, 6
+#endif
 	_ehb
 	.set	pop
 #endif
@@ -46,6 +52,12 @@
 	mfc0	t0, $5, 1
 	or	t0, (0x1 << 29)
 	mtc0	t0, $5, 1
+#ifdef CONFIG_LOONGSON3_ENHANCEMENT
+	/* Enable STFill Buffer */
+	mfc0	t0, $16, 6
+	or	t0, 0x100
+	mtc0	t0, $16, 6
+#endif
 	_ehb
 	.set	pop
 #endif
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 8df4a94..51cc4f0 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -1149,6 +1149,8 @@ static void probe_pcache(void)
 					  c->dcache.ways *
 					  c->dcache.linesz;
 		c->dcache.waybit = 0;
+		if ((prid & PRID_REV_MASK) >= PRID_REV_LOONGSON3A_R2)
+			c->options |= MIPS_CPU_PREFETCH;
 		break;
 
 	case CPU_CAVIUM_OCTEON3:
diff --git a/arch/mips/mm/page.c b/arch/mips/mm/page.c
index 885d73f..c41953c 100644
--- a/arch/mips/mm/page.c
+++ b/arch/mips/mm/page.c
@@ -188,6 +188,15 @@ static void set_prefetch_parameters(void)
 			}
 			break;
 
+		case CPU_LOONGSON3:
+			/* Loongson-3 only support the Pref_Load/Pref_Store. */
+			pref_bias_clear_store = 128;
+			pref_bias_copy_load = 128;
+			pref_bias_copy_store = 128;
+			pref_src_mode = Pref_Load;
+			pref_dst_mode = Pref_Store;
+			break;
+
 		default:
 			pref_bias_clear_store = 128;
 			pref_bias_copy_load = 256;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed
  2016-02-27 10:07 ` [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
@ 2016-03-02 10:41   ` Ralf Baechle
  2016-03-02 10:51     ` Huacai Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Ralf Baechle @ 2016-03-02 10:41 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Aurelien Jarno, Steven J . Hill, linux-mips, Fuxin Zhang, Zhangjin Wu

On Sat, Feb 27, 2016 at 06:07:36PM +0800, Huacai Chen wrote:

> Loongson-2 has a 4 entry itlb which is a subset of jtlb, Loongson-3 has
> a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. We should
> write diag register to invalidate itlb/dtlb when flushing jtlb because
> itlb/dtlb are not totally transparent to software.
> 
> For Loongson-3A R2 (and newer), we should invalidate ITLB, DTLB, VTLB
> and FTLB before we enable/disable FTLB.
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  arch/mips/kernel/cpu-probe.c |  2 ++
>  arch/mips/mm/tlb-r4k.c       | 27 +++++++++++++++------------
>  2 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 2a2ae86..ef605e2 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -562,6 +562,8 @@ static int set_ftlb_enable(struct cpuinfo_mips *c, int enable)
>  					   << MIPS_CONF7_FTLBP_SHIFT));
>  		break;
>  	case CPU_LOONGSON3:
> +		/* Flush ITLB, DTLB, VTLB and FTLB */
> +		write_c0_diag(1<<2 | 1<<3 | 1<<12 | 1<<13);

Too many magic numbers.  Could you use defines for the magic numbers you're
writing to these registers?

>  		/* Loongson-3 cores use Config6 to enable the FTLB */
>  		config = read_c0_config6();
>  		if (enable)
> diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
> index c17d762..7593529 100644
> --- a/arch/mips/mm/tlb-r4k.c
> +++ b/arch/mips/mm/tlb-r4k.c
> @@ -28,25 +28,28 @@
>  extern void build_tlb_refill_handler(void);
>  
>  /*
> - * LOONGSON2/3 has a 4 entry itlb which is a subset of dtlb,
> - * unfortunately, itlb is not totally transparent to software.
> + * LOONGSON-2 has a 4 entry itlb which is a subset of jtlb, LOONGSON-3 has
> + * a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. Unfortunately,
> + * itlb/dtlb are not totally transparent to software.
>   */
> -static inline void flush_itlb(void)
> +static inline void flush_spec_tlb(void)
>  {
>  	switch (current_cpu_type()) {
>  	case CPU_LOONGSON2:
> +		write_c0_diag(0x4);

Same here.

> +		break;
>  	case CPU_LOONGSON3:
> -		write_c0_diag(4);
> +		write_c0_diag(0xc);

And here.

Also, why did the magic number change from 4 to 0xc?

>  		break;
>  	default:
>  		break;
>  	}
>  }
>  
> -static inline void flush_itlb_vm(struct vm_area_struct *vma)
> +static inline void flush_spec_tlb_vm(struct vm_area_struct *vma)
>  {
>  	if (vma->vm_flags & VM_EXEC)
> -		flush_itlb();
> +		flush_spec_tlb();

Hm..  "spec tlb" is not very descriptive in my opinion.  How about
renameing this function to flush_micro_tlb().

  Ralf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed
  2016-03-02 10:41   ` Ralf Baechle
@ 2016-03-02 10:51     ` Huacai Chen
  0 siblings, 0 replies; 8+ messages in thread
From: Huacai Chen @ 2016-03-02 10:51 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Aurelien Jarno, Steven J . Hill, Linux MIPS Mailing List,
	Fuxin Zhang, Zhangjin Wu

On Wed, Mar 2, 2016 at 6:41 PM, Ralf Baechle <ralf@linux-mips.org> wrote:
> On Sat, Feb 27, 2016 at 06:07:36PM +0800, Huacai Chen wrote:
>
>> Loongson-2 has a 4 entry itlb which is a subset of jtlb, Loongson-3 has
>> a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. We should
>> write diag register to invalidate itlb/dtlb when flushing jtlb because
>> itlb/dtlb are not totally transparent to software.
>>
>> For Loongson-3A R2 (and newer), we should invalidate ITLB, DTLB, VTLB
>> and FTLB before we enable/disable FTLB.
>>
>> Signed-off-by: Huacai Chen <chenhc@lemote.com>
>> ---
>>  arch/mips/kernel/cpu-probe.c |  2 ++
>>  arch/mips/mm/tlb-r4k.c       | 27 +++++++++++++++------------
>>  2 files changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
>> index 2a2ae86..ef605e2 100644
>> --- a/arch/mips/kernel/cpu-probe.c
>> +++ b/arch/mips/kernel/cpu-probe.c
>> @@ -562,6 +562,8 @@ static int set_ftlb_enable(struct cpuinfo_mips *c, int enable)
>>                                          << MIPS_CONF7_FTLBP_SHIFT));
>>               break;
>>       case CPU_LOONGSON3:
>> +             /* Flush ITLB, DTLB, VTLB and FTLB */
>> +             write_c0_diag(1<<2 | 1<<3 | 1<<12 | 1<<13);
>
> Too many magic numbers.  Could you use defines for the magic numbers you're
> writing to these registers?
OK, I know, but if I define macros, put them in tlb.h or tlbflush.h or
loongson.h in mach-loongson64?

>
>>               /* Loongson-3 cores use Config6 to enable the FTLB */
>>               config = read_c0_config6();
>>               if (enable)
>> diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
>> index c17d762..7593529 100644
>> --- a/arch/mips/mm/tlb-r4k.c
>> +++ b/arch/mips/mm/tlb-r4k.c
>> @@ -28,25 +28,28 @@
>>  extern void build_tlb_refill_handler(void);
>>
>>  /*
>> - * LOONGSON2/3 has a 4 entry itlb which is a subset of dtlb,
>> - * unfortunately, itlb is not totally transparent to software.
>> + * LOONGSON-2 has a 4 entry itlb which is a subset of jtlb, LOONGSON-3 has
>> + * a 4 entry itlb and a 4 entry dtlb which are subsets of jtlb. Unfortunately,
>> + * itlb/dtlb are not totally transparent to software.
>>   */
>> -static inline void flush_itlb(void)
>> +static inline void flush_spec_tlb(void)
>>  {
>>       switch (current_cpu_type()) {
>>       case CPU_LOONGSON2:
>> +             write_c0_diag(0x4);
>
> Same here.
>
>> +             break;
>>       case CPU_LOONGSON3:
>> -             write_c0_diag(4);
>> +             write_c0_diag(0xc);
>
> And here.
>
> Also, why did the magic number change from 4 to 0xc?
0x4 means flush itlb only, 0xc means flush itlb and dtlb.

>
>>               break;
>>       default:
>>               break;
>>       }
>>  }
>>
>> -static inline void flush_itlb_vm(struct vm_area_struct *vma)
>> +static inline void flush_spec_tlb_vm(struct vm_area_struct *vma)
>>  {
>>       if (vma->vm_flags & VM_EXEC)
>> -             flush_itlb();
>> +             flush_spec_tlb();
>
> Hm..  "spec tlb" is not very descriptive in my opinion.  How about
> renameing this function to flush_micro_tlb().
OK, I will change the name.

>
>   Ralf
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-03-02 10:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-27 10:07 [PATCH V4 0/5] MIPS: Loongson: Add Loongson-3A R2 support Huacai Chen
2016-02-27 10:07 ` [PATCH V4 1/5] MIPS: Loongson: Add Loongson-3A R2 basic support Huacai Chen
2016-02-27 10:07 ` [PATCH V4 2/5] MIPS: Loongson-3: Set cache flush handlers to cache_noop Huacai Chen
2016-02-27 10:07 ` [PATCH V4 3/5] MIPS: Loongson: Invalidate special TLBs when needed Huacai Chen
2016-03-02 10:41   ` Ralf Baechle
2016-03-02 10:51     ` Huacai Chen
2016-02-27 10:07 ` [PATCH V4 4/5] MIPS: Loongson-3: Fast TLB refill handler Huacai Chen
2016-02-27 10:07 ` [PATCH V4 5/5] MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.