All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] MIPS: TLB exception handler fixes & optimisation
@ 2017-06-02 22:38 ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton

This series fixes a race condition during in TLB exceptions for I6400 &
I6500 CPUs where TLB RAMs are shared between threads/VPs within a core,
and implements a few optimisations & cleanups for TLB exception
handling.

Applies atop v4.12-rc3.

Paul Burton (6):
  MIPS: Add CPU shared FTLB feature detection
  MIPS: Handle tlbex-tlbp race condition
  MIPS: Allow storing pgd in C0_CONTEXT for MIPSr6
  MIPS: Use current_cpu_type() in m4kc_tlbp_war()
  MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
  MIPS: tlbex: Remove struct work_registers

 arch/mips/Kconfig                    |   2 +-
 arch/mips/include/asm/cpu-features.h |  41 ++++++
 arch/mips/include/asm/cpu.h          |   4 +
 arch/mips/kernel/cpu-probe.c         |  11 ++
 arch/mips/mm/tlbex.c                 | 234 +++++++++++++++++------------------
 5 files changed, 169 insertions(+), 123 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/6] MIPS: TLB exception handler fixes & optimisation
@ 2017-06-02 22:38 ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton

This series fixes a race condition during in TLB exceptions for I6400 &
I6500 CPUs where TLB RAMs are shared between threads/VPs within a core,
and implements a few optimisations & cleanups for TLB exception
handling.

Applies atop v4.12-rc3.

Paul Burton (6):
  MIPS: Add CPU shared FTLB feature detection
  MIPS: Handle tlbex-tlbp race condition
  MIPS: Allow storing pgd in C0_CONTEXT for MIPSr6
  MIPS: Use current_cpu_type() in m4kc_tlbp_war()
  MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
  MIPS: tlbex: Remove struct work_registers

 arch/mips/Kconfig                    |   2 +-
 arch/mips/include/asm/cpu-features.h |  41 ++++++
 arch/mips/include/asm/cpu.h          |   4 +
 arch/mips/kernel/cpu-probe.c         |  11 ++
 arch/mips/mm/tlbex.c                 | 234 +++++++++++++++++------------------
 5 files changed, 169 insertions(+), 123 deletions(-)

-- 
2.13.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] MIPS: Add CPU shared FTLB feature detection
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

Some systems share FTLB RAMs or entries between sibling CPUs (ie.
hardware threads, or VP(E)s, within a core). These properties require
kernel handling in various places. As a start this patch introduces
cpu_has_shared_ftlb_ram & cpu_has_shared_ftlb_entries feature macros
which we set appropriately for I6400 & I6500 CPUs. Further patches will
make use of these macros as appropriate.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org

---
This depends upon my "MIPS: Probe the I6500 CPU" patch being applied
first in order to make use of CPU_I6500.

 arch/mips/include/asm/cpu-features.h | 41 ++++++++++++++++++++++++++++++++++++
 arch/mips/include/asm/cpu.h          |  4 ++++
 arch/mips/kernel/cpu-probe.c         | 11 ++++++++++
 3 files changed, 56 insertions(+)

diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
index 494d38274142..d6ea8e7c5107 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -487,6 +487,47 @@
 # define cpu_has_perf		(cpu_data[0].options & MIPS_CPU_PERF)
 #endif
 
+#if defined(CONFIG_SMP) && defined(__mips_isa_rev) && (__mips_isa_rev >= 6)
+/*
+ * Some systems share FTLB RAMs between threads within a core (siblings in
+ * kernel parlance). This means that FTLB entries may become invalid at almost
+ * any point when an entry is evicted due to a sibling thread writing an entry
+ * to the shared FTLB RAM.
+ *
+ * This is only relevant to SMP systems, and the only systems that exhibit this
+ * property implement MIPSr6 or higher so we constrain support for this to
+ * kernels that will run on such systems.
+ */
+# ifndef cpu_has_shared_ftlb_ram
+#  define cpu_has_shared_ftlb_ram \
+	(current_cpu_data.options & MIPS_CPU_SHARED_FTLB_RAM)
+# endif
+
+/*
+ * Some systems take this a step further & share FTLB entries between siblings.
+ * This is implemented as TLB writes happening as usual, but if an entry
+ * written by a sibling exists in the shared FTLB for a translation which would
+ * otherwise cause a TLB refill exception then the CPU will use the entry
+ * written by its sibling rather than triggering a refill & writing a matching
+ * TLB entry for itself.
+ *
+ * This is naturally only valid if a TLB entry is known to be suitable for use
+ * on all siblings in a CPU, and so it only takes effect when MMIDs are in use
+ * rather than ASIDs or when a TLB entry is marked global.
+ */
+# ifndef cpu_has_shared_ftlb_entries
+#  define cpu_has_shared_ftlb_entries \
+	(current_cpu_data.options & MIPS_CPU_SHARED_FTLB_ENTRIES)
+# endif
+#endif /* SMP && __mips_isa_rev >= 6 */
+
+#ifndef cpu_has_shared_ftlb_ram
+# define cpu_has_shared_ftlb_ram 0
+#endif
+#ifndef cpu_has_shared_ftlb_entries
+# define cpu_has_shared_ftlb_entries 0
+#endif
+
 /*
  * Guest capabilities
  */
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 3069359b0120..9bc820c4e1ed 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -417,6 +417,10 @@ enum cpu_type_enum {
 #define MIPS_CPU_GUESTID	MBIT_ULL(51)	/* CPU uses VZ ASE GuestID feature */
 #define MIPS_CPU_DRG		MBIT_ULL(52)	/* CPU has VZ Direct Root to Guest (DRG) */
 #define MIPS_CPU_UFR		MBIT_ULL(53)	/* CPU supports User mode FR switching */
+#define MIPS_CPU_SHARED_FTLB_RAM \
+				MBIT_ULL(54)	/* CPU shares FTLB RAM with another */
+#define MIPS_CPU_SHARED_FTLB_ENTRIES \
+				MBIT_ULL(55)	/* CPU shares FTLB entries with another */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 353ade2c130a..8135002116df 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1653,6 +1653,17 @@ static inline void cpu_probe_mips(struct cpuinfo_mips *c, unsigned int cpu)
 	decode_configs(c);
 
 	spram_config();
+
+	switch (__get_cpu_type(c->cputype)) {
+	case CPU_I6500:
+		c->options |= MIPS_CPU_SHARED_FTLB_ENTRIES;
+		/* fall-through */
+	case CPU_I6400:
+		c->options |= MIPS_CPU_SHARED_FTLB_RAM;
+		/* fall-through */
+	default:
+		break;
+	}
 }
 
 static inline void cpu_probe_alchemy(struct cpuinfo_mips *c, unsigned int cpu)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 1/6] MIPS: Add CPU shared FTLB feature detection
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

Some systems share FTLB RAMs or entries between sibling CPUs (ie.
hardware threads, or VP(E)s, within a core). These properties require
kernel handling in various places. As a start this patch introduces
cpu_has_shared_ftlb_ram & cpu_has_shared_ftlb_entries feature macros
which we set appropriately for I6400 & I6500 CPUs. Further patches will
make use of these macros as appropriate.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org

---
This depends upon my "MIPS: Probe the I6500 CPU" patch being applied
first in order to make use of CPU_I6500.

 arch/mips/include/asm/cpu-features.h | 41 ++++++++++++++++++++++++++++++++++++
 arch/mips/include/asm/cpu.h          |  4 ++++
 arch/mips/kernel/cpu-probe.c         | 11 ++++++++++
 3 files changed, 56 insertions(+)

diff --git a/arch/mips/include/asm/cpu-features.h b/arch/mips/include/asm/cpu-features.h
index 494d38274142..d6ea8e7c5107 100644
--- a/arch/mips/include/asm/cpu-features.h
+++ b/arch/mips/include/asm/cpu-features.h
@@ -487,6 +487,47 @@
 # define cpu_has_perf		(cpu_data[0].options & MIPS_CPU_PERF)
 #endif
 
+#if defined(CONFIG_SMP) && defined(__mips_isa_rev) && (__mips_isa_rev >= 6)
+/*
+ * Some systems share FTLB RAMs between threads within a core (siblings in
+ * kernel parlance). This means that FTLB entries may become invalid at almost
+ * any point when an entry is evicted due to a sibling thread writing an entry
+ * to the shared FTLB RAM.
+ *
+ * This is only relevant to SMP systems, and the only systems that exhibit this
+ * property implement MIPSr6 or higher so we constrain support for this to
+ * kernels that will run on such systems.
+ */
+# ifndef cpu_has_shared_ftlb_ram
+#  define cpu_has_shared_ftlb_ram \
+	(current_cpu_data.options & MIPS_CPU_SHARED_FTLB_RAM)
+# endif
+
+/*
+ * Some systems take this a step further & share FTLB entries between siblings.
+ * This is implemented as TLB writes happening as usual, but if an entry
+ * written by a sibling exists in the shared FTLB for a translation which would
+ * otherwise cause a TLB refill exception then the CPU will use the entry
+ * written by its sibling rather than triggering a refill & writing a matching
+ * TLB entry for itself.
+ *
+ * This is naturally only valid if a TLB entry is known to be suitable for use
+ * on all siblings in a CPU, and so it only takes effect when MMIDs are in use
+ * rather than ASIDs or when a TLB entry is marked global.
+ */
+# ifndef cpu_has_shared_ftlb_entries
+#  define cpu_has_shared_ftlb_entries \
+	(current_cpu_data.options & MIPS_CPU_SHARED_FTLB_ENTRIES)
+# endif
+#endif /* SMP && __mips_isa_rev >= 6 */
+
+#ifndef cpu_has_shared_ftlb_ram
+# define cpu_has_shared_ftlb_ram 0
+#endif
+#ifndef cpu_has_shared_ftlb_entries
+# define cpu_has_shared_ftlb_entries 0
+#endif
+
 /*
  * Guest capabilities
  */
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 3069359b0120..9bc820c4e1ed 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -417,6 +417,10 @@ enum cpu_type_enum {
 #define MIPS_CPU_GUESTID	MBIT_ULL(51)	/* CPU uses VZ ASE GuestID feature */
 #define MIPS_CPU_DRG		MBIT_ULL(52)	/* CPU has VZ Direct Root to Guest (DRG) */
 #define MIPS_CPU_UFR		MBIT_ULL(53)	/* CPU supports User mode FR switching */
+#define MIPS_CPU_SHARED_FTLB_RAM \
+				MBIT_ULL(54)	/* CPU shares FTLB RAM with another */
+#define MIPS_CPU_SHARED_FTLB_ENTRIES \
+				MBIT_ULL(55)	/* CPU shares FTLB entries with another */
 
 /*
  * CPU ASE encodings
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 353ade2c130a..8135002116df 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1653,6 +1653,17 @@ static inline void cpu_probe_mips(struct cpuinfo_mips *c, unsigned int cpu)
 	decode_configs(c);
 
 	spram_config();
+
+	switch (__get_cpu_type(c->cputype)) {
+	case CPU_I6500:
+		c->options |= MIPS_CPU_SHARED_FTLB_ENTRIES;
+		/* fall-through */
+	case CPU_I6400:
+		c->options |= MIPS_CPU_SHARED_FTLB_RAM;
+		/* fall-through */
+	default:
+		break;
+	}
 }
 
 static inline void cpu_probe_alchemy(struct cpuinfo_mips *c, unsigned int cpu)
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] MIPS: Handle tlbex-tlbp race condition
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

In systems where there are multiple actors updating the TLB, the
potential exists for a race condition wherein a CPU hits a TLB exception
but by the time it reaches a TLBP instruction the affected TLB entry may
have been replaced. This can happen if, for example, a CPU shares the
TLB between hardware threads (VPs) within a core and one of them
replaces the entry that another has just taken a TLB exception for.

We handle this race in the case of the Hardware Table Walker (HTW) being
the other actor already, but didn't take into account the potential for
multiple threads racing. Include the code for aborting TLB exception
handling in affected multi-threaded systems, those being the I6400 &
I6500 CPUs which share TLB entries between VPs.

In the case of using RiXi without dedicated exceptions we have never
handled this race even for HTW. This patch adds WARN()s to these cases
which ought never to be hit because all CPUs with either HTW or shared
FTLB RAMs also include dedicated RiXi exceptions, but the WARN()s will
ensure this is always the case.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/mm/tlbex.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index ed1c5297547a..e6499209b81c 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -2015,6 +2015,26 @@ static void build_r3000_tlb_modify_handler(void)
 }
 #endif /* CONFIG_MIPS_PGD_C0_CONTEXT */
 
+static bool cpu_has_tlbex_tlbp_race(void)
+{
+	/*
+	 * When a Hardware Table Walker is running it can replace TLB entries
+	 * at any time, leading to a race between it & the CPU.
+	 */
+	if (cpu_has_htw)
+		return true;
+
+	/*
+	 * If the CPU shares FTLB RAM with its siblings then our entry may be
+	 * replaced at any time by a sibling performing a write to the FTLB.
+	 */
+	if (cpu_has_shared_ftlb_ram)
+		return true;
+
+	/* In all other cases there ought to be no race condition to handle */
+	return false;
+}
+
 /*
  * R4000 style TLB load/store/modify handlers.
  */
@@ -2051,7 +2071,7 @@ build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 	iPTE_LW(p, wr.r1, wr.r2); /* get even pte */
 	if (!m4kc_tlbp_war()) {
 		build_tlb_probe_entry(p);
-		if (cpu_has_htw) {
+		if (cpu_has_tlbex_tlbp_race()) {
 			/* race condition happens, leaving */
 			uasm_i_ehb(p);
 			uasm_i_mfc0(p, wr.r3, C0_INDEX);
@@ -2125,6 +2145,14 @@ static void build_r4000_tlb_load_handler(void)
 		}
 		uasm_i_nop(&p);
 
+		/*
+		 * Warn if something may race with us & replace the TLB entry
+		 * before we read it here. Everything with such races should
+		 * also have dedicated RiXi exception handlers, so this
+		 * shouldn't be hit.
+		 */
+		WARN(cpu_has_tlbex_tlbp_race(), "Unhandled race in RiXi path");
+
 		uasm_i_tlbr(&p);
 
 		switch (current_cpu_type()) {
@@ -2192,6 +2220,14 @@ static void build_r4000_tlb_load_handler(void)
 		}
 		uasm_i_nop(&p);
 
+		/*
+		 * Warn if something may race with us & replace the TLB entry
+		 * before we read it here. Everything with such races should
+		 * also have dedicated RiXi exception handlers, so this
+		 * shouldn't be hit.
+		 */
+		WARN(cpu_has_tlbex_tlbp_race(), "Unhandled race in RiXi path");
+
 		uasm_i_tlbr(&p);
 
 		switch (current_cpu_type()) {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] MIPS: Handle tlbex-tlbp race condition
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

In systems where there are multiple actors updating the TLB, the
potential exists for a race condition wherein a CPU hits a TLB exception
but by the time it reaches a TLBP instruction the affected TLB entry may
have been replaced. This can happen if, for example, a CPU shares the
TLB between hardware threads (VPs) within a core and one of them
replaces the entry that another has just taken a TLB exception for.

We handle this race in the case of the Hardware Table Walker (HTW) being
the other actor already, but didn't take into account the potential for
multiple threads racing. Include the code for aborting TLB exception
handling in affected multi-threaded systems, those being the I6400 &
I6500 CPUs which share TLB entries between VPs.

In the case of using RiXi without dedicated exceptions we have never
handled this race even for HTW. This patch adds WARN()s to these cases
which ought never to be hit because all CPUs with either HTW or shared
FTLB RAMs also include dedicated RiXi exceptions, but the WARN()s will
ensure this is always the case.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/mm/tlbex.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index ed1c5297547a..e6499209b81c 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -2015,6 +2015,26 @@ static void build_r3000_tlb_modify_handler(void)
 }
 #endif /* CONFIG_MIPS_PGD_C0_CONTEXT */
 
+static bool cpu_has_tlbex_tlbp_race(void)
+{
+	/*
+	 * When a Hardware Table Walker is running it can replace TLB entries
+	 * at any time, leading to a race between it & the CPU.
+	 */
+	if (cpu_has_htw)
+		return true;
+
+	/*
+	 * If the CPU shares FTLB RAM with its siblings then our entry may be
+	 * replaced at any time by a sibling performing a write to the FTLB.
+	 */
+	if (cpu_has_shared_ftlb_ram)
+		return true;
+
+	/* In all other cases there ought to be no race condition to handle */
+	return false;
+}
+
 /*
  * R4000 style TLB load/store/modify handlers.
  */
@@ -2051,7 +2071,7 @@ build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 	iPTE_LW(p, wr.r1, wr.r2); /* get even pte */
 	if (!m4kc_tlbp_war()) {
 		build_tlb_probe_entry(p);
-		if (cpu_has_htw) {
+		if (cpu_has_tlbex_tlbp_race()) {
 			/* race condition happens, leaving */
 			uasm_i_ehb(p);
 			uasm_i_mfc0(p, wr.r3, C0_INDEX);
@@ -2125,6 +2145,14 @@ static void build_r4000_tlb_load_handler(void)
 		}
 		uasm_i_nop(&p);
 
+		/*
+		 * Warn if something may race with us & replace the TLB entry
+		 * before we read it here. Everything with such races should
+		 * also have dedicated RiXi exception handlers, so this
+		 * shouldn't be hit.
+		 */
+		WARN(cpu_has_tlbex_tlbp_race(), "Unhandled race in RiXi path");
+
 		uasm_i_tlbr(&p);
 
 		switch (current_cpu_type()) {
@@ -2192,6 +2220,14 @@ static void build_r4000_tlb_load_handler(void)
 		}
 		uasm_i_nop(&p);
 
+		/*
+		 * Warn if something may race with us & replace the TLB entry
+		 * before we read it here. Everything with such races should
+		 * also have dedicated RiXi exception handlers, so this
+		 * shouldn't be hit.
+		 */
+		WARN(cpu_has_tlbex_tlbp_race(), "Unhandled race in RiXi path");
+
 		uasm_i_tlbr(&p);
 
 		switch (current_cpu_type()) {
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] MIPS: Allow storing pgd in C0_CONTEXT for MIPSr6
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

CONFIG_MIPS_PGD_C0_CONTEXT, which allows a pointer to the page directory
to be stored in the cop0 Context register when enabled, was previously
only allowed for MIPSr2. MIPSr6 is just as able to make use of it, so
allow it there too.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2828ecde133d..bcfd4c30ea2a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2061,7 +2061,7 @@ config CPU_SUPPORTS_UNCACHED_ACCELERATED
 	bool
 config MIPS_PGD_C0_CONTEXT
 	bool
-	default y if 64BIT && CPU_MIPSR2 && !CPU_XLP
+	default y if 64BIT && (CPU_MIPSR2 || CPU_MIPSR6) && !CPU_XLP
 
 #
 # Set to y for ptrace access to watch registers.
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] MIPS: Allow storing pgd in C0_CONTEXT for MIPSr6
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

CONFIG_MIPS_PGD_C0_CONTEXT, which allows a pointer to the page directory
to be stored in the cop0 Context register when enabled, was previously
only allowed for MIPSr2. MIPSr6 is just as able to make use of it, so
allow it there too.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2828ecde133d..bcfd4c30ea2a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2061,7 +2061,7 @@ config CPU_SUPPORTS_UNCACHED_ACCELERATED
 	bool
 config MIPS_PGD_C0_CONTEXT
 	bool
-	default y if 64BIT && CPU_MIPSR2 && !CPU_XLP
+	default y if 64BIT && (CPU_MIPSR2 || CPU_MIPSR6) && !CPU_XLP
 
 #
 # Set to y for ptrace access to watch registers.
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] MIPS: Use current_cpu_type() in m4kc_tlbp_war()
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

Use current_cpu_type() to check for 4Kc processors instead of checking
the PRID directly. This will allow for the 4Kc case to be optimised out
of kernels that can't run on 4KC processors, thanks to __get_cpu_type()
and its unreachable() call.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/mm/tlbex.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index e6499209b81c..5aadc69c8ce3 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -153,8 +153,7 @@ static int scratchpad_offset(int i)
  */
 static int m4kc_tlbp_war(void)
 {
-	return (current_cpu_data.processor_id & 0xffff00) ==
-	       (PRID_COMP_MIPS | PRID_IMP_4KC);
+	return current_cpu_type() == CPU_4KC;
 }
 
 /* Handle labels (which must be positive integers). */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] MIPS: Use current_cpu_type() in m4kc_tlbp_war()
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

Use current_cpu_type() to check for 4Kc processors instead of checking
the PRID directly. This will allow for the 4Kc case to be optimised out
of kernels that can't run on 4KC processors, thanks to __get_cpu_type()
and its unreachable() call.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/mm/tlbex.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index e6499209b81c..5aadc69c8ce3 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -153,8 +153,7 @@ static int scratchpad_offset(int i)
  */
 static int m4kc_tlbp_war(void)
 {
-	return (current_cpu_data.processor_id & 0xffff00) ==
-	       (PRID_COMP_MIPS | PRID_IMP_4KC);
+	return current_cpu_type() == CPU_4KC;
 }
 
 /* Handle labels (which must be positive integers). */
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

The TLB exception handlers currently attempt to use a KScratch register
if one is available, and otherwise fall back to calculating a
CPU-specific pointer to a memory area & saving 2 register values into
it. This has a few downsides:

  - Performing stores, and later loads, is relatively slow.

  - Keeping the pointer to the save area around means k1 is unavailable
    for use in the body of the exception handler, resulting in the need
    to save & restore $2 as well as $1.

  - The need to use different sets of work registers adds a layer of
    abstraction (struct work_registers) to the code that we would
    otherwise not need.

This patch changes this such that when KScratch registers aren't
implemented we use the coprocessor 0 ErrorEPC register as scratch
instead. The only downside to this is that we will need to ensure that
TLB exceptions don't occur whilst handling error exceptions, or at least
before the handlers for such exceptions have read the ErrorEPC register.
As the kernel always runs unmapped, or using a wired TLB entry for
certain SGI ip27 configurations, this constraint is currently always
satisfied. In the future should the kernel become mapped we will need to
cover exception handling code with a wired entry anyway such that TLB
exception handlers don't themselves trigger TLB exceptions, so the
constraint should be satisfied there too.

If we were ever to handle cache exceptions in a way that allowed us to
continue running (in contrast to our current approach of die()ing) then
it would be possible for a cache exception to be processed during the
handling of a TLB exception which we then return to. If done naively
this would be bad, but problems could be avoided if the cache exception
handler took into account that we were running a TLB exception handler &
returned to the code at EPC or the start of the TLB exception handler
instead of the address in ErrorEPC, causing the TLB exception handler to
re-run & avoiding it seeing a clobbered ErrorEPC value.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/mm/tlbex.c | 50 +++++++++++---------------------------------------
 1 file changed, 11 insertions(+), 39 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 5aadc69c8ce3..22e0281e81cc 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -63,13 +63,6 @@ struct work_registers {
 	int r3;
 };
 
-struct tlb_reg_save {
-	unsigned long a;
-	unsigned long b;
-} ____cacheline_aligned_in_smp;
-
-static struct tlb_reg_save handler_reg_save[NR_CPUS];
-
 static inline int r45k_bvahwbug(void)
 {
 	/* XXX: We should probe for the presence of this bug, but we don't. */
@@ -290,6 +283,7 @@ static inline void dump_handler(const char *symbol, const u32 *handler, int coun
 #define C0_ENTRYHI	10, 0
 #define C0_EPC		14, 0
 #define C0_XCONTEXT	20, 0
+#define C0_ERROREPC	30, 0
 
 #ifdef CONFIG_64BIT
 # define GET_CONTEXT(buf, reg) UASM_i_MFC0(buf, reg, C0_XCONTEXT)
@@ -353,47 +347,25 @@ static struct work_registers build_get_work_registers(u32 **p)
 {
 	struct work_registers r;
 
-	if (scratch_reg >= 0) {
-		/* Save in CPU local C0_KScratch? */
+	/* Save in CPU local C0_KScratch? */
+	if (scratch_reg >= 0)
 		UASM_i_MTC0(p, 1, c0_kscratch(), scratch_reg);
-		r.r1 = K0;
-		r.r2 = K1;
-		r.r3 = 1;
-		return r;
-	}
-
-	if (num_possible_cpus() > 1) {
-		/* Get smp_processor_id */
-		UASM_i_CPUID_MFC0(p, K0, SMP_CPUID_REG);
-		UASM_i_SRL_SAFE(p, K0, K0, SMP_CPUID_REGSHIFT);
+	else
+		UASM_i_MTC0(p, 1, C0_ERROREPC);
 
-		/* handler_reg_save index in K0 */
-		UASM_i_SLL(p, K0, K0, ilog2(sizeof(struct tlb_reg_save)));
+	r.r1 = K0;
+	r.r2 = K1;
+	r.r3 = 1;
 
-		UASM_i_LA(p, K1, (long)&handler_reg_save);
-		UASM_i_ADDU(p, K0, K0, K1);
-	} else {
-		UASM_i_LA(p, K0, (long)&handler_reg_save);
-	}
-	/* K0 now points to save area, save $1 and $2  */
-	UASM_i_SW(p, 1, offsetof(struct tlb_reg_save, a), K0);
-	UASM_i_SW(p, 2, offsetof(struct tlb_reg_save, b), K0);
-
-	r.r1 = K1;
-	r.r2 = 1;
-	r.r3 = 2;
 	return r;
 }
 
 static void build_restore_work_registers(u32 **p)
 {
-	if (scratch_reg >= 0) {
+	if (scratch_reg >= 0)
 		UASM_i_MFC0(p, 1, c0_kscratch(), scratch_reg);
-		return;
-	}
-	/* K0 already points to save area, restore $1 and $2  */
-	UASM_i_LW(p, 1, offsetof(struct tlb_reg_save, a), K0);
-	UASM_i_LW(p, 2, offsetof(struct tlb_reg_save, b), K0);
+	else
+		UASM_i_MFC0(p, 1, C0_ERROREPC);
 }
 
 #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

The TLB exception handlers currently attempt to use a KScratch register
if one is available, and otherwise fall back to calculating a
CPU-specific pointer to a memory area & saving 2 register values into
it. This has a few downsides:

  - Performing stores, and later loads, is relatively slow.

  - Keeping the pointer to the save area around means k1 is unavailable
    for use in the body of the exception handler, resulting in the need
    to save & restore $2 as well as $1.

  - The need to use different sets of work registers adds a layer of
    abstraction (struct work_registers) to the code that we would
    otherwise not need.

This patch changes this such that when KScratch registers aren't
implemented we use the coprocessor 0 ErrorEPC register as scratch
instead. The only downside to this is that we will need to ensure that
TLB exceptions don't occur whilst handling error exceptions, or at least
before the handlers for such exceptions have read the ErrorEPC register.
As the kernel always runs unmapped, or using a wired TLB entry for
certain SGI ip27 configurations, this constraint is currently always
satisfied. In the future should the kernel become mapped we will need to
cover exception handling code with a wired entry anyway such that TLB
exception handlers don't themselves trigger TLB exceptions, so the
constraint should be satisfied there too.

If we were ever to handle cache exceptions in a way that allowed us to
continue running (in contrast to our current approach of die()ing) then
it would be possible for a cache exception to be processed during the
handling of a TLB exception which we then return to. If done naively
this would be bad, but problems could be avoided if the cache exception
handler took into account that we were running a TLB exception handler &
returned to the code at EPC or the start of the TLB exception handler
instead of the address in ErrorEPC, causing the TLB exception handler to
re-run & avoiding it seeing a clobbered ErrorEPC value.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
---

 arch/mips/mm/tlbex.c | 50 +++++++++++---------------------------------------
 1 file changed, 11 insertions(+), 39 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 5aadc69c8ce3..22e0281e81cc 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -63,13 +63,6 @@ struct work_registers {
 	int r3;
 };
 
-struct tlb_reg_save {
-	unsigned long a;
-	unsigned long b;
-} ____cacheline_aligned_in_smp;
-
-static struct tlb_reg_save handler_reg_save[NR_CPUS];
-
 static inline int r45k_bvahwbug(void)
 {
 	/* XXX: We should probe for the presence of this bug, but we don't. */
@@ -290,6 +283,7 @@ static inline void dump_handler(const char *symbol, const u32 *handler, int coun
 #define C0_ENTRYHI	10, 0
 #define C0_EPC		14, 0
 #define C0_XCONTEXT	20, 0
+#define C0_ERROREPC	30, 0
 
 #ifdef CONFIG_64BIT
 # define GET_CONTEXT(buf, reg) UASM_i_MFC0(buf, reg, C0_XCONTEXT)
@@ -353,47 +347,25 @@ static struct work_registers build_get_work_registers(u32 **p)
 {
 	struct work_registers r;
 
-	if (scratch_reg >= 0) {
-		/* Save in CPU local C0_KScratch? */
+	/* Save in CPU local C0_KScratch? */
+	if (scratch_reg >= 0)
 		UASM_i_MTC0(p, 1, c0_kscratch(), scratch_reg);
-		r.r1 = K0;
-		r.r2 = K1;
-		r.r3 = 1;
-		return r;
-	}
-
-	if (num_possible_cpus() > 1) {
-		/* Get smp_processor_id */
-		UASM_i_CPUID_MFC0(p, K0, SMP_CPUID_REG);
-		UASM_i_SRL_SAFE(p, K0, K0, SMP_CPUID_REGSHIFT);
+	else
+		UASM_i_MTC0(p, 1, C0_ERROREPC);
 
-		/* handler_reg_save index in K0 */
-		UASM_i_SLL(p, K0, K0, ilog2(sizeof(struct tlb_reg_save)));
+	r.r1 = K0;
+	r.r2 = K1;
+	r.r3 = 1;
 
-		UASM_i_LA(p, K1, (long)&handler_reg_save);
-		UASM_i_ADDU(p, K0, K0, K1);
-	} else {
-		UASM_i_LA(p, K0, (long)&handler_reg_save);
-	}
-	/* K0 now points to save area, save $1 and $2  */
-	UASM_i_SW(p, 1, offsetof(struct tlb_reg_save, a), K0);
-	UASM_i_SW(p, 2, offsetof(struct tlb_reg_save, b), K0);
-
-	r.r1 = K1;
-	r.r2 = 1;
-	r.r3 = 2;
 	return r;
 }
 
 static void build_restore_work_registers(u32 **p)
 {
-	if (scratch_reg >= 0) {
+	if (scratch_reg >= 0)
 		UASM_i_MFC0(p, 1, c0_kscratch(), scratch_reg);
-		return;
-	}
-	/* K0 already points to save area, restore $1 and $2  */
-	UASM_i_LW(p, 1, offsetof(struct tlb_reg_save, a), K0);
-	UASM_i_LW(p, 2, offsetof(struct tlb_reg_save, b), K0);
+	else
+		UASM_i_MFC0(p, 1, C0_ERROREPC);
 }
 
 #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] MIPS: tlbex: Remove struct work_registers
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

The registers used for TLB exceptions are now always K0, K1 & AT making
struct work_registers redundant. Remove it such that register use is
easier to read & reason about throughout the TLB exception handler
generation code.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org

---

 arch/mips/mm/tlbex.c | 153 +++++++++++++++++++++++----------------------------
 1 file changed, 68 insertions(+), 85 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 22e0281e81cc..776482a31a51 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -57,12 +57,6 @@ __setup("noxpa", xpa_disable);
 extern void tlb_do_page_fault_0(void);
 extern void tlb_do_page_fault_1(void);
 
-struct work_registers {
-	int r1;
-	int r2;
-	int r3;
-};
-
 static inline int r45k_bvahwbug(void)
 {
 	/* XXX: We should probe for the presence of this bug, but we don't. */
@@ -264,6 +258,7 @@ static inline void dump_handler(const char *symbol, const u32 *handler, int coun
 }
 
 /* The only general purpose registers allowed in TLB handlers. */
+#define AT		1
 #define K0		26
 #define K1		27
 
@@ -343,29 +338,21 @@ int pgd_reg;
 EXPORT_SYMBOL_GPL(pgd_reg);
 enum vmalloc64_mode {not_refill, refill_scratch, refill_noscratch};
 
-static struct work_registers build_get_work_registers(u32 **p)
+static void build_get_work_registers(u32 **p)
 {
-	struct work_registers r;
-
 	/* Save in CPU local C0_KScratch? */
 	if (scratch_reg >= 0)
-		UASM_i_MTC0(p, 1, c0_kscratch(), scratch_reg);
+		UASM_i_MTC0(p, AT, c0_kscratch(), scratch_reg);
 	else
-		UASM_i_MTC0(p, 1, C0_ERROREPC);
-
-	r.r1 = K0;
-	r.r2 = K1;
-	r.r3 = 1;
-
-	return r;
+		UASM_i_MTC0(p, AT, C0_ERROREPC);
 }
 
 static void build_restore_work_registers(u32 **p)
 {
 	if (scratch_reg >= 0)
-		UASM_i_MFC0(p, 1, c0_kscratch(), scratch_reg);
+		UASM_i_MFC0(p, AT, c0_kscratch(), scratch_reg);
 	else
-		UASM_i_MFC0(p, 1, C0_ERROREPC);
+		UASM_i_MFC0(p, AT, C0_ERROREPC);
 }
 
 #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
@@ -2009,16 +1996,16 @@ static bool cpu_has_tlbex_tlbp_race(void)
 /*
  * R4000 style TLB load/store/modify handlers.
  */
-static struct work_registers
+static void
 build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 				   struct uasm_reloc **r)
 {
-	struct work_registers wr = build_get_work_registers(p);
+	build_get_work_registers(p);
 
 #ifdef CONFIG_64BIT
-	build_get_pmde64(p, l, r, wr.r1, wr.r2); /* get pmd in ptr */
+	build_get_pmde64(p, l, r, K0, K1); /* get pmd in ptr */
 #else
-	build_get_pgde32(p, wr.r1, wr.r2); /* get pgd in ptr */
+	build_get_pgde32(p, K0, K1); /* get pgd in ptr */
 #endif
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
@@ -2027,30 +2014,29 @@ build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 	 * instead contains the tlb pte. Check the PAGE_HUGE bit and
 	 * see if we need to jump to huge tlb processing.
 	 */
-	build_is_huge_pte(p, r, wr.r1, wr.r2, label_tlb_huge_update);
+	build_is_huge_pte(p, r, K0, K1, label_tlb_huge_update);
 #endif
 
-	UASM_i_MFC0(p, wr.r1, C0_BADVADDR);
-	UASM_i_LW(p, wr.r2, 0, wr.r2);
-	UASM_i_SRL(p, wr.r1, wr.r1, PAGE_SHIFT + PTE_ORDER - PTE_T_LOG2);
-	uasm_i_andi(p, wr.r1, wr.r1, (PTRS_PER_PTE - 1) << PTE_T_LOG2);
-	UASM_i_ADDU(p, wr.r2, wr.r2, wr.r1);
+	UASM_i_MFC0(p, K0, C0_BADVADDR);
+	UASM_i_LW(p, K1, 0, K1);
+	UASM_i_SRL(p, K0, K0, PAGE_SHIFT + PTE_ORDER - PTE_T_LOG2);
+	uasm_i_andi(p, K0, K0, (PTRS_PER_PTE - 1) << PTE_T_LOG2);
+	UASM_i_ADDU(p, K1, K1, K0);
 
 #ifdef CONFIG_SMP
 	uasm_l_smp_pgtable_change(l, *p);
 #endif
-	iPTE_LW(p, wr.r1, wr.r2); /* get even pte */
+	iPTE_LW(p, K0, K1); /* get even pte */
 	if (!m4kc_tlbp_war()) {
 		build_tlb_probe_entry(p);
 		if (cpu_has_tlbex_tlbp_race()) {
 			/* race condition happens, leaving */
 			uasm_i_ehb(p);
-			uasm_i_mfc0(p, wr.r3, C0_INDEX);
-			uasm_il_bltz(p, r, wr.r3, label_leave);
+			uasm_i_mfc0(p, AT, C0_INDEX);
+			uasm_il_bltz(p, r, AT, label_leave);
 			uasm_i_nop(p);
 		}
 	}
-	return wr;
 }
 
 static void
@@ -2077,7 +2063,6 @@ static void build_r4000_tlb_load_handler(void)
 	const int handle_tlbl_size = handle_tlbl_end - handle_tlbl;
 	struct uasm_label *l = labels;
 	struct uasm_reloc *r = relocs;
-	struct work_registers wr;
 
 	memset(handle_tlbl, 0, handle_tlbl_size * sizeof(handle_tlbl[0]));
 	memset(labels, 0, sizeof(labels));
@@ -2097,8 +2082,8 @@ static void build_r4000_tlb_load_handler(void)
 		/* No need for uasm_i_nop */
 	}
 
-	wr = build_r4000_tlbchange_handler_head(&p, &l, &r);
-	build_pte_present(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbl);
+	build_r4000_tlbchange_handler_head(&p, &l, &r);
+	build_pte_present(&p, &r, K0, K1, AT, label_nopage_tlbl);
 	if (m4kc_tlbp_war())
 		build_tlb_probe_entry(&p);
 
@@ -2108,11 +2093,11 @@ static void build_r4000_tlb_load_handler(void)
 		 * have triggered it.  Skip the expensive test..
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit0(&p, &r, wr.r1, ilog2(_PAGE_VALID),
+			uasm_il_bbit0(&p, &r, K0, ilog2(_PAGE_VALID),
 				      label_tlbl_goaround1);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r1, _PAGE_VALID);
-			uasm_il_beqz(&p, &r, wr.r3, label_tlbl_goaround1);
+			uasm_i_andi(&p, AT, K0, _PAGE_VALID);
+			uasm_il_beqz(&p, &r, AT, label_tlbl_goaround1);
 		}
 		uasm_i_nop(&p);
 
@@ -2140,32 +2125,32 @@ static void build_r4000_tlb_load_handler(void)
 
 		/* Examine  entrylo 0 or 1 based on ptr. */
 		if (use_bbit_insns()) {
-			uasm_i_bbit0(&p, wr.r2, ilog2(sizeof(pte_t)), 8);
+			uasm_i_bbit0(&p, K1, ilog2(sizeof(pte_t)), 8);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r2, sizeof(pte_t));
-			uasm_i_beqz(&p, wr.r3, 8);
+			uasm_i_andi(&p, AT, K1, sizeof(pte_t));
+			uasm_i_beqz(&p, AT, 8);
 		}
 		/* load it in the delay slot*/
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO0);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO0);
 		/* load it if ptr is odd */
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO1);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO1);
 		/*
-		 * If the entryLo (now in wr.r3) is valid (bit 1), RI or
+		 * If the entryLo (now in AT) is valid (bit 1), RI or
 		 * XI must have triggered it.
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit1(&p, &r, wr.r3, 1, label_nopage_tlbl);
+			uasm_il_bbit1(&p, &r, AT, 1, label_nopage_tlbl);
 			uasm_i_nop(&p);
 			uasm_l_tlbl_goaround1(&l, p);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r3, 2);
-			uasm_il_bnez(&p, &r, wr.r3, label_nopage_tlbl);
+			uasm_i_andi(&p, AT, AT, 2);
+			uasm_il_bnez(&p, &r, AT, label_nopage_tlbl);
 			uasm_i_nop(&p);
 		}
 		uasm_l_tlbl_goaround1(&l, p);
 	}
-	build_make_valid(&p, &r, wr.r1, wr.r2, wr.r3);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, wr.r1, wr.r2);
+	build_make_valid(&p, &r, K0, K1, AT);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/*
@@ -2173,8 +2158,8 @@ static void build_r4000_tlb_load_handler(void)
 	 * spots a huge page.
 	 */
 	uasm_l_tlb_huge_update(&l, p);
-	iPTE_LW(&p, wr.r1, wr.r2);
-	build_pte_present(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbl);
+	iPTE_LW(&p, K0, K1);
+	build_pte_present(&p, &r, K0, K1, AT, label_nopage_tlbl);
 	build_tlb_probe_entry(&p);
 
 	if (cpu_has_rixi && !cpu_has_rixiex) {
@@ -2183,11 +2168,11 @@ static void build_r4000_tlb_load_handler(void)
 		 * have triggered it.  Skip the expensive test..
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit0(&p, &r, wr.r1, ilog2(_PAGE_VALID),
+			uasm_il_bbit0(&p, &r, K0, ilog2(_PAGE_VALID),
 				      label_tlbl_goaround2);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r1, _PAGE_VALID);
-			uasm_il_beqz(&p, &r, wr.r3, label_tlbl_goaround2);
+			uasm_i_andi(&p, AT, K0, _PAGE_VALID);
+			uasm_il_beqz(&p, &r, AT, label_tlbl_goaround2);
 		}
 		uasm_i_nop(&p);
 
@@ -2215,24 +2200,24 @@ static void build_r4000_tlb_load_handler(void)
 
 		/* Examine  entrylo 0 or 1 based on ptr. */
 		if (use_bbit_insns()) {
-			uasm_i_bbit0(&p, wr.r2, ilog2(sizeof(pte_t)), 8);
+			uasm_i_bbit0(&p, K1, ilog2(sizeof(pte_t)), 8);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r2, sizeof(pte_t));
-			uasm_i_beqz(&p, wr.r3, 8);
+			uasm_i_andi(&p, AT, K1, sizeof(pte_t));
+			uasm_i_beqz(&p, AT, 8);
 		}
 		/* load it in the delay slot*/
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO0);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO0);
 		/* load it if ptr is odd */
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO1);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO1);
 		/*
-		 * If the entryLo (now in wr.r3) is valid (bit 1), RI or
+		 * If the entryLo (now in AT) is valid (bit 1), RI or
 		 * XI must have triggered it.
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit0(&p, &r, wr.r3, 1, label_tlbl_goaround2);
+			uasm_il_bbit0(&p, &r, AT, 1, label_tlbl_goaround2);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r3, 2);
-			uasm_il_beqz(&p, &r, wr.r3, label_tlbl_goaround2);
+			uasm_i_andi(&p, AT, AT, 2);
+			uasm_il_beqz(&p, &r, AT, label_tlbl_goaround2);
 		}
 		if (PM_DEFAULT_MASK == 0)
 			uasm_i_nop(&p);
@@ -2240,12 +2225,12 @@ static void build_r4000_tlb_load_handler(void)
 		 * We clobbered C0_PAGEMASK, restore it.  On the other branch
 		 * it is restored in build_huge_tlb_write_entry.
 		 */
-		build_restore_pagemask(&p, &r, wr.r3, label_nopage_tlbl, 0);
+		build_restore_pagemask(&p, &r, AT, label_nopage_tlbl, 0);
 
 		uasm_l_tlbl_goaround2(&l, p);
 	}
-	uasm_i_ori(&p, wr.r1, wr.r1, (_PAGE_ACCESSED | _PAGE_VALID));
-	build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 1);
+	uasm_i_ori(&p, K0, K0, (_PAGE_ACCESSED | _PAGE_VALID));
+	build_huge_handler_tail(&p, &r, &l, K0, K1, 1);
 #endif
 
 	uasm_l_nopage_tlbl(&l, p);
@@ -2276,18 +2261,17 @@ static void build_r4000_tlb_store_handler(void)
 	const int handle_tlbs_size = handle_tlbs_end - handle_tlbs;
 	struct uasm_label *l = labels;
 	struct uasm_reloc *r = relocs;
-	struct work_registers wr;
 
 	memset(handle_tlbs, 0, handle_tlbs_size * sizeof(handle_tlbs[0]));
 	memset(labels, 0, sizeof(labels));
 	memset(relocs, 0, sizeof(relocs));
 
-	wr = build_r4000_tlbchange_handler_head(&p, &l, &r);
-	build_pte_writable(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbs);
+	build_r4000_tlbchange_handler_head(&p, &l, &r);
+	build_pte_writable(&p, &r, K0, K1, AT, label_nopage_tlbs);
 	if (m4kc_tlbp_war())
 		build_tlb_probe_entry(&p);
-	build_make_write(&p, &r, wr.r1, wr.r2, wr.r3);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, wr.r1, wr.r2);
+	build_make_write(&p, &r, K0, K1, AT);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/*
@@ -2295,12 +2279,12 @@ static void build_r4000_tlb_store_handler(void)
 	 * build_r4000_tlbchange_handler_head spots a huge page.
 	 */
 	uasm_l_tlb_huge_update(&l, p);
-	iPTE_LW(&p, wr.r1, wr.r2);
-	build_pte_writable(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbs);
+	iPTE_LW(&p, K0, K1);
+	build_pte_writable(&p, &r, K0, K1, AT, label_nopage_tlbs);
 	build_tlb_probe_entry(&p);
-	uasm_i_ori(&p, wr.r1, wr.r1,
+	uasm_i_ori(&p, K0, K0,
 		   _PAGE_ACCESSED | _PAGE_MODIFIED | _PAGE_VALID | _PAGE_DIRTY);
-	build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 1);
+	build_huge_handler_tail(&p, &r, &l, K0, K1, 1);
 #endif
 
 	uasm_l_nopage_tlbs(&l, p);
@@ -2331,19 +2315,18 @@ static void build_r4000_tlb_modify_handler(void)
 	const int handle_tlbm_size = handle_tlbm_end - handle_tlbm;
 	struct uasm_label *l = labels;
 	struct uasm_reloc *r = relocs;
-	struct work_registers wr;
 
 	memset(handle_tlbm, 0, handle_tlbm_size * sizeof(handle_tlbm[0]));
 	memset(labels, 0, sizeof(labels));
 	memset(relocs, 0, sizeof(relocs));
 
-	wr = build_r4000_tlbchange_handler_head(&p, &l, &r);
-	build_pte_modifiable(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbm);
+	build_r4000_tlbchange_handler_head(&p, &l, &r);
+	build_pte_modifiable(&p, &r, K0, K1, AT, label_nopage_tlbm);
 	if (m4kc_tlbp_war())
 		build_tlb_probe_entry(&p);
 	/* Present and writable bits set, set accessed and dirty bits. */
-	build_make_write(&p, &r, wr.r1, wr.r2, wr.r3);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, wr.r1, wr.r2);
+	build_make_write(&p, &r, K0, K1, AT);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/*
@@ -2351,12 +2334,12 @@ static void build_r4000_tlb_modify_handler(void)
 	 * build_r4000_tlbchange_handler_head spots a huge page.
 	 */
 	uasm_l_tlb_huge_update(&l, p);
-	iPTE_LW(&p, wr.r1, wr.r2);
-	build_pte_modifiable(&p, &r, wr.r1, wr.r2,  wr.r3, label_nopage_tlbm);
+	iPTE_LW(&p, K0, K1);
+	build_pte_modifiable(&p, &r, K0, K1,  AT, label_nopage_tlbm);
 	build_tlb_probe_entry(&p);
-	uasm_i_ori(&p, wr.r1, wr.r1,
+	uasm_i_ori(&p, K0, K0,
 		   _PAGE_ACCESSED | _PAGE_MODIFIED | _PAGE_VALID | _PAGE_DIRTY);
-	build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 0);
+	build_huge_handler_tail(&p, &r, &l, K0, K1, 0);
 #endif
 
 	uasm_l_nopage_tlbm(&l, p);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] MIPS: tlbex: Remove struct work_registers
@ 2017-06-02 22:38   ` Paul Burton
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Burton @ 2017-06-02 22:38 UTC (permalink / raw)
  To: linux-mips; +Cc: Paul Burton, Ralf Baechle

The registers used for TLB exceptions are now always K0, K1 & AT making
struct work_registers redundant. Remove it such that register use is
easier to read & reason about throughout the TLB exception handler
generation code.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org

---

 arch/mips/mm/tlbex.c | 153 +++++++++++++++++++++++----------------------------
 1 file changed, 68 insertions(+), 85 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 22e0281e81cc..776482a31a51 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -57,12 +57,6 @@ __setup("noxpa", xpa_disable);
 extern void tlb_do_page_fault_0(void);
 extern void tlb_do_page_fault_1(void);
 
-struct work_registers {
-	int r1;
-	int r2;
-	int r3;
-};
-
 static inline int r45k_bvahwbug(void)
 {
 	/* XXX: We should probe for the presence of this bug, but we don't. */
@@ -264,6 +258,7 @@ static inline void dump_handler(const char *symbol, const u32 *handler, int coun
 }
 
 /* The only general purpose registers allowed in TLB handlers. */
+#define AT		1
 #define K0		26
 #define K1		27
 
@@ -343,29 +338,21 @@ int pgd_reg;
 EXPORT_SYMBOL_GPL(pgd_reg);
 enum vmalloc64_mode {not_refill, refill_scratch, refill_noscratch};
 
-static struct work_registers build_get_work_registers(u32 **p)
+static void build_get_work_registers(u32 **p)
 {
-	struct work_registers r;
-
 	/* Save in CPU local C0_KScratch? */
 	if (scratch_reg >= 0)
-		UASM_i_MTC0(p, 1, c0_kscratch(), scratch_reg);
+		UASM_i_MTC0(p, AT, c0_kscratch(), scratch_reg);
 	else
-		UASM_i_MTC0(p, 1, C0_ERROREPC);
-
-	r.r1 = K0;
-	r.r2 = K1;
-	r.r3 = 1;
-
-	return r;
+		UASM_i_MTC0(p, AT, C0_ERROREPC);
 }
 
 static void build_restore_work_registers(u32 **p)
 {
 	if (scratch_reg >= 0)
-		UASM_i_MFC0(p, 1, c0_kscratch(), scratch_reg);
+		UASM_i_MFC0(p, AT, c0_kscratch(), scratch_reg);
 	else
-		UASM_i_MFC0(p, 1, C0_ERROREPC);
+		UASM_i_MFC0(p, AT, C0_ERROREPC);
 }
 
 #ifndef CONFIG_MIPS_PGD_C0_CONTEXT
@@ -2009,16 +1996,16 @@ static bool cpu_has_tlbex_tlbp_race(void)
 /*
  * R4000 style TLB load/store/modify handlers.
  */
-static struct work_registers
+static void
 build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 				   struct uasm_reloc **r)
 {
-	struct work_registers wr = build_get_work_registers(p);
+	build_get_work_registers(p);
 
 #ifdef CONFIG_64BIT
-	build_get_pmde64(p, l, r, wr.r1, wr.r2); /* get pmd in ptr */
+	build_get_pmde64(p, l, r, K0, K1); /* get pmd in ptr */
 #else
-	build_get_pgde32(p, wr.r1, wr.r2); /* get pgd in ptr */
+	build_get_pgde32(p, K0, K1); /* get pgd in ptr */
 #endif
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
@@ -2027,30 +2014,29 @@ build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 	 * instead contains the tlb pte. Check the PAGE_HUGE bit and
 	 * see if we need to jump to huge tlb processing.
 	 */
-	build_is_huge_pte(p, r, wr.r1, wr.r2, label_tlb_huge_update);
+	build_is_huge_pte(p, r, K0, K1, label_tlb_huge_update);
 #endif
 
-	UASM_i_MFC0(p, wr.r1, C0_BADVADDR);
-	UASM_i_LW(p, wr.r2, 0, wr.r2);
-	UASM_i_SRL(p, wr.r1, wr.r1, PAGE_SHIFT + PTE_ORDER - PTE_T_LOG2);
-	uasm_i_andi(p, wr.r1, wr.r1, (PTRS_PER_PTE - 1) << PTE_T_LOG2);
-	UASM_i_ADDU(p, wr.r2, wr.r2, wr.r1);
+	UASM_i_MFC0(p, K0, C0_BADVADDR);
+	UASM_i_LW(p, K1, 0, K1);
+	UASM_i_SRL(p, K0, K0, PAGE_SHIFT + PTE_ORDER - PTE_T_LOG2);
+	uasm_i_andi(p, K0, K0, (PTRS_PER_PTE - 1) << PTE_T_LOG2);
+	UASM_i_ADDU(p, K1, K1, K0);
 
 #ifdef CONFIG_SMP
 	uasm_l_smp_pgtable_change(l, *p);
 #endif
-	iPTE_LW(p, wr.r1, wr.r2); /* get even pte */
+	iPTE_LW(p, K0, K1); /* get even pte */
 	if (!m4kc_tlbp_war()) {
 		build_tlb_probe_entry(p);
 		if (cpu_has_tlbex_tlbp_race()) {
 			/* race condition happens, leaving */
 			uasm_i_ehb(p);
-			uasm_i_mfc0(p, wr.r3, C0_INDEX);
-			uasm_il_bltz(p, r, wr.r3, label_leave);
+			uasm_i_mfc0(p, AT, C0_INDEX);
+			uasm_il_bltz(p, r, AT, label_leave);
 			uasm_i_nop(p);
 		}
 	}
-	return wr;
 }
 
 static void
@@ -2077,7 +2063,6 @@ static void build_r4000_tlb_load_handler(void)
 	const int handle_tlbl_size = handle_tlbl_end - handle_tlbl;
 	struct uasm_label *l = labels;
 	struct uasm_reloc *r = relocs;
-	struct work_registers wr;
 
 	memset(handle_tlbl, 0, handle_tlbl_size * sizeof(handle_tlbl[0]));
 	memset(labels, 0, sizeof(labels));
@@ -2097,8 +2082,8 @@ static void build_r4000_tlb_load_handler(void)
 		/* No need for uasm_i_nop */
 	}
 
-	wr = build_r4000_tlbchange_handler_head(&p, &l, &r);
-	build_pte_present(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbl);
+	build_r4000_tlbchange_handler_head(&p, &l, &r);
+	build_pte_present(&p, &r, K0, K1, AT, label_nopage_tlbl);
 	if (m4kc_tlbp_war())
 		build_tlb_probe_entry(&p);
 
@@ -2108,11 +2093,11 @@ static void build_r4000_tlb_load_handler(void)
 		 * have triggered it.  Skip the expensive test..
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit0(&p, &r, wr.r1, ilog2(_PAGE_VALID),
+			uasm_il_bbit0(&p, &r, K0, ilog2(_PAGE_VALID),
 				      label_tlbl_goaround1);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r1, _PAGE_VALID);
-			uasm_il_beqz(&p, &r, wr.r3, label_tlbl_goaround1);
+			uasm_i_andi(&p, AT, K0, _PAGE_VALID);
+			uasm_il_beqz(&p, &r, AT, label_tlbl_goaround1);
 		}
 		uasm_i_nop(&p);
 
@@ -2140,32 +2125,32 @@ static void build_r4000_tlb_load_handler(void)
 
 		/* Examine  entrylo 0 or 1 based on ptr. */
 		if (use_bbit_insns()) {
-			uasm_i_bbit0(&p, wr.r2, ilog2(sizeof(pte_t)), 8);
+			uasm_i_bbit0(&p, K1, ilog2(sizeof(pte_t)), 8);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r2, sizeof(pte_t));
-			uasm_i_beqz(&p, wr.r3, 8);
+			uasm_i_andi(&p, AT, K1, sizeof(pte_t));
+			uasm_i_beqz(&p, AT, 8);
 		}
 		/* load it in the delay slot*/
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO0);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO0);
 		/* load it if ptr is odd */
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO1);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO1);
 		/*
-		 * If the entryLo (now in wr.r3) is valid (bit 1), RI or
+		 * If the entryLo (now in AT) is valid (bit 1), RI or
 		 * XI must have triggered it.
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit1(&p, &r, wr.r3, 1, label_nopage_tlbl);
+			uasm_il_bbit1(&p, &r, AT, 1, label_nopage_tlbl);
 			uasm_i_nop(&p);
 			uasm_l_tlbl_goaround1(&l, p);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r3, 2);
-			uasm_il_bnez(&p, &r, wr.r3, label_nopage_tlbl);
+			uasm_i_andi(&p, AT, AT, 2);
+			uasm_il_bnez(&p, &r, AT, label_nopage_tlbl);
 			uasm_i_nop(&p);
 		}
 		uasm_l_tlbl_goaround1(&l, p);
 	}
-	build_make_valid(&p, &r, wr.r1, wr.r2, wr.r3);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, wr.r1, wr.r2);
+	build_make_valid(&p, &r, K0, K1, AT);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/*
@@ -2173,8 +2158,8 @@ static void build_r4000_tlb_load_handler(void)
 	 * spots a huge page.
 	 */
 	uasm_l_tlb_huge_update(&l, p);
-	iPTE_LW(&p, wr.r1, wr.r2);
-	build_pte_present(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbl);
+	iPTE_LW(&p, K0, K1);
+	build_pte_present(&p, &r, K0, K1, AT, label_nopage_tlbl);
 	build_tlb_probe_entry(&p);
 
 	if (cpu_has_rixi && !cpu_has_rixiex) {
@@ -2183,11 +2168,11 @@ static void build_r4000_tlb_load_handler(void)
 		 * have triggered it.  Skip the expensive test..
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit0(&p, &r, wr.r1, ilog2(_PAGE_VALID),
+			uasm_il_bbit0(&p, &r, K0, ilog2(_PAGE_VALID),
 				      label_tlbl_goaround2);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r1, _PAGE_VALID);
-			uasm_il_beqz(&p, &r, wr.r3, label_tlbl_goaround2);
+			uasm_i_andi(&p, AT, K0, _PAGE_VALID);
+			uasm_il_beqz(&p, &r, AT, label_tlbl_goaround2);
 		}
 		uasm_i_nop(&p);
 
@@ -2215,24 +2200,24 @@ static void build_r4000_tlb_load_handler(void)
 
 		/* Examine  entrylo 0 or 1 based on ptr. */
 		if (use_bbit_insns()) {
-			uasm_i_bbit0(&p, wr.r2, ilog2(sizeof(pte_t)), 8);
+			uasm_i_bbit0(&p, K1, ilog2(sizeof(pte_t)), 8);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r2, sizeof(pte_t));
-			uasm_i_beqz(&p, wr.r3, 8);
+			uasm_i_andi(&p, AT, K1, sizeof(pte_t));
+			uasm_i_beqz(&p, AT, 8);
 		}
 		/* load it in the delay slot*/
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO0);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO0);
 		/* load it if ptr is odd */
-		UASM_i_MFC0(&p, wr.r3, C0_ENTRYLO1);
+		UASM_i_MFC0(&p, AT, C0_ENTRYLO1);
 		/*
-		 * If the entryLo (now in wr.r3) is valid (bit 1), RI or
+		 * If the entryLo (now in AT) is valid (bit 1), RI or
 		 * XI must have triggered it.
 		 */
 		if (use_bbit_insns()) {
-			uasm_il_bbit0(&p, &r, wr.r3, 1, label_tlbl_goaround2);
+			uasm_il_bbit0(&p, &r, AT, 1, label_tlbl_goaround2);
 		} else {
-			uasm_i_andi(&p, wr.r3, wr.r3, 2);
-			uasm_il_beqz(&p, &r, wr.r3, label_tlbl_goaround2);
+			uasm_i_andi(&p, AT, AT, 2);
+			uasm_il_beqz(&p, &r, AT, label_tlbl_goaround2);
 		}
 		if (PM_DEFAULT_MASK == 0)
 			uasm_i_nop(&p);
@@ -2240,12 +2225,12 @@ static void build_r4000_tlb_load_handler(void)
 		 * We clobbered C0_PAGEMASK, restore it.  On the other branch
 		 * it is restored in build_huge_tlb_write_entry.
 		 */
-		build_restore_pagemask(&p, &r, wr.r3, label_nopage_tlbl, 0);
+		build_restore_pagemask(&p, &r, AT, label_nopage_tlbl, 0);
 
 		uasm_l_tlbl_goaround2(&l, p);
 	}
-	uasm_i_ori(&p, wr.r1, wr.r1, (_PAGE_ACCESSED | _PAGE_VALID));
-	build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 1);
+	uasm_i_ori(&p, K0, K0, (_PAGE_ACCESSED | _PAGE_VALID));
+	build_huge_handler_tail(&p, &r, &l, K0, K1, 1);
 #endif
 
 	uasm_l_nopage_tlbl(&l, p);
@@ -2276,18 +2261,17 @@ static void build_r4000_tlb_store_handler(void)
 	const int handle_tlbs_size = handle_tlbs_end - handle_tlbs;
 	struct uasm_label *l = labels;
 	struct uasm_reloc *r = relocs;
-	struct work_registers wr;
 
 	memset(handle_tlbs, 0, handle_tlbs_size * sizeof(handle_tlbs[0]));
 	memset(labels, 0, sizeof(labels));
 	memset(relocs, 0, sizeof(relocs));
 
-	wr = build_r4000_tlbchange_handler_head(&p, &l, &r);
-	build_pte_writable(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbs);
+	build_r4000_tlbchange_handler_head(&p, &l, &r);
+	build_pte_writable(&p, &r, K0, K1, AT, label_nopage_tlbs);
 	if (m4kc_tlbp_war())
 		build_tlb_probe_entry(&p);
-	build_make_write(&p, &r, wr.r1, wr.r2, wr.r3);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, wr.r1, wr.r2);
+	build_make_write(&p, &r, K0, K1, AT);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/*
@@ -2295,12 +2279,12 @@ static void build_r4000_tlb_store_handler(void)
 	 * build_r4000_tlbchange_handler_head spots a huge page.
 	 */
 	uasm_l_tlb_huge_update(&l, p);
-	iPTE_LW(&p, wr.r1, wr.r2);
-	build_pte_writable(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbs);
+	iPTE_LW(&p, K0, K1);
+	build_pte_writable(&p, &r, K0, K1, AT, label_nopage_tlbs);
 	build_tlb_probe_entry(&p);
-	uasm_i_ori(&p, wr.r1, wr.r1,
+	uasm_i_ori(&p, K0, K0,
 		   _PAGE_ACCESSED | _PAGE_MODIFIED | _PAGE_VALID | _PAGE_DIRTY);
-	build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 1);
+	build_huge_handler_tail(&p, &r, &l, K0, K1, 1);
 #endif
 
 	uasm_l_nopage_tlbs(&l, p);
@@ -2331,19 +2315,18 @@ static void build_r4000_tlb_modify_handler(void)
 	const int handle_tlbm_size = handle_tlbm_end - handle_tlbm;
 	struct uasm_label *l = labels;
 	struct uasm_reloc *r = relocs;
-	struct work_registers wr;
 
 	memset(handle_tlbm, 0, handle_tlbm_size * sizeof(handle_tlbm[0]));
 	memset(labels, 0, sizeof(labels));
 	memset(relocs, 0, sizeof(relocs));
 
-	wr = build_r4000_tlbchange_handler_head(&p, &l, &r);
-	build_pte_modifiable(&p, &r, wr.r1, wr.r2, wr.r3, label_nopage_tlbm);
+	build_r4000_tlbchange_handler_head(&p, &l, &r);
+	build_pte_modifiable(&p, &r, K0, K1, AT, label_nopage_tlbm);
 	if (m4kc_tlbp_war())
 		build_tlb_probe_entry(&p);
 	/* Present and writable bits set, set accessed and dirty bits. */
-	build_make_write(&p, &r, wr.r1, wr.r2, wr.r3);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, wr.r1, wr.r2);
+	build_make_write(&p, &r, K0, K1, AT);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
 
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	/*
@@ -2351,12 +2334,12 @@ static void build_r4000_tlb_modify_handler(void)
 	 * build_r4000_tlbchange_handler_head spots a huge page.
 	 */
 	uasm_l_tlb_huge_update(&l, p);
-	iPTE_LW(&p, wr.r1, wr.r2);
-	build_pte_modifiable(&p, &r, wr.r1, wr.r2,  wr.r3, label_nopage_tlbm);
+	iPTE_LW(&p, K0, K1);
+	build_pte_modifiable(&p, &r, K0, K1,  AT, label_nopage_tlbm);
 	build_tlb_probe_entry(&p);
-	uasm_i_ori(&p, wr.r1, wr.r1,
+	uasm_i_ori(&p, K0, K0,
 		   _PAGE_ACCESSED | _PAGE_MODIFIED | _PAGE_VALID | _PAGE_DIRTY);
-	build_huge_handler_tail(&p, &r, &l, wr.r1, wr.r2, 0);
+	build_huge_handler_tail(&p, &r, &l, K0, K1, 0);
 #endif
 
 	uasm_l_nopage_tlbm(&l, p);
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
@ 2017-06-15 17:27     ` Maciej W. Rozycki
  0 siblings, 0 replies; 19+ messages in thread
From: Maciej W. Rozycki @ 2017-06-15 17:27 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-mips, Ralf Baechle

On Fri, 2 Jun 2017, Paul Burton wrote:

> This patch changes this such that when KScratch registers aren't
> implemented we use the coprocessor 0 ErrorEPC register as scratch
> instead. The only downside to this is that we will need to ensure that
> TLB exceptions don't occur whilst handling error exceptions, or at least
> before the handlers for such exceptions have read the ErrorEPC register.
> As the kernel always runs unmapped, or using a wired TLB entry for
> certain SGI ip27 configurations, this constraint is currently always
> satisfied. In the future should the kernel become mapped we will need to
> cover exception handling code with a wired entry anyway such that TLB
> exception handlers don't themselves trigger TLB exceptions, so the
> constraint should be satisfied there too.

 All error exception handlers run from (C)KSEG1 and with (X)KUSEG forcibly 
unmapped, so a TLB exception could only ever happen with an access to the 
kernel stack or static data located in (C)KSEG2 or XKSEG.  I think this 
can be easily avoided, and actually should, to avoid cascading errors.

 Isn't the reverse a problem though, i.e. getting an error exception while 
running a TLB exception handler and consequently getting the value stashed 
in CP0.ErrorEPC clobbered?  Or do we assume all error exceptions are fatal 
and the kernel shall panic without ever getting back?

  Maciej

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
@ 2017-06-15 17:27     ` Maciej W. Rozycki
  0 siblings, 0 replies; 19+ messages in thread
From: Maciej W. Rozycki @ 2017-06-15 17:27 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-mips, Ralf Baechle

On Fri, 2 Jun 2017, Paul Burton wrote:

> This patch changes this such that when KScratch registers aren't
> implemented we use the coprocessor 0 ErrorEPC register as scratch
> instead. The only downside to this is that we will need to ensure that
> TLB exceptions don't occur whilst handling error exceptions, or at least
> before the handlers for such exceptions have read the ErrorEPC register.
> As the kernel always runs unmapped, or using a wired TLB entry for
> certain SGI ip27 configurations, this constraint is currently always
> satisfied. In the future should the kernel become mapped we will need to
> cover exception handling code with a wired entry anyway such that TLB
> exception handlers don't themselves trigger TLB exceptions, so the
> constraint should be satisfied there too.

 All error exception handlers run from (C)KSEG1 and with (X)KUSEG forcibly 
unmapped, so a TLB exception could only ever happen with an access to the 
kernel stack or static data located in (C)KSEG2 or XKSEG.  I think this 
can be easily avoided, and actually should, to avoid cascading errors.

 Isn't the reverse a problem though, i.e. getting an error exception while 
running a TLB exception handler and consequently getting the value stashed 
in CP0.ErrorEPC clobbered?  Or do we assume all error exceptions are fatal 
and the kernel shall panic without ever getting back?

  Maciej

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
  2017-06-15 17:27     ` Maciej W. Rozycki
  (?)
@ 2017-06-28 15:25     ` Ralf Baechle
  2017-06-29 16:39         ` Maciej W. Rozycki
  -1 siblings, 1 reply; 19+ messages in thread
From: Ralf Baechle @ 2017-06-28 15:25 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Paul Burton, linux-mips

On Thu, Jun 15, 2017 at 06:27:34PM +0100, Maciej W. Rozycki wrote:

> > This patch changes this such that when KScratch registers aren't
> > implemented we use the coprocessor 0 ErrorEPC register as scratch
> > instead. The only downside to this is that we will need to ensure that
> > TLB exceptions don't occur whilst handling error exceptions, or at least
> > before the handlers for such exceptions have read the ErrorEPC register.
> > As the kernel always runs unmapped, or using a wired TLB entry for
> > certain SGI ip27 configurations, this constraint is currently always
> > satisfied. In the future should the kernel become mapped we will need to
> > cover exception handling code with a wired entry anyway such that TLB
> > exception handlers don't themselves trigger TLB exceptions, so the
> > constraint should be satisfied there too.
> 
>  All error exception handlers run from (C)KSEG1 and with (X)KUSEG forcibly 
> unmapped, so a TLB exception could only ever happen with an access to the 
> kernel stack or static data located in (C)KSEG2 or XKSEG.  I think this 
> can be easily avoided, and actually should, to avoid cascading errors.
> 
>  Isn't the reverse a problem though, i.e. getting an error exception while 
> running a TLB exception handler and consequently getting the value stashed 
> in CP0.ErrorEPC clobbered?  Or do we assume all error exceptions are fatal 
> and the kernel shall panic without ever getting back?

Think of cache error exceptions for example.  Not all systems are as
bad as Pass 1 BCM1250 parts which were spewing like a few a day.  Without
going into hardware implementation details - memory parity or ECC errors
are on many systems are signaled as cache errors, thus clobering c0_errorepc.

So I think while it's a nice hack I think this patch should be reserved
for system that don't support parity or ECC or where generally a tiny bit
of performance is more important that reliability.

  Ralf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
@ 2017-06-29 16:39         ` Maciej W. Rozycki
  0 siblings, 0 replies; 19+ messages in thread
From: Maciej W. Rozycki @ 2017-06-29 16:39 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Paul Burton, linux-mips

On Wed, 28 Jun 2017, Ralf Baechle wrote:

> So I think while it's a nice hack I think this patch should be reserved
> for system that don't support parity or ECC or where generally a tiny bit
> of performance is more important that reliability.

 One problem I can see here is that AFAICT we use this somewhat costly 
scratch setup even in cases where it is not needed (i.e. the scratch 
remains unused throughout the handler), defeating the intent of the TLB 
handler generator, which was created for the very purpose of avoiding any 
wasted cycles that static universal handlers necessarily incurred.

 I think this is what has to be addressed instead, removing the penalty 
from configurations that do not need it, i.e. no RIXI, no HTW, etc.  Then 
chances are the more complex configurations will often have the required 
scratch resources available such as KScratch or SRS registers, which can 
then be used appropriately (and if some don't then it'll be them only 
that'll take the penalty they deserve).

  Maciej

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available
@ 2017-06-29 16:39         ` Maciej W. Rozycki
  0 siblings, 0 replies; 19+ messages in thread
From: Maciej W. Rozycki @ 2017-06-29 16:39 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Paul Burton, linux-mips

On Wed, 28 Jun 2017, Ralf Baechle wrote:

> So I think while it's a nice hack I think this patch should be reserved
> for system that don't support parity or ECC or where generally a tiny bit
> of performance is more important that reliability.

 One problem I can see here is that AFAICT we use this somewhat costly 
scratch setup even in cases where it is not needed (i.e. the scratch 
remains unused throughout the handler), defeating the intent of the TLB 
handler generator, which was created for the very purpose of avoiding any 
wasted cycles that static universal handlers necessarily incurred.

 I think this is what has to be addressed instead, removing the penalty 
from configurations that do not need it, i.e. no RIXI, no HTW, etc.  Then 
chances are the more complex configurations will often have the required 
scratch resources available such as KScratch or SRS registers, which can 
then be used appropriately (and if some don't then it'll be them only 
that'll take the penalty they deserve).

  Maciej

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-06-29 16:40 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-02 22:38 [PATCH 0/6] MIPS: TLB exception handler fixes & optimisation Paul Burton
2017-06-02 22:38 ` Paul Burton
2017-06-02 22:38 ` [PATCH 1/6] MIPS: Add CPU shared FTLB feature detection Paul Burton
2017-06-02 22:38   ` Paul Burton
2017-06-02 22:38 ` [PATCH 2/6] MIPS: Handle tlbex-tlbp race condition Paul Burton
2017-06-02 22:38   ` Paul Burton
2017-06-02 22:38 ` [PATCH 3/6] MIPS: Allow storing pgd in C0_CONTEXT for MIPSr6 Paul Burton
2017-06-02 22:38   ` Paul Burton
2017-06-02 22:38 ` [PATCH 4/6] MIPS: Use current_cpu_type() in m4kc_tlbp_war() Paul Burton
2017-06-02 22:38   ` Paul Burton
2017-06-02 22:38 ` [PATCH 5/6] MIPS: tlbex: Use ErrorEPC as scratch when KScratch isn't available Paul Burton
2017-06-02 22:38   ` Paul Burton
2017-06-15 17:27   ` Maciej W. Rozycki
2017-06-15 17:27     ` Maciej W. Rozycki
2017-06-28 15:25     ` Ralf Baechle
2017-06-29 16:39       ` Maciej W. Rozycki
2017-06-29 16:39         ` Maciej W. Rozycki
2017-06-02 22:38 ` [PATCH 6/6] MIPS: tlbex: Remove struct work_registers Paul Burton
2017-06-02 22:38   ` Paul Burton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.