All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] MIPS: Add basic R5900 support
@ 2017-08-27 13:23 Fredrik Noring
  2017-08-28 13:53 ` Ralf Baechle
  2017-08-29 17:24   ` Maciej W. Rozycki
  0 siblings, 2 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-08-27 13:23 UTC (permalink / raw)
  To: linux-mips

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
 arch/mips/Kconfig                | 13 +++++++++++++
 arch/mips/include/asm/cpu-type.h |  4 ++++
 arch/mips/include/asm/cpu.h      |  6 ++++++
 arch/mips/include/asm/module.h   |  2 ++
 arch/mips/kernel/cpu-probe.c     | 10 ++++++++++
 5 files changed, 35 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2828ecde133d..2a3592032861 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1708,6 +1708,16 @@ config CPU_BMIPS
 	help
 	  Support for BMIPS32/3300/4350/4380 and BMIPS5000 processors.
 
+config CPU_R5900
+	bool "R5900"
+	depends on SYS_HAS_CPU_R5900
+	select CPU_SUPPORTS_32BIT_KERNEL
+	select CPU_SUPPORTS_64BIT_KERNEL
+	select IRQ_MIPS_CPU
+	select CPU_HAS_WB
+	help
+	  MIPS Technologies R5900 processor (Emotion Engine in Sony Playstation 2).
+
 config CPU_XLR
 	bool "Netlogic XLR SoC"
 	depends on SYS_HAS_CPU_XLR
@@ -1938,6 +1948,9 @@ config SYS_HAS_CPU_R5432
 config SYS_HAS_CPU_R5500
 	bool
 
+config SYS_HAS_CPU_R5900
+	bool
+
 config SYS_HAS_CPU_R6000
 	bool
 
diff --git a/arch/mips/include/asm/cpu-type.h b/arch/mips/include/asm/cpu-type.h
index bdd6dc18e65c..5613ae2a0fe0 100644
--- a/arch/mips/include/asm/cpu-type.h
+++ b/arch/mips/include/asm/cpu-type.h
@@ -150,6 +150,10 @@ static inline int __pure __get_cpu_type(const int cpu_type)
 	case CPU_R5500:
 #endif
 
+#ifdef CONFIG_SYS_HAS_CPU_R5900
+	case CPU_R5900:
+#endif
+
 #ifdef CONFIG_SYS_HAS_CPU_R6000
 	case CPU_R6000:
 	case CPU_R6000A:
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 98f59307e6a3..f332aaa9e69b 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -80,6 +80,7 @@
 #define PRID_IMP_R4650		0x2200		/* Same as R4640 */
 #define PRID_IMP_R5000		0x2300
 #define PRID_IMP_TX49		0x2d00
+#define PRID_IMP_R5900		0x2e00		/* Playstation 2 */
 #define PRID_IMP_SONIC		0x2400
 #define PRID_IMP_MAGIC		0x2500
 #define PRID_IMP_RM7000		0x2700
@@ -326,6 +327,11 @@ enum cpu_type_enum {
 
 	CPU_QEMU_GENERIC,
 
+	/*
+	 * Playstation 2 processors
+	 */
+	CPU_R5900,
+
 	CPU_LAST
 };
 
diff --git a/arch/mips/include/asm/module.h b/arch/mips/include/asm/module.h
index 702c273e67a9..5025b321604f 100644
--- a/arch/mips/include/asm/module.h
+++ b/arch/mips/include/asm/module.h
@@ -114,6 +114,8 @@ search_module_dbetables(unsigned long addr)
 #define MODULE_PROC_FAMILY "R5432 "
 #elif defined CONFIG_CPU_R5500
 #define MODULE_PROC_FAMILY "R5500 "
+#elif defined CONFIG_CPU_R5900
+#define MODULE_PROC_FAMILY "R5900 "
 #elif defined CONFIG_CPU_R6000
 #define MODULE_PROC_FAMILY "R6000 "
 #elif defined CONFIG_CPU_NEVADA
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 1aba27786bd5..b8bed9f26f8d 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1518,6 +1518,16 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
 		}
 
 		break;
+	case PRID_IMP_R5900:
+		c->cputype = CPU_R5900;
+		__cpu_name[cpu] = "R5900";
+		c->isa_level = MIPS_CPU_ISA_III;
+		c->tlbsize = 48;
+		c->options = MIPS_CPU_TLB | MIPS_CPU_4K_CACHE |
+			     MIPS_CPU_4KEX | MIPS_CPU_DIVEC |
+			     MIPS_CPU_FPU | MIPS_CPU_32FPR |
+			     MIPS_CPU_COUNTER;
+		break;
 	}
 }
 
-- 
2.13.4

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-08-27 13:23 [PATCH] MIPS: Add basic R5900 support Fredrik Noring
@ 2017-08-28 13:53 ` Ralf Baechle
  2017-08-28 17:11   ` Maciej W. Rozycki
  2017-08-29 17:33   ` Fredrik Noring
  2017-08-29 17:24   ` Maciej W. Rozycki
  1 sibling, 2 replies; 117+ messages in thread
From: Ralf Baechle @ 2017-08-28 13:53 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

On Sun, Aug 27, 2017 at 03:23:10PM +0200, Fredrik Noring wrote:

> Signed-off-by: Fredrik Noring <noring@nocrew.org>
> ---
>  arch/mips/Kconfig                | 13 +++++++++++++
>  arch/mips/include/asm/cpu-type.h |  4 ++++
>  arch/mips/include/asm/cpu.h      |  6 ++++++
>  arch/mips/include/asm/module.h   |  2 ++
>  arch/mips/kernel/cpu-probe.c     | 10 ++++++++++
>  5 files changed, 35 insertions(+)

Patch is looking perfect at a glance but without support for an R5900
system that is the PS2 it kinda pointless so I'd like to wait and
review and apply everything at once.

  Ralf

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-08-28 13:53 ` Ralf Baechle
@ 2017-08-28 17:11   ` Maciej W. Rozycki
  2017-08-29 17:33   ` Fredrik Noring
  1 sibling, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-08-28 17:11 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Fredrik Noring, linux-mips

On Mon, 28 Aug 2017, Ralf Baechle wrote:

> > Signed-off-by: Fredrik Noring <noring@nocrew.org>
> > ---
> >  arch/mips/Kconfig                | 13 +++++++++++++
> >  arch/mips/include/asm/cpu-type.h |  4 ++++
> >  arch/mips/include/asm/cpu.h      |  6 ++++++
> >  arch/mips/include/asm/module.h   |  2 ++
> >  arch/mips/kernel/cpu-probe.c     | 10 ++++++++++
> >  5 files changed, 35 insertions(+)
> 
> Patch is looking perfect at a glance but without support for an R5900
> system that is the PS2 it kinda pointless so I'd like to wait and
> review and apply everything at once.

 I've had some concerns anyway though, which I'll post tomorrow (as 
today's a UK bank holiday).

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-08-29 17:24   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-08-29 17:24 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

 Thank you for your contribution.  As Ralf has noted your change looks 
good overall.  I just have a couple of nits to address.  Please take them 
into account with the next version of your change.

> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 2828ecde133d..2a3592032861 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1708,6 +1708,16 @@ config CPU_BMIPS
>  	help
>  	  Support for BMIPS32/3300/4350/4380 and BMIPS5000 processors.
>  
> +config CPU_R5900
> +	bool "R5900"
> +	depends on SYS_HAS_CPU_R5900
> +	select CPU_SUPPORTS_32BIT_KERNEL
> +	select CPU_SUPPORTS_64BIT_KERNEL
> +	select IRQ_MIPS_CPU
> +	select CPU_HAS_WB

 Is there an external explicitly-driven write-back buffer there with the 
R5900?  That would be odd with a MIPS III ISA processor, however if there 
indeed is, then I think the CPU_HAS_WB setting needs to go along with the 
code that implements `__wbflush' for this platform.

> diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
> index 98f59307e6a3..f332aaa9e69b 100644
> --- a/arch/mips/include/asm/cpu.h
> +++ b/arch/mips/include/asm/cpu.h
> @@ -326,6 +327,11 @@ enum cpu_type_enum {
>  
>  	CPU_QEMU_GENERIC,
>  
> +	/*
> +	 * Playstation 2 processors
> +	 */
> +	CPU_R5900,

 Shouldn't it go along with `R4000 class processors' earlier above?

> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 1aba27786bd5..b8bed9f26f8d 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -1518,6 +1518,16 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
>  		}
>  
>  		break;
> +	case PRID_IMP_R5900:
> +		c->cputype = CPU_R5900;
> +		__cpu_name[cpu] = "R5900";
> +		c->isa_level = MIPS_CPU_ISA_III;
> +		c->tlbsize = 48;
> +		c->options = MIPS_CPU_TLB | MIPS_CPU_4K_CACHE |
> +			     MIPS_CPU_4KEX | MIPS_CPU_DIVEC |
> +			     MIPS_CPU_FPU | MIPS_CPU_32FPR |
> +			     MIPS_CPU_COUNTER;
> +		break;

 If this is a MIPS III base ISA implementation, then presumably you need 
to set `c->fpu_msk31' as well, to exclude FPU_CSR_CONDX bits introduced 
with the MIPS IV ISA only.  Double-check with hardware documentation for 
the details.

 If you don't have documentation, but you have the hardware at hand, then 
you'll best check it yourself by writing a small user program that writes 
to CP1.FCSR and checks which bits stick (of course you need to leave the 
exception cause/mask bits alone for this check or you'll get SIGFPE sent 
instead).

 Please let me know if you have any questions.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-08-29 17:24   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-08-29 17:24 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

 Thank you for your contribution.  As Ralf has noted your change looks 
good overall.  I just have a couple of nits to address.  Please take them 
into account with the next version of your change.

> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 2828ecde133d..2a3592032861 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1708,6 +1708,16 @@ config CPU_BMIPS
>  	help
>  	  Support for BMIPS32/3300/4350/4380 and BMIPS5000 processors.
>  
> +config CPU_R5900
> +	bool "R5900"
> +	depends on SYS_HAS_CPU_R5900
> +	select CPU_SUPPORTS_32BIT_KERNEL
> +	select CPU_SUPPORTS_64BIT_KERNEL
> +	select IRQ_MIPS_CPU
> +	select CPU_HAS_WB

 Is there an external explicitly-driven write-back buffer there with the 
R5900?  That would be odd with a MIPS III ISA processor, however if there 
indeed is, then I think the CPU_HAS_WB setting needs to go along with the 
code that implements `__wbflush' for this platform.

> diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
> index 98f59307e6a3..f332aaa9e69b 100644
> --- a/arch/mips/include/asm/cpu.h
> +++ b/arch/mips/include/asm/cpu.h
> @@ -326,6 +327,11 @@ enum cpu_type_enum {
>  
>  	CPU_QEMU_GENERIC,
>  
> +	/*
> +	 * Playstation 2 processors
> +	 */
> +	CPU_R5900,

 Shouldn't it go along with `R4000 class processors' earlier above?

> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 1aba27786bd5..b8bed9f26f8d 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -1518,6 +1518,16 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
>  		}
>  
>  		break;
> +	case PRID_IMP_R5900:
> +		c->cputype = CPU_R5900;
> +		__cpu_name[cpu] = "R5900";
> +		c->isa_level = MIPS_CPU_ISA_III;
> +		c->tlbsize = 48;
> +		c->options = MIPS_CPU_TLB | MIPS_CPU_4K_CACHE |
> +			     MIPS_CPU_4KEX | MIPS_CPU_DIVEC |
> +			     MIPS_CPU_FPU | MIPS_CPU_32FPR |
> +			     MIPS_CPU_COUNTER;
> +		break;

 If this is a MIPS III base ISA implementation, then presumably you need 
to set `c->fpu_msk31' as well, to exclude FPU_CSR_CONDX bits introduced 
with the MIPS IV ISA only.  Double-check with hardware documentation for 
the details.

 If you don't have documentation, but you have the hardware at hand, then 
you'll best check it yourself by writing a small user program that writes 
to CP1.FCSR and checks which bits stick (of course you need to leave the 
exception cause/mask bits alone for this check or you'll get SIGFPE sent 
instead).

 Please let me know if you have any questions.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-08-28 13:53 ` Ralf Baechle
  2017-08-28 17:11   ` Maciej W. Rozycki
@ 2017-08-29 17:33   ` Fredrik Noring
  1 sibling, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-08-29 17:33 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, Maciej W. Rozycki

On Mon, Aug 28, 2017 at 03:53:05PM +0200, Ralf Baechle wrote:
> Patch is looking perfect at a glance but without support for an R5900
> system that is the PS2 it kinda pointless so I'd like to wait and
> review and apply everything at once.

Thanks for your quick feedback! Everything else is here:

https://github.com/frno7/linux/tree/ps2-v4.12-squashed

This is GPL work by Sony, Jürgen Urban, Rick Gaiser, and others. About 40000
lines of code with drivers etc. and unpostable in current form. I've updated
it from 2.6.35 to 4.12 and it's running with workarounds and a few reverts.
I believe the patch can be improved in several areas. Some notes:

- The R5900 has 128 bit registers. In the patch this is implemented by
  replacing __u64 pt_regs::regs[32] with r5900_reg_t { __u64 lo, hi; }, and
  consequently replacing all 300+ register reads and writes with macros such
  as MIPS_READ_REG, MIPS_WRITE_REG, etc. Perhaps a less intrusive way is to
  store the most significant 64 bits separately and only use the least 64 bits
  in the rest of the kernel, without modification?
- MFC0/MTC0 and a few other instructions need additional SYNC. The patch uses
  ifdefs but perhaps macros are better? (See p. C-28 in TX79 Core Architecture
  manual by Toshiba, for example.)
- A new SYNC.P instruction is added to arch/mips/mm/uasm-mips.c.
- According to the same manual, the "first two instructions in an exception
  handler are executed as NOP when a bus error occurs (FLX05)" with the
  corrective measure to "place NOP in the first two instruction locations in
  all exception handlers" (p. 1-11).
- The memcpy/strlen/etc. family of functions need short loop NOP padding to
  avoid hardware bugs. Perhaps it's less fragile to rely on the compiler and
  use C implementations instead of assembly for R5900?
- A few other places apparently also need NOP padding.
- I'm unusure about arch/mips/kernel/scall32-n32.S.
- LL, SC, LLD and SCD etc. are not implemented in R5900 and are emulated.
- FPU needs special save/restore.
- arch/mips/include/asm/r4kcache.h is modified with special cache macros.
- arch/mips/mm/tlbex.c is updated with scratch pad memory map.
- The USB driver in the patch broke on 6f65126c7 "USB: OHCI: add SG support"
  and 6894258ed "dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}".
- The patch also broke on a6335fa1 "MIPS: bootmem: Don't use memory holes for
  page bitmap", as well as 084a7cf7 "MIPS: IRQ Stack: Unwind IRQ stack onto
  task stack".
- One should look at the 2.6.35 patch series to make sense of some of the
  changes in this updated patch. These parts need to be reworked, of course.

That's a start and some design choices need to be made to continue. What are
your thoughts?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-08-29 17:24   ` Maciej W. Rozycki
  (?)
@ 2017-08-30 13:23   ` Fredrik Noring
  2017-08-31 15:11       ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-08-30 13:23 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

Thank you for your review!

>  Is there an external explicitly-driven write-back buffer there with the 
> R5900?  That would be odd with a MIPS III ISA processor, however if there 
> indeed is, then I think the CPU_HAS_WB setting needs to go along with the 
> code that implements `__wbflush' for this platform.

The C790 block diagram contains a WBB connected to the BIU bus (p. 2-2) in the
"TX System RISC TX79 Core Architecture" manual:

    The Writeback Buffer (WBB) is an 8 entry by 16 byte (one quadword) FIFO
    queuing up stores prior to accessing the CPU bus. It increases C790
    performance by decoupling the processor from the latencies of the CPU bus.
    It is also used during the gathering operation of uncached accelerated
    stores; sequential stores less than a quadword in length are gathered in
    the WBB, thereby reducing bus bandwidth usage. (p. 2-4)

__wbflush is implemented in arch/mips/ps2/setup.c:193 in the remaining patch
(link below):

static void ps2_wbflush(void)
{
	__asm__ __volatile__("sync.l":::"memory");

	/* flush write buffer to bus */
	inl(ps2sif_bustophys(0));
}

https://github.com/frno7/linux/blob/ps2-v4.12-squashed/arch/mips/ps2/setup.c#L193

Then ps2sif_bustophys is implemented in arch/mips/ps2/iopheap.c:

phys_addr_t ps2sif_bustophys(dma_addr_t a)
{
	return(a + PS2_IOP_HEAP_BASE);
}

which in turn uses

#define PS2_IOP_HEAP_BASE 0x1c000000

from arch/mips/include/asm/mach-ps2/ps2.h. Would you like to move this code
somewhere else to go along with the declaration of CPU_HAS_WB?

>  Shouldn't it go along with `R4000 class processors' earlier above?

Sure!

>  If this is a MIPS III base ISA implementation, then presumably you need 
> to set `c->fpu_msk31' as well, to exclude FPU_CSR_CONDX bits introduced 
> with the MIPS IV ISA only.  Double-check with hardware documentation for 
> the details.

Good catch, I'm checking it with the "TX System RISC TX79 Core Architecture"
manual (link below). The FPU is IEEE754-1985 compatible MIPS III ISA (p. 1-2),
the same as the TX49HF CPU core (p. 2-18). FCR31 looks like this (p. 10-6):

    31       25 24 23 22  18 17   12 11      7 6     2 1  0
    +----------+--+--+------+-------+---------+-------+----+
    |    0     |FS| C|   0  | Cause | Enables | Flags | RM |
    +----------+--+--+------+-------+---------+-------+----+
         7       1  1    5      6        5        5      2

http://www.lukasz.dk/files/tx79architecture.pdf

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-08-31 15:11       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-08-31 15:11 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Is there an external explicitly-driven write-back buffer there with the 
> > R5900?  That would be odd with a MIPS III ISA processor, however if there 
> > indeed is, then I think the CPU_HAS_WB setting needs to go along with the 
> > code that implements `__wbflush' for this platform.
> 
> The C790 block diagram contains a WBB connected to the BIU bus (p. 2-2) in the
> "TX System RISC TX79 Core Architecture" manual:
> 
>     The Writeback Buffer (WBB) is an 8 entry by 16 byte (one quadword) FIFO
>     queuing up stores prior to accessing the CPU bus. It increases C790
>     performance by decoupling the processor from the latencies of the CPU bus.
>     It is also used during the gathering operation of uncached accelerated
>     stores; sequential stores less than a quadword in length are gathered in
>     the WBB, thereby reducing bus bandwidth usage. (p. 2-4)

 That's no different though from the write-back buffer the R4400 CPU has, 
as do many more modern MIPS architecture implementations, with strong bus 
(and hence memory) ordering enforced by the SYNC instruction.  In that 
case a lone SYNC instruction is sufficient as the implementation of the 
`wmb', `rmb' and `mb' memory ordering barrier operations.

> __wbflush is implemented in arch/mips/ps2/setup.c:193 in the remaining patch
> (link below):
> 
> static void ps2_wbflush(void)
> {
> 	__asm__ __volatile__("sync.l":::"memory");
> 
> 	/* flush write buffer to bus */
> 	inl(ps2sif_bustophys(0));
> }
> 
> https://github.com/frno7/linux/blob/ps2-v4.12-squashed/arch/mips/ps2/setup.c#L193
> 
> Then ps2sif_bustophys is implemented in arch/mips/ps2/iopheap.c:
> 
> phys_addr_t ps2sif_bustophys(dma_addr_t a)
> {
> 	return(a + PS2_IOP_HEAP_BASE);
> }
> 
> which in turn uses
> 
> #define PS2_IOP_HEAP_BASE 0x1c000000
> 
> from arch/mips/include/asm/mach-ps2/ps2.h. Would you like to move this code
> somewhere else to go along with the declaration of CPU_HAS_WB?

 This looks to me like a completion barrier, rather than a bus ordering 
barrier that would require a special `__wbflush' implementation.  Here 
SYNC.L (which is BTW an assembly idiom for plain SYNC) enforces strong 
ordering, acting as `mb' as per the architecture requirement, and the 
following read back causes all the outstanding bus accesses to retire 
beforehand, acting as a completion barrier.

 So I think this `ps2_wbflush' completion barrier should be used as the 
implementation of `mmiowb', or (suspecting that with SYNC already in the 
picture it would be too strong for this platform, unless the chipset can 
do further write merging or reordering) perhaps just `iob', or a more 
general `iobarrier_sync' operation I have outlined here:

<https://marc.info/?l=linux-kernel&m=139868504324701&w=2>

(but never got to implementing).  In that case you don't need to select 
CPU_HAS_WB, as the platform does not have a write-back buffer that would 
require special handling for bus ordering purposes.

 Let me know if you have any questions or comments.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-08-31 15:11       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-08-31 15:11 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Is there an external explicitly-driven write-back buffer there with the 
> > R5900?  That would be odd with a MIPS III ISA processor, however if there 
> > indeed is, then I think the CPU_HAS_WB setting needs to go along with the 
> > code that implements `__wbflush' for this platform.
> 
> The C790 block diagram contains a WBB connected to the BIU bus (p. 2-2) in the
> "TX System RISC TX79 Core Architecture" manual:
> 
>     The Writeback Buffer (WBB) is an 8 entry by 16 byte (one quadword) FIFO
>     queuing up stores prior to accessing the CPU bus. It increases C790
>     performance by decoupling the processor from the latencies of the CPU bus.
>     It is also used during the gathering operation of uncached accelerated
>     stores; sequential stores less than a quadword in length are gathered in
>     the WBB, thereby reducing bus bandwidth usage. (p. 2-4)

 That's no different though from the write-back buffer the R4400 CPU has, 
as do many more modern MIPS architecture implementations, with strong bus 
(and hence memory) ordering enforced by the SYNC instruction.  In that 
case a lone SYNC instruction is sufficient as the implementation of the 
`wmb', `rmb' and `mb' memory ordering barrier operations.

> __wbflush is implemented in arch/mips/ps2/setup.c:193 in the remaining patch
> (link below):
> 
> static void ps2_wbflush(void)
> {
> 	__asm__ __volatile__("sync.l":::"memory");
> 
> 	/* flush write buffer to bus */
> 	inl(ps2sif_bustophys(0));
> }
> 
> https://github.com/frno7/linux/blob/ps2-v4.12-squashed/arch/mips/ps2/setup.c#L193
> 
> Then ps2sif_bustophys is implemented in arch/mips/ps2/iopheap.c:
> 
> phys_addr_t ps2sif_bustophys(dma_addr_t a)
> {
> 	return(a + PS2_IOP_HEAP_BASE);
> }
> 
> which in turn uses
> 
> #define PS2_IOP_HEAP_BASE 0x1c000000
> 
> from arch/mips/include/asm/mach-ps2/ps2.h. Would you like to move this code
> somewhere else to go along with the declaration of CPU_HAS_WB?

 This looks to me like a completion barrier, rather than a bus ordering 
barrier that would require a special `__wbflush' implementation.  Here 
SYNC.L (which is BTW an assembly idiom for plain SYNC) enforces strong 
ordering, acting as `mb' as per the architecture requirement, and the 
following read back causes all the outstanding bus accesses to retire 
beforehand, acting as a completion barrier.

 So I think this `ps2_wbflush' completion barrier should be used as the 
implementation of `mmiowb', or (suspecting that with SYNC already in the 
picture it would be too strong for this platform, unless the chipset can 
do further write merging or reordering) perhaps just `iob', or a more 
general `iobarrier_sync' operation I have outlined here:

<https://marc.info/?l=linux-kernel&m=139868504324701&w=2>

(but never got to implementing).  In that case you don't need to select 
CPU_HAS_WB, as the platform does not have a write-back buffer that would 
require special handling for bus ordering purposes.

 Let me know if you have any questions or comments.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-08-29 17:24   ` Maciej W. Rozycki
  (?)
  (?)
@ 2017-09-02 10:28   ` Fredrik Noring
  2017-09-09 10:13       ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-02 10:28 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  If you don't have documentation, but you have the hardware at hand, then 
> you'll best check it yourself by writing a small user program that writes 
> to CP1.FCSR and checks which bits stick (of course you need to leave the 
> exception cause/mask bits alone for this check or you'll get SIGFPE sent 
> instead).

Did you have something like this in mind? It prints 01000001 so the bits
above FS does not seem to stick.

	uint32_t fcr31;
	asm volatile (" cfc1 $t0,$31\n"
		      " lui  $t1,0xfe00\n"
		      " or   $t0,$t1,$t0\n"
	              " ctc1 $t0,$31\n"
	              " nop\n"
	              " cfc1 $t0,$31\n"
	              " nop\n"
	              " move %0,$t0\n" : "=r" (fcr31));
	printf("fcr31 %08" PRIx32 "\n", fcr31);

The "TX System RISC TX79 Core Architecture" manual says that both data and
instruction caches are 32 kB, but other sources seem to contradict this with
8 kB for data and 16 kB for instructions. So R5900 and C790 seem to be very
similar but not identical which could bring various surprises. Here is
another source:

https://www.linux-mips.org/wiki/PS2

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [PATCH v2] MIPS: Add basic R5900 support
  2017-08-29 17:24   ` Maciej W. Rozycki
                     ` (2 preceding siblings ...)
  (?)
@ 2017-09-02 14:10   ` Fredrik Noring
  2017-09-11  5:18       ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-02 14:10 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
Hi Maciej,

Here is revised patch. I've added arch/mips/mm/c-r4k.c and I'm unsure about

	c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;

but similar architectures have MIPS_CPU_CACHE_CDEX_P and R5900 has a PREF
instruction. As indicated in the comment, it's not entirely clear why

	if (c->dcache.waysize > PAGE_SIZE)
		c->dcache.flags |= MIPS_CACHE_ALIASES;

is necessary. I've also added arch/mips/Makefile.

 arch/mips/Kconfig                | 12 ++++++++++++
 arch/mips/Makefile               |  2 ++
 arch/mips/include/asm/cpu-type.h |  4 ++++
 arch/mips/include/asm/cpu.h      |  3 ++-
 arch/mips/include/asm/module.h   |  2 ++
 arch/mips/kernel/cpu-probe.c     | 11 +++++++++++
 arch/mips/mm/c-r4k.c             | 25 +++++++++++++++++++++++++
 7 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 2828ecde133d..aec56966484b 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1708,6 +1708,15 @@ config CPU_BMIPS
 	help
 	  Support for BMIPS32/3300/4350/4380 and BMIPS5000 processors.
 
+config CPU_R5900
+	bool "R5900"
+	depends on SYS_HAS_CPU_R5900
+	select CPU_SUPPORTS_32BIT_KERNEL
+	select CPU_SUPPORTS_64BIT_KERNEL
+	select IRQ_MIPS_CPU
+	help
+	  MIPS Technologies R5900 processor (Emotion Engine in Sony Playstation 2).
+
 config CPU_XLR
 	bool "Netlogic XLR SoC"
 	depends on SYS_HAS_CPU_XLR
@@ -1938,6 +1947,9 @@ config SYS_HAS_CPU_R5432
 config SYS_HAS_CPU_R5500
 	bool
 
+config SYS_HAS_CPU_R5900
+	bool
+
 config SYS_HAS_CPU_R6000
 	bool
 
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 02a1787c888c..e8e2805a05c4 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -171,6 +171,8 @@ cflags-$(CONFIG_CPU_R5432)	+= $(call cc-option,-march=r5400,-march=r5000) \
 			-Wa,--trap
 cflags-$(CONFIG_CPU_R5500)	+= $(call cc-option,-march=r5500,-march=r5000) \
 			-Wa,--trap
+cflags-$(CONFIG_CPU_R5900)	+= -march=r5900 -mtune=r5900 \
+			-Wa,--trap -mno-llsc
 cflags-$(CONFIG_CPU_NEVADA)	+= $(call cc-option,-march=rm5200,-march=r5000) \
 			-Wa,--trap
 cflags-$(CONFIG_CPU_RM7000)	+= $(call cc-option,-march=rm7000,-march=r5000) \
diff --git a/arch/mips/include/asm/cpu-type.h b/arch/mips/include/asm/cpu-type.h
index bdd6dc18e65c..5613ae2a0fe0 100644
--- a/arch/mips/include/asm/cpu-type.h
+++ b/arch/mips/include/asm/cpu-type.h
@@ -150,6 +150,10 @@ static inline int __pure __get_cpu_type(const int cpu_type)
 	case CPU_R5500:
 #endif
 
+#ifdef CONFIG_SYS_HAS_CPU_R5900
+	case CPU_R5900:
+#endif
+
 #ifdef CONFIG_SYS_HAS_CPU_R6000
 	case CPU_R6000:
 	case CPU_R6000A:
diff --git a/arch/mips/include/asm/cpu.h b/arch/mips/include/asm/cpu.h
index 98f59307e6a3..19da9e4be440 100644
--- a/arch/mips/include/asm/cpu.h
+++ b/arch/mips/include/asm/cpu.h
@@ -80,6 +80,7 @@
 #define PRID_IMP_R4650		0x2200		/* Same as R4640 */
 #define PRID_IMP_R5000		0x2300
 #define PRID_IMP_TX49		0x2d00
+#define PRID_IMP_R5900		0x2e00		/* Playstation 2 */
 #define PRID_IMP_SONIC		0x2400
 #define PRID_IMP_MAGIC		0x2500
 #define PRID_IMP_RM7000		0x2700
@@ -296,7 +297,7 @@ enum cpu_type_enum {
 	CPU_R4700, CPU_R5000, CPU_R5500, CPU_NEVADA, CPU_R5432, CPU_R10000,
 	CPU_R12000, CPU_R14000, CPU_R16000, CPU_VR41XX, CPU_VR4111, CPU_VR4121,
 	CPU_VR4122, CPU_VR4131, CPU_VR4133, CPU_VR4181, CPU_VR4181A, CPU_RM7000,
-	CPU_SR71000, CPU_TX49XX,
+	CPU_SR71000, CPU_TX49XX, CPU_R5900,
 
 	/*
 	 * R8000 class processors
diff --git a/arch/mips/include/asm/module.h b/arch/mips/include/asm/module.h
index 702c273e67a9..5025b321604f 100644
--- a/arch/mips/include/asm/module.h
+++ b/arch/mips/include/asm/module.h
@@ -114,6 +114,8 @@ search_module_dbetables(unsigned long addr)
 #define MODULE_PROC_FAMILY "R5432 "
 #elif defined CONFIG_CPU_R5500
 #define MODULE_PROC_FAMILY "R5500 "
+#elif defined CONFIG_CPU_R5900
+#define MODULE_PROC_FAMILY "R5900 "
 #elif defined CONFIG_CPU_R6000
 #define MODULE_PROC_FAMILY "R6000 "
 #elif defined CONFIG_CPU_NEVADA
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 1aba27786bd5..c9431900d11f 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -1383,6 +1383,17 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
 			     MIPS_CPU_WATCH | MIPS_CPU_LLSC;
 		c->tlbsize = 48;
 		break;
+	case PRID_IMP_R5900:
+		c->cputype = CPU_R5900;
+		__cpu_name[cpu] = "R5900";
+		c->isa_level = MIPS_CPU_ISA_III;
+		c->fpu_msk31 |= FPU_CSR_CONDX;
+		c->options = MIPS_CPU_TLB | MIPS_CPU_4K_CACHE |
+			     MIPS_CPU_4KEX | MIPS_CPU_DIVEC |
+			     MIPS_CPU_FPU | MIPS_CPU_32FPR |
+			     MIPS_CPU_COUNTER;
+		c->tlbsize = 48;
+		break;
 	case PRID_IMP_NEVADA:
 		c->cputype = CPU_NEVADA;
 		__cpu_name[cpu] = "Nevada";
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 3fe99cb271a9..0420ce8fb086 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -1192,6 +1192,20 @@ static void probe_pcache(void)
 		c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;
 		break;
 
+	case CPU_R5900:
+		icache_size = 1 << (12 + ((config & CONF_IC) >> 9));
+		c->icache.linesz = 64;
+		c->icache.ways = 2;
+		c->icache.waybit = 0;
+
+		dcache_size = 1 << (12 + ((config & CONF_DC) >> 6));
+		c->dcache.linesz = 64;
+		c->dcache.ways = 2;
+		c->dcache.waybit = 0;
+
+		c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;
+		break;
+
 	case CPU_TX49XX:
 		icache_size = 1 << (12 + ((config & CONF_IC) >> 9));
 		c->icache.linesz = 16 << ((config & CONF_IB) >> 5);
@@ -1465,6 +1479,17 @@ static void probe_pcache(void)
 	case CPU_R16000:
 		break;
 
+	case CPU_R5900:
+		if (c->icache.waysize > PAGE_SIZE)
+			c->dcache.flags |= MIPS_CACHE_ALIASES;
+		/*
+		 * There seems to be a missing d-cache flush which is fixed
+		 * with MIPS_CACHE_ALIASES.
+		 */
+		if (c->dcache.waysize > PAGE_SIZE)
+			c->dcache.flags |= MIPS_CACHE_ALIASES;
+		break;
+
 	case CPU_74K:
 	case CPU_1074K:
 		has_74k_erratum = alias_74k_erratum(c);
-- 
2.13.4

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-09 10:13       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-09 10:13 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  If you don't have documentation, but you have the hardware at hand, then 
> > you'll best check it yourself by writing a small user program that writes 
> > to CP1.FCSR and checks which bits stick (of course you need to leave the 
> > exception cause/mask bits alone for this check or you'll get SIGFPE sent 
> > instead).
> 
> Did you have something like this in mind? It prints 01000001 so the bits
> above FS does not seem to stick.
> 
> 	uint32_t fcr31;
> 	asm volatile (" cfc1 $t0,$31\n"
> 		      " lui  $t1,0xfe00\n"
> 		      " or   $t0,$t1,$t0\n"
> 	              " ctc1 $t0,$31\n"
> 	              " nop\n"
> 	              " cfc1 $t0,$31\n"
> 	              " nop\n"
> 	              " move %0,$t0\n" : "=r" (fcr31));

 NB you're missing clobbers for $t0 and $t1 here, which may cause odd 
results (since you've named these registers explicitly rather than letting 
GCC choose them via constraints).

> 	printf("fcr31 %08" PRIx32 "\n", fcr31);

 Yes, this is roughly what I had in mind, although you could have used an 
upper mask of 0xfffc to double-check the other bits.  Thanks for doing 
this check.

 I find it odd to see the FS bit set though, it shouldn't be as neither 
the kernel nor glibc startup sets it -- is it hardwired by any chance?  
If so, then it has to be reflected both in `->fpu_msk31' and in 
`->fpu_csr31', in particular for the `nofpu' mode to closely follow 
hardware (but also for some obscure corner cases where CTC1 is emulated 
even in the regular FPU operation mode).

 Can you please try flipping the bits instead then, e.g.:

	uint32_t fcsr0, fcsr1;
	asm volatile (" cfc1 %0,$31\n"
		      " lui  %1,0xfffc\n"
		      " xor  %1,%0\n"
	              " ctc1 %1,$31\n"
	              " nop\n"
	              " cfc1 %1,$31\n"
	              " ctc1 %0,$31\n"
		      : "=r" (fcsr0), "=r" (fcsr1));
	printf("FCSR old: %08" PRIx32 ", new: %08" PRIx32 "\n", fcsr0, fcsr1);

[NB there are no pipeline hazards in accessing the FCRs according to 
Section 10.2.4 "Accessing the FP Control and Implementation/Revision 
Registers" of the TX79 manual, however I've left the NOP in place as it 
won't hurt and may be needed by other hardware.]

You then effectively need to set:

	->fpu_csr31 = (old & new) & 0xfffc0000;
	->fpu_msk31 = (old ^ ~new) & 0xfffc0000;

however see examples throughout arch/mips/kernel/cpu-probe.c for how to 
use macros rather than magic numbers to express bits on the RHS.  We avoid 
run-time probing for FCSR bits to avoid unpredictable behaviour some 
hardware can show.

> The "TX System RISC TX79 Core Architecture" manual says that both data and
> instruction caches are 32 kB, but other sources seem to contradict this with
> 8 kB for data and 16 kB for instructions. So R5900 and C790 seem to be very
> similar but not identical which could bring various surprises. Here is
> another source:
> 
> https://www.linux-mips.org/wiki/PS2

 Cache sizes may well have been an RTL option and the base architecture 
the same.  Of course it would help if we had accurate documentation, but 
as often we need to live with whatever we have available.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-09 10:13       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-09 10:13 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  If you don't have documentation, but you have the hardware at hand, then 
> > you'll best check it yourself by writing a small user program that writes 
> > to CP1.FCSR and checks which bits stick (of course you need to leave the 
> > exception cause/mask bits alone for this check or you'll get SIGFPE sent 
> > instead).
> 
> Did you have something like this in mind? It prints 01000001 so the bits
> above FS does not seem to stick.
> 
> 	uint32_t fcr31;
> 	asm volatile (" cfc1 $t0,$31\n"
> 		      " lui  $t1,0xfe00\n"
> 		      " or   $t0,$t1,$t0\n"
> 	              " ctc1 $t0,$31\n"
> 	              " nop\n"
> 	              " cfc1 $t0,$31\n"
> 	              " nop\n"
> 	              " move %0,$t0\n" : "=r" (fcr31));

 NB you're missing clobbers for $t0 and $t1 here, which may cause odd 
results (since you've named these registers explicitly rather than letting 
GCC choose them via constraints).

> 	printf("fcr31 %08" PRIx32 "\n", fcr31);

 Yes, this is roughly what I had in mind, although you could have used an 
upper mask of 0xfffc to double-check the other bits.  Thanks for doing 
this check.

 I find it odd to see the FS bit set though, it shouldn't be as neither 
the kernel nor glibc startup sets it -- is it hardwired by any chance?  
If so, then it has to be reflected both in `->fpu_msk31' and in 
`->fpu_csr31', in particular for the `nofpu' mode to closely follow 
hardware (but also for some obscure corner cases where CTC1 is emulated 
even in the regular FPU operation mode).

 Can you please try flipping the bits instead then, e.g.:

	uint32_t fcsr0, fcsr1;
	asm volatile (" cfc1 %0,$31\n"
		      " lui  %1,0xfffc\n"
		      " xor  %1,%0\n"
	              " ctc1 %1,$31\n"
	              " nop\n"
	              " cfc1 %1,$31\n"
	              " ctc1 %0,$31\n"
		      : "=r" (fcsr0), "=r" (fcsr1));
	printf("FCSR old: %08" PRIx32 ", new: %08" PRIx32 "\n", fcsr0, fcsr1);

[NB there are no pipeline hazards in accessing the FCRs according to 
Section 10.2.4 "Accessing the FP Control and Implementation/Revision 
Registers" of the TX79 manual, however I've left the NOP in place as it 
won't hurt and may be needed by other hardware.]

You then effectively need to set:

	->fpu_csr31 = (old & new) & 0xfffc0000;
	->fpu_msk31 = (old ^ ~new) & 0xfffc0000;

however see examples throughout arch/mips/kernel/cpu-probe.c for how to 
use macros rather than magic numbers to express bits on the RHS.  We avoid 
run-time probing for FCSR bits to avoid unpredictable behaviour some 
hardware can show.

> The "TX System RISC TX79 Core Architecture" manual says that both data and
> instruction caches are 32 kB, but other sources seem to contradict this with
> 8 kB for data and 16 kB for instructions. So R5900 and C790 seem to be very
> similar but not identical which could bring various surprises. Here is
> another source:
> 
> https://www.linux-mips.org/wiki/PS2

 Cache sizes may well have been an RTL option and the base architecture 
the same.  Of course it would help if we had accurate documentation, but 
as often we need to live with whatever we have available.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-11  5:18       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-11  5:18 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

 Thank you for the updated patch.

On Sat, 2 Sep 2017, Fredrik Noring wrote:

> Signed-off-by: Fredrik Noring <noring@nocrew.org>

 Please add at least a terse description of what the change actually does.

> Here is revised patch. I've added arch/mips/mm/c-r4k.c and I'm unsure about
> 
> 	c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;
> 
> but similar architectures have MIPS_CPU_CACHE_CDEX_P and R5900 has a PREF
> instruction.

 I'm assuming the R5900 is like the TX79 here.

 If adding the MIPS_CPU_PREFETCH flag, you also need to update 
`set_prefetch_parameters' (in arch/mips/mm/page.c) accordingly to have a 
case for the R5900 as it does not support the Pref_LoadStreamed and 
Pref_PrepareForStore operations the default case requires.  As this is an 
optimisation only I think the whole PREF support for the R5900 will best 
be deferred to a separate later patch.

 As to the MIPS_CPU_CACHE_CDEX_P, the manual is clear that the CPU does 
not support the Create Dirty Exclusive (Create_Dirty_Excl_D ak 0x0d) 
operation, and furthermore all the cache ops use encodings different from 
what the usual are.  So you'll have to refactor all the cache handling to 
take this into account.

> As indicated in the comment, it's not entirely clear why
> 
> 	if (c->dcache.waysize > PAGE_SIZE)
> 		c->dcache.flags |= MIPS_CACHE_ALIASES;
> 
> is necessary.

 Again, the manual is clear here:

"C790 Programming Note:

   Overlapping of the cache index bit range and PFN bit range causes the 
   "cache aliasing problem".  C790 does not have any hardware mechanisms 
   to detect the cache aliasing.  It is programmer's responsibility to 
   avoid the cache aliasing.  When a physical page is mapped on the 
   different virtual pages, VPN[13:12] have to be same in both virtual 
   address.  The conservative way to avoid this is that VPN[13:12] == 
   PFN[13:12] whenever a page is mapped."

so you need the flag indeed -- original R4000/R4400 hardware used a 
Virtual Coherency Exception Data/Instruction (VCED/VCEI) mechanism for 
alias resolution and some newer MIPS processor implementations have logic 
in hardware for that.  Obviously the TX79 (and presumably the R5900 as 
well) has neither.

> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 2828ecde133d..aec56966484b 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1708,6 +1708,15 @@ config CPU_BMIPS
>  	help
>  	  Support for BMIPS32/3300/4350/4380 and BMIPS5000 processors.
>  
> +config CPU_R5900
> +	bool "R5900"
> +	depends on SYS_HAS_CPU_R5900
> +	select CPU_SUPPORTS_32BIT_KERNEL
> +	select CPU_SUPPORTS_64BIT_KERNEL

 I think it will make sense to defer 64-bit support until the oddities 
around it have been sorted out, see below.  So I suggest removing 
CPU_SUPPORTS_64BIT_KERNEL at first, and then suddenly you don't need to 
worry about stuff we know that will be broken until a further update.

> +	select IRQ_MIPS_CPU
> +	help
> +	  MIPS Technologies R5900 processor (Emotion Engine in Sony Playstation 2).

 Not Toshiba rather than MIPS Technologies?

> diff --git a/arch/mips/Makefile b/arch/mips/Makefile
> index 02a1787c888c..e8e2805a05c4 100644
> --- a/arch/mips/Makefile
> +++ b/arch/mips/Makefile
> @@ -171,6 +171,8 @@ cflags-$(CONFIG_CPU_R5432)	+= $(call cc-option,-march=r5400,-march=r5000) \
>  			-Wa,--trap
>  cflags-$(CONFIG_CPU_R5500)	+= $(call cc-option,-march=r5500,-march=r5000) \
>  			-Wa,--trap
> +cflags-$(CONFIG_CPU_R5900)	+= -march=r5900 -mtune=r5900 \
> +			-Wa,--trap -mno-llsc

 First, `-mtune=' defaults to whatever has been used with `-march=', so 
please remove it (I think we used to had both in the old days, but that 
stuff is now gone, so please follow the current rules).  If you feel 
pedantic, then to double-check you may compare binaries built with and w/o 
`-mtune=' to make sure the're the same, however TBH I think it would be a 
waste of time (especially if it turns out that GCC stores its command-line 
options somewhere in the binary produced).

 Second, given that the R5900 has no LL/SC instructions, I would expect 
`-march=r5900' to already imply it, so `-mno-llsc' should not be needed.  
Unless you want to support building with an older compiler, in which case:

cflags-$(CONFIG_CPU_R5900)	+= $(call cc-option,-march=r5900,-march=r4600 -mno-llsc) \
			-Wa,--trap

perhaps?  I.e. does the code you are going to introduce use any of the 
unusual processor-specific instructions, such as DI/EI (which have 
encodings and semantics different from their later MIPSr2 counterparts)?

> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 1aba27786bd5..c9431900d11f 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -1383,6 +1383,17 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
>  			     MIPS_CPU_WATCH | MIPS_CPU_LLSC;
>  		c->tlbsize = 48;
>  		break;
> +	case PRID_IMP_R5900:
> +		c->cputype = CPU_R5900;
> +		__cpu_name[cpu] = "R5900";
> +		c->isa_level = MIPS_CPU_ISA_III;
> +		c->fpu_msk31 |= FPU_CSR_CONDX;
> +		c->options = MIPS_CPU_TLB | MIPS_CPU_4K_CACHE |
> +			     MIPS_CPU_4KEX | MIPS_CPU_DIVEC |
> +			     MIPS_CPU_FPU | MIPS_CPU_32FPR |
> +			     MIPS_CPU_COUNTER;

 As (MIPS_CPU_TLB | MIPS_CPU_4K_CACHE | MIPS_CPU_4KEX | MIPS_CPU_COUNTER)
can be shortened to R4K_OPTS, please do so.

 More importantly, I think MIPS_CPU_32FPR is not right, as it's defined 
as: "32 dbl. prec. FP registers" whereas the R5900 AFAIK only supports 
single floats, so as far as the layout of the register file is concerned 
it is similar to a MIPS32r1 FPU.

 I do hope the R5900 implements the FPU in a sane way such as the R4650 
does, that is it still implements CP0.Status.FR, to flip between 16 and 32 
single FGRs, and all the double CP1 operations, i.e. LDC1/SDC1, 
DMTC1/DMFC1, and all the double arithmetics, such as MOV.D, ADD.D, etc. 
cause an Unimplemented Operation FPE exception, which we can then emulate.  

 Of course for the kernel as far as the instruction list is concerned only 
the lack of LDC1/SDC1 will matter in that it requires special attention 
for FP context switching.  All the user stuff will be handled 
automagically by our emulator.

 NB I find it is interesting that neither MIPS_CPU_32FPR nor its 
associated `cpu_has_32fpr' predicate is actually used anywhere.  So while 
it serves a CPU feature documentation purpose (for information that can be 
hard to chase sometimes), it actually is not used at the run time.

 Also the processor is unusual in that although it's a legacy architecture 
implementation it does use a distinct vector for the Interrupt exception.  
However its use is hardwired and there is no CP0.Cause.IV bit to control 
it, with its CP0.Cause.23 location fixed at 0.  So I think it would make 
sense to arrange for `configure_exception_vector' not to try flipping it, 
as a microoptimisation, but more importantly to have the code express that 
we know what we're doing here.

 So I think an update along the line of this:

	if (cpu_has_divec) {
		if (cpu_has_mipsmt) {
			/* ... */
		} else if (current_cpu_type() != CPU_R5900) {
			/*
			 * The R5900 has no Cause.IV bit and always uses
			 * the dedicated interrupt exception vector.
			 */
			set_c0_cause(CAUSEF_IV);
		}
	}

would be appropriate as a part of this patch.

 Finally, you'll need an option to indicate this is a processor that only 
does 32-bit addressing.  So o32 and n32 binaries will work, but n64 ones 
will not.  Well, not in the general case, as `-msym32' will, but that's a 
matter for the n64 ELF loader to sort out, by checking the VMA ranges 
requested (there's no MSYM32 ELF annotation yet, even though it's been 
discussed over and over again), which it currently does not.  Also for 
64-bit binaries to work correctly LLD/SCD emulation will be required.

 So as I noted above I think it will make sense if you remove 
CPU_SUPPORTS_64BIT_KERNEL above and only include it with the patch that 
adds support for running 64-bit binaries in the 32-bit addressing mode 
(effectively the same as contemporary MIPS architecture's CP0.Status.PX=1
mode) and the complementing unusual CP0.Status.FR handling, along with a 
flag like MIPS_CPU_FR to denote the presence of the CP0.Status.FR bit, but 
not 64-bit FGRs.

 That leaves us with o32 support only, which still I suspect will not 
handle the FPU correctly, due to the lack of LDC1/SDC1 instructions, 
currently expected by our code to exist for any but MIPS I processors, and 
missing odd-numbered FGRs in the CP0.Status.FR=0 mode in the first place.

 So perhaps again let's defer that feature and start with the MIPS_CPU_FPU 
flag removed, and then add it along with proper FPU support in a later 
patch?  That patch will presumably add FP context switching support using 
LWC1/SWC1 and anything else to handle R5900's FPU peculiarities.

 Until that further patch has been applied the R5900 would then operate in 
the fully-emulated FP mode only, i.e. as if `nofpu' has been 
unconditionally selected.

> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 3fe99cb271a9..0420ce8fb086 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -1192,6 +1192,20 @@ static void probe_pcache(void)
>  		c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;
>  		break;
>  
> +	case CPU_R5900:
> +		icache_size = 1 << (12 + ((config & CONF_IC) >> 9));
> +		c->icache.linesz = 64;
> +		c->icache.ways = 2;
> +		c->icache.waybit = 0;
> +
> +		dcache_size = 1 << (12 + ((config & CONF_DC) >> 6));
> +		c->dcache.linesz = 64;
> +		c->dcache.ways = 2;
> +		c->dcache.waybit = 0;
> +
> +		c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;

 The cache parameters appear correct to me and I have discussed the 
options earlier on.

> @@ -1465,6 +1479,17 @@ static void probe_pcache(void)
>  	case CPU_R16000:
>  		break;
>  
> +	case CPU_R5900:
> +		if (c->icache.waysize > PAGE_SIZE)
> +			c->dcache.flags |= MIPS_CACHE_ALIASES;
> +		/*
> +		 * There seems to be a missing d-cache flush which is fixed
> +		 * with MIPS_CACHE_ALIASES.
> +		 */
> +		if (c->dcache.waysize > PAGE_SIZE)
> +			c->dcache.flags |= MIPS_CACHE_ALIASES;
> +		break;

 Duplicate code here; otherwise OK, as noted above.  You may wish to 
update the comment though.

 Ralf may yet want to chime in, but overall I think that the way to move 
forward with your submission is to:

1. Make the adjustments to this patch I have outlined above; dropping FPU 
   and 64-bit support for the time being in particular.

2. Update cache handlers to use the correct R5900-specific cache op 
   encodings, presumably within the same patch.

3. As Ralf has already reqested, add basic board support for the PS2 
   platform, including essential drivers that are required to boot, e.g. 
   serial, network; fancy stuff can be added gradually later on.

4. Post the whole set of changes collected so far, properly split into 
   functionally self-contained changes, i.e. ones that build and can run
   successfully run on actual hardware, for a reasonable definition of 
   success, e.g. patch #1 is this one, patch #2 is base board setup
   infrastructure, patch #3 is interrupt support, patch #4 is timer 
   support, patch #5 is the serial driver, patch #6 is the network 
   driver, or suchlike.

We can then integrate these to have basic hardware support already working 
and only then continue with more complicated features, especially ones 
such as the FPU and 64-bit support which will require considerable updates 
to generic architecture code.

 NB if CP1.FCSR.FS indeed turns out hardwired to 1, then having this FPU 
hardware handled will be a more complex task, involving adding AT_FPUCW 
support to our MIPS port of the kernel and adjusting glibc appropriately 
before we are able to proceed.  So if this is indeed the case, then I 
think it'll be the most reasonable if we just ignore the issue of 
CP1.FCSR.FS for the time being, and have the emulated FP work with the bit 
flippable.

 Questions, comments?

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-11  5:18       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-11  5:18 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

 Thank you for the updated patch.

On Sat, 2 Sep 2017, Fredrik Noring wrote:

> Signed-off-by: Fredrik Noring <noring@nocrew.org>

 Please add at least a terse description of what the change actually does.

> Here is revised patch. I've added arch/mips/mm/c-r4k.c and I'm unsure about
> 
> 	c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;
> 
> but similar architectures have MIPS_CPU_CACHE_CDEX_P and R5900 has a PREF
> instruction.

 I'm assuming the R5900 is like the TX79 here.

 If adding the MIPS_CPU_PREFETCH flag, you also need to update 
`set_prefetch_parameters' (in arch/mips/mm/page.c) accordingly to have a 
case for the R5900 as it does not support the Pref_LoadStreamed and 
Pref_PrepareForStore operations the default case requires.  As this is an 
optimisation only I think the whole PREF support for the R5900 will best 
be deferred to a separate later patch.

 As to the MIPS_CPU_CACHE_CDEX_P, the manual is clear that the CPU does 
not support the Create Dirty Exclusive (Create_Dirty_Excl_D ak 0x0d) 
operation, and furthermore all the cache ops use encodings different from 
what the usual are.  So you'll have to refactor all the cache handling to 
take this into account.

> As indicated in the comment, it's not entirely clear why
> 
> 	if (c->dcache.waysize > PAGE_SIZE)
> 		c->dcache.flags |= MIPS_CACHE_ALIASES;
> 
> is necessary.

 Again, the manual is clear here:

"C790 Programming Note:

   Overlapping of the cache index bit range and PFN bit range causes the 
   "cache aliasing problem".  C790 does not have any hardware mechanisms 
   to detect the cache aliasing.  It is programmer's responsibility to 
   avoid the cache aliasing.  When a physical page is mapped on the 
   different virtual pages, VPN[13:12] have to be same in both virtual 
   address.  The conservative way to avoid this is that VPN[13:12] == 
   PFN[13:12] whenever a page is mapped."

so you need the flag indeed -- original R4000/R4400 hardware used a 
Virtual Coherency Exception Data/Instruction (VCED/VCEI) mechanism for 
alias resolution and some newer MIPS processor implementations have logic 
in hardware for that.  Obviously the TX79 (and presumably the R5900 as 
well) has neither.

> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 2828ecde133d..aec56966484b 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -1708,6 +1708,15 @@ config CPU_BMIPS
>  	help
>  	  Support for BMIPS32/3300/4350/4380 and BMIPS5000 processors.
>  
> +config CPU_R5900
> +	bool "R5900"
> +	depends on SYS_HAS_CPU_R5900
> +	select CPU_SUPPORTS_32BIT_KERNEL
> +	select CPU_SUPPORTS_64BIT_KERNEL

 I think it will make sense to defer 64-bit support until the oddities 
around it have been sorted out, see below.  So I suggest removing 
CPU_SUPPORTS_64BIT_KERNEL at first, and then suddenly you don't need to 
worry about stuff we know that will be broken until a further update.

> +	select IRQ_MIPS_CPU
> +	help
> +	  MIPS Technologies R5900 processor (Emotion Engine in Sony Playstation 2).

 Not Toshiba rather than MIPS Technologies?

> diff --git a/arch/mips/Makefile b/arch/mips/Makefile
> index 02a1787c888c..e8e2805a05c4 100644
> --- a/arch/mips/Makefile
> +++ b/arch/mips/Makefile
> @@ -171,6 +171,8 @@ cflags-$(CONFIG_CPU_R5432)	+= $(call cc-option,-march=r5400,-march=r5000) \
>  			-Wa,--trap
>  cflags-$(CONFIG_CPU_R5500)	+= $(call cc-option,-march=r5500,-march=r5000) \
>  			-Wa,--trap
> +cflags-$(CONFIG_CPU_R5900)	+= -march=r5900 -mtune=r5900 \
> +			-Wa,--trap -mno-llsc

 First, `-mtune=' defaults to whatever has been used with `-march=', so 
please remove it (I think we used to had both in the old days, but that 
stuff is now gone, so please follow the current rules).  If you feel 
pedantic, then to double-check you may compare binaries built with and w/o 
`-mtune=' to make sure the're the same, however TBH I think it would be a 
waste of time (especially if it turns out that GCC stores its command-line 
options somewhere in the binary produced).

 Second, given that the R5900 has no LL/SC instructions, I would expect 
`-march=r5900' to already imply it, so `-mno-llsc' should not be needed.  
Unless you want to support building with an older compiler, in which case:

cflags-$(CONFIG_CPU_R5900)	+= $(call cc-option,-march=r5900,-march=r4600 -mno-llsc) \
			-Wa,--trap

perhaps?  I.e. does the code you are going to introduce use any of the 
unusual processor-specific instructions, such as DI/EI (which have 
encodings and semantics different from their later MIPSr2 counterparts)?

> diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
> index 1aba27786bd5..c9431900d11f 100644
> --- a/arch/mips/kernel/cpu-probe.c
> +++ b/arch/mips/kernel/cpu-probe.c
> @@ -1383,6 +1383,17 @@ static inline void cpu_probe_legacy(struct cpuinfo_mips *c, unsigned int cpu)
>  			     MIPS_CPU_WATCH | MIPS_CPU_LLSC;
>  		c->tlbsize = 48;
>  		break;
> +	case PRID_IMP_R5900:
> +		c->cputype = CPU_R5900;
> +		__cpu_name[cpu] = "R5900";
> +		c->isa_level = MIPS_CPU_ISA_III;
> +		c->fpu_msk31 |= FPU_CSR_CONDX;
> +		c->options = MIPS_CPU_TLB | MIPS_CPU_4K_CACHE |
> +			     MIPS_CPU_4KEX | MIPS_CPU_DIVEC |
> +			     MIPS_CPU_FPU | MIPS_CPU_32FPR |
> +			     MIPS_CPU_COUNTER;

 As (MIPS_CPU_TLB | MIPS_CPU_4K_CACHE | MIPS_CPU_4KEX | MIPS_CPU_COUNTER)
can be shortened to R4K_OPTS, please do so.

 More importantly, I think MIPS_CPU_32FPR is not right, as it's defined 
as: "32 dbl. prec. FP registers" whereas the R5900 AFAIK only supports 
single floats, so as far as the layout of the register file is concerned 
it is similar to a MIPS32r1 FPU.

 I do hope the R5900 implements the FPU in a sane way such as the R4650 
does, that is it still implements CP0.Status.FR, to flip between 16 and 32 
single FGRs, and all the double CP1 operations, i.e. LDC1/SDC1, 
DMTC1/DMFC1, and all the double arithmetics, such as MOV.D, ADD.D, etc. 
cause an Unimplemented Operation FPE exception, which we can then emulate.  

 Of course for the kernel as far as the instruction list is concerned only 
the lack of LDC1/SDC1 will matter in that it requires special attention 
for FP context switching.  All the user stuff will be handled 
automagically by our emulator.

 NB I find it is interesting that neither MIPS_CPU_32FPR nor its 
associated `cpu_has_32fpr' predicate is actually used anywhere.  So while 
it serves a CPU feature documentation purpose (for information that can be 
hard to chase sometimes), it actually is not used at the run time.

 Also the processor is unusual in that although it's a legacy architecture 
implementation it does use a distinct vector for the Interrupt exception.  
However its use is hardwired and there is no CP0.Cause.IV bit to control 
it, with its CP0.Cause.23 location fixed at 0.  So I think it would make 
sense to arrange for `configure_exception_vector' not to try flipping it, 
as a microoptimisation, but more importantly to have the code express that 
we know what we're doing here.

 So I think an update along the line of this:

	if (cpu_has_divec) {
		if (cpu_has_mipsmt) {
			/* ... */
		} else if (current_cpu_type() != CPU_R5900) {
			/*
			 * The R5900 has no Cause.IV bit and always uses
			 * the dedicated interrupt exception vector.
			 */
			set_c0_cause(CAUSEF_IV);
		}
	}

would be appropriate as a part of this patch.

 Finally, you'll need an option to indicate this is a processor that only 
does 32-bit addressing.  So o32 and n32 binaries will work, but n64 ones 
will not.  Well, not in the general case, as `-msym32' will, but that's a 
matter for the n64 ELF loader to sort out, by checking the VMA ranges 
requested (there's no MSYM32 ELF annotation yet, even though it's been 
discussed over and over again), which it currently does not.  Also for 
64-bit binaries to work correctly LLD/SCD emulation will be required.

 So as I noted above I think it will make sense if you remove 
CPU_SUPPORTS_64BIT_KERNEL above and only include it with the patch that 
adds support for running 64-bit binaries in the 32-bit addressing mode 
(effectively the same as contemporary MIPS architecture's CP0.Status.PX=1
mode) and the complementing unusual CP0.Status.FR handling, along with a 
flag like MIPS_CPU_FR to denote the presence of the CP0.Status.FR bit, but 
not 64-bit FGRs.

 That leaves us with o32 support only, which still I suspect will not 
handle the FPU correctly, due to the lack of LDC1/SDC1 instructions, 
currently expected by our code to exist for any but MIPS I processors, and 
missing odd-numbered FGRs in the CP0.Status.FR=0 mode in the first place.

 So perhaps again let's defer that feature and start with the MIPS_CPU_FPU 
flag removed, and then add it along with proper FPU support in a later 
patch?  That patch will presumably add FP context switching support using 
LWC1/SWC1 and anything else to handle R5900's FPU peculiarities.

 Until that further patch has been applied the R5900 would then operate in 
the fully-emulated FP mode only, i.e. as if `nofpu' has been 
unconditionally selected.

> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index 3fe99cb271a9..0420ce8fb086 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -1192,6 +1192,20 @@ static void probe_pcache(void)
>  		c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;
>  		break;
>  
> +	case CPU_R5900:
> +		icache_size = 1 << (12 + ((config & CONF_IC) >> 9));
> +		c->icache.linesz = 64;
> +		c->icache.ways = 2;
> +		c->icache.waybit = 0;
> +
> +		dcache_size = 1 << (12 + ((config & CONF_DC) >> 6));
> +		c->dcache.linesz = 64;
> +		c->dcache.ways = 2;
> +		c->dcache.waybit = 0;
> +
> +		c->options |= MIPS_CPU_CACHE_CDEX_P | MIPS_CPU_PREFETCH;

 The cache parameters appear correct to me and I have discussed the 
options earlier on.

> @@ -1465,6 +1479,17 @@ static void probe_pcache(void)
>  	case CPU_R16000:
>  		break;
>  
> +	case CPU_R5900:
> +		if (c->icache.waysize > PAGE_SIZE)
> +			c->dcache.flags |= MIPS_CACHE_ALIASES;
> +		/*
> +		 * There seems to be a missing d-cache flush which is fixed
> +		 * with MIPS_CACHE_ALIASES.
> +		 */
> +		if (c->dcache.waysize > PAGE_SIZE)
> +			c->dcache.flags |= MIPS_CACHE_ALIASES;
> +		break;

 Duplicate code here; otherwise OK, as noted above.  You may wish to 
update the comment though.

 Ralf may yet want to chime in, but overall I think that the way to move 
forward with your submission is to:

1. Make the adjustments to this patch I have outlined above; dropping FPU 
   and 64-bit support for the time being in particular.

2. Update cache handlers to use the correct R5900-specific cache op 
   encodings, presumably within the same patch.

3. As Ralf has already reqested, add basic board support for the PS2 
   platform, including essential drivers that are required to boot, e.g. 
   serial, network; fancy stuff can be added gradually later on.

4. Post the whole set of changes collected so far, properly split into 
   functionally self-contained changes, i.e. ones that build and can run
   successfully run on actual hardware, for a reasonable definition of 
   success, e.g. patch #1 is this one, patch #2 is base board setup
   infrastructure, patch #3 is interrupt support, patch #4 is timer 
   support, patch #5 is the serial driver, patch #6 is the network 
   driver, or suchlike.

We can then integrate these to have basic hardware support already working 
and only then continue with more complicated features, especially ones 
such as the FPU and 64-bit support which will require considerable updates 
to generic architecture code.

 NB if CP1.FCSR.FS indeed turns out hardwired to 1, then having this FPU 
hardware handled will be a more complex task, involving adding AT_FPUCW 
support to our MIPS port of the kernel and adjusting glibc appropriately 
before we are able to proceed.  So if this is indeed the case, then I 
think it'll be the most reasonable if we just ignore the issue of 
CP1.FCSR.FS for the time being, and have the emulated FP work with the bit 
flippable.

 Questions, comments?

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-11  5:21         ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-11  5:21 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

On Sat, 9 Sep 2017, Maciej W. Rozycki wrote:

>  Can you please try flipping the bits instead then, e.g.:
> 
> 	uint32_t fcsr0, fcsr1;
> 	asm volatile (" cfc1 %0,$31\n"
> 		      " lui  %1,0xfffc\n"

 Actually can you please substitute:

		      " li   %1,0xfffc0003\n"

here, so that we know how RM behaves?

 Again, it is odd to see it set to 1 (towards zero) by default and if it 
is hardwired, then `->fpu_csr31' and `->fpu_msk31' will have to be 
updated, AT_FPUCW exported and glibc adjusted.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-11  5:21         ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-11  5:21 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

On Sat, 9 Sep 2017, Maciej W. Rozycki wrote:

>  Can you please try flipping the bits instead then, e.g.:
> 
> 	uint32_t fcsr0, fcsr1;
> 	asm volatile (" cfc1 %0,$31\n"
> 		      " lui  %1,0xfffc\n"

 Actually can you please substitute:

		      " li   %1,0xfffc0003\n"

here, so that we know how RM behaves?

 Again, it is odd to see it set to 1 (towards zero) by default and if it 
is hardwired, then `->fpu_csr31' and `->fpu_msk31' will have to be 
updated, AT_FPUCW exported and glibc adjusted.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-11  5:18       ` Maciej W. Rozycki
  (?)
@ 2017-09-11 15:17       ` Fredrik Noring
  2017-09-14 13:50           ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-11 15:17 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

Many thanks for your extensive review comments! I will work through them in
detail during the week. Regarding the submission:

>  Ralf may yet want to chime in, but overall I think that the way to move
> forward with your submission is to:
> 
> 1. Make the adjustments to this patch I have outlined above; dropping FPU
>    and 64-bit support for the time being in particular.
> 
> 2. Update cache handlers to use the correct R5900-specific cache op
>    encodings, presumably within the same patch.
> 
> 3. As Ralf has already reqested, add basic board support for the PS2
>    platform, including essential drivers that are required to boot, e.g.
>    serial, network; fancy stuff can be added gradually later on.
> 
> 4. Post the whole set of changes collected so far, properly split into
>    functionally self-contained changes, i.e. ones that build and can run
>    successfully run on actual hardware, for a reasonable definition of
>    success, e.g. patch #1 is this one, patch #2 is base board setup
>    infrastructure, patch #3 is interrupt support, patch #4 is timer
>    support, patch #5 is the serial driver, patch #6 is the network
>    driver, or suchlike.
> 
> We can then integrate these to have basic hardware support already working
> and only then continue with more complicated features, especially ones
> such as the FPU and 64-bit support which will require considerable updates
> to generic architecture code.
> 
>  NB if CP1.FCSR.FS indeed turns out hardwired to 1, then having this FPU
> hardware handled will be a more complex task, involving adding AT_FPUCW
> support to our MIPS port of the kernel and adjusting glibc appropriately
> before we are able to proceed.  So if this is indeed the case, then I
> think it'll be the most reasonable if we just ignore the issue of
> CP1.FCSR.FS for the time being, and have the emulated FP work with the bit
> flippable.
> 
>  Questions, comments?

This sounds like a good plan. I did have a few comments and questions on the
rest of the code in

https://www.linux-mips.org/archives/linux-mips/2017-08/msg00570.html

The first item in particular: The R5900 has 128 bit quadword registers which
isn't the native integer type and therefore maps poorly to pt_regs::regs[32]
in arch/mips/include/asm/ptrace.h and arch/mips/include/uapi/asm/ptrace.h.

The current patch does

struct pt_regs {
	...
	/* Saved main processor registers. */
#ifdef CONFIG_R5900_128BIT_SUPPORT
	/* Support for 128 bit. */
	r5900_reg_t regs[32];
#else
	unsigned long regs[32];
#endif

with r5900_reg_t as

typedef struct __attribute__((aligned(16))) {
	unsigned long long lo;
	unsigned long long hi;
} r5900_reg_t;

There are 300+ register reads/writes throughout arch/mips that need to be
adjusted with this change. The patch introduces a set of MIPS_READ_REG and
MIPS_WRITE_REG macros for this purpose:

/* Cast larger R5900 register to smaller 32 bit. */
#define MIPS_READ_REG_L(reg) ((unsigned long)((reg).lo))
#define MIPS_READ_REG(reg) ((reg).lo)
#define MIPS_READ_REG_HIGH(reg) ((reg).hi)
#define MIPS_READ_REG_S(reg) ((long long)(reg).lo)
#define MIPS_WRITE_REG(reg) ((reg).lo)
#define MIPS_REG_T unsigned long long

Typical use cases are

	orig31 = MIPS_READ_REG(regs->regs[31]);

and

	MIPS_WRITE_REG(regs->regs[insn.i_format.rt]) = value;

Can this be improved? CONFIG_R5900_128BIT_SUPPORT is configurable but the 32
bit programs I've tested become unstable unless it's set, so something isn't
quite working without it (which may or may not be related to the registers,
since CONFIG_R5900_128BIT_SUPPORT activates some other changes as well).

At what stage in the patch series would it be appropriate to introduce
support for quadword registers, and in what form?

The rest of the notes comment on the new SYNC.P instruction, MFC0/MTC0,
short loop crashes in the memcpy/strlen family of functions, etc. Several of
these changes and workarounds are required for a stable system and would
need to be introduced in some form.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-09-11  5:21         ` Maciej W. Rozycki
  (?)
@ 2017-09-12 17:59         ` Fredrik Noring
  2017-09-15 11:12             ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-12 17:59 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> >  Can you please try flipping the bits instead then, e.g.:
> > 
> > 	uint32_t fcsr0, fcsr1;
> > 	asm volatile (" cfc1 %0,$31\n"
> > 		      " lui  %1,0xfffc\n"
> 
>  Actually can you please substitute:
> 
> 		      " li   %1,0xfffc0003\n"
> 
> here, so that we know how RM behaves?

Sure. I get "FCSR old: 01000001, new: 01800001" with the R5900.

>  Again, it is odd to see it set to 1 (towards zero) by default and if it 
> is hardwired, then `->fpu_csr31' and `->fpu_msk31' will have to be 
> updated, AT_FPUCW exported and glibc adjusted.

Right. Quite a few details to resolve for the FPU then. Here is the
disassembly to double-check the compiled code:

004001c0 <main>:
  4001c0:	3c1c0043 	lui	gp,0x43
  4001c4:	27bdffe0 	addiu	sp,sp,-32
  4001c8:	279c9470 	addiu	gp,gp,-27536
  4001cc:	afbf001c 	sw	ra,28(sp)
  4001d0:	afbc0010 	sw	gp,16(sp)
  4001d4:	4445f800 	cfc1	a1,$31
  4001d8:	3c06fffc 	lui	a2,0xfffc
  4001dc:	34c60003 	ori	a2,a2,0x3
  4001e0:	00c53026 	xor	a2,a2,a1
  4001e4:	44c6f800 	ctc1	a2,$31
  4001e8:	00000000 	nop
  4001ec:	4446f800 	cfc1	a2,$31
  4001f0:	44c5f800 	ctc1	a1,$31
  4001f4:	8f9980f4 	lw	t9,-32524(gp)
  4001f8:	3c040041 	lui	a0,0x41
  4001fc:	04110094 	bal	400450 <__GI_printf>
  400200:	2484f720 	addiu	a0,a0,-2272
  400204:	8fbf001c 	lw	ra,28(sp)
  400208:	00001021 	move	v0,zero
  40020c:	03e00008 	jr	ra
  400210:	27bd0020 	addiu	sp,sp,32

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-14 13:50           ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-14 13:50 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> This sounds like a good plan. I did have a few comments and questions on the
> rest of the code in
> 
> https://www.linux-mips.org/archives/linux-mips/2017-08/msg00570.html

 I'll go through and try to address those questions one by one separately.

> Can this be improved? CONFIG_R5900_128BIT_SUPPORT is configurable but the 32
> bit programs I've tested become unstable unless it's set, so something isn't
> quite working without it (which may or may not be related to the registers,
> since CONFIG_R5900_128BIT_SUPPORT activates some other changes as well).

 For the initial R5900 support I think there are two options here, 
depending on what hardware supports:

1. If (for binary compatibility reasons) 128-bit GPR support can somehow 
   be disabled in hardware, by flipping a CP0 register bit or suchlike, 
   then I suggest doing that in the first stage.

2. Otherwise I think that the context initialisation/switch code has to be 
   adjusted such that the upper GPR halves are set to a known state, 
   either zeroed or sign-extended from bit #63 (or #31 really, given the 
   initial 32-bit port only) according to hardware requirements, so as to
   make execution stable and prevent data from leaking between contexts.

Later on proper 128-bit support can be added, though for that to make 
sense you need to have compiler support too, which AFAICT is currently 
missing.  Myself I'd rather defer commenting on that further support until 
we get to it, although of course someone else might be willing to sketch 
an idea.

> At what stage in the patch series would it be appropriate to introduce
> support for quadword registers, and in what form?

 Well, I think we need to stabilise base 32-bit and then 64-bit support 
first.  So that'll be a separate patch or patch set to consider at that 
point.

> The rest of the notes comment on the new SYNC.P instruction, MFC0/MTC0,
> short loop crashes in the memcpy/strlen family of functions, etc. Several of
> these changes and workarounds are required for a stable system and would
> need to be introduced in some form.

 The exception handler workarounds should be easy to implement as we 
generate machine code for them at run-time, so inserting a pair of NOPs 
should be straightforward while not affecting any other target.  As a 
solution addressing a grave hardware erratum this obviously has to be 
included with your initial patch set, as a separate change because it's 
self-contained.

 Any other workarounds are handled via <asm/war.h>.  Those which are 
needed for correct initial 32-bit operation need to go with the initial 
patch set as well, one change per issue.

 If SYNC acts as a hazard barrier for MFC0/MTC0, etc., then it can be 
substituted for EHB/JR.HB where applicable.  We have infrastructure for 
that in <asm/hazards.h> already, so you just need to hook in.  You may 
have to expand that header of course if in the R5900 there are hazards not 
already covered by EHB/JR.HB for other processors.  These will need to go 
with the initial patch set too.

 And last but not least I suggest to structure your initial patch set such 
that the commit containing Kconfig changes to enable R5900/PS2 comes last, 
so that once that last patch goes in you can build a kernel that boots and 
correctly works on actual hardware.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-14 13:50           ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-14 13:50 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> This sounds like a good plan. I did have a few comments and questions on the
> rest of the code in
> 
> https://www.linux-mips.org/archives/linux-mips/2017-08/msg00570.html

 I'll go through and try to address those questions one by one separately.

> Can this be improved? CONFIG_R5900_128BIT_SUPPORT is configurable but the 32
> bit programs I've tested become unstable unless it's set, so something isn't
> quite working without it (which may or may not be related to the registers,
> since CONFIG_R5900_128BIT_SUPPORT activates some other changes as well).

 For the initial R5900 support I think there are two options here, 
depending on what hardware supports:

1. If (for binary compatibility reasons) 128-bit GPR support can somehow 
   be disabled in hardware, by flipping a CP0 register bit or suchlike, 
   then I suggest doing that in the first stage.

2. Otherwise I think that the context initialisation/switch code has to be 
   adjusted such that the upper GPR halves are set to a known state, 
   either zeroed or sign-extended from bit #63 (or #31 really, given the 
   initial 32-bit port only) according to hardware requirements, so as to
   make execution stable and prevent data from leaking between contexts.

Later on proper 128-bit support can be added, though for that to make 
sense you need to have compiler support too, which AFAICT is currently 
missing.  Myself I'd rather defer commenting on that further support until 
we get to it, although of course someone else might be willing to sketch 
an idea.

> At what stage in the patch series would it be appropriate to introduce
> support for quadword registers, and in what form?

 Well, I think we need to stabilise base 32-bit and then 64-bit support 
first.  So that'll be a separate patch or patch set to consider at that 
point.

> The rest of the notes comment on the new SYNC.P instruction, MFC0/MTC0,
> short loop crashes in the memcpy/strlen family of functions, etc. Several of
> these changes and workarounds are required for a stable system and would
> need to be introduced in some form.

 The exception handler workarounds should be easy to implement as we 
generate machine code for them at run-time, so inserting a pair of NOPs 
should be straightforward while not affecting any other target.  As a 
solution addressing a grave hardware erratum this obviously has to be 
included with your initial patch set, as a separate change because it's 
self-contained.

 Any other workarounds are handled via <asm/war.h>.  Those which are 
needed for correct initial 32-bit operation need to go with the initial 
patch set as well, one change per issue.

 If SYNC acts as a hazard barrier for MFC0/MTC0, etc., then it can be 
substituted for EHB/JR.HB where applicable.  We have infrastructure for 
that in <asm/hazards.h> already, so you just need to hook in.  You may 
have to expand that header of course if in the R5900 there are hazards not 
already covered by EHB/JR.HB for other processors.  These will need to go 
with the initial patch set too.

 And last but not least I suggest to structure your initial patch set such 
that the commit containing Kconfig changes to enable R5900/PS2 comes last, 
so that once that last patch goes in you can build a kernel that boots and 
correctly works on actual hardware.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-15 11:12             ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-15 11:12 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> > >  Can you please try flipping the bits instead then, e.g.:
> > > 
> > > 	uint32_t fcsr0, fcsr1;
> > > 	asm volatile (" cfc1 %0,$31\n"
> > > 		      " lui  %1,0xfffc\n"
> > 
> >  Actually can you please substitute:
> > 
> > 		      " li   %1,0xfffc0003\n"
> > 
> > here, so that we know how RM behaves?
> 
> Sure. I get "FCSR old: 01000001, new: 01800001" with the R5900.

 Thanks, that is as I suspected then.

 I wonder if FS=1 hardwired also means the Underflow exception cannot 
happen.  As the corresponding Cause and Enable bits cannot be set together 
or an FPE exception will happen right away, and the Unimplemented 
Operation exception is uncoditional so we need to leave it out, can you 
please also try these masks in turns:

	      " li   %1,0x0001f07c\n"

and:

	      " li   %1,0x00000f80\n"

This will reveal if any of the Cause, Enable or Flag bits are hardwired.

> >  Again, it is odd to see it set to 1 (towards zero) by default and if it 
> > is hardwired, then `->fpu_csr31' and `->fpu_msk31' will have to be 
> > updated, AT_FPUCW exported and glibc adjusted.
> 
> Right. Quite a few details to resolve for the FPU then. Here is the
> disassembly to double-check the compiled code:

 Nothing unusual here.  As you can see GCC has been smart enough to 
schedule temporaries right in argument registers passed to the `printf' 
call. :)

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-15 11:12             ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-15 11:12 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> > >  Can you please try flipping the bits instead then, e.g.:
> > > 
> > > 	uint32_t fcsr0, fcsr1;
> > > 	asm volatile (" cfc1 %0,$31\n"
> > > 		      " lui  %1,0xfffc\n"
> > 
> >  Actually can you please substitute:
> > 
> > 		      " li   %1,0xfffc0003\n"
> > 
> > here, so that we know how RM behaves?
> 
> Sure. I get "FCSR old: 01000001, new: 01800001" with the R5900.

 Thanks, that is as I suspected then.

 I wonder if FS=1 hardwired also means the Underflow exception cannot 
happen.  As the corresponding Cause and Enable bits cannot be set together 
or an FPE exception will happen right away, and the Unimplemented 
Operation exception is uncoditional so we need to leave it out, can you 
please also try these masks in turns:

	      " li   %1,0x0001f07c\n"

and:

	      " li   %1,0x00000f80\n"

This will reveal if any of the Cause, Enable or Flag bits are hardwired.

> >  Again, it is odd to see it set to 1 (towards zero) by default and if it 
> > is hardwired, then `->fpu_csr31' and `->fpu_msk31' will have to be 
> > updated, AT_FPUCW exported and glibc adjusted.
> 
> Right. Quite a few details to resolve for the FPU then. Here is the
> disassembly to double-check the compiled code:

 Nothing unusual here.  As you can see GCC has been smart enough to 
schedule temporaries right in argument registers passed to the `printf' 
call. :)

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
  2017-09-15 11:12             ` Maciej W. Rozycki
  (?)
@ 2017-09-15 13:19             ` Fredrik Noring
  2017-09-15 18:28                 ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-15 13:19 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  I wonder if FS=1 hardwired also means the Underflow exception cannot 
> happen.  As the corresponding Cause and Enable bits cannot be set together 
> or an FPE exception will happen right away, and the Unimplemented 
> Operation exception is uncoditional so we need to leave it out, can you 
> please also try these masks in turns:
> 
> 	      " li   %1,0x0001f07c\n"
> 
> and:
> 
> 	      " li   %1,0x00000f80\n"
> 
> This will reveal if any of the Cause, Enable or Flag bits are hardwired.

The result is:

	FCSR 0x0001f07c old: 01000001, new: 0101c079
	FCSR 0x00000f80 old: 01000001, new: 01000001

I was looking for information on GCC for R5900 and found

https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html

where you and Jürgen Urban discuss this topic. Jürgen cites some FPU details
from the Emotion Engine core user's manual that is very helpful, in addition
to mentioning TX79 differences.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-15 18:28                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-15 18:28 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  I wonder if FS=1 hardwired also means the Underflow exception cannot 
> > happen.  As the corresponding Cause and Enable bits cannot be set together 
> > or an FPE exception will happen right away, and the Unimplemented 
> > Operation exception is uncoditional so we need to leave it out, can you 
> > please also try these masks in turns:
> > 
> > 	      " li   %1,0x0001f07c\n"
> > 
> > and:
> > 
> > 	      " li   %1,0x00000f80\n"
> > 
> > This will reveal if any of the Cause, Enable or Flag bits are hardwired.
> 
> The result is:
> 
> 	FCSR 0x0001f07c old: 01000001, new: 0101c079
> 	FCSR 0x00000f80 old: 01000001, new: 01000001

 This looks unusual and inconsistent in that only V, Z and O Cause bits 
appear settable, these and also I Flag bits do and no Enable bits do.  
Given Jürgen's observations in the discussion you referred to below I 
would expect the I Flag bit not to be settable either; perhaps it's a 
hardware erratum.

> I was looking for information on GCC for R5900 and found
> 
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> 
> where you and Jürgen Urban discuss this topic. Jürgen cites some FPU details
> from the Emotion Engine core user's manual that is very helpful, in addition
> to mentioning TX79 differences.

 Thanks for the reference, I did remember I had the discussion, but didn't 
recall the details, although I had a vague recollection about instruction 
encoding differences.

 Given the situation I think we'll have to stick with full FPU emulation 
for regular MIPS/Linux user programs, and then possibly have an ELF ABI 
flag of sorts to mark software requesting running in the R5900 hard-float 
mode (which obviously won't be able to use standard `libm', etc.); we can 
think of doing it in a way to keep binary compatibility with exiting PS2 
software, should this be a concern.

 Tasks run in the R5900 hard-float mode would then have our FPU emulator 
strapped for pass-through operation, i.e. the CpU exception and context 
switching would work normally, however any FPE exception, given the 
findings above about FCSR possibly including Unimplemented Operation only, 
would just throw SIGFPE, letting the userland handle it if desired.  
You'd need a new `si_code' of course for Unimplemented Operation; or maybe 
not even that, because as vague as Jürgen's notes are they seem to suggest 
the R5900 may not actually trap with FPE ever.

 This also means you only want FPU_CSR_CONDX in `c->fpu_msk31' (for the 
full FPU emulation) as with an ordinary MIPS III processor.

 NB, I think the issue with RDHWR emulation to access CP0.UserLocal 
mentioned in the discussion referred will have to be addressed with the 
initial submission as well.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH] MIPS: Add basic R5900 support
@ 2017-09-15 18:28                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-15 18:28 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  I wonder if FS=1 hardwired also means the Underflow exception cannot 
> > happen.  As the corresponding Cause and Enable bits cannot be set together 
> > or an FPE exception will happen right away, and the Unimplemented 
> > Operation exception is uncoditional so we need to leave it out, can you 
> > please also try these masks in turns:
> > 
> > 	      " li   %1,0x0001f07c\n"
> > 
> > and:
> > 
> > 	      " li   %1,0x00000f80\n"
> > 
> > This will reveal if any of the Cause, Enable or Flag bits are hardwired.
> 
> The result is:
> 
> 	FCSR 0x0001f07c old: 01000001, new: 0101c079
> 	FCSR 0x00000f80 old: 01000001, new: 01000001

 This looks unusual and inconsistent in that only V, Z and O Cause bits 
appear settable, these and also I Flag bits do and no Enable bits do.  
Given Jürgen's observations in the discussion you referred to below I 
would expect the I Flag bit not to be settable either; perhaps it's a 
hardware erratum.

> I was looking for information on GCC for R5900 and found
> 
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> 
> where you and Jürgen Urban discuss this topic. Jürgen cites some FPU details
> from the Emotion Engine core user's manual that is very helpful, in addition
> to mentioning TX79 differences.

 Thanks for the reference, I did remember I had the discussion, but didn't 
recall the details, although I had a vague recollection about instruction 
encoding differences.

 Given the situation I think we'll have to stick with full FPU emulation 
for regular MIPS/Linux user programs, and then possibly have an ELF ABI 
flag of sorts to mark software requesting running in the R5900 hard-float 
mode (which obviously won't be able to use standard `libm', etc.); we can 
think of doing it in a way to keep binary compatibility with exiting PS2 
software, should this be a concern.

 Tasks run in the R5900 hard-float mode would then have our FPU emulator 
strapped for pass-through operation, i.e. the CpU exception and context 
switching would work normally, however any FPE exception, given the 
findings above about FCSR possibly including Unimplemented Operation only, 
would just throw SIGFPE, letting the userland handle it if desired.  
You'd need a new `si_code' of course for Unimplemented Operation; or maybe 
not even that, because as vague as Jürgen's notes are they seem to suggest 
the R5900 may not actually trap with FPE ever.

 This also means you only want FPU_CSR_CONDX in `c->fpu_msk31' (for the 
full FPU emulation) as with an ordinary MIPS III processor.

 NB, I think the issue with RDHWR emulation to access CP0.UserLocal 
mentioned in the discussion referred will have to be addressed with the 
initial submission as well.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-14 13:50           ` Maciej W. Rozycki
  (?)
@ 2017-09-16 13:34           ` Fredrik Noring
  2017-09-18 17:05               ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-16 13:34 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  For the initial R5900 support I think there are two options here, 
> depending on what hardware supports:
> 
> 1. If (for binary compatibility reasons) 128-bit GPR support can somehow 
>    be disabled in hardware, by flipping a CP0 register bit or suchlike, 
>    then I suggest doing that in the first stage.

Unfortunately I haven't found such a switch. There is also a set of 128-bit
multimedia instructions to consider (GCC is perhaps unlikely to generate
those but assembly code is an option too).

> 2. Otherwise I think that the context initialisation/switch code has to be 
>    adjusted such that the upper GPR halves are set to a known state, 
>    either zeroed or sign-extended from bit #63 (or #31 really, given the 
>    initial 32-bit port only) according to hardware requirements, so as to
>    make execution stable and prevent data from leaking between contexts.
> 
> Later on proper 128-bit support can be added, though for that to make 
> sense you need to have compiler support too, which AFAICT is currently 
> missing.  Myself I'd rather defer commenting on that further support until 
> we get to it, although of course someone else might be willing to sketch 
> an idea.

I have a working 32-bit kernel now, except that BusyBox randomly crashes
unless the kernel saves/restores 64-bit GPRs. The executables and libraries
declare "ELF 32-bit LSB, MIPS, MIPS-III version 1" so in theory, I suppose,
they ought to be 32-bit only. It is possible that the error lies in the
kernel handling of the GPRs but I have double-checked this in several ways.

The error, as it appears, is nasty for at least two reasons: it occurs
randomly (when the kernel arbitrarily resets the upper 96 bits of all GPRs)
and it can easily remain undetected and lead to silent data corruption.

Are there other Linux MIPS implementations that reset GPRs like this?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-18 17:05               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-18 17:05 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  For the initial R5900 support I think there are two options here, 
> > depending on what hardware supports:
> > 
> > 1. If (for binary compatibility reasons) 128-bit GPR support can somehow 
> >    be disabled in hardware, by flipping a CP0 register bit or suchlike, 
> >    then I suggest doing that in the first stage.
> 
> Unfortunately I haven't found such a switch. There is also a set of 128-bit
> multimedia instructions to consider (GCC is perhaps unlikely to generate
> those but assembly code is an option too).

 The usual minimal approach is to have compiler intrinsics implemented.

> > 2. Otherwise I think that the context initialisation/switch code has to be 
> >    adjusted such that the upper GPR halves are set to a known state, 
> >    either zeroed or sign-extended from bit #63 (or #31 really, given the 
> >    initial 32-bit port only) according to hardware requirements, so as to
> >    make execution stable and prevent data from leaking between contexts.
> > 
> > Later on proper 128-bit support can be added, though for that to make 
> > sense you need to have compiler support too, which AFAICT is currently 
> > missing.  Myself I'd rather defer commenting on that further support until 
> > we get to it, although of course someone else might be willing to sketch 
> > an idea.
> 
> I have a working 32-bit kernel now, except that BusyBox randomly crashes
> unless the kernel saves/restores 64-bit GPRs. The executables and libraries
> declare "ELF 32-bit LSB, MIPS, MIPS-III version 1" so in theory, I suppose,
> they ought to be 32-bit only. It is possible that the error lies in the
> kernel handling of the GPRs but I have double-checked this in several ways.

 Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which 
the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently 
unsupported and is kept clear as well), which means that an attempt to use 
any instruction that affects register bits beyond bit #31 will cause a 
Reserved Instruction exception, and in turn SIGILL being sent to the 
program.  

 So any crash caused by the lack of handling of the upper bits is a result 
of either a kernel bug or an issue with hardware.

> The error, as it appears, is nasty for at least two reasons: it occurs
> randomly (when the kernel arbitrarily resets the upper 96 bits of all GPRs)
> and it can easily remain undetected and lead to silent data corruption.

 Hmm, can you verify that no LWU instruction is present in the kernel 
somewhere?

 Can you add a diagnostic consistency check to the context restoration 
code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
such as a `break 12' (BRK_BUG) instruction if a register value is not 
correctly sign-extended?  You can instead use one of the register trap 
instructions (with the same BRK_BUG code), to avoid the need for a branch.  
Make sure you don't clobber registers restored; you may have to use $k0 or 
$k1 in places.  This will cause a kernel oops, which can then be examined 
to track down a possible cause.

 GAS will prevent the use of any 64-bit instructions (which LWU is one of) 
when the o32 ABI has been selected for assembly, however it can be 
temporarily overridden by `.set' pseudo-ops, and also I haven't verified 
if there isn't an issue with `-march=r5900' in GAS.

> Are there other Linux MIPS implementations that reset GPRs like this?

 No, because keeping CP0.Status.UX clear guarantees that only instructions 
which sign-extend register results from bit #31 can be used.

 Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-18 17:05               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-18 17:05 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  For the initial R5900 support I think there are two options here, 
> > depending on what hardware supports:
> > 
> > 1. If (for binary compatibility reasons) 128-bit GPR support can somehow 
> >    be disabled in hardware, by flipping a CP0 register bit or suchlike, 
> >    then I suggest doing that in the first stage.
> 
> Unfortunately I haven't found such a switch. There is also a set of 128-bit
> multimedia instructions to consider (GCC is perhaps unlikely to generate
> those but assembly code is an option too).

 The usual minimal approach is to have compiler intrinsics implemented.

> > 2. Otherwise I think that the context initialisation/switch code has to be 
> >    adjusted such that the upper GPR halves are set to a known state, 
> >    either zeroed or sign-extended from bit #63 (or #31 really, given the 
> >    initial 32-bit port only) according to hardware requirements, so as to
> >    make execution stable and prevent data from leaking between contexts.
> > 
> > Later on proper 128-bit support can be added, though for that to make 
> > sense you need to have compiler support too, which AFAICT is currently 
> > missing.  Myself I'd rather defer commenting on that further support until 
> > we get to it, although of course someone else might be willing to sketch 
> > an idea.
> 
> I have a working 32-bit kernel now, except that BusyBox randomly crashes
> unless the kernel saves/restores 64-bit GPRs. The executables and libraries
> declare "ELF 32-bit LSB, MIPS, MIPS-III version 1" so in theory, I suppose,
> they ought to be 32-bit only. It is possible that the error lies in the
> kernel handling of the GPRs but I have double-checked this in several ways.

 Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which 
the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently 
unsupported and is kept clear as well), which means that an attempt to use 
any instruction that affects register bits beyond bit #31 will cause a 
Reserved Instruction exception, and in turn SIGILL being sent to the 
program.  

 So any crash caused by the lack of handling of the upper bits is a result 
of either a kernel bug or an issue with hardware.

> The error, as it appears, is nasty for at least two reasons: it occurs
> randomly (when the kernel arbitrarily resets the upper 96 bits of all GPRs)
> and it can easily remain undetected and lead to silent data corruption.

 Hmm, can you verify that no LWU instruction is present in the kernel 
somewhere?

 Can you add a diagnostic consistency check to the context restoration 
code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
such as a `break 12' (BRK_BUG) instruction if a register value is not 
correctly sign-extended?  You can instead use one of the register trap 
instructions (with the same BRK_BUG code), to avoid the need for a branch.  
Make sure you don't clobber registers restored; you may have to use $k0 or 
$k1 in places.  This will cause a kernel oops, which can then be examined 
to track down a possible cause.

 GAS will prevent the use of any 64-bit instructions (which LWU is one of) 
when the o32 ABI has been selected for assembly, however it can be 
temporarily overridden by `.set' pseudo-ops, and also I haven't verified 
if there isn't an issue with `-march=r5900' in GAS.

> Are there other Linux MIPS implementations that reset GPRs like this?

 No, because keeping CP0.Status.UX clear guarantees that only instructions 
which sign-extend register results from bit #31 can be used.

 Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-18 17:05               ` Maciej W. Rozycki
  (?)
@ 2017-09-18 19:24               ` Fredrik Noring
  2017-09-19 12:44                   ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-18 19:24 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> > Unfortunately I haven't found such a switch. There is also a set of 128-bit
> > multimedia instructions to consider (GCC is perhaps unlikely to generate
> > those but assembly code is an option too).
> 
>  The usual minimal approach is to have compiler intrinsics implemented.

I sometimes make less than minimal programs, or unusual ones, or both. ;)

>  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which 
> the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently 
> unsupported and is kept clear as well), which means that an attempt to use 
> any instruction that affects register bits beyond bit #31 will cause a 
> Reserved Instruction exception, and in turn SIGILL being sent to the 
> program.  

Would UX be bit 5 of CP0.Status? That bit is hardwired to naught according
to the TX79 manual (p. 4-16).

>  So any crash caused by the lack of handling of the upper bits is a result 
> of either a kernel bug or an issue with hardware.

R5900 does not seem to implement UX, SX, KX or PX.

>  Hmm, can you verify that no LWU instruction is present in the kernel 
> somewhere?

Sure, "mipsel-linux-objdump -d vmlinux | grep lwu" yields nil.

>  Can you add a diagnostic consistency check to the context restoration 
> code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
> such as a `break 12' (BRK_BUG) instruction if a register value is not 
> correctly sign-extended?  You can instead use one of the register trap 
> instructions (with the same BRK_BUG code), to avoid the need for a branch.  
> Make sure you don't clobber registers restored; you may have to use $k0 or 
> $k1 in places.  This will cause a kernel oops, which can then be examined 
> to track down a possible cause.

Given the R5900 patch I believe this can be done somewhat simpler, since
register access macros have been implemented in C (in this way the physical
registers become in some sense separated from the logical registers in the
kernel).

The transition from 128-bit registers to 64-bit registers was easy (in a
32-bit kernel) by changing the LONGD_{L,S} macros in asm.h from quadword
{L,S}Q to doubleword {L,S}D instructions, and changing pt_regs::regs[32]
from r5900_reg_t to unsigned long long. (The patch replaces LONG_* with
three variants: LONGD_*, LONGH_* and LONGI_*. It also forces LD and SD
via a ".set push/.set mips3/.set pop" combination like you outline below.)

The patch has full 64-bit registers accessible in C too, which is why I
propose to do the diagnostic consistency check in C. (Macros truncate to
32 bits everywhere in the kernel except for save/restore.)

>  GAS will prevent the use of any 64-bit instructions (which LWU is one of) 
> when the o32 ABI has been selected for assembly, however it can be 
> temporarily overridden by `.set' pseudo-ops, and also I haven't verified 
> if there isn't an issue with `-march=r5900' in GAS.

I'm using a BusyBox binary from the Debian-based Black Rhino distribution,
so I'm not entirely sure how it was compiled, and it might contain 64-bit
instructions that are not caught by the (unavailable) UX bit.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-19 12:44                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-19 12:44 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which 
> > the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently 
> > unsupported and is kept clear as well), which means that an attempt to use 
> > any instruction that affects register bits beyond bit #31 will cause a 
> > Reserved Instruction exception, and in turn SIGILL being sent to the 
> > program.  
> 
> Would UX be bit 5 of CP0.Status? That bit is hardwired to naught according
> to the TX79 manual (p. 4-16).

 Yes, bit #5.

> >  Can you add a diagnostic consistency check to the context restoration 
> > code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
> > such as a `break 12' (BRK_BUG) instruction if a register value is not 
> > correctly sign-extended?  You can instead use one of the register trap 
> > instructions (with the same BRK_BUG code), to avoid the need for a branch.  
> > Make sure you don't clobber registers restored; you may have to use $k0 or 
> > $k1 in places.  This will cause a kernel oops, which can then be examined 
> > to track down a possible cause.
> 
> Given the R5900 patch I believe this can be done somewhat simpler, since
> register access macros have been implemented in C (in this way the physical
> registers become in some sense separated from the logical registers in the
> kernel).
> 
> The transition from 128-bit registers to 64-bit registers was easy (in a
> 32-bit kernel) by changing the LONGD_{L,S} macros in asm.h from quadword
> {L,S}Q to doubleword {L,S}D instructions, and changing pt_regs::regs[32]
> from r5900_reg_t to unsigned long long.

 But why did you have to change anything there in the first place?  All 
that's there is generic stuff.

> (The patch replaces LONG_* with
> three variants: LONGD_*, LONGH_* and LONGI_*. It also forces LD and SD
> via a ".set push/.set mips3/.set pop" combination like you outline below.)

 I don't remember suggesting anything like that.

> The patch has full 64-bit registers accessible in C too, which is why I
> propose to do the diagnostic consistency check in C. (Macros truncate to
> 32 bits everywhere in the kernel except for save/restore.)

 You need to figure out the semantics of 128-bit registers and describe it 
in details (to be provided in the relevant commit's description), in 
particular any interaction 32-bit and 64-bit instructions have with the 
upper 64-bit half, before we can accept any change to support these 
extended registers.
  
 Barring evidence otherwise I think updating macros in <asm/asm.h> is not 
enough, because our syscalls rely on the standard MIPS psABI's calling 
convention and call-saved registers will only be saved and restored on an 
as-needed basis, in the prologue/epilogue of any kernel's C functions that 
actually use them.  And GCC will only use save and restore call-saved 
registers using regular 32-bit or 64-bit operations, according to the ABI 
the kernel has been compiled for.  So if there's a need to preserve the 
upper 64-bit halves, then it has to be done explicitly, possibly in an 
extra <asm/stackframe.h> macro.

 But all that is something for a later stage; right now I suggest that you 
figure out what's causing registers to become clobbered and fix it there.

> >  GAS will prevent the use of any 64-bit instructions (which LWU is one of) 
> > when the o32 ABI has been selected for assembly, however it can be 
> > temporarily overridden by `.set' pseudo-ops, and also I haven't verified 
> > if there isn't an issue with `-march=r5900' in GAS.
> 
> I'm using a BusyBox binary from the Debian-based Black Rhino distribution,
> so I'm not entirely sure how it was compiled, and it might contain 64-bit
> instructions that are not caught by the (unavailable) UX bit.

 Use `file' or `readelf -h' on the BusyBox binary to double-check the ABI 
it has been built for.  Although I doubt there will be issues with the 
executable, as it would crash on any of the other MIPS processors which 
implement the 32-bit mode correctly.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-19 12:44                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-19 12:44 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which 
> > the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently 
> > unsupported and is kept clear as well), which means that an attempt to use 
> > any instruction that affects register bits beyond bit #31 will cause a 
> > Reserved Instruction exception, and in turn SIGILL being sent to the 
> > program.  
> 
> Would UX be bit 5 of CP0.Status? That bit is hardwired to naught according
> to the TX79 manual (p. 4-16).

 Yes, bit #5.

> >  Can you add a diagnostic consistency check to the context restoration 
> > code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
> > such as a `break 12' (BRK_BUG) instruction if a register value is not 
> > correctly sign-extended?  You can instead use one of the register trap 
> > instructions (with the same BRK_BUG code), to avoid the need for a branch.  
> > Make sure you don't clobber registers restored; you may have to use $k0 or 
> > $k1 in places.  This will cause a kernel oops, which can then be examined 
> > to track down a possible cause.
> 
> Given the R5900 patch I believe this can be done somewhat simpler, since
> register access macros have been implemented in C (in this way the physical
> registers become in some sense separated from the logical registers in the
> kernel).
> 
> The transition from 128-bit registers to 64-bit registers was easy (in a
> 32-bit kernel) by changing the LONGD_{L,S} macros in asm.h from quadword
> {L,S}Q to doubleword {L,S}D instructions, and changing pt_regs::regs[32]
> from r5900_reg_t to unsigned long long.

 But why did you have to change anything there in the first place?  All 
that's there is generic stuff.

> (The patch replaces LONG_* with
> three variants: LONGD_*, LONGH_* and LONGI_*. It also forces LD and SD
> via a ".set push/.set mips3/.set pop" combination like you outline below.)

 I don't remember suggesting anything like that.

> The patch has full 64-bit registers accessible in C too, which is why I
> propose to do the diagnostic consistency check in C. (Macros truncate to
> 32 bits everywhere in the kernel except for save/restore.)

 You need to figure out the semantics of 128-bit registers and describe it 
in details (to be provided in the relevant commit's description), in 
particular any interaction 32-bit and 64-bit instructions have with the 
upper 64-bit half, before we can accept any change to support these 
extended registers.
  
 Barring evidence otherwise I think updating macros in <asm/asm.h> is not 
enough, because our syscalls rely on the standard MIPS psABI's calling 
convention and call-saved registers will only be saved and restored on an 
as-needed basis, in the prologue/epilogue of any kernel's C functions that 
actually use them.  And GCC will only use save and restore call-saved 
registers using regular 32-bit or 64-bit operations, according to the ABI 
the kernel has been compiled for.  So if there's a need to preserve the 
upper 64-bit halves, then it has to be done explicitly, possibly in an 
extra <asm/stackframe.h> macro.

 But all that is something for a later stage; right now I suggest that you 
figure out what's causing registers to become clobbered and fix it there.

> >  GAS will prevent the use of any 64-bit instructions (which LWU is one of) 
> > when the o32 ABI has been selected for assembly, however it can be 
> > temporarily overridden by `.set' pseudo-ops, and also I haven't verified 
> > if there isn't an issue with `-march=r5900' in GAS.
> 
> I'm using a BusyBox binary from the Debian-based Black Rhino distribution,
> so I'm not entirely sure how it was compiled, and it might contain 64-bit
> instructions that are not caught by the (unavailable) UX bit.

 Use `file' or `readelf -h' on the BusyBox binary to double-check the ABI 
it has been built for.  Although I doubt there will be issues with the 
executable, as it would crash on any of the other MIPS processors which 
implement the 32-bit mode correctly.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-18 17:05               ` Maciej W. Rozycki
  (?)
  (?)
@ 2017-09-20 14:07               ` Fredrik Noring
  2017-09-21 21:07                   ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-20 14:07 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  Can you add a diagnostic consistency check to the context restoration 
> code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
> such as a `break 12' (BRK_BUG) instruction if a register value is not 
> correctly sign-extended?

Hmm... I think some details need to be sorted out for this. The LW
instruction used to restore registers sign-extends to register length by
definition (p. A-70 in the TX79 manual), so I assume that isn't what we
are going to check unless we suspect a grave hardware error with LW? (Do
we need to check the register values immediately prior to LW?)

Another possibility would be to check that saved registers in SAVE_ALL
will be restored properly. That is, immediately after SW check that LW
(to a temporary register such as k1) will restore to the same value by
64-bit comparison and trap if unequal (TNE). I thought that made sense.
Something like for example

	sw	$17, PT_R17(sp)
	lw	k1, PT_R17(sp)
	tne	k1, $17, 12

as a replacement for

	LONG_S	$17, PT_R17(sp)

in SAVE_STATIC?

A question is whether registers are clobbered within the kernel itself
(via interrupts or some such) or for user programs.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-19 12:44                   ` Maciej W. Rozycki
  (?)
@ 2017-09-20 14:54                   ` Fredrik Noring
  2017-09-26 11:50                       ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-20 14:54 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> > Given the R5900 patch I believe this can be done somewhat simpler, since
> > register access macros have been implemented in C (in this way the physical
> > registers become in some sense separated from the logical registers in the
> > kernel).
> > 
> > The transition from 128-bit registers to 64-bit registers was easy (in a
> > 32-bit kernel) by changing the LONGD_{L,S} macros in asm.h from quadword
> > {L,S}Q to doubleword {L,S}D instructions, and changing pt_regs::regs[32]
> > from r5900_reg_t to unsigned long long.
> 
>  But why did you have to change anything there in the first place?  All 
> that's there is generic stuff.

The 128-bit register save/restore infrastructure is part of the original
2.6.35 patch for R5900 support, that I ported to 4.12 and we're about to
disentangle. The original patch crashes similarly unless full 128-bit GPRs
are handled in 32-bit or 64-bit kernels, so this particular issue appears
to remain intact. Hopefully we will be able to figure out cause and fix.

> > (The patch replaces LONG_* with
> > three variants: LONGD_*, LONGH_* and LONGI_*. It also forces LD and SD
> > via a ".set push/.set mips3/.set pop" combination like you outline below.)
> 
>  I don't remember suggesting anything like that.

Well, I was referring to similar use of the .set pseudo-op.

> > The patch has full 64-bit registers accessible in C too, which is why I
> > propose to do the diagnostic consistency check in C. (Macros truncate to
> > 32 bits everywhere in the kernel except for save/restore.)
> 
>  You need to figure out the semantics of 128-bit registers and describe it 
> in details (to be provided in the relevant commit's description), in 
> particular any interaction 32-bit and 64-bit instructions have with the 
> upper 64-bit half, before we can accept any change to support these 
> extended registers.
>   
>  Barring evidence otherwise I think updating macros in <asm/asm.h> is not 
> enough, because our syscalls rely on the standard MIPS psABI's calling 
> convention and call-saved registers will only be saved and restored on an 
> as-needed basis, in the prologue/epilogue of any kernel's C functions that 
> actually use them.  And GCC will only use save and restore call-saved 
> registers using regular 32-bit or 64-bit operations, according to the ABI 
> the kernel has been compiled for.  So if there's a need to preserve the 
> upper 64-bit halves, then it has to be done explicitly, possibly in an 
> extra <asm/stackframe.h> macro.
> 
>  But all that is something for a later stage; right now I suggest that you 
> figure out what's causing registers to become clobbered and fix it there.

My thinking was simply that we could try to use the (already patched up)
128-bit infrastructure, that seems to work quite well, as a debug tool in
this particular case. I understand that merging it is another matter.

> > I'm using a BusyBox binary from the Debian-based Black Rhino distribution,
> > so I'm not entirely sure how it was compiled, and it might contain 64-bit
> > instructions that are not caught by the (unavailable) UX bit.
> 
>  Use `file' or `readelf -h' on the BusyBox binary to double-check the ABI 
> it has been built for.  Although I doubt there will be issues with the 
> executable, as it would crash on any of the other MIPS processors which 
> implement the 32-bit mode correctly.

As previously mentioned all binaries I've tested so far declare "ELF 32-bit
LSB, MIPS, MIPS-III version 1" with file. mipsel-linux-readelf says

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           MIPS R3000
  Version:                           0x1
  Entry point address:               0x402d90
  Start of program headers:          52 (bytes into file)
  Start of section headers:          276156 (bytes into file)
  Flags:                             0x20920003, noreorder, pic, unknown CPU, mips3
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         7
  Size of section headers:           40 (bytes)
  Number of section headers:         25
  Section header string table index: 24

but it's unclear to me whether it's generic or somehow tailored for R5900.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-21 18:11                 ` Paul Burton
  0 siblings, 0 replies; 117+ messages in thread
From: Paul Burton @ 2017-09-21 18:11 UTC (permalink / raw)
  To: Maciej W. Rozycki, Fredrik Noring; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]

Hi Maciej/Fredrik,

On Monday, 18 September 2017 10:05:56 PDT Maciej W. Rozycki wrote:
> Hi Fredrik,
> 
> > >  For the initial R5900 support I think there are two options here,
> > > 
> > > depending on what hardware supports:
> > > 
> > > 1. If (for binary compatibility reasons) 128-bit GPR support can somehow
> > > 
> > >    be disabled in hardware, by flipping a CP0 register bit or suchlike,
> > >    then I suggest doing that in the first stage.
> > 
> > Unfortunately I haven't found such a switch. There is also a set of
> > 128-bit
> > multimedia instructions to consider (GCC is perhaps unlikely to generate
> > those but assembly code is an option too).
> 
>  The usual minimal approach is to have compiler intrinsics implemented.
> 
> > > 2. Otherwise I think that the context initialisation/switch code has to
> > > be
> > > 
> > >    adjusted such that the upper GPR halves are set to a known state,
> > >    either zeroed or sign-extended from bit #63 (or #31 really, given the
> > >    initial 32-bit port only) according to hardware requirements, so as
> > >    to
> > >    make execution stable and prevent data from leaking between contexts.
> > > 
> > > Later on proper 128-bit support can be added, though for that to make
> > > sense you need to have compiler support too, which AFAICT is currently
> > > missing.  Myself I'd rather defer commenting on that further support
> > > until
> > > we get to it, although of course someone else might be willing to sketch
> > > an idea.
> > 
> > I have a working 32-bit kernel now, except that BusyBox randomly crashes
> > unless the kernel saves/restores 64-bit GPRs. The executables and
> > libraries
> > declare "ELF 32-bit LSB, MIPS, MIPS-III version 1" so in theory, I
> > suppose,
> > they ought to be 32-bit only. It is possible that the error lies in the
> > kernel handling of the GPRs but I have double-checked this in several
> > ways.
> 
>  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which
> the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently
> unsupported and is kept clear as well), which means that an attempt to use
> any instruction that affects register bits beyond bit #31 will cause a
> Reserved Instruction exception, and in turn SIGILL being sent to the
> program.

This isn't actually true - we currently set ST0_UX unconditionally if the 
kernel is built with CONFIG_64BIT=y. It doesn't matter whether a user program 
is MIPS32 or MIPS64 code, it always runs with UX=1. We also always save all 64 
bits of each GPR - not just the least significant 32 bits when running an o32 
program.

This means 32 bit user code could try using MIPS64 instructions if it wanted 
to, it would just probably be a bad idea.

I think this would be nice to change such that we had UX=PX=0 for o32 
programs, UX=0 PX=1 for n32 programs, UX=1 PX=x for n64 programs, but right 
now we just have UX=1 always for 64 bit kernels.

> > Are there other Linux MIPS implementations that reset GPRs like this?
> 
>  No, because keeping CP0.Status.UX clear guarantees that only instructions
> which sign-extend register results from bit #31 can be used.

I agree that there are no other implementations with this issue - but the 
reason isn't the UX bit, it's that instructions sign extend into wider GPRs 
and the kernel always saves at least as much of a GPR as user code can access.

Thanks,
    Paul

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-21 18:11                 ` Paul Burton
  0 siblings, 0 replies; 117+ messages in thread
From: Paul Burton @ 2017-09-21 18:11 UTC (permalink / raw)
  To: Maciej W. Rozycki, Fredrik Noring; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]

Hi Maciej/Fredrik,

On Monday, 18 September 2017 10:05:56 PDT Maciej W. Rozycki wrote:
> Hi Fredrik,
> 
> > >  For the initial R5900 support I think there are two options here,
> > > 
> > > depending on what hardware supports:
> > > 
> > > 1. If (for binary compatibility reasons) 128-bit GPR support can somehow
> > > 
> > >    be disabled in hardware, by flipping a CP0 register bit or suchlike,
> > >    then I suggest doing that in the first stage.
> > 
> > Unfortunately I haven't found such a switch. There is also a set of
> > 128-bit
> > multimedia instructions to consider (GCC is perhaps unlikely to generate
> > those but assembly code is an option too).
> 
>  The usual minimal approach is to have compiler intrinsics implemented.
> 
> > > 2. Otherwise I think that the context initialisation/switch code has to
> > > be
> > > 
> > >    adjusted such that the upper GPR halves are set to a known state,
> > >    either zeroed or sign-extended from bit #63 (or #31 really, given the
> > >    initial 32-bit port only) according to hardware requirements, so as
> > >    to
> > >    make execution stable and prevent data from leaking between contexts.
> > > 
> > > Later on proper 128-bit support can be added, though for that to make
> > > sense you need to have compiler support too, which AFAICT is currently
> > > missing.  Myself I'd rather defer commenting on that further support
> > > until
> > > we get to it, although of course someone else might be willing to sketch
> > > an idea.
> > 
> > I have a working 32-bit kernel now, except that BusyBox randomly crashes
> > unless the kernel saves/restores 64-bit GPRs. The executables and
> > libraries
> > declare "ELF 32-bit LSB, MIPS, MIPS-III version 1" so in theory, I
> > suppose,
> > they ought to be 32-bit only. It is possible that the error lies in the
> > kernel handling of the GPRs but I have double-checked this in several
> > ways.
> 
>  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which
> the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently
> unsupported and is kept clear as well), which means that an attempt to use
> any instruction that affects register bits beyond bit #31 will cause a
> Reserved Instruction exception, and in turn SIGILL being sent to the
> program.

This isn't actually true - we currently set ST0_UX unconditionally if the 
kernel is built with CONFIG_64BIT=y. It doesn't matter whether a user program 
is MIPS32 or MIPS64 code, it always runs with UX=1. We also always save all 64 
bits of each GPR - not just the least significant 32 bits when running an o32 
program.

This means 32 bit user code could try using MIPS64 instructions if it wanted 
to, it would just probably be a bad idea.

I think this would be nice to change such that we had UX=PX=0 for o32 
programs, UX=0 PX=1 for n32 programs, UX=1 PX=x for n64 programs, but right 
now we just have UX=1 always for 64 bit kernels.

> > Are there other Linux MIPS implementations that reset GPRs like this?
> 
>  No, because keeping CP0.Status.UX clear guarantees that only instructions
> which sign-extend register results from bit #31 can be used.

I agree that there are no other implementations with this issue - but the 
reason isn't the UX bit, it's that instructions sign extend into wider GPRs 
and the kernel always saves at least as much of a GPR as user code can access.

Thanks,
    Paul

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-21 19:48                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-21 19:48 UTC (permalink / raw)
  To: Paul Burton; +Cc: Fredrik Noring, linux-mips

Hi Paul,

> >  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which
> > the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently
> > unsupported and is kept clear as well), which means that an attempt to use
> > any instruction that affects register bits beyond bit #31 will cause a
> > Reserved Instruction exception, and in turn SIGILL being sent to the
> > program.
> 
> This isn't actually true - we currently set ST0_UX unconditionally if the 
> kernel is built with CONFIG_64BIT=y. It doesn't matter whether a user program 
> is MIPS32 or MIPS64 code, it always runs with UX=1. We also always save all 64 
> bits of each GPR - not just the least significant 32 bits when running an o32 
> program.

 I referred to plain 32-bit kernels (which I do acknowledge that I failed 
to communicate clearly, sorry), which is what we currently have under 
consideration (given the inability to support the generic case of an n64 
binary with the address space limitation of the R5900 processor), and 
these do keep CP0.Status.UX clear and thus the rest of your observation is 
irrelevant (though it will be once we get to 64-bit support).

 One aspect of the limitation is the R5900 does not support the XTLB 
refill handler or the CP0 XContext register, so once we get to supporting 
64-bit operation we'll have to maintain the TLB with the TLB refill 
handler and the CP0 Context register, which we currently don't with 64-bit 
kernels.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-21 19:48                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-21 19:48 UTC (permalink / raw)
  To: Paul Burton; +Cc: Fredrik Noring, linux-mips

Hi Paul,

> >  Virtually all 64-bit MIPS processors have the CP0.Status.UX bit, which
> > the Linux kernel keeps clear for o32 processes (CP0.Status.PX is currently
> > unsupported and is kept clear as well), which means that an attempt to use
> > any instruction that affects register bits beyond bit #31 will cause a
> > Reserved Instruction exception, and in turn SIGILL being sent to the
> > program.
> 
> This isn't actually true - we currently set ST0_UX unconditionally if the 
> kernel is built with CONFIG_64BIT=y. It doesn't matter whether a user program 
> is MIPS32 or MIPS64 code, it always runs with UX=1. We also always save all 64 
> bits of each GPR - not just the least significant 32 bits when running an o32 
> program.

 I referred to plain 32-bit kernels (which I do acknowledge that I failed 
to communicate clearly, sorry), which is what we currently have under 
consideration (given the inability to support the generic case of an n64 
binary with the address space limitation of the R5900 processor), and 
these do keep CP0.Status.UX clear and thus the rest of your observation is 
irrelevant (though it will be once we get to 64-bit support).

 One aspect of the limitation is the R5900 does not support the XTLB 
refill handler or the CP0 XContext register, so once we get to supporting 
64-bit operation we'll have to maintain the TLB with the TLB refill 
handler and the CP0 Context register, which we currently don't with 64-bit 
kernels.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-21 21:07                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-21 21:07 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Can you add a diagnostic consistency check to the context restoration 
> > code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
> > such as a `break 12' (BRK_BUG) instruction if a register value is not 
> > correctly sign-extended?
> 
> Hmm... I think some details need to be sorted out for this. The LW
> instruction used to restore registers sign-extends to register length by
> definition (p. A-70 in the TX79 manual), so I assume that isn't what we
> are going to check unless we suspect a grave hardware error with LW? (Do
> we need to check the register values immediately prior to LW?)

 The operation is only defined for bits 63:0 AFAICS.  IIUC bits 127:64 
remain unchanged (which is why I think that at the initial stage of R5900 
support they have to be explicitly set to a fixed value on a context 
switch, to prevent leaking information), but I have no means to verify it.

 In the interim to fix the value of bits 127:64 while keeping disruption 
to existing code at the minimum you could AFAICT use a sequence like:

	pcpyld	$1, $0, $1
	pcpyld	$2, $0, $2
#	...
	pcpyld	$31, $0, $31

in RESTORE_SOME, preferably via an auxiliary macro.  Once we have switched 
to saving/restoring full 128-bit registers, possibly with SQ/LQ, we can 
remove this temporary measure.

> Another possibility would be to check that saved registers in SAVE_ALL
> will be restored properly. That is, immediately after SW check that LW
> (to a temporary register such as k1) will restore to the same value by
> 64-bit comparison and trap if unequal (TNE). I thought that made sense.
> Something like for example
> 
> 	sw	$17, PT_R17(sp)
> 	lw	k1, PT_R17(sp)
> 	tne	k1, $17, 12
> 
> as a replacement for
> 
> 	LONG_S	$17, PT_R17(sp)
> 
> in SAVE_STATIC?

 This would verify whether the original contents of $17 were a properly 
sign-extended 32-bit value.  Although for predictable operation I would 
advise to use:

	sll	k1, $17, 0
	sw	k1, PT_R17(sp)
	lw	k1, PT_R17(sp)
	tne	k1, $17, 12

or simply:

	sll	k1, $17, 0
	tne	k1, $17, 12
	sw	$17, PT_R17(sp)

Previously you wrote that the problem is with resetting the upper 96 bits 
(how did you notice that BTW?) rather than bits 63:32 only, so you need a 
different check.  Also I see no reason why LW would set bits 63:32 to 
anything different from what was there before SW as long as the original 
value was 32-bit (hence the second check sequence proposed).

> A question is whether registers are clobbered within the kernel itself
> (via interrupts or some such) or for user programs.

 Well, you do need to verify your patches for such a possibility, right.  
I would advise double-checking exception handling indeed, including 
run-time generated exception handler code in particular.

 Unless there is an unhandled CPU erratum the userland does not clobber 
itself as o32 binaries are only supposed to have instructions that operate 
on 32-bit data.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-21 21:07                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-21 21:07 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Can you add a diagnostic consistency check to the context restoration 
> > code, i.e. all the macros called from RESTORE_ALL (in <asm/stackframe.h>), 
> > such as a `break 12' (BRK_BUG) instruction if a register value is not 
> > correctly sign-extended?
> 
> Hmm... I think some details need to be sorted out for this. The LW
> instruction used to restore registers sign-extends to register length by
> definition (p. A-70 in the TX79 manual), so I assume that isn't what we
> are going to check unless we suspect a grave hardware error with LW? (Do
> we need to check the register values immediately prior to LW?)

 The operation is only defined for bits 63:0 AFAICS.  IIUC bits 127:64 
remain unchanged (which is why I think that at the initial stage of R5900 
support they have to be explicitly set to a fixed value on a context 
switch, to prevent leaking information), but I have no means to verify it.

 In the interim to fix the value of bits 127:64 while keeping disruption 
to existing code at the minimum you could AFAICT use a sequence like:

	pcpyld	$1, $0, $1
	pcpyld	$2, $0, $2
#	...
	pcpyld	$31, $0, $31

in RESTORE_SOME, preferably via an auxiliary macro.  Once we have switched 
to saving/restoring full 128-bit registers, possibly with SQ/LQ, we can 
remove this temporary measure.

> Another possibility would be to check that saved registers in SAVE_ALL
> will be restored properly. That is, immediately after SW check that LW
> (to a temporary register such as k1) will restore to the same value by
> 64-bit comparison and trap if unequal (TNE). I thought that made sense.
> Something like for example
> 
> 	sw	$17, PT_R17(sp)
> 	lw	k1, PT_R17(sp)
> 	tne	k1, $17, 12
> 
> as a replacement for
> 
> 	LONG_S	$17, PT_R17(sp)
> 
> in SAVE_STATIC?

 This would verify whether the original contents of $17 were a properly 
sign-extended 32-bit value.  Although for predictable operation I would 
advise to use:

	sll	k1, $17, 0
	sw	k1, PT_R17(sp)
	lw	k1, PT_R17(sp)
	tne	k1, $17, 12

or simply:

	sll	k1, $17, 0
	tne	k1, $17, 12
	sw	$17, PT_R17(sp)

Previously you wrote that the problem is with resetting the upper 96 bits 
(how did you notice that BTW?) rather than bits 63:32 only, so you need a 
different check.  Also I see no reason why LW would set bits 63:32 to 
anything different from what was there before SW as long as the original 
value was 32-bit (hence the second check sequence proposed).

> A question is whether registers are clobbered within the kernel itself
> (via interrupts or some such) or for user programs.

 Well, you do need to verify your patches for such a possibility, right.  
I would advise double-checking exception handling indeed, including 
run-time generated exception handler code in particular.

 Unless there is an unhandled CPU erratum the userland does not clobber 
itself as o32 binaries are only supposed to have instructions that operate 
on 32-bit data.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-22 16:37                     ` Fredrik Noring
  0 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-09-22 16:37 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  The operation is only defined for bits 63:0 AFAICS.  IIUC bits 127:64 
> remain unchanged (which is why I think that at the initial stage of R5900 
> support they have to be explicitly set to a fixed value on a context 
> switch, to prevent leaking information), but I have no means to verify it.
> 
>  In the interim to fix the value of bits 127:64 while keeping disruption 
> to existing code at the minimum you could AFAICT use a sequence like:
> 
> 	pcpyld	$1, $0, $1
> 	pcpyld	$2, $0, $2
> #	...
> 	pcpyld	$31, $0, $31
> 
> in RESTORE_SOME, preferably via an auxiliary macro.  Once we have switched 
> to saving/restoring full 128-bit registers, possibly with SQ/LQ, we can 
> remove this temporary measure.

Sounds reasonable!

>  This would verify whether the original contents of $17 were a properly 
> sign-extended 32-bit value.  Although for predictable operation I would 
> advise to use:
> 
> 	sll	k1, $17, 0
> 	sw	k1, PT_R17(sp)
> 	lw	k1, PT_R17(sp)
> 	tne	k1, $17, 12
> 
> or simply:
> 
> 	sll	k1, $17, 0
> 	tne	k1, $17, 12
> 	sw	$17, PT_R17(sp)

There is a slight complication: the trap appears to be taken before the
console is ready, hence nothing is displayed. Is there a practical way
to postpone or recover from a trap? The issue becomes somewhat involved
since the trap needs to save/restore registers for itself to recover,
and so might evoke boundless recursion.

From a practical point of view it would be great if backtraces could be
rate limited, recoverable and possible to copy over network (I don't have
e.g. a serial port soldered). I will look into other alternatives to try
to capture this.

> Previously you wrote that the problem is with resetting the upper 96 bits 
> (how did you notice that BTW?) rather than bits 63:32 only, so you need a 
> different check.

I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD
is sufficient. Summery of observations thus far: save/restore works with
SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus.

> Also I see no reason why LW would set bits 63:32 to anything different
> from what was there before SW as long as the original value was 32-bit
> (hence the second check sequence proposed).

Yes, SLL seems sufficient for testing this.

> > A question is whether registers are clobbered within the kernel itself
> > (via interrupts or some such) or for user programs.
> 
>  Well, you do need to verify your patches for such a possibility, right.  
> I would advise double-checking exception handling indeed, including 
> run-time generated exception handler code in particular.

The extremely early trap indicates a kernel issue, or perhaps register
garbage during kernel initialisation, that wouldn't be an error? Is the
run-time code related to genex.S? The R5900 patch sprinkles NOP and
SYNC.P instructions on it, for various workarounds, but not much else
apart from reverting db8466c581c "MIPS: IRQ Stack: Unwind IRQ stack onto
task stack" that otherwise crashes for an unknown reason.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-22 16:37                     ` Fredrik Noring
  0 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-09-22 16:37 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  The operation is only defined for bits 63:0 AFAICS.  IIUC bits 127:64 
> remain unchanged (which is why I think that at the initial stage of R5900 
> support they have to be explicitly set to a fixed value on a context 
> switch, to prevent leaking information), but I have no means to verify it.
> 
>  In the interim to fix the value of bits 127:64 while keeping disruption 
> to existing code at the minimum you could AFAICT use a sequence like:
> 
> 	pcpyld	$1, $0, $1
> 	pcpyld	$2, $0, $2
> #	...
> 	pcpyld	$31, $0, $31
> 
> in RESTORE_SOME, preferably via an auxiliary macro.  Once we have switched 
> to saving/restoring full 128-bit registers, possibly with SQ/LQ, we can 
> remove this temporary measure.

Sounds reasonable!

>  This would verify whether the original contents of $17 were a properly 
> sign-extended 32-bit value.  Although for predictable operation I would 
> advise to use:
> 
> 	sll	k1, $17, 0
> 	sw	k1, PT_R17(sp)
> 	lw	k1, PT_R17(sp)
> 	tne	k1, $17, 12
> 
> or simply:
> 
> 	sll	k1, $17, 0
> 	tne	k1, $17, 12
> 	sw	$17, PT_R17(sp)

There is a slight complication: the trap appears to be taken before the
console is ready, hence nothing is displayed. Is there a practical way
to postpone or recover from a trap? The issue becomes somewhat involved
since the trap needs to save/restore registers for itself to recover,
and so might evoke boundless recursion.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-26 11:50                       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-26 11:50 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> > > The transition from 128-bit registers to 64-bit registers was easy (in a
> > > 32-bit kernel) by changing the LONGD_{L,S} macros in asm.h from quadword
> > > {L,S}Q to doubleword {L,S}D instructions, and changing pt_regs::regs[32]
> > > from r5900_reg_t to unsigned long long.
> > 
> >  But why did you have to change anything there in the first place?  All 
> > that's there is generic stuff.
> 
> The 128-bit register save/restore infrastructure is part of the original
> 2.6.35 patch for R5900 support, that I ported to 4.12 and we're about to
> disentangle. The original patch crashes similarly unless full 128-bit GPRs
> are handled in 32-bit or 64-bit kernels, so this particular issue appears
> to remain intact. Hopefully we will be able to figure out cause and fix.

 Ack.

 BTW I think that when we get to supporting 128-bit registers we want to 
avoid changing the definition of LONG_{L,S} macros, because these are used 
for purposes beyond context access.

 Instead I think these macros as well all the ones in <asm/stackframe.h> 
should remain unchanged and the save and restoration of the 64-bit upper 
halves done separately, most likely in `switch_to', which is where all the 
user context registers which like these upper halves are not touched by 
the kernel (and which are not handled lazily by other means) are switched.

 Having come to this conclusion I think the clearing of upper halves I 
have previously suggested for the initial stage will best be done in 
`switch_to' as well.

> >  But all that is something for a later stage; right now I suggest that you 
> > figure out what's causing registers to become clobbered and fix it there.
> 
> My thinking was simply that we could try to use the (already patched up)
> 128-bit infrastructure, that seems to work quite well, as a debug tool in
> this particular case. I understand that merging it is another matter.

 Sure, use whatever tools you have available at hand to debug.

> >  Use `file' or `readelf -h' on the BusyBox binary to double-check the ABI 
> > it has been built for.  Although I doubt there will be issues with the 
> > executable, as it would crash on any of the other MIPS processors which 
> > implement the 32-bit mode correctly.
> 
> As previously mentioned all binaries I've tested so far declare "ELF 32-bit
> LSB, MIPS, MIPS-III version 1" with file. mipsel-linux-readelf says
> 
> ELF Header:
>   Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
>   Class:                             ELF32
>   Data:                              2's complement, little endian
>   Version:                           1 (current)
>   OS/ABI:                            UNIX - System V
>   ABI Version:                       0
>   Type:                              EXEC (Executable file)
>   Machine:                           MIPS R3000
>   Version:                           0x1
>   Entry point address:               0x402d90
>   Start of program headers:          52 (bytes into file)
>   Start of section headers:          276156 (bytes into file)
>   Flags:                             0x20920003, noreorder, pic, unknown CPU, mips3
>   Size of this header:               52 (bytes)
>   Size of program headers:           32 (bytes)
>   Number of program headers:         7
>   Size of section headers:           40 (bytes)
>   Number of section headers:         25
>   Section header string table index: 24
> 
> but it's unclear to me whether it's generic or somehow tailored for R5900.

 Thanks for this dump.  The binary is indeed for the R5900, according to 
`Flags' having 0x92 in bits 23:16:

#define E_MIPS_MACH_5900	0x00920000

[which shows up as `unknown CPU' because the port submitter forgot to add 
decoding of this value to `readelf'; I've now fixed that upstream].  So it 
could be doing something odd if there was a bug somewhere in the toolchain 
used.

 Can you try a regular 32-bit MIPS Debian distribution instead?

 BTW, I have just noticed that DMULT, DMULTU, DDIV and DDIVU instructions 
are not implemented.  Which means that a 64-bit kernel will only work if 
compiled with `-march=r5900' and emulation is required for 64-bit user 
programs.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-26 11:50                       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-26 11:50 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> > > The transition from 128-bit registers to 64-bit registers was easy (in a
> > > 32-bit kernel) by changing the LONGD_{L,S} macros in asm.h from quadword
> > > {L,S}Q to doubleword {L,S}D instructions, and changing pt_regs::regs[32]
> > > from r5900_reg_t to unsigned long long.
> > 
> >  But why did you have to change anything there in the first place?  All 
> > that's there is generic stuff.
> 
> The 128-bit register save/restore infrastructure is part of the original
> 2.6.35 patch for R5900 support, that I ported to 4.12 and we're about to
> disentangle. The original patch crashes similarly unless full 128-bit GPRs
> are handled in 32-bit or 64-bit kernels, so this particular issue appears
> to remain intact. Hopefully we will be able to figure out cause and fix.

 Ack.

 BTW I think that when we get to supporting 128-bit registers we want to 
avoid changing the definition of LONG_{L,S} macros, because these are used 
for purposes beyond context access.

 Instead I think these macros as well all the ones in <asm/stackframe.h> 
should remain unchanged and the save and restoration of the 64-bit upper 
halves done separately, most likely in `switch_to', which is where all the 
user context registers which like these upper halves are not touched by 
the kernel (and which are not handled lazily by other means) are switched.

 Having come to this conclusion I think the clearing of upper halves I 
have previously suggested for the initial stage will best be done in 
`switch_to' as well.

> >  But all that is something for a later stage; right now I suggest that you 
> > figure out what's causing registers to become clobbered and fix it there.
> 
> My thinking was simply that we could try to use the (already patched up)
> 128-bit infrastructure, that seems to work quite well, as a debug tool in
> this particular case. I understand that merging it is another matter.

 Sure, use whatever tools you have available at hand to debug.

> >  Use `file' or `readelf -h' on the BusyBox binary to double-check the ABI 
> > it has been built for.  Although I doubt there will be issues with the 
> > executable, as it would crash on any of the other MIPS processors which 
> > implement the 32-bit mode correctly.
> 
> As previously mentioned all binaries I've tested so far declare "ELF 32-bit
> LSB, MIPS, MIPS-III version 1" with file. mipsel-linux-readelf says
> 
> ELF Header:
>   Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
>   Class:                             ELF32
>   Data:                              2's complement, little endian
>   Version:                           1 (current)
>   OS/ABI:                            UNIX - System V
>   ABI Version:                       0
>   Type:                              EXEC (Executable file)
>   Machine:                           MIPS R3000
>   Version:                           0x1
>   Entry point address:               0x402d90
>   Start of program headers:          52 (bytes into file)
>   Start of section headers:          276156 (bytes into file)
>   Flags:                             0x20920003, noreorder, pic, unknown CPU, mips3
>   Size of this header:               52 (bytes)
>   Size of program headers:           32 (bytes)
>   Number of program headers:         7
>   Size of section headers:           40 (bytes)
>   Number of section headers:         25
>   Section header string table index: 24
> 
> but it's unclear to me whether it's generic or somehow tailored for R5900.

 Thanks for this dump.  The binary is indeed for the R5900, according to 
`Flags' having 0x92 in bits 23:16:

#define E_MIPS_MACH_5900	0x00920000

[which shows up as `unknown CPU' because the port submitter forgot to add 
decoding of this value to `readelf'; I've now fixed that upstream].  So it 
could be doing something odd if there was a bug somewhere in the toolchain 
used.

 Can you try a regular 32-bit MIPS Debian distribution instead?

 BTW, I have just noticed that DMULT, DMULTU, DDIV and DDIVU instructions 
are not implemented.  Which means that a 64-bit kernel will only work if 
compiled with `-march=r5900' and emulation is required for 64-bit user 
programs.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-26 11:50                       ` Maciej W. Rozycki
  (?)
@ 2017-09-27 17:21                       ` Fredrik Noring
  2017-09-28 12:13                           ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-27 17:21 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  BTW I think that when we get to supporting 128-bit registers we want to 
> avoid changing the definition of LONG_{L,S} macros, because these are used 
> for purposes beyond context access.
> 
>  Instead I think these macros as well all the ones in <asm/stackframe.h> 
> should remain unchanged and the save and restoration of the 64-bit upper 
> halves done separately, most likely in `switch_to', which is where all the 
> user context registers which like these upper halves are not touched by 
> the kernel (and which are not handled lazily by other means) are switched.

Hmm... What about a 32-bit kernel and bits 63:32 sign-extended by kernel
instructions? LONG_{L,S} saves/restores 31:0 using LW/SW thus 63:32 will
be lost in exceptions?

>  Can you try a regular 32-bit MIPS Debian distribution instead?

BusyBox at

https://packages.debian.org/stretch/mipsel/busybox-static/download

seemed appropriate but yields "illegal instruction" which I suppose is
interesting in itself. My MIPS toolchain is somewhat limited at the moment
so I will need to get back on this.

>  BTW, I have just noticed that DMULT, DMULTU, DDIV and DDIVU instructions 
> are not implemented.  Which means that a 64-bit kernel will only work if 
> compiled with `-march=r5900' and emulation is required for 64-bit user 
> programs.

Indeed. In the R5900 patch these instructions are emulated (or simulated as
it is called in the source) in

https://github.com/frno7/linux/blob/1c8247e352d1eb7ae9022a76ecf19f74264534f7/arch/mips/kernel/traps.c

along with LLD, SCD, etc.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-28 12:13                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-28 12:13 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Instead I think these macros as well all the ones in <asm/stackframe.h> 
> > should remain unchanged and the save and restoration of the 64-bit upper 
> > halves done separately, most likely in `switch_to', which is where all the 
> > user context registers which like these upper halves are not touched by 
> > the kernel (and which are not handled lazily by other means) are switched.
> 
> Hmm... What about a 32-bit kernel and bits 63:32 sign-extended by kernel
> instructions? LONG_{L,S} saves/restores 31:0 using LW/SW thus 63:32 will
> be lost in exceptions?

 You mean for use with MMI instructions?  Offhand I think we have two 
options:

1. Declaring the lack of support for MMI instructions in o32 software.

2. Switching to using LD/SD in <asm/stackframe.h> and preserving statics 
   across syscalls with SAVE_STATIC/RESTORE_STATIC at the cost of
   performance loss.

I'm open for a better suggestion though.  I propose that we start with #1, 
as the zero performance cost and zero effort solution.

> >  Can you try a regular 32-bit MIPS Debian distribution instead?
> 
> BusyBox at
> 
> https://packages.debian.org/stretch/mipsel/busybox-static/download
> 
> seemed appropriate but yields "illegal instruction" which I suppose is
> interesting in itself. My MIPS toolchain is somewhat limited at the moment
> so I will need to get back on this.

 Getting a core dump and using it to figure out which specific instruction 
caused the exception would be interesting.  Also make sure you have RDHWR 
instruction emulation in place for CP0 UserLocal register access.

> >  BTW, I have just noticed that DMULT, DMULTU, DDIV and DDIVU instructions 
> > are not implemented.  Which means that a 64-bit kernel will only work if 
> > compiled with `-march=r5900' and emulation is required for 64-bit user 
> > programs.
> 
> Indeed. In the R5900 patch these instructions are emulated (or simulated as
> it is called in the source) in
> 
> https://github.com/frno7/linux/blob/1c8247e352d1eb7ae9022a76ecf19f74264534f7/arch/mips/kernel/traps.c
> 
> along with LLD, SCD, etc.

 Ah, OK then.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-28 12:13                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-28 12:13 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Instead I think these macros as well all the ones in <asm/stackframe.h> 
> > should remain unchanged and the save and restoration of the 64-bit upper 
> > halves done separately, most likely in `switch_to', which is where all the 
> > user context registers which like these upper halves are not touched by 
> > the kernel (and which are not handled lazily by other means) are switched.
> 
> Hmm... What about a 32-bit kernel and bits 63:32 sign-extended by kernel
> instructions? LONG_{L,S} saves/restores 31:0 using LW/SW thus 63:32 will
> be lost in exceptions?

 You mean for use with MMI instructions?  Offhand I think we have two 
options:

1. Declaring the lack of support for MMI instructions in o32 software.

2. Switching to using LD/SD in <asm/stackframe.h> and preserving statics 
   across syscalls with SAVE_STATIC/RESTORE_STATIC at the cost of
   performance loss.

I'm open for a better suggestion though.  I propose that we start with #1, 
as the zero performance cost and zero effort solution.

> >  Can you try a regular 32-bit MIPS Debian distribution instead?
> 
> BusyBox at
> 
> https://packages.debian.org/stretch/mipsel/busybox-static/download
> 
> seemed appropriate but yields "illegal instruction" which I suppose is
> interesting in itself. My MIPS toolchain is somewhat limited at the moment
> so I will need to get back on this.

 Getting a core dump and using it to figure out which specific instruction 
caused the exception would be interesting.  Also make sure you have RDHWR 
instruction emulation in place for CP0 UserLocal register access.

> >  BTW, I have just noticed that DMULT, DMULTU, DDIV and DDIVU instructions 
> > are not implemented.  Which means that a 64-bit kernel will only work if 
> > compiled with `-march=r5900' and emulation is required for 64-bit user 
> > programs.
> 
> Indeed. In the R5900 patch these instructions are emulated (or simulated as
> it is called in the source) in
> 
> https://github.com/frno7/linux/blob/1c8247e352d1eb7ae9022a76ecf19f74264534f7/arch/mips/kernel/traps.c
> 
> along with LLD, SCD, etc.

 Ah, OK then.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-29 23:55                       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-29 23:55 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  This would verify whether the original contents of $17 were a properly 
> > sign-extended 32-bit value.  Although for predictable operation I would 
> > advise to use:
> > 
> > 	sll	k1, $17, 0
> > 	sw	k1, PT_R17(sp)
> > 	lw	k1, PT_R17(sp)
> > 	tne	k1, $17, 12
> > 
> > or simply:
> > 
> > 	sll	k1, $17, 0
> > 	tne	k1, $17, 12
> > 	sw	$17, PT_R17(sp)
> 
> There is a slight complication: the trap appears to be taken before the
> console is ready, hence nothing is displayed. Is there a practical way
> to postpone or recover from a trap? The issue becomes somewhat involved
> since the trap needs to save/restore registers for itself to recover,
> and so might evoke boundless recursion.

 You can use a static variable to hold a flag preventing the diagnostic 
check from failing more than once, avoiding recursion.  Just check it here 
before doing actual verification and set it at the beginning of the Trap 
exception handler in arch/mips/kernel/genex.S.

> From a practical point of view it would be great if backtraces could be
> rate limited, recoverable and possible to copy over network (I don't have
> e.g. a serial port soldered). I will look into other alternatives to try
> to capture this.

 You can halt mid-way through `show_registers' to limit output if all you 
have is the virtual terminal and you have to copy information by hand.  
Later on in bootstrap you have the netconsole available; see 
Documentation/networking/netconsole.txt for details (I have never used 
that myself though).

> > Previously you wrote that the problem is with resetting the upper 96 bits 
> > (how did you notice that BTW?) rather than bits 63:32 only, so you need a 
> > different check.
> 
> I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD
> is sufficient. Summery of observations thus far: save/restore works with
> SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus.

 This does look intriguing.

> >  Well, you do need to verify your patches for such a possibility, right.  
> > I would advise double-checking exception handling indeed, including 
> > run-time generated exception handler code in particular.
> 
> The extremely early trap indicates a kernel issue, or perhaps register
> garbage during kernel initialisation, that wouldn't be an error? Is the
> run-time code related to genex.S? The R5900 patch sprinkles NOP and
> SYNC.P instructions on it, for various workarounds, but not much else
> apart from reverting db8466c581c "MIPS: IRQ Stack: Unwind IRQ stack onto
> task stack" that otherwise crashes for an unknown reason.

 You cannot assume the firmware leaves properly sign-extended 32-bit 
values in registers upon the kernel entry.  I advise truncating the 
contents of registers (with SLL by 0) at the beginning of `kernel_entry' 
in arch/mips/kernel/head.S for the purpose of avoiding spurious check 
triggers in the course of this debugging effort.

 HTH,

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-09-29 23:55                       ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-09-29 23:55 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  This would verify whether the original contents of $17 were a properly 
> > sign-extended 32-bit value.  Although for predictable operation I would 
> > advise to use:
> > 
> > 	sll	k1, $17, 0
> > 	sw	k1, PT_R17(sp)
> > 	lw	k1, PT_R17(sp)
> > 	tne	k1, $17, 12
> > 
> > or simply:
> > 
> > 	sll	k1, $17, 0
> > 	tne	k1, $17, 12
> > 	sw	$17, PT_R17(sp)
> 
> There is a slight complication: the trap appears to be taken before the
> console is ready, hence nothing is displayed. Is there a practical way
> to postpone or recover from a trap? The issue becomes somewhat involved
> since the trap needs to save/restore registers for itself to recover,
> and so might evoke boundless recursion.

 You can use a static variable to hold a flag preventing the diagnostic 
check from failing more than once, avoiding recursion.  Just check it here 
before doing actual verification and set it at the beginning of the Trap 
exception handler in arch/mips/kernel/genex.S.

> From a practical point of view it would be great if backtraces could be
> rate limited, recoverable and possible to copy over network (I don't have
> e.g. a serial port soldered). I will look into other alternatives to try
> to capture this.

 You can halt mid-way through `show_registers' to limit output if all you 
have is the virtual terminal and you have to copy information by hand.  
Later on in bootstrap you have the netconsole available; see 
Documentation/networking/netconsole.txt for details (I have never used 
that myself though).

> > Previously you wrote that the problem is with resetting the upper 96 bits 
> > (how did you notice that BTW?) rather than bits 63:32 only, so you need a 
> > different check.
> 
> I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD
> is sufficient. Summery of observations thus far: save/restore works with
> SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus.

 This does look intriguing.

> >  Well, you do need to verify your patches for such a possibility, right.  
> > I would advise double-checking exception handling indeed, including 
> > run-time generated exception handler code in particular.
> 
> The extremely early trap indicates a kernel issue, or perhaps register
> garbage during kernel initialisation, that wouldn't be an error? Is the
> run-time code related to genex.S? The R5900 patch sprinkles NOP and
> SYNC.P instructions on it, for various workarounds, but not much else
> apart from reverting db8466c581c "MIPS: IRQ Stack: Unwind IRQ stack onto
> task stack" that otherwise crashes for an unknown reason.

 You cannot assume the firmware leaves properly sign-extended 32-bit 
values in registers upon the kernel entry.  I advise truncating the 
contents of registers (with SLL by 0) at the beginning of `kernel_entry' 
in arch/mips/kernel/head.S for the purpose of avoiding spurious check 
triggers in the course of this debugging effort.

 HTH,

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-28 12:13                           ` Maciej W. Rozycki
  (?)
@ 2017-09-30  6:56                           ` Fredrik Noring
  2017-10-02  9:05                               ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-30  6:56 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> > Hmm... What about a 32-bit kernel and bits 63:32 sign-extended by kernel
> > instructions? LONG_{L,S} saves/restores 31:0 using LW/SW thus 63:32 will
> > be lost in exceptions?
> 
>  You mean for use with MMI instructions?  Offhand I think we have two 
> options:
> 
> 1. Declaring the lack of support for MMI instructions in o32 software.
> 
> 2. Switching to using LD/SD in <asm/stackframe.h> and preserving statics 
>    across syscalls with SAVE_STATIC/RESTORE_STATIC at the cost of
>    performance loss.
> 
> I'm open for a better suggestion though.  I propose that we start with #1, 
> as the zero performance cost and zero effort solution.

Sure. I would eventually like to explore solutions to efficiently support
the full register range in applications with both 32- and 64-bit kernels.

> > BusyBox at
> > 
> > https://packages.debian.org/stretch/mipsel/busybox-static/download
> > 
> > seemed appropriate but yields "illegal instruction" which I suppose is
> > interesting in itself. My MIPS toolchain is somewhat limited at the moment
> > so I will need to get back on this.
> 
>  Getting a core dump and using it to figure out which specific instruction 
> caused the exception would be interesting.

It's 72308802 as in "mul s1,s1,s0" which I believe is the DSP enhancement
multiplication with register write in the MIPS32 architecture. The R5900
doesn't have those DSP instructions, as far as I can tell.

For this reason the R5900 patch modifies the __{save,restore}_dsp macros,
mips_dsp_state::dspcontrol, DSP_INIT, sigcontext32::sc_dsp, etc. I've seen
the cpu_has_dsp macro too, but haven't looked at the details of this yet.

Considering the lack of DSP instructions, would you know any commonly
compiled mipsel distribution that could be made compatible with the R5900
in a reasonable manner? I suppose Gentoo has an advantage here, given the
ability to supply R5900 compilation flags.

> Also make sure you have RDHWR instruction emulation in place for CP0
> UserLocal register access.

Right. Debian's BusyBox has 857 of those. Jürgen Urban observed in the
conversation with you in

https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html

that RDHWR has the same encoding as "sq v1,-6085(zero)" for the R5900,
which luckily always gives an alignment exception so that the kernel is
able to emulate RDHWR properly. I haven't verified this though.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-29 23:55                       ` Maciej W. Rozycki
  (?)
@ 2017-09-30 18:26                       ` Fredrik Noring
  2017-10-02  9:11                           ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-09-30 18:26 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> > I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD
> > is sufficient. Summery of observations thus far: save/restore works with
> > SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus.
> 
>  This does look intriguing.

I believe the simple answer to this mystery is that addresses are not
supposed to be sign-extended, given the look of $31 below. What are
your thoughts on this?

To delay and recover from the trap I chose printk_delay_msec as an
(arbitrary) trigger with "echo 1 >/proc/sys/kernel/printk_delay".

.macro check_reg reg
	.extern	printk_delay_msec
	sll	$27, \reg, 0
	beq	$27, \reg, 1f
	nop
	lw	$27, printk_delay_msec
	beq	$27, $0, 1f
	nop
	.set	at=$27
	sw	$0, printk_delay_msec
	.set	noat
	break	12
	nop
1:
.endm

Break instruction in kernel code[#1]:
CPU: 0 PID: 94 Comm: echo Not tainted 4.12.0+ #510
task: 81fb5500 task.stack: 81f70000
$ 0   :
 0000000000000000
 0000000000000020
 ffffffff8100a874
 ffffffff800beeb0
$ 4   :
 ffffffffffffffff
 000000000fb60000
 000000000082cb49
 0000000000000001
$ 8   :
 0000000000000875
 ffffffff80260684
 ffffffff80539f0c
 ffffffffffffff80
$12   :
 000000007f9582e0
 0000000077f1a2d0
 0000000000000000
 0000000000000000
$16   :
 ffffffff81500d80
 ffffffff81020be0
 ffffffff81f71dd8
 000000000fb79000
$20   :
 ffffffff805150c0
 000000000fb61000
 ffffffff814e59f8
 000000000fb60000
$24   :
 0000000000000000
 0000000077e4d67c                  
$28   :
 ffffffff81f70000
 ffffffff81f71bf8
 ffffffff815010f8
 00000000800bed80
Hi    : 00000000
Lo    : 00000048
epc   : 800beeb0 unmap_page_range+0x3cc/0x664
ra    : 00000000800bed80 unmap_page_range+0x29c/0x664
Status: 30018c03	KERNEL EXL IE 
Cause : 50000424 (ExcCode 09)
PrId  : 00002e42 (R5900)
Modules linked in:
Process echo (pid: 94, threadinfo=81f70000, task=81fb5500, tls=00adc0e0)
Stack : 8053a2e8 00000000 00000001 00000000 00000000 80520000 00000000 30018c01
        00000000 00000000 001fffff 80525f60 8100a874 ffffffff ffffffff ffffffff
        ffffffff ffffffff 0fb60000 00000000 0082cb49 00000000 00000001 00000000
        01000200 8025bfcc 014200ca 00000001 81f71c68 81f71c68 00000001 00000000
        00000002 81f71cb0 81f71d10 8025c010 00000000 30018c01 8053a2e8 00000000
        ...
Call Trace:
[<800beeb0>] unmap_page_range+0x3cc/0x664
[<8025bfcc>] simple_strtoull+0x34/0x68
[<8025c010>] simple_strtoul+0x10/0x1c
Code: 1220ff6b  ae440008  8e220014 <30430001> 2442ffff  0223100a  8c420004  30420001  14400013 

---[ end trace e89f1298cab4fd73 ]---
Fixing recursive fault but reboot is needed!

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-02  9:05                               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-02  9:05 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Getting a core dump and using it to figure out which specific instruction 
> > caused the exception would be interesting.
> 
> It's 72308802 as in "mul s1,s1,s0" which I believe is the DSP enhancement
> multiplication with register write in the MIPS32 architecture. The R5900
> doesn't have those DSP instructions, as far as I can tell.

 Umm, has Debian switched to MIPS32 as the base architecture?  That would 
be unfortunate, they used to support MIPS I or at worst MIPS II (ISTR 
voices to switch to the latter).  There's still plenty of MIPS III 
hardware around so for 32-bit support I would consider MIPS II the common 
denominator (the sole difference between MIPS II and MIPS III is 64-bit 
support).

 In any case you'll have to find a MIPS I or MIPS II distribution, like an 
older version of Debian.

 The three-argument MUL is a part of the base MIPS32 architecture BTW, 
originating from the IDT R4650 and the NEC Vr5500 processors.  It has 
nothing to do with the DSP ASE (though it may have been claimed originally 
to be a DSP enhancement).

> For this reason the R5900 patch modifies the __{save,restore}_dsp macros,
> mips_dsp_state::dspcontrol, DSP_INIT, sigcontext32::sc_dsp, etc. I've seen
> the cpu_has_dsp macro too, but haven't looked at the details of this yet.

 Given that the R5900 does not expand DSP support anyhow that sounds 
suspicious to me.

> Considering the lack of DSP instructions, would you know any commonly
> compiled mipsel distribution that could be made compatible with the R5900
> in a reasonable manner? I suppose Gentoo has an advantage here, given the
> ability to supply R5900 compilation flags.

 I don't know.  According to <https://www.debian.org/ports/mips/>:

"Debian GNU/Linux 8.9 supports the following machines:

  * SGI Indy with R4x00 and R5000 CPUs, and Indigo2 with R4400 CPU (IP22).
  * SGI O2 with R5000, R5200 and RM7000 CPU (IP32).
  * Broadcom BCM91250A (SWARM) evaluation board (big and little-endian).
  * MIPS Malta boards (big and little-endian, 32 and 64-bit).
  * Loongson 2e and 2f machines, including the Yeelong laptop (little-endian).
  * Loongson 3 machines (little-endian).
  * Cavium Octeon (big-endian).

In addition to the above machines, it is possible to use Debian on a lot 
more machines provided that a non-Debian kernel is used.  This is for 
example the case of the MIPS Creator Ci20 development board."

so your observation looks like a result of a package compilation bug in 
the program which crashed.  Among the systems listed above there are many 
which only support the MIPS III ISA (R4x00, R4400 and Loongson 2e CPUs) or 
the MIPS IV ISA (R5000, R5200, RM7000 CPUs).  I see the DECstation port 
along with R2000 and R3000 CPU support has been removed, so I gather the 
baseline is indeed supposed to be MIPS II.

 I haven't used Gentoo, but I'm told you can choose your compilation flags 
as you like with it.

> > Also make sure you have RDHWR instruction emulation in place for CP0
> > UserLocal register access.
> 
> Right. Debian's BusyBox has 857 of those. Jürgen Urban observed in the
> conversation with you in
> 
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> 
> that RDHWR has the same encoding as "sq v1,-6085(zero)" for the R5900,
> which luckily always gives an alignment exception so that the kernel is
> able to emulate RDHWR properly. I haven't verified this though.

 That instruction encoding (actually implemented by some MIPS32r2/MIPS64r2 
and newer hardware) is used under Linux for Thread Local Storage (TLS) 
access.  For hardware that does not have it the instruction is emulated in 
the Reserved Instruction (RI) exception handler, but obviously not the 
Address Error Store (AdES) exception.  So code to handle it as a special 
case with the R5900 has to be provided among the patches (and included 
with the initial series).

 Note that `rdhwr $3,$29' is the usual encoding, handled by a fastpath in 
arch/mips/kernel/genex.S (see `handle_ri_rdhwr'), however all `rt' 
encodings (covered in `simulate_rdhwr' in arch/mips/kernel/traps.c) have 
to be handled for completeness.  Fortunately RDHWR and SQ both use the 
same bits for `rt', and the `-6085(zero)' encoding of the memory reference 
makes no sense, so we can safely rely on the AdES exception.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-02  9:05                               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-02  9:05 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> >  Getting a core dump and using it to figure out which specific instruction 
> > caused the exception would be interesting.
> 
> It's 72308802 as in "mul s1,s1,s0" which I believe is the DSP enhancement
> multiplication with register write in the MIPS32 architecture. The R5900
> doesn't have those DSP instructions, as far as I can tell.

 Umm, has Debian switched to MIPS32 as the base architecture?  That would 
be unfortunate, they used to support MIPS I or at worst MIPS II (ISTR 
voices to switch to the latter).  There's still plenty of MIPS III 
hardware around so for 32-bit support I would consider MIPS II the common 
denominator (the sole difference between MIPS II and MIPS III is 64-bit 
support).

 In any case you'll have to find a MIPS I or MIPS II distribution, like an 
older version of Debian.

 The three-argument MUL is a part of the base MIPS32 architecture BTW, 
originating from the IDT R4650 and the NEC Vr5500 processors.  It has 
nothing to do with the DSP ASE (though it may have been claimed originally 
to be a DSP enhancement).

> For this reason the R5900 patch modifies the __{save,restore}_dsp macros,
> mips_dsp_state::dspcontrol, DSP_INIT, sigcontext32::sc_dsp, etc. I've seen
> the cpu_has_dsp macro too, but haven't looked at the details of this yet.

 Given that the R5900 does not expand DSP support anyhow that sounds 
suspicious to me.

> Considering the lack of DSP instructions, would you know any commonly
> compiled mipsel distribution that could be made compatible with the R5900
> in a reasonable manner? I suppose Gentoo has an advantage here, given the
> ability to supply R5900 compilation flags.

 I don't know.  According to <https://www.debian.org/ports/mips/>:

"Debian GNU/Linux 8.9 supports the following machines:

  * SGI Indy with R4x00 and R5000 CPUs, and Indigo2 with R4400 CPU (IP22).
  * SGI O2 with R5000, R5200 and RM7000 CPU (IP32).
  * Broadcom BCM91250A (SWARM) evaluation board (big and little-endian).
  * MIPS Malta boards (big and little-endian, 32 and 64-bit).
  * Loongson 2e and 2f machines, including the Yeelong laptop (little-endian).
  * Loongson 3 machines (little-endian).
  * Cavium Octeon (big-endian).

In addition to the above machines, it is possible to use Debian on a lot 
more machines provided that a non-Debian kernel is used.  This is for 
example the case of the MIPS Creator Ci20 development board."

so your observation looks like a result of a package compilation bug in 
the program which crashed.  Among the systems listed above there are many 
which only support the MIPS III ISA (R4x00, R4400 and Loongson 2e CPUs) or 
the MIPS IV ISA (R5000, R5200, RM7000 CPUs).  I see the DECstation port 
along with R2000 and R3000 CPU support has been removed, so I gather the 
baseline is indeed supposed to be MIPS II.

 I haven't used Gentoo, but I'm told you can choose your compilation flags 
as you like with it.

> > Also make sure you have RDHWR instruction emulation in place for CP0
> > UserLocal register access.
> 
> Right. Debian's BusyBox has 857 of those. Jürgen Urban observed in the
> conversation with you in
> 
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> 
> that RDHWR has the same encoding as "sq v1,-6085(zero)" for the R5900,
> which luckily always gives an alignment exception so that the kernel is
> able to emulate RDHWR properly. I haven't verified this though.

 That instruction encoding (actually implemented by some MIPS32r2/MIPS64r2 
and newer hardware) is used under Linux for Thread Local Storage (TLS) 
access.  For hardware that does not have it the instruction is emulated in 
the Reserved Instruction (RI) exception handler, but obviously not the 
Address Error Store (AdES) exception.  So code to handle it as a special 
case with the R5900 has to be provided among the patches (and included 
with the initial series).

 Note that `rdhwr $3,$29' is the usual encoding, handled by a fastpath in 
arch/mips/kernel/genex.S (see `handle_ri_rdhwr'), however all `rt' 
encodings (covered in `simulate_rdhwr' in arch/mips/kernel/traps.c) have 
to be handled for completeness.  Fortunately RDHWR and SQ both use the 
same bits for `rt', and the `-6085(zero)' encoding of the memory reference 
makes no sense, so we can safely rely on the AdES exception.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-02  9:11                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-02  9:11 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> > > I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD
> > > is sufficient. Summery of observations thus far: save/restore works with
> > > SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus.
> > 
> >  This does look intriguing.
> 
> I believe the simple answer to this mystery is that addresses are not
> supposed to be sign-extended, given the look of $31 below. What are
> your thoughts on this?
[...]
> $28   :
>  ffffffff81f70000
>  ffffffff81f71bf8
>  ffffffff815010f8
>  00000000800bed80
> Hi    : 00000000
> Lo    : 00000048
> epc   : 800beeb0 unmap_page_range+0x3cc/0x664
> ra    : 00000000800bed80 unmap_page_range+0x29c/0x664

 Hmm, this looks consistent with the TX79 manual:

"6.2.1 Virtual Address Space

The C790 only implements 32 bits of virtual address space.  There is no 
requirement for address sign extension and no checking will be done on the 
upper 32 bits of the address."

and then say in the JAL instruction description:

"I: GPR[31] 63..0 <- zero_extend (PC + 8)"

It does not matter for the user mode where bit #31 is 0 and therefore both 
zero-extension and sign-extension produce the same result, so the typical 
PIC code sequence used to determine its own location, i.e.:

	la	$2, 0f
	bltzal	$0, 0f
0:
	subu	$2, $31, $2

will work correctly, not causing UB with the SUBU instruction.

 However it does cause complications for the kernel in that the value of 
$ra retrieved cannot be readily used for 32-bit calculations and has to be 
treated with SLL by 0 first.  You'll have to audit the arch/mips subtree 
for any such $ra use for calculation; hopefully are there's none.

 I wonder why they broke it like this -- was it a silly deliberate choice 
or merely an oversight (erratum) they chose to document rather than fix? 
For a change they do implement MFC0 with sign-extension, so retrieving 
e.g. CP0.EPC will see kernel addresses correctly sign-extended.

 Anyway, as noted above that shouldn't cause a problem with user software 
and I think that any corruption you can see comes from elsewhere.  You'll 
have to paper this $ra non-sign-extension issue over somehow to proceed 
though.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-02  9:11                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-02  9:11 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> > > I suspect 63:32 are the critical bits of the upper 96 bits since SD/LD
> > > is sufficient. Summery of observations thus far: save/restore works with
> > > SQ/LQ and SD/LD, but not SW/LW, in a 32-bit kernel ceteris paribus.
> > 
> >  This does look intriguing.
> 
> I believe the simple answer to this mystery is that addresses are not
> supposed to be sign-extended, given the look of $31 below. What are
> your thoughts on this?
[...]
> $28   :
>  ffffffff81f70000
>  ffffffff81f71bf8
>  ffffffff815010f8
>  00000000800bed80
> Hi    : 00000000
> Lo    : 00000048
> epc   : 800beeb0 unmap_page_range+0x3cc/0x664
> ra    : 00000000800bed80 unmap_page_range+0x29c/0x664

 Hmm, this looks consistent with the TX79 manual:

"6.2.1 Virtual Address Space

The C790 only implements 32 bits of virtual address space.  There is no 
requirement for address sign extension and no checking will be done on the 
upper 32 bits of the address."

and then say in the JAL instruction description:

"I: GPR[31] 63..0 <- zero_extend (PC + 8)"

It does not matter for the user mode where bit #31 is 0 and therefore both 
zero-extension and sign-extension produce the same result, so the typical 
PIC code sequence used to determine its own location, i.e.:

	la	$2, 0f
	bltzal	$0, 0f
0:
	subu	$2, $31, $2

will work correctly, not causing UB with the SUBU instruction.

 However it does cause complications for the kernel in that the value of 
$ra retrieved cannot be readily used for 32-bit calculations and has to be 
treated with SLL by 0 first.  You'll have to audit the arch/mips subtree 
for any such $ra use for calculation; hopefully are there's none.

 I wonder why they broke it like this -- was it a silly deliberate choice 
or merely an oversight (erratum) they chose to document rather than fix? 
For a change they do implement MFC0 with sign-extension, so retrieving 
e.g. CP0.EPC will see kernel addresses correctly sign-extended.

 Anyway, as noted above that shouldn't cause a problem with user software 
and I think that any corruption you can see comes from elsewhere.  You'll 
have to paper this $ra non-sign-extension issue over somehow to proceed 
though.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-02  9:05                               ` Maciej W. Rozycki
  (?)
@ 2017-10-02 16:33                               ` Fredrik Noring
  -1 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-10-02 16:33 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  In any case you'll have to find a MIPS I or MIPS II distribution, like an 
> older version of Debian.

I will see what I can find. I currently have Black Rhino based on Debian
and tailored for the R5900, as you verified.

>  The three-argument MUL is a part of the base MIPS32 architecture BTW, 
> originating from the IDT R4650 and the NEC Vr5500 processors.  It has 
> nothing to do with the DSP ASE (though it may have been claimed originally 
> to be a DSP enhancement).

Ah, I was referring to a press release that appears to be written by
MIPS Technologies (linked from Wikipedia):

    In addition, DSP enhancements such as multiply (MUL) and multiply and
    add (MADD) instructions have been standardized and added to the new
    architectures. While these instructions have been available as
    licensee-specific extensions in the past, this is the first time that
    they have been available as a standard part of the MIPS(R) architecture.

https://www.thefreelibrary.com/MIPS+Technologies,+Inc.+Enhances+Architecture+to+Support+Growing+Need...-a054531136

> > For this reason the R5900 patch modifies the __{save,restore}_dsp macros,
> > mips_dsp_state::dspcontrol, DSP_INIT, sigcontext32::sc_dsp, etc. I've seen
> > the cpu_has_dsp macro too, but haven't looked at the details of this yet.
> 
>  Given that the R5900 does not expand DSP support anyhow that sounds 
> suspicious to me.

Yes, I agree.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-02  9:11                           ` Maciej W. Rozycki
  (?)
@ 2017-10-03 19:49                           ` Fredrik Noring
  2017-10-05 19:04                             ` Fredrik Noring
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-10-03 19:49 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  Hmm, this looks consistent with the TX79 manual:
> 
> "6.2.1 Virtual Address Space
> 
> The C790 only implements 32 bits of virtual address space.  There is no 
> requirement for address sign extension and no checking will be done on the 
> upper 32 bits of the address."
> 
> and then say in the JAL instruction description:
> 
> "I: GPR[31] 63..0 <- zero_extend (PC + 8)"
> 
> It does not matter for the user mode where bit #31 is 0 and therefore both 
> zero-extension and sign-extension produce the same result, so the typical 
> PIC code sequence used to determine its own location, i.e.:
> 
> 	la	$2, 0f
> 	bltzal	$0, 0f
> 0:
> 	subu	$2, $31, $2
> 
> will work correctly, not causing UB with the SUBU instruction.
> 
>  However it does cause complications for the kernel in that the value of 
> $ra retrieved cannot be readily used for 32-bit calculations and has to be 
> treated with SLL by 0 first.  You'll have to audit the arch/mips subtree 
> for any such $ra use for calculation; hopefully are there's none.
> 
>  I wonder why they broke it like this -- was it a silly deliberate choice 
> or merely an oversight (erratum) they chose to document rather than fix? 
> For a change they do implement MFC0 with sign-extension, so retrieving 
> e.g. CP0.EPC will see kernel addresses correctly sign-extended.

After some further tests, it appears that for $ra, save/restore works with
both SW/LW and SW/LWU. Hence, $ra bits 63:32 do not seem to matter at all
(as intended), and its sign-extension failure can therefore be disregarded.
This is somewhat non-obvious since $ra is the only register that fails to
sign-extend after kernel initialisation (and consequently trigger a trap).

>  Anyway, as noted above that shouldn't cause a problem with user software 
> and I think that any corruption you can see comes from elsewhere.  You'll 
> have to paper this $ra non-sign-extension issue over somehow to proceed 
> though.

During early kernel initialisation at least one other register (besides
$ra) appears to fail the sign-extension test, and the error cannot be
ignored. I will now try figure this out.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-03 19:49                           ` Fredrik Noring
@ 2017-10-05 19:04                             ` Fredrik Noring
  0 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-10-05 19:04 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> >  Anyway, as noted above that shouldn't cause a problem with user software
> > and I think that any corruption you can see comes from elsewhere.  You'll
> > have to paper this $ra non-sign-extension issue over somehow to proceed
> > though.
> 
> During early kernel initialisation at least one other register (besides
> $ra) appears to fail the sign-extension test, and the error cannot be
> ignored. I will now try figure this out.

Trap 12 is causing kernel inconvenience resulting in "kernel bug" crashes
when triggering sign-extension failures for the remaining registers. As a
less intrusive measure I've collected some statistics:

Failed registers are marked by bit position in 0x8000FF74, hence 
registers $v0, $a0-$a2, $t0-$t7 and $ra fail the sign-extension test.

Problems are present during kernel initialisation, so either 1) the R5900
is not really 32-bit compatible, 2) the compiler is somehow generating
64-bit code, 3) assembler directives in the R5900 patch produce 64-bit code,
or 4) the sign-extension tests are wrong (or a combination of the above).

In principle, I suppose one could carefully write assembly code to save
all registers and stack traces in a reserved memory area for later
examination, avoiding "kernel bug" crashes when immediately triggering
trap 12 in a sensitive kernel context.

What are your thoughts on how to proceed?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-02  9:11                           ` Maciej W. Rozycki
  (?)
  (?)
@ 2017-10-06 20:28                           ` Fredrik Noring
  2017-10-15 16:39                             ` Fredrik Noring
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-10-06 20:28 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  Anyway, as noted above that shouldn't cause a problem with user software 
> and I think that any corruption you can see comes from elsewhere.  You'll 
> have to paper this $ra non-sign-extension issue over somehow to proceed 
> though.

I've extended do_IRQ with a register check under the condition that
user_mode(get_irq_regs()) is true, with the following sample results
where registers $2-$25 are printed if they are not sign-extended
properly (there is a certain amount of randomness to this):

    $10 : 00005f6362696c5f
    epc = 0fb6db00 in ld.so.1[fb60000+19000]

     $8 : ffffff7272655f5f
     $9 : ffffff7272655f5f
    $10 : 7066732e6362696c
    epc = 0fb759a0 in ld.so.1[fb60000+19000]

    $10 : 7274735f65646f6d
    $12 : ffff000000000000
    $13 : 0000ffffffffffff
    $14 : 000000ffffffffff
    epc = 0fb6d03c in ld.so.1[fb60000+19000]

    $10 : ffff732e6362696c
    epc = 0fb6cfe8 in ld.so.1[fb60000+19000]

     $8 : 000000ff00000000
    epc = 77e29fe4 in libc.so.6[77dc0000+12e000]

     $9 : 7fb71f357fb71f40
    epc = 0041cc60 in busybox[400000+3d000]

    $10 : 0000ffffff6f6c63
    epc = 0fb6d060 in ld.so.1[fb60000+19000]

    $10 : 00ffffffff657365
    epc = 0fb6d03c in ld.so.1[fb60000+19000]

    $12 : ffff000000000000
    $13 : 0000ffffffffffff
    epc = 0fb6d03c in ld.so.1[fb60000+19000]

     $8 : 4700302e325f4342
     $9 : 4700302e325f4342
    $10 : 0000ff6e6769735f
    $14 : 00ff000000000000
    epc = 0fb75a6c in ld.so.1[fb60000+19000]

    $10 : 635f6362696c5f5f
    $12 : ffff000000000000
    $13 : 0000ffffffffffff
    epc = 0fb6d608 in ld.so.1[fb60000+19000]

I'm not yet sure this approach is completely correct, because there are
quite a few macros and other things to set this up, and I'm assuming all
these registers are saved for IRQs by SAVE_ALL. The regs variable is
64-bits unsigned long long and save/restore is SD/LD in relevant places.

It would be interesting to somehow single-step through BusyBox and for
every hardware instruction validate registers to find the first occurrence
where sign-extension breaks.

What about making the R5900 a 64-bit kernel only if it would turn out that
the 32-bit sign-extension logic is not completely reliable?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-06 20:28                           ` Fredrik Noring
@ 2017-10-15 16:39                             ` Fredrik Noring
  2017-10-17 12:23                                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-10-15 16:39 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> >  Anyway, as noted above that shouldn't cause a problem with user software 
> > and I think that any corruption you can see comes from elsewhere.  You'll 
> > have to paper this $ra non-sign-extension issue over somehow to proceed 
> > though.
> 
> I've extended do_IRQ with a register check under the condition that
> user_mode(get_irq_regs()) is true, with the following sample results
> where registers $2-$25 are printed if they are not sign-extended
> properly (there is a certain amount of randomness to this):
> 
>     $10 : 00005f6362696c5f
>     epc = 0fb6db00 in ld.so.1[fb60000+19000]

Debian-based Black Rhino libc.so.6 declares "ELF 32-bit LSB MIPS-III
version 1" but functions such as strcmp contain both 64-bit and multimedia
instructions (presumably hand coded in assembly for the R5900):

	6005ea90 <strcmp>:
	...
	6005eb50:	78880000 	lq	t0,0(a0)
	6005eb54:	710043a9 	pcpyud	t0,t0,zero
	6005eb58:	1000000c 	b	6005eb8c <strcmp+0xfc>
	6005eb5c:	71204ba9 	pcpyud	t1,t1,zero
	6005eb60:	dc880000 	ld	t0,0(a0)
	6005eb64:	24840008 	addiu	a0,a0,8
	6005eb68:	dca90000 	ld	t1,0(a1)
	6005eb6c:	710072a8 	pceqb	t6,t0,zero
	6005eb70:	71207aa8 	pceqb	t7,t1,zero
	6005eb74:	01cf7025 	or	t6,t6,t7
	6005eb78:	71097aa8 	pceqb	t7,t0,t1
	...

Hence corruption and register sign-extension failures. One can also note
that according to the TX-79 manual, for a 32-bit kernel, several MIPS I
instruction operations are undefined unless registers are sign-extended.

It is unfortunate that these instructions seem untrappable by the R5900,
instead silently causing strange behaviour and invalid results.

Still left to explain is why the kernel stumbles on registers during
initialisation, before user applications are invoked.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-17 12:23                                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-17 12:23 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> Debian-based Black Rhino libc.so.6 declares "ELF 32-bit LSB MIPS-III
> version 1" but functions such as strcmp contain both 64-bit and multimedia
> instructions (presumably hand coded in assembly for the R5900):
> 
> 	6005ea90 <strcmp>:
> 	...
> 	6005eb50:	78880000 	lq	t0,0(a0)
> 	6005eb54:	710043a9 	pcpyud	t0,t0,zero
> 	6005eb58:	1000000c 	b	6005eb8c <strcmp+0xfc>
> 	6005eb5c:	71204ba9 	pcpyud	t1,t1,zero
> 	6005eb60:	dc880000 	ld	t0,0(a0)
> 	6005eb64:	24840008 	addiu	a0,a0,8
> 	6005eb68:	dca90000 	ld	t1,0(a1)
> 	6005eb6c:	710072a8 	pceqb	t6,t0,zero
> 	6005eb70:	71207aa8 	pceqb	t7,t1,zero
> 	6005eb74:	01cf7025 	or	t6,t6,t7
> 	6005eb78:	71097aa8 	pceqb	t7,t0,t1
> 	...
> 
> Hence corruption and register sign-extension failures. One can also note
> that according to the TX-79 manual, for a 32-bit kernel, several MIPS I
> instruction operations are undefined unless registers are sign-extended.

 That's a standard architectural requirement, not related to the kernel 
being 32-bit or 64-bit.  Overall all 32-bit ALU operations, except for SLL 
and SLLV, mandate that their input operands have been sign-extended from 
bit #31.

 I think you need to find another libc (or the whole userland), as I 
previously suggested.  I don't think we want to enable non-standard 
userland semantics, not as yet at least.  Having code supported like the 
snippet you have quoted above would I believe essentially boil down to 
Linux o64 ABI support, and anyway I think GAS should reject MMI 
instructions in the o32 assembly mode (which is something that was missed 
in R5900 support review).  I'll think on a suitable fix for GAS.

> It is unfortunate that these instructions seem untrappable by the R5900,
> instead silently causing strange behaviour and invalid results.

 Indeed, there's no way to trap with operations that break the 32-bit 
sign-extension requirement.  That's no different though from how the 
64-bit Linux kernel has operated for o32 software since forever; we don't 
clear the CP0.Status.UX bit for o32 tasks, although a discussion has been 
underway about changing it as it breaks indexed addressing (not a concern 
for the R5900, as a MIPS IV+ feature, e.g. LDXC1).  This has complications
though as we'd have to implement a TLB refill handler along with the XTLB 
refill handler, which is already taking space the former requires.

> Still left to explain is why the kernel stumbles on registers during
> initialisation, before user applications are invoked.

 Good luck!

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-17 12:23                                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-17 12:23 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> Debian-based Black Rhino libc.so.6 declares "ELF 32-bit LSB MIPS-III
> version 1" but functions such as strcmp contain both 64-bit and multimedia
> instructions (presumably hand coded in assembly for the R5900):
> 
> 	6005ea90 <strcmp>:
> 	...
> 	6005eb50:	78880000 	lq	t0,0(a0)
> 	6005eb54:	710043a9 	pcpyud	t0,t0,zero
> 	6005eb58:	1000000c 	b	6005eb8c <strcmp+0xfc>
> 	6005eb5c:	71204ba9 	pcpyud	t1,t1,zero
> 	6005eb60:	dc880000 	ld	t0,0(a0)
> 	6005eb64:	24840008 	addiu	a0,a0,8
> 	6005eb68:	dca90000 	ld	t1,0(a1)
> 	6005eb6c:	710072a8 	pceqb	t6,t0,zero
> 	6005eb70:	71207aa8 	pceqb	t7,t1,zero
> 	6005eb74:	01cf7025 	or	t6,t6,t7
> 	6005eb78:	71097aa8 	pceqb	t7,t0,t1
> 	...
> 
> Hence corruption and register sign-extension failures. One can also note
> that according to the TX-79 manual, for a 32-bit kernel, several MIPS I
> instruction operations are undefined unless registers are sign-extended.

 That's a standard architectural requirement, not related to the kernel 
being 32-bit or 64-bit.  Overall all 32-bit ALU operations, except for SLL 
and SLLV, mandate that their input operands have been sign-extended from 
bit #31.

 I think you need to find another libc (or the whole userland), as I 
previously suggested.  I don't think we want to enable non-standard 
userland semantics, not as yet at least.  Having code supported like the 
snippet you have quoted above would I believe essentially boil down to 
Linux o64 ABI support, and anyway I think GAS should reject MMI 
instructions in the o32 assembly mode (which is something that was missed 
in R5900 support review).  I'll think on a suitable fix for GAS.

> It is unfortunate that these instructions seem untrappable by the R5900,
> instead silently causing strange behaviour and invalid results.

 Indeed, there's no way to trap with operations that break the 32-bit 
sign-extension requirement.  That's no different though from how the 
64-bit Linux kernel has operated for o32 software since forever; we don't 
clear the CP0.Status.UX bit for o32 tasks, although a discussion has been 
underway about changing it as it breaks indexed addressing (not a concern 
for the R5900, as a MIPS IV+ feature, e.g. LDXC1).  This has complications
though as we'd have to implement a TLB refill handler along with the XTLB 
refill handler, which is already taking space the former requires.

> Still left to explain is why the kernel stumbles on registers during
> initialisation, before user applications are invoked.

 Good luck!

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-17 12:23                                 ` Maciej W. Rozycki
  (?)
@ 2017-10-21 18:00                                 ` Fredrik Noring
  2017-10-23 16:10                                     ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-10-21 18:00 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  I think you need to find another libc (or the whole userland), as I 
> previously suggested.

Indeed. I've replaced it all now.

> > Still left to explain is why the kernel stumbles on registers during
> > initialisation, before user applications are invoked.
> 
>  Good luck!

The problem was with the inq and outq macros in the Graphics Synthesizer
driver. A 32-bit kernel now works with 32-bit register save/restore and o32
applications, as intended. Many thanks for all your help in finding this!

I've found an unrelated curiosity. With CONFIG_CPU_HAS_MSA undefined,
handle_msa_fpe_int, do_msa_fpe, etc. are still generated with nonsensical
instructions:

	80025128 <handle_msa_fpe_int>:
	80025128:       787e0859        lq      s8,2137(v1) <<<-----
	8002512c:       00202821        move    a1,at
	80025130:       0000040f        sync.p
	80025134:       40086000        mfc0    t0,c0_sr
	...

	80030f08 <do_msa_fpe>:
	80030f08:       27bdffd8        addiu   sp,sp,-40
	80030f0c:       afb0001c        sw      s0,28(sp)
	80030f10:       00808021        move    s0,a0
	...
	80030f70:       02228824        and     s1,s1,v0
	80030f74:       02200821        move    at,s1
	80030f78:       783e0859        lq      s8,2137(at) <<<-----
	80030f7c:       0000040f        sync.p
	80030f80:       40016000        mfc0    at,c0_sr
	...

I disabled both with the patch below (there seems to be more opportunities
for size reductions overall).

Fredrik

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index ae810da4d499..91855c68e2de 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -439,11 +439,13 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	TRACE_IRQS_OFF
 	.endm
 
+#ifdef CONFIG_CPU_HAS_MSA
 	.macro	__build_clear_msa_fpe
 	_cfcmsa	a1, MSA_CSR
 	CLI
 	TRACE_IRQS_OFF
 	.endm
+#endif
 
 	.macro	__build_clear_ade
 	MFC0	t0, CP0_BADVADDR
@@ -503,10 +505,14 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	BUILD_HANDLER cpu cpu sti silent		/* #11 */
 	BUILD_HANDLER ov ov sti silent			/* #12 */
 	BUILD_HANDLER tr tr sti silent			/* #13 */
+#ifdef CONFIG_CPU_HAS_MSA
 	BUILD_HANDLER msa_fpe msa_fpe msa_fpe silent	/* #14 */
+#endif
 	BUILD_HANDLER fpe fpe fpe silent		/* #15 */
 	BUILD_HANDLER ftlb ftlb none silent		/* #16 */
+#ifdef CONFIG_CPU_HAS_MSA
 	BUILD_HANDLER msa msa sti silent		/* #21 */
+#endif
 	BUILD_HANDLER mdmx mdmx sti silent		/* #22 */
 #ifdef	CONFIG_HARDWARE_WATCHPOINTS
 	/*
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 38dfa27730ff..9bd7b4a0b764 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1457,6 +1457,7 @@ asmlinkage void do_cpu(struct pt_regs *regs)
 	exception_exit(prev_state);
 }
 
+#ifdef CONFIG_CPU_HAS_MSA
 asmlinkage void do_msa_fpe(struct pt_regs *regs, unsigned int msacsr)
 {
 	enum ctx_state prev_state;
@@ -1497,6 +1498,7 @@ asmlinkage void do_msa(struct pt_regs *regs)
 out:
 	exception_exit(prev_state);
 }
+#endif /* CONFIG_CPU_HAS_MSA */
 
 asmlinkage void do_mdmx(struct pt_regs *regs)
 {
@@ -2425,7 +2427,9 @@ void __init trap_init(void)
 	set_except_vector(EXCCODE_CPU, handle_cpu);
 	set_except_vector(EXCCODE_OV, handle_ov);
 	set_except_vector(EXCCODE_TR, handle_tr);
+#ifdef CONFIG_CPU_HAS_MSA
 	set_except_vector(EXCCODE_MSAFPE, handle_msa_fpe);
+#endif
 
 	if (current_cpu_type() == CPU_R6000 ||
 	    current_cpu_type() == CPU_R6000A) {
@@ -2455,7 +2459,9 @@ void __init trap_init(void)
 		set_except_vector(EXCCODE_TLBXI, tlb_do_page_fault_0);
 	}
 
+#ifdef CONFIG_CPU_HAS_MSA
 	set_except_vector(EXCCODE_MSADIS, handle_msa);
+#endif
 	set_except_vector(EXCCODE_MDMX, handle_mdmx);
 
 	if (cpu_has_mcheck)

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-23 16:10                                     ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-23 16:10 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

On Sat, 21 Oct 2017, Fredrik Noring wrote:

> > > Still left to explain is why the kernel stumbles on registers during
> > > initialisation, before user applications are invoked.
> > 
> >  Good luck!
> 
> The problem was with the inq and outq macros in the Graphics Synthesizer
> driver. A 32-bit kernel now works with 32-bit register save/restore and o32
> applications, as intended. Many thanks for all your help in finding this!

 Great, and you are welcome!

> I've found an unrelated curiosity. With CONFIG_CPU_HAS_MSA undefined,
> handle_msa_fpe_int, do_msa_fpe, etc. are still generated with nonsensical
> instructions:
> 
> 	80025128 <handle_msa_fpe_int>:
> 	80025128:       787e0859        lq      s8,2137(v1) <<<-----
> 	8002512c:       00202821        move    a1,at
> 	80025130:       0000040f        sync.p
> 	80025134:       40086000        mfc0    t0,c0_sr
> 	...
> 
> 	80030f08 <do_msa_fpe>:
> 	80030f08:       27bdffd8        addiu   sp,sp,-40
> 	80030f0c:       afb0001c        sw      s0,28(sp)
> 	80030f10:       00808021        move    s0,a0
> 	...
> 	80030f70:       02228824        and     s1,s1,v0
> 	80030f74:       02200821        move    at,s1
> 	80030f78:       783e0859        lq      s8,2137(at) <<<-----
> 	80030f7c:       0000040f        sync.p
> 	80030f80:       40016000        mfc0    at,c0_sr
> 	...
> 
> I disabled both with the patch below (there seems to be more opportunities
> for size reductions overall).

 Perhaps the MSA and the MSAFPE exception handlers could be improved 
somehow for configurations known not to support the MSA ASE, however your 
change as it stands will make the MSA exception handler default to 
`do_reserved', which in turn will cause MSA software to cause a kernel 
panic rather than sending SIGILL if run with CPU_HAS_MSA disabled on MSA 
hardware.  This would be rather nasty IMO.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-10-23 16:10                                     ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-10-23 16:10 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

On Sat, 21 Oct 2017, Fredrik Noring wrote:

> > > Still left to explain is why the kernel stumbles on registers during
> > > initialisation, before user applications are invoked.
> > 
> >  Good luck!
> 
> The problem was with the inq and outq macros in the Graphics Synthesizer
> driver. A 32-bit kernel now works with 32-bit register save/restore and o32
> applications, as intended. Many thanks for all your help in finding this!

 Great, and you are welcome!

> I've found an unrelated curiosity. With CONFIG_CPU_HAS_MSA undefined,
> handle_msa_fpe_int, do_msa_fpe, etc. are still generated with nonsensical
> instructions:
> 
> 	80025128 <handle_msa_fpe_int>:
> 	80025128:       787e0859        lq      s8,2137(v1) <<<-----
> 	8002512c:       00202821        move    a1,at
> 	80025130:       0000040f        sync.p
> 	80025134:       40086000        mfc0    t0,c0_sr
> 	...
> 
> 	80030f08 <do_msa_fpe>:
> 	80030f08:       27bdffd8        addiu   sp,sp,-40
> 	80030f0c:       afb0001c        sw      s0,28(sp)
> 	80030f10:       00808021        move    s0,a0
> 	...
> 	80030f70:       02228824        and     s1,s1,v0
> 	80030f74:       02200821        move    at,s1
> 	80030f78:       783e0859        lq      s8,2137(at) <<<-----
> 	80030f7c:       0000040f        sync.p
> 	80030f80:       40016000        mfc0    at,c0_sr
> 	...
> 
> I disabled both with the patch below (there seems to be more opportunities
> for size reductions overall).

 Perhaps the MSA and the MSAFPE exception handlers could be improved 
somehow for configurations known not to support the MSA ASE, however your 
change as it stands will make the MSA exception handler default to 
`do_reserved', which in turn will cause MSA software to cause a kernel 
panic rather than sending SIGILL if run with CPU_HAS_MSA disabled on MSA 
hardware.  This would be rather nasty IMO.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-02  9:05                               ` Maciej W. Rozycki
  (?)
  (?)
@ 2017-10-29 17:20                               ` Fredrik Noring
  2017-11-10 23:34                                   ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-10-29 17:20 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> > >  Getting a core dump and using it to figure out which specific instruction 
> > > caused the exception would be interesting.
> > 
> > It's 72308802 as in "mul s1,s1,s0" which I believe is the DSP enhancement
> > multiplication with register write in the MIPS32 architecture. The R5900
> > doesn't have those DSP instructions, as far as I can tell.
> 
>  Umm, has Debian switched to MIPS32 as the base architecture?  That would 
> be unfortunate, they used to support MIPS I or at worst MIPS II (ISTR 
> voices to switch to the latter).  There's still plenty of MIPS III 
> hardware around so for 32-bit support I would consider MIPS II the common 
> denominator (the sole difference between MIPS II and MIPS III is 64-bit 
> support).
> 
>  In any case you'll have to find a MIPS I or MIPS II distribution, like an 
> older version of Debian.

Jürgen Urban tried Debian 3.0, 3.1, 4.0, 5.0, 6.0 and 7.0. As far as he
remembers 3.0 and perhaps 5.0 were good. 6.0 and 7.0 required substantial
workarounds.

However, it turns out that the R5900 has a grave hardware error that
appears to rule out most if not all generic MIPS distributions:

The short loop bug under certain conditions causes loops to execute only
once or twice. GCC 2.95 that shipped with Sony PS2 Linux had a patch with
the following note:

    On the R5900, we must ensure that the compiler never generates
    loops that satisfy all of the following conditions:

    - a loop consists of less than equal to six instructions
      (including the branch delay slot).
    - a loop contains only one conditional branch instruction at
      the end of the loop.
    - a loop does not contain any other branch or jump instructions.
    - a branch delay slot of the loop is not NOP (EE 2.9 or later).

    We need to do this because of a bug in the chip.

>  The three-argument MUL is a part of the base MIPS32 architecture BTW, 
> originating from the IDT R4650 and the NEC Vr5500 processors.  It has 
> nothing to do with the DSP ASE (though it may have been claimed originally 
> to be a DSP enhancement).

The R5900 has three-operand multiply and multiply-accumulate instructions
as part of its multimedia set. Sadly, the MULT instruction format

      SPECIAL                          MULT
    +--------+----+----+----+-------+--------+
    | 000000 | rs | rt | rd | 00000 | 011000 |
    +--------+----+----+----+-------+--------+
         6      5    5    5     5        6

is incompatible with the corresponding MIPS32 MUL format

     SPECIAL2                           MUL
    +--------+----+----+----+-------+--------+
    | 011100 | rs | rt | rd | 00000 | 000010 |
    +--------+----+----+----+-------+--------+.
         6      5    5    5     5        6

> > > Also make sure you have RDHWR instruction emulation in place for CP0
> > > UserLocal register access.
> > 
> > Right. Debian's BusyBox has 857 of those. Jürgen Urban observed in the
> > conversation with you in
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> > 
> > that RDHWR has the same encoding as "sq v1,-6085(zero)" for the R5900,
> > which luckily always gives an alignment exception so that the kernel is
> > able to emulate RDHWR properly. I haven't verified this though.
> 
>  That instruction encoding (actually implemented by some MIPS32r2/MIPS64r2 
> and newer hardware) is used under Linux for Thread Local Storage (TLS) 
> access.  For hardware that does not have it the instruction is emulated in 
> the Reserved Instruction (RI) exception handler, but obviously not the 
> Address Error Store (AdES) exception.  So code to handle it as a special 
> case with the R5900 has to be provided among the patches (and included 
> with the initial series).
> 
>  Note that `rdhwr $3,$29' is the usual encoding, handled by a fastpath in 
> arch/mips/kernel/genex.S (see `handle_ri_rdhwr'), however all `rt' 
> encodings (covered in `simulate_rdhwr' in arch/mips/kernel/traps.c) have 
> to be handled for completeness.  Fortunately RDHWR and SQ both use the 
> same bits for `rt', and the `-6085(zero)' encoding of the memory reference 
> makes no sense, so we can safely rely on the AdES exception.

This patch traps the RDHWR instruction as an unaligned SQ:

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index f41cf3ee82a7..d4987e2d9695 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -39,4 +39,6 @@ extern int register_nmi_notifier(struct notifier_block *nb);
 	register_nmi_notifier(&fn##_nb);				\
 })
 
+asmlinkage void do_ri(struct pt_regs *regs);
+
 #endif /* _ASM_TRAPS_H */
diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c
index f806ee56e639..7303d5d5cac8 100644
--- a/arch/mips/kernel/unaligned.c
+++ b/arch/mips/kernel/unaligned.c
@@ -89,6 +89,7 @@
 #include <asm/fpu.h>
 #include <asm/fpu_emulator.h>
 #include <asm/inst.h>
+#include <asm/traps.h>
 #include <linux/uaccess.h>
 
 #define STR(x)	__STR(x)
@@ -1309,6 +1310,35 @@ static void emulate_load_store_insn(struct pt_regs *regs,
 		cu2_notifier_call_chain(CU2_SDC2_OP, regs);
 		break;
 #endif
+
+#ifdef CONFIG_CPU_R5900
+	case spec3_op:
+		/*
+		 * On the R5900 the RDHWR instruction
+		 *
+		 *     +--------+-------+----+----+-------+--------+
+		 *     | 011111 | 00000 | rt | rd | 00000 | 111011 |
+		 *     +--------+-------+----+----+-------+--------+
+		 *          6       5      5    5     5        6
+		 *
+		 * is interpreted as the R5900 specific SQ instruction
+		 *
+		 *     +--------+-------+----+---------------------+
+		 *     | 011111 |  base | rt |        offset       |
+		 *     +--------+-------+----+---------------------+
+		 *          6       5      5            16
+		 *
+		 * with an odd offset based on $0 that always yields an
+		 * address error exception. Hence RDHWR can be trapped
+		 * and emulated here.
+		 */
+		if (insn.spec3_format.func == rdhwr_op) {
+			do_ri(regs);
+			return;
+		}
+		goto sigill;
+#endif
+
 	default:
 		/*
 		 * Pheeee...  We encountered an yet unknown instruction or

Fredrik

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-09-11  5:18       ` Maciej W. Rozycki
  (?)
  (?)
@ 2017-10-29 18:42       ` Fredrik Noring
  -1 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2017-10-29 18:42 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

>  Please add at least a terse description of what the change actually does.

Please find below a new extended description in the first commit!

>  Ralf may yet want to chime in, but overall I think that the way to move 
> forward with your submission is to:
> 
> 1. Make the adjustments to this patch I have outlined above; dropping FPU 
>    and 64-bit support for the time being in particular.
> 
> 2. Update cache handlers to use the correct R5900-specific cache op 
>    encodings, presumably within the same patch.
> 
> 3. As Ralf has already reqested, add basic board support for the PS2 
>    platform, including essential drivers that are required to boot, e.g. 
>    serial, network; fancy stuff can be added gradually later on.
> 
> 4. Post the whole set of changes collected so far, properly split into 
>    functionally self-contained changes, i.e. ones that build and can run
>    successfully run on actual hardware, for a reasonable definition of 
>    success, e.g. patch #1 is this one, patch #2 is base board setup
>    infrastructure, patch #3 is interrupt support, patch #4 is timer 
>    support, patch #5 is the serial driver, patch #6 is the network 
>    driver, or suchlike.
> 
> We can then integrate these to have basic hardware support already working 
> and only then continue with more complicated features, especially ones 
> such as the FPU and 64-bit support which will require considerable updates 
> to generic architecture code.

Please find below a chronological log of the first 17 commits for the
initial submission. Those should cover most of the changes required for
the R5900 except memory management. I have a couple of difficulties to
sort out though. For example, the assembly version of memcpy crashes with
the R5900 (I provisionally use the Linux 2.6 version). The use of SYNC.L
is somewhat unclear to me. As are details of the CACHE changes.

Regarding item (3) on your list: I believe framebuffer and network (and
possibly USB) support are essential for most people. Those require DMA,
the Graphics Synthesizer and firmware support (plus a few other things)
in at least 10000 lines of code (not yet ready to be merged).

Any comments on the progress so far?

Fredrik

commit 2971122d3fe2647db697242f5a2ca8677625da08
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Aug 27 12:06:13 2017 +0000

    MIPS: R5900: Initial support for the Emotion Engine in the Playstation 2
    
    The R5900 implements the 64-bit MIPS III instruction set except LL, SC,
    LLD and SCD, with additional PREFETCH and conditional move instructions
    from MIPS IV as well as three-operand multiply and multiply-accumulate
    instructions. A set of 128-bit multimedia instructions specific to the
    R5900 is also implemented. Endianness is configurable but taken to be
    little-endian. The R5900 does not implement CP0.Status.{UX,SX,KX,PX}.
    
    The COP1 FPU implements single-precision floating-point operations but
    is not entirely IEEE 754 compatible. In particular,
    
    - NaN (not a number) and plus/minus infinities are not supported;
    - exception mechanisms are not fully supported;
    - denormalized numbers are not supported;
    - rounding towards nearest and plus/minus infinities are not supported;
    - computed results usually differs in the least significant bit;
    - saturating instructions can differ more than the least significant bit.
    
    Since only rounding towards zero is supported, the two least significant
    bits of FCR31 are hardwired to 01. To support IEEE 754 in applications
    the FPU is emulated in software by the kernel.
    
    The VPU0 is a vector processor of the Emotion Engine. In macro mode, it
    functions as COP2 (coprocessor) and instructions execute simultaneously
    in the main integer pipeline I1 and the COP2 pipeline. In micro mode,
    the VPU0 functions as a stand-alone processor. The VPU1 is an additional
    vector processor that operates independently of both the R5900 and the
    VPU0. It primarily acts as a preprocessor to the Graphics Synthesizer.
    
    The scratch pad RAM (SPRAM) of the Emotion Engine is 16 KiB of very fast
    static RAM organised in 128-bit quadwords. Both the DMA controller and
    the R5900 can access the SPRAM.
    
    The R5900 has several significant hardware bugs. Perhaps the most
    important bug affecting applications is the short loop bug, which under
    certain conditions causes loops to execute only once or twice. GCC 2.95
    that shipped with Sony PS2 Linux had a patch with the following note:
    
        On the R5900, we must ensure that the compiler never generates
        loops that satisfy all of the following conditions:
    
        - a loop consists of less than equal to six instructions
          (including the branch delay slot);
        - a loop contains only one conditional branch instruction at
          the end of the loop;
        - a loop does not contain any other branch or jump instructions;
        - a branch delay slot of the loop is not NOP (EE 2.9 or later).
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/Kconfig                | 13 ++++++++++++-
 arch/mips/Makefile               |  1 +
 arch/mips/include/asm/cpu-type.h |  4 ++++
 arch/mips/include/asm/cpu.h      |  3 ++-
 arch/mips/include/asm/module.h   |  2 ++
 arch/mips/kernel/cpu-probe.c     |  8 ++++++++
 arch/mips/mm/Makefile            |  1 +
 arch/mips/mm/c-r4k.c             | 19 +++++++++++++++++++
 8 files changed, 49 insertions(+), 2 deletions(-)

commit 6c428ebe7e1a287601044b4f8605299d0a114b93
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 10:32:37 2017 +0000

    MIPS: R5900: Trap the RDHWR instruction as an unaligned SQ
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/traps.h |  2 ++
 arch/mips/kernel/unaligned.c  | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

commit 8bb6cf3f082aedaf5c2c06a5eed4de291ec649f6
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 11:38:03 2017 +0000

    MIPS: R5900: Sign-extend O32 system call registers
    
    The R5900 has 64-bit instructions but does not implement CP0.Status.UX
    so a 32-bit kernel cannot assume O32 registers are sign-extended.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/kernel/scall32-o32.S | 12 ++++++++++++
 1 file changed, 12 insertions(+)

commit 494c54857165f679c14ee8f79180366d73550d07
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 12:35:37 2017 +0000

    MIPS: R5900: Add the SYNC.P instruction
    
    The SYNC.P instruction is a pipeline barrier. All instructions prior to
    the barrier are completed before the instructions following the barrier
    are fetched.
    
    However, the barrier operation doesn't wait for any prior instructions
    to retire, for example multiply, divide, multicycle COP1 operations or
    a pending load issued before the barrier operation.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/uasm.h | 1 +
 arch/mips/mm/uasm-mips.c     | 1 +
 arch/mips/mm/uasm.c          | 2 ++
 3 files changed, 4 insertions(+)

commit 096b8e8c6557ee5d8d1277727a248b0d55e369b6
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 15:07:40 2017 +0000

    MIPS: R5900: Reset bits 127..64 of GPRs in RESTORE_SOME
    
    Bits 127..64 are not used by the kernel but can be modified by
    applications using the R5900 specific multimedia instructions.
    Clearing them in RESTORE_SOME prevents leaking information between
    processes. This is a provisional measure until full 128-bit registers
    are saved/restored, possibly using SQ/LQ.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/stackframe.h | 53 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

commit 83acb945af3013a740a3bb7f75888ea38fd51dc4
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 15:23:30 2017 +0000

    MIPS: R5900: Reset the funnel shift amount (SA) register in RESTORE_SOME
    
    The shift amount (SA) register is a 64-bit special register storing the
    funnel shift amount. It is used by the QFSRV (quadword funnel shift
    right variable) 256-bit multimedia instruction. This is a provisional
    measure until the SA register is saved/restored properly.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/stackframe.h | 1 +
 1 file changed, 1 insertion(+)

commit dd12db794c4c870b84a811513984e35ba16d384f
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 17:07:02 2017 +0000

    MIPS: R5900: Workaround for the short loop bug
    
    The short loop bug under certain conditions causes loops to execute
    only once or twice. GCC 2.95 that shipped with Sony PS2 Linux had a
    patch with the following note:
    
        On the R5900, we must ensure that the compiler never generates
        loops that satisfy all of the following conditions:
    
        - a loop consists of less than equal to six instructions
          (including the branch delay slot).
        - a loop contains only one conditional branch instruction at
          the end of the loop.
        - a loop does not contain any other branch or jump instructions.
        - a branch delay slot of the loop is not NOP (EE 2.9 or later).
    
        We need to do this because of a bug in the chip.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/r4kcache.h |  7 +++++++
 arch/mips/lib/memset.S           | 12 ++++++++++++
 arch/mips/lib/strlen_user.S      |  6 ++++++
 arch/mips/lib/strncpy_user.S     |  4 ++++
 arch/mips/lib/strnlen_user.S     |  6 ++++++
 5 files changed, 35 insertions(+)

commit 2c7dd9a5741646df771fc6864e8a5388a585ba0b
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 13:45:55 2017 +0000

    MIPS: R5900: Add implicit SYNC.P to the UASM_i_M[FT]C0 macros
    
    The Toshiba TX79 manual mandates that all MTC0 instructions must be
    followed by a SYNC.P instruction as a barrier to guarantee COP0 register
    updates. There is one exception to this rule:
    
    An MTC0 instruction which loads the EntryHi COP0 register can be followed
    by a TLBWI or a TLBWR instruction without having an intervening SYNC.P.
    This special case is handled by a hardware interlock.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/uasm.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

commit b0d8587b4338001510173ce6a660d6837589b14e
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 17:29:21 2017 +0000

    MIPS: R5900: Add mandatory SYNC.P to all M[FT]C0 instructions
    
    The Toshiba TX79 manual mandates that all MTC0 instructions must be
    followed by a SYNC.P instruction as a barrier to guarantee COP0 register
    updates. There is one exception to this rule:
    
    An MTC0 instruction which loads the EntryHi COP0 register can be followed
    by a TLBWI or a TLBWR instruction without having an intervening SYNC.P.
    This special case is handled by a hardware interlock.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/asmmacro.h   | 12 +++++++
 arch/mips/include/asm/irqflags.h   | 15 +++++++++
 arch/mips/include/asm/mipsregs.h   | 69 ++++++++++++++++++++++++++++++++++++++
 arch/mips/include/asm/stackframe.h | 54 +++++++++++++++++++++++++++++
 arch/mips/kernel/genex.S           | 60 +++++++++++++++++++++++++++++++++
 arch/mips/kernel/head.S            |  9 +++++
 arch/mips/kernel/r4k_switch.S      | 21 ++++++++++++
 arch/mips/mm/cex-gen.S             |  6 ++++
 arch/mips/mm/tlbex-fault.S         |  3 ++
 arch/mips/mm/tlbex.c               | 11 +++---
 10 files changed, 256 insertions(+), 4 deletions(-)

commit b07f10ebc26eb746c22fafa46271c6297fc7929b
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 18:06:28 2017 +0000

    MIPS: R5900: Workaround exception NOP execution bug (FLX05)
    
    For the R5900, there are cases in which the first two instructions
    in an exception handler are executed as NOP instructions, when
    certain exceptions occur and then a bus error occurs immediately
    before jumping to the exception handler (FLX05).
    
    The corrective measure is to place NOP in the first two instruction
    locations in all exception handlers.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/kernel/genex.S       | 42 ++++++++++++++++++++++++++++++++++++++++++
 arch/mips/kernel/scall32-o32.S | 12 ++++++++++++
 2 files changed, 54 insertions(+)

commit e2362f3890a4e64dd64e018fc9368f286d08b3b5
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 08:30:48 2017 +0000

    MIPS: R5900: Add CACHE instruction operation field encodings
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/cacheops.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

commit 0b38596bf90f7968e3ebe21da061832db8980cb7
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 09:25:50 2017 +0000

    MIPS: R5900: Workaround where MSB must be 0 for the instruction cache
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/r4kcache.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

commit 4170865d4d04d96ebd10cdd226c06c5d10820533
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 08:43:18 2017 +0000

    MIPS: R5900: Use SYNC.L for data cache and SYNC.P for instruction cache
    
    Toshiba TX79 manual programming notes:
    
        For all CACHE sub-operations which operate on the instruction cache
        the following programming restrictions have to be followed:
    
        1. A sequence of CACHE instructions has to be directly preceded and
           followed by a SYNC.P instruction.
        2. Each individual FILL sub-operation has to be followed by a SYNC.L
           instruction.
    
        For all CACHE sub-operations which operate on the data cache the
        following programming restrictions have to be followed:
    
        1. A sequence of CACHE instructions have to be directly preceded and
           followed by a SYNC.L instruction.
        2. Each of the three WRITEBACK sub-operations have to be
           individually followed by a SYNC.L instruction.
    
        For all CACHE sub-operations which operate on the BTAC the following
        programming restrictions have to be followed:
    
        1. A sequence of CACHE instructions have to be directly preceded and
           followed by a SYNC.P instruction.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/r4kcache.h | 239 ++++++++++++++++++++++++++++++---------
 arch/mips/mm/c-r4k.c             |   8 +-
 2 files changed, 189 insertions(+), 58 deletions(-)

commit 62acf546c937e68071c6dc97202b7404c9766854
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 15:57:26 2017 +0000

    MIPS: R5900: Use mandatory SYNC.L in exception handlers (FIXME: Why?)
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/ftrace.h  |  52 ++++++++++
 arch/mips/include/asm/uaccess.h | 216 ++++++++++++++++++++++++++++++++++++++++
 arch/mips/kernel/unaligned.c    |  32 ++++++
 arch/mips/lib/csum_partial.S    |   3 +
 arch/mips/lib/memset.S          |  19 ++++
 arch/mips/lib/strlen_user.S     |   2 +
 arch/mips/lib/strncpy_user.S    |   2 +
 arch/mips/lib/strnlen_user.S    |   2 +
 8 files changed, 328 insertions(+)

commit 6cdfcde1a1a939b0114988a31eb8210505fdb042
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 10:13:33 2017 +0000

    MIPS: R5900: Add COP0 config register fields
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/mipsregs.h | 8 ++++++++
 1 file changed, 8 insertions(+)

commit 412a6f4f67e395d3f048278692a8b083d1b27f03
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sun Oct 29 14:20:24 2017 +0000

    MIPS: R5900: Workaround for CACHE instruction near branch delay slot
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/kernel/genex.S | 16 ++++++++++++++++
 arch/mips/kernel/traps.c | 24 ++++++++++++++++++++++++
 arch/mips/mm/tlbex.c     | 18 ++++++++++++++++++
 3 files changed, 58 insertions(+)

commit d44ba94c588f991c4ade2ac73fa8334e2e80f311
Author: Fredrik Noring <noring@nocrew.org>
Date:   Sat Oct 28 14:07:39 2017 +0000

    MIPS: R5900: Support 64-bit inq and outq macros in 32-bit kernels
    
    Playstation 2 hardware such as the Graphics Synthesizer requires 64-bit
    register reads and writes.
    
    Signed-off-by: Fredrik Noring <noring@nocrew.org>

 arch/mips/include/asm/io.h | 49 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-10-02  9:05                               ` Maciej W. Rozycki
                                                 ` (2 preceding siblings ...)
  (?)
@ 2017-10-30 17:55                               ` Fredrik Noring
  2017-11-24 10:26                                   ` Maciej W. Rozycki
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-10-30 17:55 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Hi Maciej,

> > For this reason the R5900 patch modifies the __{save,restore}_dsp macros,
> > mips_dsp_state::dspcontrol, DSP_INIT, sigcontext32::sc_dsp, etc. I've seen
> > the cpu_has_dsp macro too, but haven't looked at the details of this yet.
> 
>  Given that the R5900 does not expand DSP support anyhow that sounds 
> suspicious to me.

I've taken a closer look at the R5900 changes to the DSP kernel code now:

The R5900 has four three-operand instructions: MADD, MADDU, MULT and MULTU.
In addition, it has ten instructions for pipeline 1: MULT1, MULTU1, DIV1,
DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1 and MTLO1. Those are the reason
(parts of) the cpu_has_dsp infrastructure is used, as shown in the patch
below. What are your thoughts on this?

The instructions are specific to the R5900, and notably incompatible with
similar ones in the base MIPS32 architecture. They are also distinct from
the (also R5900 specific) 128-bit multimedia instructions.

By the way, "machine" is set to "Unknown" and "ASEs implemented" is empty
in /proc/cpuinfo. What would be the proper values for the R5900?

Fredrik

diff --git a/arch/mips/include/asm/dsp.h b/arch/mips/include/asm/dsp.h
index 7bfad0520e25..1bf4da622795 100644
--- a/arch/mips/include/asm/dsp.h
+++ b/arch/mips/include/asm/dsp.h
@@ -27,11 +27,13 @@ static inline void __init_dsp(void)
 {
 	mthi1(0);
 	mtlo1(0);
+#ifndef CONFIG_CPU_R5900
 	mthi2(0);
 	mtlo2(0);
 	mthi3(0);
 	mtlo3(0);
 	wrdsp(DSP_DEFAULT, DSP_MASK);
+#endif
 }
 
 static inline void init_dsp(void)
@@ -40,6 +42,13 @@ static inline void init_dsp(void)
 		__init_dsp();
 }
 
+#ifdef CONFIG_CPU_R5900
+#define __save_dsp(tsk)							\
+do {									\
+	tsk->thread.dsp.dspr[0] = mfhi1();				\
+	tsk->thread.dsp.dspr[1] = mflo1();				\
+} while (0)
+#else
 #define __save_dsp(tsk)							\
 do {									\
 	tsk->thread.dsp.dspr[0] = mfhi1();				\
@@ -50,6 +59,7 @@ do {									\
 	tsk->thread.dsp.dspr[5] = mflo3();				\
 	tsk->thread.dsp.dspcontrol = rddsp(DSP_MASK);			\
 } while (0)
+#endif
 
 #define save_dsp(tsk)							\
 do {									\
@@ -57,6 +67,13 @@ do {									\
 		__save_dsp(tsk);					\
 } while (0)
 
+#ifdef CONFIG_CPU_R5900
+#define __restore_dsp(tsk)						\
+do {									\
+	mthi1(tsk->thread.dsp.dspr[0]);					\
+	mtlo1(tsk->thread.dsp.dspr[1]);					\
+} while (0)
+#else
 #define __restore_dsp(tsk)						\
 do {									\
 	mthi1(tsk->thread.dsp.dspr[0]);					\
@@ -67,6 +84,7 @@ do {									\
 	mtlo3(tsk->thread.dsp.dspr[5]);					\
 	wrdsp(tsk->thread.dsp.dspcontrol, DSP_MASK);			\
 } while (0)
+#endif
 
 #define restore_dsp(tsk)						\
 do {									\
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 95b8c471f572..7330530f31b0 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -139,13 +139,19 @@ struct mips_fpu_struct {
 	unsigned int	msacsr;
 };
 
+#ifdef CONFIG_CPU_R5900
+#define NUM_DSP_REGS   2
+#else
 #define NUM_DSP_REGS   6
+#endif
 
 typedef __u32 dspreg_t;
 
 struct mips_dsp_state {
 	dspreg_t	dspr[NUM_DSP_REGS];
+#ifndef CONFIG_CPU_R5900
 	unsigned int	dspcontrol;
+#endif
 };
 
 #define INIT_CPUMASK { \
@@ -304,10 +310,20 @@ struct thread_struct {
 #define FPAFF_INIT
 #endif /* CONFIG_MIPS_MT_FPAFF */
 
-#define INIT_THREAD  {						\
-	/*							\
-	 * Saved main processor registers			\
-	 */							\
+#ifdef CONFIG_CPU_R5900
+#define DSP_INIT \
+	.dsp			= {				\
+		.dspr		= {0, },			\
+	},
+#else
+#define DSP_INIT \
+	.dsp			= {				\
+		.dspr		= {0, },			\
+		.dspcontrol	= 0,				\
+	},
+#endif
+
+#define REGS_INIT \
 	.reg16			= 0,				\
 	.reg17			= 0,				\
 	.reg18			= 0,				\
@@ -318,7 +334,13 @@ struct thread_struct {
 	.reg23			= 0,				\
 	.reg29			= 0,				\
 	.reg30			= 0,				\
-	.reg31			= 0,				\
+	.reg31			= 0,
+
+#define INIT_THREAD  {						\
+        /*							\
+         * Saved main processor registers			\
+         */							\
+	REGS_INIT						\
 	/*							\
 	 * Saved cp0 stuff					\
 	 */							\
@@ -342,10 +364,7 @@ struct thread_struct {
 	/*							\
 	 * Saved DSP stuff					\
 	 */							\
-	.dsp			= {				\
-		.dspr		= {0, },			\
-		.dspcontrol	= 0,				\
-	},							\
+	DSP_INIT						\
 	/*							\
 	 * saved watch register stuff				\
 	 */							\
diff --git a/arch/mips/include/asm/sigcontext.h b/arch/mips/include/asm/sigcontext.h
index eeeb0f48c767..4e975a3291f6 100644
--- a/arch/mips/include/asm/sigcontext.h
+++ b/arch/mips/include/asm/sigcontext.h
@@ -23,15 +23,19 @@ struct sigcontext32 {
 	__u32		sc_fpc_csr;
 	__u32		sc_fpc_eir;	/* Unused */
 	__u32		sc_used_math;
+#ifndef CONFIG_CPU_R5900
 	__u32		sc_dsp;		/* dsp status, was sc_ssflags */
+#endif
 	__u64		sc_mdhi;
 	__u64		sc_mdlo;
 	__u32		sc_hi1;		/* Was sc_cause */
 	__u32		sc_lo1;		/* Was sc_badvaddr */
+#ifndef CONFIG_CPU_R5900
 	__u32		sc_hi2;		/* Was sc_sigset[4] */
 	__u32		sc_lo2;
 	__u32		sc_hi3;
 	__u32		sc_lo3;
+#endif
 };
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 || _MIPS_SIM == _MIPS_SIM_NABI32 */
 #endif /* _ASM_SIGCONTEXT_H */
diff --git a/arch/mips/include/uapi/asm/sigcontext.h b/arch/mips/include/uapi/asm/sigcontext.h
index 5cbd9ae6421f..7564ba82425a 100644
--- a/arch/mips/include/uapi/asm/sigcontext.h
+++ b/arch/mips/include/uapi/asm/sigcontext.h
@@ -40,15 +40,26 @@ struct sigcontext {
 	unsigned int		sc_fpc_csr;
 	unsigned int		sc_fpc_eir;	/* Unused */
 	unsigned int		sc_used_math;
+#ifdef CONFIG_CPU_R5900
+	unsigned int		pad0;
+#else
 	unsigned int		sc_dsp;		/* dsp status, was sc_ssflags */
+#endif
 	unsigned long long	sc_mdhi;
 	unsigned long long	sc_mdlo;
 	unsigned long		sc_hi1;		/* Was sc_cause */
 	unsigned long		sc_lo1;		/* Was sc_badvaddr */
+#ifdef CONFIG_CPU_R5900
+	unsigned long		pad1;
+	unsigned long		pad2;
+	unsigned long		pad3;
+	unsigned long		pad4;
+#else
 	unsigned long		sc_hi2;		/* Was sc_sigset[4] */
 	unsigned long		sc_lo2;
 	unsigned long		sc_hi3;
 	unsigned long		sc_lo3;
+#endif
 };
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
@@ -71,16 +82,22 @@ struct sigcontext {
 	__u64	sc_fpregs[32];
 	__u64	sc_mdhi;
 	__u64	sc_hi1;
+#ifndef CONFIG_CPU_R5900
 	__u64	sc_hi2;
 	__u64	sc_hi3;
+#endif
 	__u64	sc_mdlo;
 	__u64	sc_lo1;
+#ifndef CONFIG_CPU_R5900
 	__u64	sc_lo2;
 	__u64	sc_lo3;
+#endif
 	__u64	sc_pc;
 	__u32	sc_fpc_csr;
 	__u32	sc_used_math;
+#ifndef CONFIG_CPU_R5900
 	__u32	sc_dsp;
+#endif
 	__u32	sc_reserved;
 };
 
diff --git a/arch/mips/kernel/asm-offsets.c b/arch/mips/kernel/asm-offsets.c
index a670c0c11875..041ed07e7910 100644
--- a/arch/mips/kernel/asm-offsets.c
+++ b/arch/mips/kernel/asm-offsets.c
@@ -226,10 +226,12 @@ void output_sc_defines(void)
 	OFFSET(SC_FPC_EIR, sigcontext, sc_fpc_eir);
 	OFFSET(SC_HI1, sigcontext, sc_hi1);
 	OFFSET(SC_LO1, sigcontext, sc_lo1);
+#ifndef CONFIG_CPU_R5900
 	OFFSET(SC_HI2, sigcontext, sc_hi2);
 	OFFSET(SC_LO2, sigcontext, sc_lo2);
 	OFFSET(SC_HI3, sigcontext, sc_hi3);
 	OFFSET(SC_LO3, sigcontext, sc_lo3);
+#endif
 	BLANK();
 }
 #endif
diff --git a/arch/mips/kernel/branch.c b/arch/mips/kernel/branch.c
index f702a459a830..b675c112aac3 100644
--- a/arch/mips/kernel/branch.c
+++ b/arch/mips/kernel/branch.c
@@ -416,9 +416,12 @@ int __MIPS16e_compute_return_epc(struct pt_regs *regs)
 int __compute_return_epc_for_insn(struct pt_regs *regs,
 				   union mips_instruction insn)
 {
-	unsigned int bit, fcr31, dspcontrol, reg;
+	unsigned int bit, fcr31, reg;
 	long epc = regs->cp0_epc;
 	int ret = 0;
+#ifndef CONFIG_CPU_R5900
+	unsigned int dspcontrol;
+#endif
 
 	switch (insn.i_format.opcode) {
 	/*
@@ -539,9 +542,12 @@ int __compute_return_epc_for_insn(struct pt_regs *regs,
 			break;
 
 		case bposge32_op:
+#ifndef CONFIG_CPU_R5900
 			if (!cpu_has_dsp)
+#endif
 				goto sigill_dsp;
 
+#ifndef CONFIG_CPU_R5900
 			dspcontrol = rddsp(0x01);
 
 			if (dspcontrol >= 32) {
@@ -549,6 +555,7 @@ int __compute_return_epc_for_insn(struct pt_regs *regs,
 			} else
 				epc += 8;
 			regs->cp0_epc = epc;
+#endif
 			break;
 		}
 		break;
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 6931fe722a0b..c1b854542561 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -710,7 +710,7 @@ long arch_ptrace(struct task_struct *child, long request,
 			/* implementation / version register */
 			tmp = boot_cpu_data.fpu_id;
 			break;
-		case DSP_BASE ... DSP_BASE + 5: {
+		case DSP_BASE ... DSP_BASE + NUM_DSP_REGS - 1: {
 			dspreg_t *dregs;
 
 			if (!cpu_has_dsp) {
@@ -722,6 +722,7 @@ long arch_ptrace(struct task_struct *child, long request,
 			tmp = (unsigned long) (dregs[addr - DSP_BASE]);
 			break;
 		}
+#ifndef CONFIG_CPU_R5900
 		case DSP_CONTROL:
 			if (!cpu_has_dsp) {
 				tmp = 0;
@@ -730,6 +731,7 @@ long arch_ptrace(struct task_struct *child, long request,
 			}
 			tmp = child->thread.dsp.dspcontrol;
 			break;
+#endif
 		default:
 			tmp = 0;
 			ret = -EIO;
@@ -791,7 +793,7 @@ long arch_ptrace(struct task_struct *child, long request,
 			init_fp_ctx(child);
 			ptrace_setfcr31(child, data);
 			break;
-		case DSP_BASE ... DSP_BASE + 5: {
+		case DSP_BASE ... DSP_BASE + NUM_DSP_REGS - 1: {
 			dspreg_t *dregs;
 
 			if (!cpu_has_dsp) {
@@ -803,6 +805,7 @@ long arch_ptrace(struct task_struct *child, long request,
 			dregs[addr - DSP_BASE] = data;
 			break;
 		}
+#ifndef CONFIG_CPU_R5900
 		case DSP_CONTROL:
 			if (!cpu_has_dsp) {
 				ret = -EIO;
@@ -810,6 +813,7 @@ long arch_ptrace(struct task_struct *child, long request,
 			}
 			child->thread.dsp.dspcontrol = data;
 			break;
+#endif
 		default:
 			/* The rest are not allowed. */
 			ret = -EIO;
diff --git a/arch/mips/kernel/ptrace32.c b/arch/mips/kernel/ptrace32.c
index 40e212d6b26b..232a28c94cce 100644
--- a/arch/mips/kernel/ptrace32.c
+++ b/arch/mips/kernel/ptrace32.c
@@ -132,7 +132,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			/* implementation / version register */
 			tmp = boot_cpu_data.fpu_id;
 			break;
-		case DSP_BASE ... DSP_BASE + 5: {
+		case DSP_BASE ... DSP_BASE + NUM_DSP_REGS - 1: {
 			dspreg_t *dregs;
 
 			if (!cpu_has_dsp) {
@@ -144,6 +144,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			tmp = (unsigned long) (dregs[addr - DSP_BASE]);
 			break;
 		}
+#ifndef CONFIG_CPU_R5900
 		case DSP_CONTROL:
 			if (!cpu_has_dsp) {
 				tmp = 0;
@@ -152,6 +153,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			}
 			tmp = child->thread.dsp.dspcontrol;
 			break;
+#endif
 		default:
 			tmp = 0;
 			ret = -EIO;
@@ -230,7 +232,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 		case FPC_CSR:
 			child->thread.fpu.fcr31 = data;
 			break;
-		case DSP_BASE ... DSP_BASE + 5: {
+		case DSP_BASE ... DSP_BASE + NUM_DSP_REGS - 1: {
 			dspreg_t *dregs;
 
 			if (!cpu_has_dsp) {
@@ -242,6 +244,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			dregs[addr - DSP_BASE] = data;
 			break;
 		}
+#ifndef CONFIG_CPU_R5900
 		case DSP_CONTROL:
 			if (!cpu_has_dsp) {
 				ret = -EIO;
@@ -249,6 +252,7 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
 			}
 			child->thread.dsp.dspcontrol = data;
 			break;
+#endif
 		default:
 			/* The rest are not allowed. */
 			ret = -EIO;
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index 9e224469c788..3ca0f424c78b 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -426,11 +426,13 @@ int setup_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc)
 	if (cpu_has_dsp) {
 		err |= __put_user(mfhi1(), &sc->sc_hi1);
 		err |= __put_user(mflo1(), &sc->sc_lo1);
+#ifndef CONFIG_CPU_R5900
 		err |= __put_user(mfhi2(), &sc->sc_hi2);
 		err |= __put_user(mflo2(), &sc->sc_lo2);
 		err |= __put_user(mfhi3(), &sc->sc_hi3);
 		err |= __put_user(mflo3(), &sc->sc_lo3);
 		err |= __put_user(rddsp(DSP_MASK), &sc->sc_dsp);
+#endif
 	}
 
 
@@ -503,11 +505,13 @@ int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc)
 	if (cpu_has_dsp) {
 		err |= __get_user(treg, &sc->sc_hi1); mthi1(treg);
 		err |= __get_user(treg, &sc->sc_lo1); mtlo1(treg);
+#ifndef CONFIG_CPU_R5900
 		err |= __get_user(treg, &sc->sc_hi2); mthi2(treg);
 		err |= __get_user(treg, &sc->sc_lo2); mtlo2(treg);
 		err |= __get_user(treg, &sc->sc_hi3); mthi3(treg);
 		err |= __get_user(treg, &sc->sc_lo3); mtlo3(treg);
 		err |= __get_user(treg, &sc->sc_dsp); wrdsp(treg, DSP_MASK);
+#endif
 	}
 
 	for (i = 1; i < 32; i++)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 9979eb78c592..5b63fcc11733 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1607,8 +1607,10 @@ asmlinkage void do_mt(struct pt_regs *regs)
 
 asmlinkage void do_dsp(struct pt_regs *regs)
 {
+#ifndef CONFIG_CPU_R5900
 	if (cpu_has_dsp)
 		panic("Unexpected DSP exception");
+#endif
 
 	force_sig(SIGILL, current);
 }

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-11-10 23:34                                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-11-10 23:34 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> However, it turns out that the R5900 has a grave hardware error that
> appears to rule out most if not all generic MIPS distributions:
> 
> The short loop bug under certain conditions causes loops to execute only
> once or twice. GCC 2.95 that shipped with Sony PS2 Linux had a patch with
> the following note:
> 
>     On the R5900, we must ensure that the compiler never generates
>     loops that satisfy all of the following conditions:
> 
>     - a loop consists of less than equal to six instructions
>       (including the branch delay slot).
>     - a loop contains only one conditional branch instruction at
>       the end of the loop.
>     - a loop does not contain any other branch or jump instructions.
>     - a branch delay slot of the loop is not NOP (EE 2.9 or later).
> 
>     We need to do this because of a bug in the chip.

 You'll need a `-mfix-r5900' workaround in the compiler then.  One for GAS 
for handcoded assembly might be doable as well, fixing the `reorder' mode 
only and possibly bailing out if the conditions are met in the `noreorder' 
mode.

> > originating from the IDT R4650 and the NEC Vr5500 processors.  It has 
> > nothing to do with the DSP ASE (though it may have been claimed originally 
> > to be a DSP enhancement).
> 
> The R5900 has three-operand multiply and multiply-accumulate instructions
> as part of its multimedia set. Sadly, the MULT instruction format
> 
>       SPECIAL                          MULT
>     +--------+----+----+----+-------+--------+
>     | 000000 | rs | rt | rd | 00000 | 011000 |
>     +--------+----+----+----+-------+--------+
>          6      5    5    5     5        6
> 
> is incompatible with the corresponding MIPS32 MUL format
> 
>      SPECIAL2                           MUL
>     +--------+----+----+----+-------+--------+
>     | 011100 | rs | rt | rd | 00000 | 000010 |
>     +--------+----+----+----+-------+--------+.
>          6      5    5    5     5        6

 Still R5900-specific code may use it.

> > > > Also make sure you have RDHWR instruction emulation in place for CP0
> > > > UserLocal register access.
> > > 
> > > Right. Debian's BusyBox has 857 of those. Jürgen Urban observed in the
> > > conversation with you in
> > > 
> > > https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> > > 
> > > that RDHWR has the same encoding as "sq v1,-6085(zero)" for the R5900,
> > > which luckily always gives an alignment exception so that the kernel is
> > > able to emulate RDHWR properly. I haven't verified this though.
> > 
> >  That instruction encoding (actually implemented by some MIPS32r2/MIPS64r2 
> > and newer hardware) is used under Linux for Thread Local Storage (TLS) 
> > access.  For hardware that does not have it the instruction is emulated in 
> > the Reserved Instruction (RI) exception handler, but obviously not the 
> > Address Error Store (AdES) exception.  So code to handle it as a special 
> > case with the R5900 has to be provided among the patches (and included 
> > with the initial series).
> > 
> >  Note that `rdhwr $3,$29' is the usual encoding, handled by a fastpath in 
> > arch/mips/kernel/genex.S (see `handle_ri_rdhwr'), however all `rt' 
> > encodings (covered in `simulate_rdhwr' in arch/mips/kernel/traps.c) have 
> > to be handled for completeness.  Fortunately RDHWR and SQ both use the 
> > same bits for `rt', and the `-6085(zero)' encoding of the memory reference 
> > makes no sense, so we can safely rely on the AdES exception.
> 
> This patch traps the RDHWR instruction as an unaligned SQ:
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index f41cf3ee82a7..d4987e2d9695 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -39,4 +39,6 @@ extern int register_nmi_notifier(struct notifier_block *nb);
>  	register_nmi_notifier(&fn##_nb);				\
>  })
>  
> +asmlinkage void do_ri(struct pt_regs *regs);
> +
>  #endif /* _ASM_TRAPS_H */
> diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c
> index f806ee56e639..7303d5d5cac8 100644
> --- a/arch/mips/kernel/unaligned.c
> +++ b/arch/mips/kernel/unaligned.c
> @@ -89,6 +89,7 @@
>  #include <asm/fpu.h>
>  #include <asm/fpu_emulator.h>
>  #include <asm/inst.h>
> +#include <asm/traps.h>
>  #include <linux/uaccess.h>
>  
>  #define STR(x)	__STR(x)
> @@ -1309,6 +1310,35 @@ static void emulate_load_store_insn(struct pt_regs *regs,
>  		cu2_notifier_call_chain(CU2_SDC2_OP, regs);
>  		break;
>  #endif
> +
> +#ifdef CONFIG_CPU_R5900

 It might be preferable to use:

	if (IS_ENABLED(CONFIG_CPU_R5900))

instead.

> +	case spec3_op:

 There is already a `spec3_op' case in this `switch' statement, so you 
need to fold your code into it (have you actually successfully built this 
piece before posting?).

> +		/*
> +		 * On the R5900 the RDHWR instruction
> +		 *
> +		 *     +--------+-------+----+----+-------+--------+
> +		 *     | 011111 | 00000 | rt | rd | 00000 | 111011 |
> +		 *     +--------+-------+----+----+-------+--------+
> +		 *          6       5      5    5     5        6
> +		 *
> +		 * is interpreted as the R5900 specific SQ instruction
> +		 *
> +		 *     +--------+-------+----+---------------------+
> +		 *     | 011111 |  base | rt |        offset       |
> +		 *     +--------+-------+----+---------------------+
> +		 *          6       5      5            16
> +		 *
> +		 * with an odd offset based on $0 that always yields an
> +		 * address error exception. Hence RDHWR can be trapped
> +		 * and emulated here.
> +		 */
> +		if (insn.spec3_format.func == rdhwr_op) {

 I think `r_format' is more appropriate for RDHWR (`spec3_format' really 
matches EVA instructions only; we might invent a distinct new format for 
the BSHFL, DBSHFL and RDHWR minor opcodes, but I think this would be an 
overkill) and you need to qualify the other instruction fields, i.e. `rs' 
and `re', because of the overlap with SQ.  We only want to give the 
special exception for what looks like a real RDHWR instruction and not 
just any faulting SQ whose least significant bits of the offset happen to 
match the RDHWR minor opcode.

> +			do_ri(regs);

 Or rather `simulate_rdhwr(regs, insn.r_format.rd, insn.r_format.rt)' as 
we've already qualified it.

> +			return;
> +		}
> +		goto sigill;

 This I think should be `sigbus' as the SQ opcode is valid on the R5900.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-11-10 23:34                                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-11-10 23:34 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: linux-mips

Hi Fredrik,

> However, it turns out that the R5900 has a grave hardware error that
> appears to rule out most if not all generic MIPS distributions:
> 
> The short loop bug under certain conditions causes loops to execute only
> once or twice. GCC 2.95 that shipped with Sony PS2 Linux had a patch with
> the following note:
> 
>     On the R5900, we must ensure that the compiler never generates
>     loops that satisfy all of the following conditions:
> 
>     - a loop consists of less than equal to six instructions
>       (including the branch delay slot).
>     - a loop contains only one conditional branch instruction at
>       the end of the loop.
>     - a loop does not contain any other branch or jump instructions.
>     - a branch delay slot of the loop is not NOP (EE 2.9 or later).
> 
>     We need to do this because of a bug in the chip.

 You'll need a `-mfix-r5900' workaround in the compiler then.  One for GAS 
for handcoded assembly might be doable as well, fixing the `reorder' mode 
only and possibly bailing out if the conditions are met in the `noreorder' 
mode.

> > originating from the IDT R4650 and the NEC Vr5500 processors.  It has 
> > nothing to do with the DSP ASE (though it may have been claimed originally 
> > to be a DSP enhancement).
> 
> The R5900 has three-operand multiply and multiply-accumulate instructions
> as part of its multimedia set. Sadly, the MULT instruction format
> 
>       SPECIAL                          MULT
>     +--------+----+----+----+-------+--------+
>     | 000000 | rs | rt | rd | 00000 | 011000 |
>     +--------+----+----+----+-------+--------+
>          6      5    5    5     5        6
> 
> is incompatible with the corresponding MIPS32 MUL format
> 
>      SPECIAL2                           MUL
>     +--------+----+----+----+-------+--------+
>     | 011100 | rs | rt | rd | 00000 | 000010 |
>     +--------+----+----+----+-------+--------+.
>          6      5    5    5     5        6

 Still R5900-specific code may use it.

> > > > Also make sure you have RDHWR instruction emulation in place for CP0
> > > > UserLocal register access.
> > > 
> > > Right. Debian's BusyBox has 857 of those. Jürgen Urban observed in the
> > > conversation with you in
> > > 
> > > https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00658.html
> > > 
> > > that RDHWR has the same encoding as "sq v1,-6085(zero)" for the R5900,
> > > which luckily always gives an alignment exception so that the kernel is
> > > able to emulate RDHWR properly. I haven't verified this though.
> > 
> >  That instruction encoding (actually implemented by some MIPS32r2/MIPS64r2 
> > and newer hardware) is used under Linux for Thread Local Storage (TLS) 
> > access.  For hardware that does not have it the instruction is emulated in 
> > the Reserved Instruction (RI) exception handler, but obviously not the 
> > Address Error Store (AdES) exception.  So code to handle it as a special 
> > case with the R5900 has to be provided among the patches (and included 
> > with the initial series).
> > 
> >  Note that `rdhwr $3,$29' is the usual encoding, handled by a fastpath in 
> > arch/mips/kernel/genex.S (see `handle_ri_rdhwr'), however all `rt' 
> > encodings (covered in `simulate_rdhwr' in arch/mips/kernel/traps.c) have 
> > to be handled for completeness.  Fortunately RDHWR and SQ both use the 
> > same bits for `rt', and the `-6085(zero)' encoding of the memory reference 
> > makes no sense, so we can safely rely on the AdES exception.
> 
> This patch traps the RDHWR instruction as an unaligned SQ:
> 
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index f41cf3ee82a7..d4987e2d9695 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -39,4 +39,6 @@ extern int register_nmi_notifier(struct notifier_block *nb);
>  	register_nmi_notifier(&fn##_nb);				\
>  })
>  
> +asmlinkage void do_ri(struct pt_regs *regs);
> +
>  #endif /* _ASM_TRAPS_H */
> diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c
> index f806ee56e639..7303d5d5cac8 100644
> --- a/arch/mips/kernel/unaligned.c
> +++ b/arch/mips/kernel/unaligned.c
> @@ -89,6 +89,7 @@
>  #include <asm/fpu.h>
>  #include <asm/fpu_emulator.h>
>  #include <asm/inst.h>
> +#include <asm/traps.h>
>  #include <linux/uaccess.h>
>  
>  #define STR(x)	__STR(x)
> @@ -1309,6 +1310,35 @@ static void emulate_load_store_insn(struct pt_regs *regs,
>  		cu2_notifier_call_chain(CU2_SDC2_OP, regs);
>  		break;
>  #endif
> +
> +#ifdef CONFIG_CPU_R5900

 It might be preferable to use:

	if (IS_ENABLED(CONFIG_CPU_R5900))

instead.

> +	case spec3_op:

 There is already a `spec3_op' case in this `switch' statement, so you 
need to fold your code into it (have you actually successfully built this 
piece before posting?).

> +		/*
> +		 * On the R5900 the RDHWR instruction
> +		 *
> +		 *     +--------+-------+----+----+-------+--------+
> +		 *     | 011111 | 00000 | rt | rd | 00000 | 111011 |
> +		 *     +--------+-------+----+----+-------+--------+
> +		 *          6       5      5    5     5        6
> +		 *
> +		 * is interpreted as the R5900 specific SQ instruction
> +		 *
> +		 *     +--------+-------+----+---------------------+
> +		 *     | 011111 |  base | rt |        offset       |
> +		 *     +--------+-------+----+---------------------+
> +		 *          6       5      5            16
> +		 *
> +		 * with an odd offset based on $0 that always yields an
> +		 * address error exception. Hence RDHWR can be trapped
> +		 * and emulated here.
> +		 */
> +		if (insn.spec3_format.func == rdhwr_op) {

 I think `r_format' is more appropriate for RDHWR (`spec3_format' really 
matches EVA instructions only; we might invent a distinct new format for 
the BSHFL, DBSHFL and RDHWR minor opcodes, but I think this would be an 
overkill) and you need to qualify the other instruction fields, i.e. `rs' 
and `re', because of the overlap with SQ.  We only want to give the 
special exception for what looks like a real RDHWR instruction and not 
just any faulting SQ whose least significant bits of the offset happen to 
match the RDHWR minor opcode.

> +			do_ri(regs);

 Or rather `simulate_rdhwr(regs, insn.r_format.rd, insn.r_format.rt)' as 
we've already qualified it.

> +			return;
> +		}
> +		goto sigill;

 This I think should be `sigbus' as the SQ opcode is valid on the R5900.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-11-10 23:34                                   ` Maciej W. Rozycki
  (?)
@ 2017-11-11 16:04                                   ` Fredrik Noring
  2018-01-29 20:27                                     ` Fredrik Noring
  -1 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2017-11-11 16:04 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

Many thanks for your review, Maciej,

>  You'll need a `-mfix-r5900' workaround in the compiler then.  One for GAS 
> for handcoded assembly might be doable as well, fixing the `reorder' mode 
> only and possibly bailing out if the conditions are met in the `noreorder' 
> mode.

-march=r5900 currently handles this for C code, but the assembler does not
attempt to fix anything, as far as I understand. As shown in

https://www.linux-mips.org/archives/linux-mips/2017-10/msg00372.html

a separate commit updates

 arch/mips/include/asm/r4kcache.h |  7 +++++++
 arch/mips/lib/memset.S           | 12 ++++++++++++
 arch/mips/lib/strlen_user.S      |  6 ++++++
 arch/mips/lib/strncpy_user.S     |  4 ++++
 arch/mips/lib/strnlen_user.S     |  6 ++++++
 5 files changed, 35 insertions(+)

to avoid this bug. Taking care of this in the assembler sounds interesting.

> > +#ifdef CONFIG_CPU_R5900
> 
>  It might be preferable to use:
> 
> 	if (IS_ENABLED(CONFIG_CPU_R5900))
> 
> instead.

Yes, that makes sense once the patch set is rebased on the latest version
after v4.12 (which was the latest version when I started out; I'm hoping
to sort out all major issues before proceeding to the latest version).

> > +	case spec3_op:
> 
>  There is already a `spec3_op' case in this `switch' statement, so you 
> need to fold your code into it (have you actually successfully built this 
> piece before posting?).

Yes, it successfully builds and runs because the patch is based on v4.12
where

	% git sh v4.12:arch/mips/kernel/unaligned.c | sed -n 942,943p
	#ifdef CONFIG_EVA
		case spec3_op:

and CONFIG_EVA is disabled.

>  I think `r_format' is more appropriate for RDHWR (`spec3_format' really 
> matches EVA instructions only; we might invent a distinct new format for 
> the BSHFL, DBSHFL and RDHWR minor opcodes, but I think this would be an 
> overkill) and you need to qualify the other instruction fields, i.e. `rs' 
> and `re', because of the overlap with SQ.  We only want to give the 
> special exception for what looks like a real RDHWR instruction and not 
> just any faulting SQ whose least significant bits of the offset happen to 
> match the RDHWR minor opcode.

Sure, please find updated patch below.

> > +			do_ri(regs);
> 
>  Or rather `simulate_rdhwr(regs, insn.r_format.rd, insn.r_format.rt)' as 
> we've already qualified it.

compute_return_epc(regs) seems to be required to avoid a boundless loop.

> > +			return;
> > +		}
> > +		goto sigill;
> 
>  This I think should be `sigbus' as the SQ opcode is valid on the R5900.

Sure, please find updated patch below.

Fredrik

diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index f41cf3ee82a7..256998085d5e 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -39,4 +39,6 @@ extern int register_nmi_notifier(struct notifier_block *nb);
 	register_nmi_notifier(&fn##_nb);				\
 })
 
+int simulate_rdhwr(struct pt_regs *regs, int rd, int rt);
+
 #endif /* _ASM_TRAPS_H */
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 38dfa27730ff..2341c3d4b1c3 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -623,7 +623,7 @@ static int simulate_llsc(struct pt_regs *regs, unsigned int opcode)
  * Simulate trapping 'rdhwr' instructions to provide user accessible
  * registers not implemented in hardware.
  */
-static int simulate_rdhwr(struct pt_regs *regs, int rd, int rt)
+int simulate_rdhwr(struct pt_regs *regs, int rd, int rt)
 {
 	struct thread_info *ti = task_thread_info(current);
 
diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c
index f806ee56e639..4f645ae3fde9 100644
--- a/arch/mips/kernel/unaligned.c
+++ b/arch/mips/kernel/unaligned.c
@@ -89,6 +89,7 @@
 #include <asm/fpu.h>
 #include <asm/fpu_emulator.h>
 #include <asm/inst.h>
+#include <asm/traps.h>
 #include <linux/uaccess.h>
 
 #define STR(x)	__STR(x)
@@ -1309,6 +1310,40 @@ static void emulate_load_store_insn(struct pt_regs *regs,
 		cu2_notifier_call_chain(CU2_SDC2_OP, regs);
 		break;
 #endif
+
+#ifdef CONFIG_CPU_R5900
+	case spec3_op:
+		/*
+		 * On the R5900 the RDHWR instruction
+		 *
+		 *     +--------+-------+----+----+-------+--------+
+		 *     | 011111 | 00000 | rt | rd | 00000 | 111011 |
+		 *     +--------+-------+----+----+-------+--------+
+		 *          6       5      5    5     5        6
+		 *
+		 * is interpreted as the R5900 specific SQ instruction
+		 *
+		 *     +--------+-------+----+---------------------+
+		 *     | 011111 |  base | rt |        offset       |
+		 *     +--------+-------+----+---------------------+
+		 *          6       5      5            16
+		 *
+		 * with an odd offset based on $0 that always yields an
+		 * address error exception. Hence RDHWR can be trapped
+		 * and emulated here.
+		 */
+		if (insn.r_format.func == rdhwr_op &&
+		    insn.r_format.rs == 0 &&
+		    insn.r_format.re == 0) {
+			if (compute_return_epc(regs) < 0 ||
+			    simulate_rdhwr(regs, insn.r_format.rd,
+				           insn.r_format.rt) < 0)
+				goto sigill;
+			return;
+		}
+		goto sigbus;
+#endif
+
 	default:
 		/*
 		 * Pheeee...  We encountered an yet unknown instruction or

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-11-24 10:26                                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-11-24 10:26 UTC (permalink / raw)
  To: Fredrik Noring, John Crispin; +Cc: linux-mips

Fredrik, John --

 John: can you please see the question below on the machine type you 
previously fiddled with?

> >  Given that the R5900 does not expand DSP support anyhow that sounds 
> > suspicious to me.
> 
> I've taken a closer look at the R5900 changes to the DSP kernel code now:
> 
> The R5900 has four three-operand instructions: MADD, MADDU, MULT and MULTU.
> In addition, it has ten instructions for pipeline 1: MULT1, MULTU1, DIV1,
> DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1 and MTLO1. Those are the reason
> (parts of) the cpu_has_dsp infrastructure is used, as shown in the patch
> below. What are your thoughts on this?
> 
> The instructions are specific to the R5900, and notably incompatible with
> similar ones in the base MIPS32 architecture. They are also distinct from
> the (also R5900 specific) 128-bit multimedia instructions.

 They're still upper halves of the architectural HI/LO accumulator and 
also used by the 128-bit multiply and divide instructions.  I think they 
should be handled analogously to the 128-bit GPRs, rather than pretending 
they're a crippled version of the DSP ASE.

> By the way, "machine" is set to "Unknown" and "ASEs implemented" is empty
> in /proc/cpuinfo. What would be the proper values for the R5900?

 I have no idea what the machine type is supposed to be set to and why it 
is not omitted by default, given this piece:

		if (mips_get_machine_name())
			seq_printf(m, "machine\t\t\t: %s\n",
				mips_get_machine_name());

I think commit 9169a5d01114 ("MIPS: move mips_{set,get}_machine_name() to 
a more generic place") broke things.  Cc-ing the author for possible 
input.

 ASEs OTOH are specific to MIPS32 and MIPS64 architectures, as per 
respective architecture specification volumes (the MIPS16 ASE might be a 
prominent exception, having been defined as the first ASE ever mid way 
through between the MIPS IV and MIPS32/MIPS64 ISAs).  As the R5900 is not 
a MIPS32 or MIPS64 processor (and has no MIPS16 support) it does not have 
any ASEs implemented.  Vendor-specific architecture extensions do not 
count as ASEs.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-11-24 10:26                                   ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-11-24 10:26 UTC (permalink / raw)
  To: Fredrik Noring, John Crispin; +Cc: linux-mips

Fredrik, John --

 John: can you please see the question below on the machine type you 
previously fiddled with?

> >  Given that the R5900 does not expand DSP support anyhow that sounds 
> > suspicious to me.
> 
> I've taken a closer look at the R5900 changes to the DSP kernel code now:
> 
> The R5900 has four three-operand instructions: MADD, MADDU, MULT and MULTU.
> In addition, it has ten instructions for pipeline 1: MULT1, MULTU1, DIV1,
> DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1 and MTLO1. Those are the reason
> (parts of) the cpu_has_dsp infrastructure is used, as shown in the patch
> below. What are your thoughts on this?
> 
> The instructions are specific to the R5900, and notably incompatible with
> similar ones in the base MIPS32 architecture. They are also distinct from
> the (also R5900 specific) 128-bit multimedia instructions.

 They're still upper halves of the architectural HI/LO accumulator and 
also used by the 128-bit multiply and divide instructions.  I think they 
should be handled analogously to the 128-bit GPRs, rather than pretending 
they're a crippled version of the DSP ASE.

> By the way, "machine" is set to "Unknown" and "ASEs implemented" is empty
> in /proc/cpuinfo. What would be the proper values for the R5900?

 I have no idea what the machine type is supposed to be set to and why it 
is not omitted by default, given this piece:

		if (mips_get_machine_name())
			seq_printf(m, "machine\t\t\t: %s\n",
				mips_get_machine_name());

I think commit 9169a5d01114 ("MIPS: move mips_{set,get}_machine_name() to 
a more generic place") broke things.  Cc-ing the author for possible 
input.

 ASEs OTOH are specific to MIPS32 and MIPS64 architectures, as per 
respective architecture specification volumes (the MIPS16 ASE might be a 
prominent exception, having been defined as the first ASE ever mid way 
through between the MIPS IV and MIPS32/MIPS64 ISAs).  As the R5900 is not 
a MIPS32 or MIPS64 processor (and has no MIPS16 support) it does not have 
any ASEs implemented.  Vendor-specific architecture extensions do not 
count as ASEs.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-11-24 10:39                                     ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-11-24 10:39 UTC (permalink / raw)
  To: Fredrik Noring, John Crispin; +Cc: linux-mips

Fredrik, John --

[Sending again, with John's address corrected.]

 John: can you please see the question below on the machine type you 
previously fiddled with?

> >  Given that the R5900 does not expand DSP support anyhow that sounds 
> > suspicious to me.
> 
> I've taken a closer look at the R5900 changes to the DSP kernel code now:
> 
> The R5900 has four three-operand instructions: MADD, MADDU, MULT and MULTU.
> In addition, it has ten instructions for pipeline 1: MULT1, MULTU1, DIV1,
> DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1 and MTLO1. Those are the reason
> (parts of) the cpu_has_dsp infrastructure is used, as shown in the patch
> below. What are your thoughts on this?
> 
> The instructions are specific to the R5900, and notably incompatible with
> similar ones in the base MIPS32 architecture. They are also distinct from
> the (also R5900 specific) 128-bit multimedia instructions.

 They're still upper halves of the architectural HI/LO accumulator and 
also used by the 128-bit multiply and divide instructions.  I think they 
should be handled analogously to the 128-bit GPRs, rather than pretending 
they're a crippled version of the DSP ASE.

> By the way, "machine" is set to "Unknown" and "ASEs implemented" is empty
> in /proc/cpuinfo. What would be the proper values for the R5900?

 I have no idea what the machine type is supposed to be set to and why it 
is not omitted by default, given this piece:

		if (mips_get_machine_name())
			seq_printf(m, "machine\t\t\t: %s\n",
				mips_get_machine_name());

I think commit 9169a5d01114 ("MIPS: move mips_{set,get}_machine_name() to 
a more generic place") broke things.  Cc-ing the author for possible 
input.

 ASEs OTOH are specific to MIPS32 and MIPS64 architectures, as per 
respective architecture specification volumes (the MIPS16 ASE might be a 
prominent exception, having been defined as the first ASE ever mid way 
through between the MIPS IV and MIPS32/MIPS64 ISAs).  As the R5900 is not 
a MIPS32 or MIPS64 processor (and has no MIPS16 support) it does not have 
any ASEs implemented.  Vendor-specific architecture extensions do not 
count as ASEs.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
@ 2017-11-24 10:39                                     ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2017-11-24 10:39 UTC (permalink / raw)
  To: Fredrik Noring, John Crispin; +Cc: linux-mips

Fredrik, John --

[Sending again, with John's address corrected.]

 John: can you please see the question below on the machine type you 
previously fiddled with?

> >  Given that the R5900 does not expand DSP support anyhow that sounds 
> > suspicious to me.
> 
> I've taken a closer look at the R5900 changes to the DSP kernel code now:
> 
> The R5900 has four three-operand instructions: MADD, MADDU, MULT and MULTU.
> In addition, it has ten instructions for pipeline 1: MULT1, MULTU1, DIV1,
> DIVU1, MADD1, MADDU1, MFHI1, MFLO1, MTHI1 and MTLO1. Those are the reason
> (parts of) the cpu_has_dsp infrastructure is used, as shown in the patch
> below. What are your thoughts on this?
> 
> The instructions are specific to the R5900, and notably incompatible with
> similar ones in the base MIPS32 architecture. They are also distinct from
> the (also R5900 specific) 128-bit multimedia instructions.

 They're still upper halves of the architectural HI/LO accumulator and 
also used by the 128-bit multiply and divide instructions.  I think they 
should be handled analogously to the 128-bit GPRs, rather than pretending 
they're a crippled version of the DSP ASE.

> By the way, "machine" is set to "Unknown" and "ASEs implemented" is empty
> in /proc/cpuinfo. What would be the proper values for the R5900?

 I have no idea what the machine type is supposed to be set to and why it 
is not omitted by default, given this piece:

		if (mips_get_machine_name())
			seq_printf(m, "machine\t\t\t: %s\n",
				mips_get_machine_name());

I think commit 9169a5d01114 ("MIPS: move mips_{set,get}_machine_name() to 
a more generic place") broke things.  Cc-ing the author for possible 
input.

 ASEs OTOH are specific to MIPS32 and MIPS64 architectures, as per 
respective architecture specification volumes (the MIPS16 ASE might be a 
prominent exception, having been defined as the first ASE ever mid way 
through between the MIPS IV and MIPS32/MIPS64 ISAs).  As the R5900 is not 
a MIPS32 or MIPS64 processor (and has no MIPS16 support) it does not have 
any ASEs implemented.  Vendor-specific architecture extensions do not 
count as ASEs.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2017-11-11 16:04                                   ` Fredrik Noring
@ 2018-01-29 20:27                                     ` Fredrik Noring
  2018-01-31 23:01                                       ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-01-29 20:27 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

Hi Maciej & Jürgen,

I have updated the PS2 patchset to v4.15 now. For the initial submission,
I'm hoping to include device drivers for USB and serial support. The first
20 or so patches are ready for review, with 5-10 additional patches needing
clean-ups.

USB maintainer Alan Stern has previewed the PS2 OHCI driver:

https://marc.info/?l=linux-usb&m=151198476018400

Simple devices such as a USB keyboard work. Jürgen Urban has reported
issues with USB mass storage devices, possibly due to lost interrupts.

I tried a wireless AR9271 USB device. It had at least two problems: First
error was "ath9k_htc: Unable to allocate URBs", due to the (very) limited
amount of reserved IOP memory (256 kb). I then adjusted a few hardcoded
ath9k_htc buffer limits. The following error was "ath9k_htc: Target is
unresponsive" which remains to investigate.

Jürgen: In ps2_uart.c for v4.15, the init_timer call needs to be replaced
with timer_setup.

Work in progress:

https://github.com/frno7/linux/tree/ps2-v4.15-n0

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [PATCH v2] MIPS: Add basic R5900 support
  2018-01-29 20:27                                     ` Fredrik Noring
@ 2018-01-31 23:01                                       ` Maciej W. Rozycki
  2018-02-11  7:29                                         ` [RFC] MIPS: R5900: Workaround for the short loop bug Fredrik Noring
                                                           ` (9 more replies)
  0 siblings, 10 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-01-31 23:01 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> I have updated the PS2 patchset to v4.15 now. For the initial submission,
> I'm hoping to include device drivers for USB and serial support. The first
> 20 or so patches are ready for review, with 5-10 additional patches needing
> clean-ups.

 Thank you for the status update.  Let me know if I could be of help 
anyhow.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Workaround for the short loop bug
  2018-01-31 23:01                                       ` Maciej W. Rozycki
@ 2018-02-11  7:29                                         ` Fredrik Noring
  2018-02-12  9:25                                           ` Maciej W. Rozycki
  2018-02-11  7:46                                         ` [RFC] MIPS: R5900: Use SYNC.L for data cache and SYNC.P for instruction cache Fredrik Noring
                                                           ` (8 subsequent siblings)
  9 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-11  7:29 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

The short loop bug under certain conditions causes loops to execute
only once or twice. GCC 2.95 that shipped with Sony PS2 Linux had a
patch with the following note:

    On the R5900, we must ensure that the compiler never generates
    loops that satisfy all of the following conditions:

    - a loop consists of less than equal to six instructions
      (including the branch delay slot);
    - a loop contains only one conditional branch instruction at
      the end of the loop;
    - a loop does not contain any other branch or jump instructions;
    - a branch delay slot of the loop is not NOP (EE 2.9 or later).

    We need to do this because of a bug in the chip.

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
The exact NOP placements in this patch are provisional. Request for comment
on the method to use. I believe there are at least three alternatives:

1. Add #ifdefs or macros in the source code (similar to this patch).
2. Modify the assembler to automatically insert NOPs as required.
3. Avoid assembly and use C versions of memcpy etc. instead.

This change has been ported from v2.6 patches.

diff --git a/arch/mips/include/asm/r4kcache.h b/arch/mips/include/asm/r4kcache.h
index 7f12d7e27c94..9659fb55abd2 100644
--- a/arch/mips/include/asm/r4kcache.h
+++ b/arch/mips/include/asm/r4kcache.h
@@ -37,6 +37,12 @@ extern void (*r4k_blast_icache)(void);
  *    without ifdefs we let the compiler do it by a type cast.
  */
 #define INDEX_BASE	CKSEG0
+#ifdef CONFIG_CPU_R5900
+/* Workaround for short loops on R5900. */
+#define R5900_LOOP_WAR() do { __asm__ __volatile__("nop;nop;\n"); } while(0)
+#else
+#define R5900_LOOP_WAR() do { } while(0)
+#endif
 
 #define cache_op(op,addr)						\
 	__asm__ __volatile__(						\
@@ -689,6 +695,7 @@ static inline void prot##extra##blast_##pfx##cache##_range(unsigned long start,
 									\
 	while (1) {							\
 		prot##cache_op(hitop, addr);				\
+		R5900_LOOP_WAR();  /* FIXME: Is this needed in C? */	\
 		if (addr == aend)					\
 			break;						\
 		addr += lsize;						\
diff --git a/arch/mips/lib/memcpy.S b/arch/mips/lib/memcpy.S
index 03e3304d6ae5..713015f6faa2 100644
--- a/arch/mips/lib/memcpy.S
+++ b/arch/mips/lib/memcpy.S
@@ -339,6 +339,12 @@
 	STORE(t1, UNIT(-1)(dst), .Ls_exc_p1u\@)
 	PREFS(	0, 8*32(src) )
 	PREFD(	1, 8*32(dst) )
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne	len, rem, 1b
 	 nop
 
@@ -382,6 +388,12 @@
 	STORE(t0, 0(dst), .Ls_exc_p1u\@)
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, NBYTES
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne	rem, len, 1b
 	.set	noreorder
 
@@ -467,6 +479,12 @@
 	PREFD(	1, 9*32(dst) )		# 1 is PREF_STORE (not streamed)
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, 4*NBYTES
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne	len, rem, 1b
 	.set	noreorder
 
@@ -484,6 +502,12 @@
 	STORE(t0, 0(dst), .Ls_exc_p1u\@)
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, NBYTES
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne	len, rem, 1b
 	.set	noreorder
 
@@ -528,6 +552,12 @@
 	COPY_BYTE(6)
 	COPY_BYTE(7)
 	ADD	src, src, 8
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	b	1b
 	 ADD	dst, dst, 8
 #endif /* CONFIG_CPU_MIPSR6 */
@@ -557,6 +587,12 @@
 	sb	t1, 0(dst)	# can't fault -- we're copy_from_user
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, 1
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne	src, t0, 1b
 	.set	noreorder
 .Ll_exc\@:
@@ -623,6 +659,12 @@ LEAF(__rmemcpy)					/* a0=dst a1=src a2=len */
 	SUB	a1, a1, 0x1
 	.set	reorder				/* DADDI_WAR */
 	SUB	a0, a0, 0x1
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bnez	a2, .Lr_end_bytes
 	.set	noreorder
 
@@ -638,6 +680,12 @@ LEAF(__rmemcpy)					/* a0=dst a1=src a2=len */
 	ADD	a1, a1, 0x1
 	.set	reorder				/* DADDI_WAR */
 	ADD	a0, a0, 0x1
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bnez	a2, .Lr_end_bytes_up
 	.set	noreorder
 
diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index a1456664d6c2..489bc9cffcbd 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -156,6 +156,12 @@
 1:	PTR_ADDIU	a0, 64
 	R10KCBARRIER(0(ra))
 	f_fill64 a0, -64, FILL64RG, .Lfwd_fixup\@, \mode
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne		t1, a0, 1b
 	.set		noreorder
 
@@ -218,6 +224,12 @@
 
 1:	PTR_ADDIU	a0, 1			/* fill bytewise */
 	R10KCBARRIER(0(ra))
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bne		t1, a0, 1b
 	sb		a1, -1(a0)
 
diff --git a/arch/mips/lib/strncpy_user.S b/arch/mips/lib/strncpy_user.S
index acdff66bd5d2..44cc346fd400 100644
--- a/arch/mips/lib/strncpy_user.S
+++ b/arch/mips/lib/strncpy_user.S
@@ -48,6 +48,10 @@ LEAF(__strncpy_from_\func\()_asm)
 	beqz		v0, 2f
 	PTR_ADDIU	t0, 1
 	PTR_ADDIU	a0, 1
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+#endif
 	bne		t0, a2, 1b
 2:	PTR_ADDU	v0, a1, t0
 	xor		v0, a1
diff --git a/arch/mips/lib/strnlen_user.S b/arch/mips/lib/strnlen_user.S
index e1bacf5a3abe..474979641a8d 100644
--- a/arch/mips/lib/strnlen_user.S
+++ b/arch/mips/lib/strnlen_user.S
@@ -46,6 +46,12 @@ LEAF(__strnlen_\func\()_asm)
 	EX(lbe, t0, (v0), .Lfault\@)
 .endif
 	.set		noreorder
+#ifdef CONFIG_CPU_R5900
+	/* No short loops. */
+	nop
+	nop
+	nop
+#endif
 	bnez		t0, 1b
 1:
 #ifndef CONFIG_CPU_DADDI_WORKAROUNDS

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Use SYNC.L for data cache and SYNC.P for instruction cache
  2018-01-31 23:01                                       ` Maciej W. Rozycki
  2018-02-11  7:29                                         ` [RFC] MIPS: R5900: Workaround for the short loop bug Fredrik Noring
@ 2018-02-11  7:46                                         ` Fredrik Noring
  2018-02-11  7:56                                         ` [RFC] MIPS: R5900: Workaround exception NOP execution bug (FLX05) Fredrik Noring
                                                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-11  7:46 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

Toshiba TX79 manual programming notes:

    For all CACHE sub-operations which operate on the instruction cache
    the following programming restrictions have to be followed:

    1. A sequence of CACHE instructions has to be directly preceded and
       followed by a SYNC.P instruction.
    2. Each individual FILL sub-operation has to be followed by a SYNC.L
       instruction.

    For all CACHE sub-operations which operate on the data cache the
    following programming restrictions have to be followed:

    1. A sequence of CACHE instructions have to be directly preceded and
       followed by a SYNC.L instruction.
    2. Each of the three WRITEBACK sub-operations have to be
       individually followed by a SYNC.L instruction.

    For all CACHE sub-operations which operate on the BTAC the following
    programming restrictions have to be followed:

    1. A sequence of CACHE instructions have to be directly preceded and
       followed by a SYNC.P instruction.

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
This change has been ported from v2.6 patches.

diff --git a/arch/mips/include/asm/r4kcache.h b/arch/mips/include/asm/r4kcache.h
index 1b69bb602ffd..c1834e2ed92d 100644
--- a/arch/mips/include/asm/r4kcache.h
+++ b/arch/mips/include/asm/r4kcache.h
@@ -46,7 +46,7 @@ extern void (*r4k_blast_icache)(void);
 #define R5900_LOOP_WAR() do { } while(0)
 #endif
 
-#define cache_op(op,addr)						\
+#define cache_op_s(op,addr)						\
 	__asm__ __volatile__(						\
 	"	.set	push					\n"	\
 	"	.set	noreorder				\n"	\
@@ -55,6 +55,35 @@ extern void (*r4k_blast_icache)(void);
 	"	.set	pop					\n"	\
 	:								\
 	: "i" (op), "R" (*(unsigned char *)(addr)))
+#ifdef CONFIG_CPU_R5900
+#define cache_op_d(op,addr)						\
+	__asm__ __volatile__(						\
+	"	.set	push					\n"	\
+	"	.set	noreorder				\n"	\
+	"	.set	mips3\n\t				\n"	\
+	"	sync.l						\n"	\
+	"	cache	%0, %1					\n"	\
+	"	sync.l						\n"	\
+	"	.set	pop					\n"	\
+	:								\
+	: "i" (op), "R" (*(unsigned char *)(addr)))
+#define cache_op_i(op,addr)						\
+	__asm__ __volatile__(						\
+	"	.set	push					\n"	\
+	"	.set	noreorder				\n"	\
+	"	.set	mips3\n\t				\n"	\
+	"	sync.p						\n"	\
+	"	cache	%0, %1					\n"	\
+	"	sync.p						\n"	\
+	"	.set	pop					\n"	\
+	:								\
+	: "i" (op), "R" (*(unsigned char *)(addr)))
+#else
+#define cache_op_d cache_op_s
+#define cache_op_i cache_op_s
+#define cache_op cache_op_s
+#endif
+#define cache_op_t cache_op_s
 
 #ifdef CONFIG_MIPS_MT
 
@@ -99,20 +128,20 @@ extern void (*r4k_blast_icache)(void);
 static inline void flush_icache_line_indexed(unsigned long addr)
 {
 	__iflush_prologue
-	cache_op(Index_Invalidate_I, addr);
+	cache_op_i(Index_Invalidate_I, addr);
 	__iflush_epilogue
 }
 
 static inline void flush_dcache_line_indexed(unsigned long addr)
 {
 	__dflush_prologue
-	cache_op(Index_Writeback_Inv_D, addr);
+	cache_op_d(Index_Writeback_Inv_D, addr);
 	__dflush_epilogue
 }
 
 static inline void flush_scache_line_indexed(unsigned long addr)
 {
-	cache_op(Index_Writeback_Inv_SD, addr);
+	cache_op_s(Index_Writeback_Inv_SD, addr);
 }
 
 static inline void flush_icache_line(unsigned long addr)
@@ -120,11 +149,11 @@ static inline void flush_icache_line(unsigned long addr)
 	__iflush_prologue
 	switch (boot_cpu_type()) {
 	case CPU_LOONGSON2:
-		cache_op(Hit_Invalidate_I_Loongson2, addr);
+		cache_op_i(Hit_Invalidate_I_Loongson2, addr);
 		break;
 
 	default:
-		cache_op(Hit_Invalidate_I, addr);
+		cache_op_i(Hit_Invalidate_I, addr);
 		break;
 	}
 	__iflush_epilogue
@@ -133,28 +162,28 @@ static inline void flush_icache_line(unsigned long addr)
 static inline void flush_dcache_line(unsigned long addr)
 {
 	__dflush_prologue
-	cache_op(Hit_Writeback_Inv_D, addr);
+	cache_op_d(Hit_Writeback_Inv_D, addr);
 	__dflush_epilogue
 }
 
 static inline void invalidate_dcache_line(unsigned long addr)
 {
 	__dflush_prologue
-	cache_op(Hit_Invalidate_D, addr);
+	cache_op_d(Hit_Invalidate_D, addr);
 	__dflush_epilogue
 }
 
 static inline void invalidate_scache_line(unsigned long addr)
 {
-	cache_op(Hit_Invalidate_SD, addr);
+	cache_op_s(Hit_Invalidate_SD, addr);
 }
 
 static inline void flush_scache_line(unsigned long addr)
 {
-	cache_op(Hit_Writeback_Inv_SD, addr);
+	cache_op_s(Hit_Writeback_Inv_SD, addr);
 }
 
-#define protected_cache_op(op,addr)				\
+#define protected_cache_op_s(op,addr)				\
 ({								\
 	int __err = 0;						\
 	__asm__ __volatile__(					\
@@ -176,6 +205,49 @@ static inline void flush_scache_line(unsigned long addr)
 	__err;							\
 })
 
+#ifdef CONFIG_CPU_R5900
+#define protected_cache_op_d(op,addr)				\
+({								\
+	int __err = 0;						\
+	__asm__ __volatile__(					\
+	"	.set	push			\n"		\
+	"	.set	noreorder		\n"		\
+	"	.set	mips3			\n"		\
+	"	sync.l				\n"		\
+	"1:	cache	%0, (%1)		\n"		\
+	"	sync.l				\n"		\
+	"2:	.set	pop			\n"		\
+	"	.section __ex_table,\"a\"	\n"		\
+	"	"STR(PTR)" 1b, 2b		\n"		\
+	"	.previous"					\
+	:							\
+	: "i" (op), "r" (addr));				\
+	__err;							\
+})
+
+#define protected_cache_op_i(op,addr)				\
+({								\
+	int __err = 0;						\
+	__asm__ __volatile__(					\
+	"	.set	push			\n"		\
+	"	.set	noreorder		\n"		\
+	"	.set	mips3			\n"		\
+	"	sync.p				\n"		\
+	"1:	cache	%0, (%1)		\n"		\
+	"	sync.p				\n"		\
+	"2:	.set	pop			\n"		\
+	"	.section __ex_table,\"a\"	\n"		\
+	"	"STR(PTR)" 1b, 2b		\n"		\
+	"	.previous"					\
+	:							\
+	: "i" (op), "r" (addr));				\
+	__err;							\
+})
+#else
+#define protected_cache_op_i protected_cache_op_s
+#define protected_cache_op_d protected_cache_op_s
+#define protected_cache_op protected_cache_op_s
+#endif
 
 #define protected_cachee_op(op,addr)				\
 ({								\
@@ -207,13 +279,13 @@ static inline int protected_flush_icache_line(unsigned long addr)
 {
 	switch (boot_cpu_type()) {
 	case CPU_LOONGSON2:
-		return protected_cache_op(Hit_Invalidate_I_Loongson2, addr);
+		return protected_cache_op_i(Hit_Invalidate_I_Loongson2, addr);
 
 	default:
 #ifdef CONFIG_EVA
-		return protected_cachee_op(Hit_Invalidate_I, addr);
+		return protected_cachee_op_i(Hit_Invalidate_I, addr);
 #else
-		return protected_cache_op(Hit_Invalidate_I, addr);
+		return protected_cache_op_i(Hit_Invalidate_I, addr);
 #endif
 	}
 }
@@ -227,18 +299,18 @@ static inline int protected_flush_icache_line(unsigned long addr)
 static inline int protected_writeback_dcache_line(unsigned long addr)
 {
 #ifdef CONFIG_EVA
-	return protected_cachee_op(Hit_Writeback_Inv_D, addr);
+	return protected_cachee_op_d(Hit_Writeback_Inv_D, addr);
 #else
-	return protected_cache_op(Hit_Writeback_Inv_D, addr);
+	return protected_cache_op_d(Hit_Writeback_Inv_D, addr);
 #endif
 }
 
 static inline int protected_writeback_scache_line(unsigned long addr)
 {
 #ifdef CONFIG_EVA
-	return protected_cachee_op(Hit_Writeback_Inv_SD, addr);
+	return protected_cachee_op_s(Hit_Writeback_Inv_SD, addr);
 #else
-	return protected_cache_op(Hit_Writeback_Inv_SD, addr);
+	return protected_cache_op_s(Hit_Writeback_Inv_SD, addr);
 #endif
 }
 
@@ -247,7 +319,7 @@ static inline int protected_writeback_scache_line(unsigned long addr)
  */
 static inline void invalidate_tcache_page(unsigned long addr)
 {
-	cache_op(Page_Invalidate_T, addr);
+	cache_op_t(Page_Invalidate_T, addr);
 }
 
 #ifndef CONFIG_CPU_MIPSR6
@@ -328,6 +400,65 @@ static inline void invalidate_tcache_page(unsigned long addr)
 		:							\
 		: "r" (base),						\
 		  "i" (op));
+#ifdef CONFIG_CPU_R5900
+#define cache64_unroll32_d(base,op)					\
+	__asm__ __volatile__(						\
+	"	.set push					\n"	\
+	"	.set noreorder					\n"	\
+	"	.set mips3					\n"	\
+	"	sync.l						\n"	\
+	"	cache %1, 0x000(%0); sync.l; cache %1, 0x040(%0); sync.l	\n"	\
+	"	cache %1, 0x080(%0); sync.l; cache %1, 0x0c0(%0); sync.l	\n"	\
+	"	cache %1, 0x100(%0); sync.l; cache %1, 0x140(%0); sync.l	\n"	\
+	"	cache %1, 0x180(%0); sync.l; cache %1, 0x1c0(%0); sync.l	\n"	\
+	"	cache %1, 0x200(%0); sync.l; cache %1, 0x240(%0); sync.l	\n"	\
+	"	cache %1, 0x280(%0); sync.l; cache %1, 0x2c0(%0); sync.l	\n"	\
+	"	cache %1, 0x300(%0); sync.l; cache %1, 0x340(%0); sync.l	\n"	\
+	"	cache %1, 0x380(%0); sync.l; cache %1, 0x3c0(%0); sync.l	\n"	\
+	"	cache %1, 0x400(%0); sync.l; cache %1, 0x440(%0); sync.l	\n"	\
+	"	cache %1, 0x480(%0); sync.l; cache %1, 0x4c0(%0); sync.l	\n"	\
+	"	cache %1, 0x500(%0); sync.l; cache %1, 0x540(%0); sync.l	\n"	\
+	"	cache %1, 0x580(%0); sync.l; cache %1, 0x5c0(%0); sync.l	\n"	\
+	"	cache %1, 0x600(%0); sync.l; cache %1, 0x640(%0); sync.l	\n"	\
+	"	cache %1, 0x680(%0); sync.l; cache %1, 0x6c0(%0); sync.l	\n"	\
+	"	cache %1, 0x700(%0); sync.l; cache %1, 0x740(%0); sync.l	\n"	\
+	"	cache %1, 0x780(%0); sync.l; cache %1, 0x7c0(%0); sync.l	\n"	\
+	"	.set pop					\n"	\
+		:							\
+		: "r" (base),						\
+		  "i" (op));
+
+#define cache64_unroll32_i(base,op)					\
+	__asm__ __volatile__(						\
+	"	.set push					\n"	\
+	"	.set noreorder					\n"	\
+	"	.set mips3					\n"	\
+	"	sync.p						\n"	\
+	"	cache %1, 0x000(%0); cache %1, 0x040(%0)	\n"	\
+	"	cache %1, 0x080(%0); cache %1, 0x0c0(%0)	\n"	\
+	"	cache %1, 0x100(%0); cache %1, 0x140(%0)	\n"	\
+	"	cache %1, 0x180(%0); cache %1, 0x1c0(%0)	\n"	\
+	"	cache %1, 0x200(%0); cache %1, 0x240(%0)	\n"	\
+	"	cache %1, 0x280(%0); cache %1, 0x2c0(%0)	\n"	\
+	"	cache %1, 0x300(%0); cache %1, 0x340(%0)	\n"	\
+	"	cache %1, 0x380(%0); cache %1, 0x3c0(%0)	\n"	\
+	"	cache %1, 0x400(%0); cache %1, 0x440(%0)	\n"	\
+	"	cache %1, 0x480(%0); cache %1, 0x4c0(%0)	\n"	\
+	"	cache %1, 0x500(%0); cache %1, 0x540(%0)	\n"	\
+	"	cache %1, 0x580(%0); cache %1, 0x5c0(%0)	\n"	\
+	"	cache %1, 0x600(%0); cache %1, 0x640(%0)	\n"	\
+	"	cache %1, 0x680(%0); cache %1, 0x6c0(%0)	\n"	\
+	"	cache %1, 0x700(%0); cache %1, 0x740(%0)	\n"	\
+	"	cache %1, 0x780(%0); cache %1, 0x7c0(%0)	\n"	\
+	"	sync.p						\n"	\
+	"	.set pop					\n"	\
+		:							\
+		: "r" (base),						\
+		  "i" (op));
+#else
+#define cache64_unroll32_i cache64_unroll32
+#define cache64_unroll32_d cache64_unroll32
+#endif
 
 #define cache128_unroll32(base,op)					\
 	__asm__ __volatile__(						\
@@ -584,7 +715,7 @@ static inline void invalidate_tcache_page(unsigned long addr)
 		  "i" (op));
 
 /* build blast_xxx, blast_xxx_page, blast_xxx_page_indexed */
-#define __BUILD_BLAST_CACHE(pfx, desc, indexop, hitop, lsize, extra)	\
+#define __BUILD_BLAST_CACHE(fn_pfx, pfx, desc, indexop, hitop, lsize, extra)	\
 static inline void extra##blast_##pfx##cache##lsize(void)		\
 {									\
 	unsigned long start = INDEX_BASE;				\
@@ -598,7 +729,7 @@ static inline void extra##blast_##pfx##cache##lsize(void)		\
 									\
 	for (ws = 0; ws < ws_end; ws += ws_inc)				\
 		for (addr = start; addr < end; addr += lsize * 32)	\
-			cache##lsize##_unroll32(addr|ws, indexop);	\
+			cache##lsize##_unroll32##fn_pfx(addr|ws, indexop);	\
 									\
 	__##pfx##flush_epilogue						\
 }									\
@@ -611,7 +742,7 @@ static inline void extra##blast_##pfx##cache##lsize##_page(unsigned long page) \
 	__##pfx##flush_prologue						\
 									\
 	do {								\
-		cache##lsize##_unroll32(start, hitop);			\
+		cache##lsize##_unroll32##fn_pfx(start, hitop);			\
 		start += lsize * 32;					\
 	} while (start < end);						\
 									\
@@ -632,31 +763,31 @@ static inline void extra##blast_##pfx##cache##lsize##_page_indexed(unsigned long
 									\
 	for (ws = 0; ws < ws_end; ws += ws_inc)				\
 		for (addr = start; addr < end; addr += lsize * 32)	\
-			cache##lsize##_unroll32(addr|ws, indexop);	\
+			cache##lsize##_unroll32##fn_pfx(addr|ws, indexop);	\
 									\
 	__##pfx##flush_epilogue						\
 }
 
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 16, )
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 16, )
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 16, )
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 32, )
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 32, )
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I_Loongson2, 32, loongson2_)
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 32, )
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 64, )
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 64, )
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 64, )
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 128, )
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 128, )
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 128, )
+__BUILD_BLAST_CACHE(, d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 16, )
+__BUILD_BLAST_CACHE(, i, icache, Index_Invalidate_I, Hit_Invalidate_I, 16, )
+__BUILD_BLAST_CACHE(, s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 16, )
+__BUILD_BLAST_CACHE(, d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 32, )
+__BUILD_BLAST_CACHE(, i, icache, Index_Invalidate_I, Hit_Invalidate_I, 32, )
+__BUILD_BLAST_CACHE(, i, icache, Index_Invalidate_I, Hit_Invalidate_I_Loongson2, 32, loongson2_)
+__BUILD_BLAST_CACHE(, s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 32, )
+__BUILD_BLAST_CACHE(_d, d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 64, )
+__BUILD_BLAST_CACHE(_i, i, icache, Index_Invalidate_I, Hit_Invalidate_I, 64, )
+__BUILD_BLAST_CACHE(, s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 64, )
+__BUILD_BLAST_CACHE(, d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 128, )
+__BUILD_BLAST_CACHE(, i, icache, Index_Invalidate_I, Hit_Invalidate_I, 128, )
+__BUILD_BLAST_CACHE(, s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 128, )
 
-__BUILD_BLAST_CACHE(inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 16, )
-__BUILD_BLAST_CACHE(inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 32, )
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 16, )
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 32, )
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 64, )
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 128, )
+__BUILD_BLAST_CACHE(, inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 16, )
+__BUILD_BLAST_CACHE(, inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 32, )
+__BUILD_BLAST_CACHE(, inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 16, )
+__BUILD_BLAST_CACHE(, inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 32, )
+__BUILD_BLAST_CACHE(, inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 64, )
+__BUILD_BLAST_CACHE(, inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 128, )
 
 #define __BUILD_BLAST_USER_CACHE(pfx, desc, indexop, hitop, lsize) \
 static inline void blast_##pfx##cache##lsize##_user_page(unsigned long page) \
@@ -685,7 +816,7 @@ __BUILD_BLAST_USER_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D,
 __BUILD_BLAST_USER_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 64)
 
 /* build blast_xxx_range, protected_blast_xxx_range */
-#define __BUILD_BLAST_CACHE_RANGE(pfx, desc, hitop, prot, extra)	\
+#define __BUILD_BLAST_CACHE_RANGE(fn_pfx, pfx, desc, hitop, prot, extra)	\
 static inline void prot##extra##blast_##pfx##cache##_range(unsigned long start, \
 						    unsigned long end)	\
 {									\
@@ -696,7 +827,7 @@ static inline void prot##extra##blast_##pfx##cache##_range(unsigned long start,
 	__##pfx##flush_prologue						\
 									\
 	while (1) {							\
-		prot##cache_op(hitop, addr);				\
+		prot##cache_op##fn_pfx(hitop, addr);			\
 		R5900_LOOP_WAR();  /* FIXME: Is this needed in C? */	\
 		if (addr == aend)					\
 			break;						\
@@ -708,8 +839,8 @@ static inline void prot##extra##blast_##pfx##cache##_range(unsigned long start,
 
 #ifndef CONFIG_EVA
 
-__BUILD_BLAST_CACHE_RANGE(d, dcache, Hit_Writeback_Inv_D, protected_, )
-__BUILD_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I, protected_, )
+__BUILD_BLAST_CACHE_RANGE(_d, d, dcache, Hit_Writeback_Inv_D, protected_, )
+__BUILD_BLAST_CACHE_RANGE(_i, i, icache, Hit_Invalidate_I, protected_, )
 
 #else
 
@@ -746,14 +877,14 @@ __BUILD_PROT_BLAST_CACHE_RANGE(d, dcache, Hit_Writeback_Inv_D)
 __BUILD_PROT_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I)
 
 #endif
-__BUILD_BLAST_CACHE_RANGE(s, scache, Hit_Writeback_Inv_SD, protected_, )
-__BUILD_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I_Loongson2, \
+__BUILD_BLAST_CACHE_RANGE(_s, s, scache, Hit_Writeback_Inv_SD, protected_, )
+__BUILD_BLAST_CACHE_RANGE(_i, i, icache, Hit_Invalidate_I_Loongson2, \
 	protected_, loongson2_)
-__BUILD_BLAST_CACHE_RANGE(d, dcache, Hit_Writeback_Inv_D, , )
-__BUILD_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I, , )
-__BUILD_BLAST_CACHE_RANGE(s, scache, Hit_Writeback_Inv_SD, , )
+__BUILD_BLAST_CACHE_RANGE(_d, d, dcache, Hit_Writeback_Inv_D, , )
+__BUILD_BLAST_CACHE_RANGE(_i, i, icache, Hit_Invalidate_I, , )
+__BUILD_BLAST_CACHE_RANGE(_s, s, scache, Hit_Writeback_Inv_SD, , )
 /* blast_inv_dcache_range */
-__BUILD_BLAST_CACHE_RANGE(inv_d, dcache, Hit_Invalidate_D, , )
-__BUILD_BLAST_CACHE_RANGE(inv_s, scache, Hit_Invalidate_SD, , )
+__BUILD_BLAST_CACHE_RANGE(_d, inv_d, dcache, Hit_Invalidate_D, , )
+__BUILD_BLAST_CACHE_RANGE(_d, inv_s, scache, Hit_Invalidate_SD, , )
 
 #endif /* _ASM_R4KCACHE_H */
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 3cd5bb465354..432ebd18cb7c 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -1626,14 +1626,14 @@ static int probe_scache(void)
 	write_c0_taglo(0);
 	write_c0_taghi(0);
 	__asm__ __volatile__("nop; nop; nop; nop;"); /* avoid the hazard */
-	cache_op(Index_Store_Tag_I, begin);
-	cache_op(Index_Store_Tag_D, begin);
-	cache_op(Index_Store_Tag_SD, begin);
+	cache_op_i(Index_Store_Tag_I, begin);
+	cache_op_d(Index_Store_Tag_D, begin);
+	cache_op_s(Index_Store_Tag_SD, begin);
 
 	/* Now search for the wrap around point. */
 	pow2 = (128 * 1024);
 	for (addr = begin + (128 * 1024); addr < end; addr = begin + pow2) {
-		cache_op(Index_Load_Tag_SD, addr);
+		cache_op_s(Index_Load_Tag_SD, addr);
 		__asm__ __volatile__("nop; nop; nop; nop;"); /* hazard... */
 		if (!read_c0_taglo())
 			break;

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-01-31 23:01                                       ` Maciej W. Rozycki
  2018-02-11  7:29                                         ` [RFC] MIPS: R5900: Workaround for the short loop bug Fredrik Noring
  2018-02-11  7:46                                         ` [RFC] MIPS: R5900: Use SYNC.L for data cache and SYNC.P for instruction cache Fredrik Noring
@ 2018-02-11  7:56                                         ` Fredrik Noring
  2018-02-12  9:28                                           ` Maciej W. Rozycki
  2018-02-11  8:01                                         ` [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot Fredrik Noring
                                                           ` (6 subsequent siblings)
  9 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-11  7:56 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

For the R5900, there are cases in which the first two instructions
in an exception handler are executed as NOP instructions, when
certain exceptions occur and then a bus error occurs immediately
before jumping to the exception handler (FLX05).

The corrective measure is to place NOP in the first two instruction
locations in all exception handlers.

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
This change has been ported from v2.6 patches.

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index c7b64f4a8ad3..4008298c1880 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -32,6 +32,10 @@
 NESTED(except_vec3_generic, 0, sp)
 	.set	push
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 #if R5432_CP0_INTERRUPT_WAR
 #ifdef CONFIG_CPU_R5900
 	sync.p
@@ -62,6 +66,8 @@ NESTED(except_vec3_r4000, 0, sp)
 	.set	arch=r4000
 	.set	noat
 #ifdef CONFIG_CPU_R5900
+	nop
+	nop
 	sync.p
 #endif
 	mfc0	k1, CP0_CAUSE
@@ -174,6 +180,10 @@ LEAF(__r4k_wait)
 	.align	5
 BUILD_ROLLBACK_PROLOGUE handle_int
 NESTED(handle_int, PT_SIZE, sp)
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	.cfi_signal_frame
 #ifdef CONFIG_TRACE_IRQFLAGS
 	/*
@@ -275,6 +285,10 @@ NESTED(handle_int, PT_SIZE, sp)
  * to fit into space reserved for the exception handler.
  */
 NESTED(except_vec4, 0, sp)
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 1:	j	1b			/* Dummy, will be replaced */
 	END(except_vec4)
 
@@ -285,6 +299,10 @@ NESTED(except_vec4, 0, sp)
  * unconditional jump to this vector.
  */
 NESTED(except_vec_ejtag_debug, 0, sp)
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	j	ejtag_debug_handler
 #ifdef CONFIG_CPU_MICROMIPS
 	 nop
@@ -300,6 +318,10 @@ NESTED(except_vec_ejtag_debug, 0, sp)
  */
 BUILD_ROLLBACK_PROLOGUE except_vec_vi
 NESTED(except_vec_vi, 0, sp)
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	SAVE_SOME docfi=1
 	SAVE_AT docfi=1
 	.set	push
@@ -319,6 +341,10 @@ EXPORT(except_vec_vi_end)
  * Complete the register saves and invoke the handler which is passed in $v0
  */
 NESTED(except_vec_vi_handler, 0, sp)
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	SAVE_TEMP
 	SAVE_STATIC
 	CLI
@@ -378,6 +404,10 @@ NESTED(except_vec_vi_handler, 0, sp)
 NESTED(ejtag_debug_handler, PT_SIZE, sp)
 	.set	push
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	MTC0	k0, CP0_DESAVE
 #ifdef CONFIG_CPU_R5900
 	sync.p
@@ -424,6 +454,10 @@ EXPORT(ejtag_debug_buffer)
  * unconditional jump to this vector.
  */
 NESTED(except_vec_nmi, 0, sp)
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	j	nmi_handler
 #ifdef CONFIG_CPU_MICROMIPS
 	 nop
@@ -436,6 +470,10 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	.cfi_signal_frame
 	.set	push
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	/*
 	 * Clear ERL - restore segment mapping
 	 * Clear BEV - required for page fault exception handler to work
@@ -521,6 +559,10 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	NESTED(handle_\exception, PT_SIZE, sp)
 	.cfi_signal_frame
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 	SAVE_ALL
 	FEXPORT(handle_\exception\ext)
 	__build_clear_\clear
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 89b425646647..e56f988b5c20 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -30,6 +30,18 @@ NESTED(handle_sys, PT_SIZE, sp)
 	.set	noat
 #ifdef CONFIG_CPU_R5900
 	/*
+	 * For the R5900, there are cases in which the first two instructions
+	 * in an exception handler are executed as NOP instructions, when
+	 * certain exceptions occur and then a bus error occurs immediately
+	 * before jumping to the exception handler (FLX05).
+	 *
+	 * The corrective measure is to place NOP in the first two instruction
+	 * locations in all exception handlers.
+	 */
+	nop
+	nop
+
+	/*
 	 * We don't want to stumble over broken sign extensions from
 	 * userland. O32 does never use the upper half, but since the
 	 * R5900 does not implement CP0.Status.UX it cannot enforce it.
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index a18b013fd887..fc7ec8f9eed8 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1308,6 +1308,11 @@ static void build_r4000_tlb_refill_handler(void)
 	memset(relocs, 0, sizeof(relocs));
 	memset(final_handler, 0, sizeof(final_handler));
 
+#ifdef CONFIG_CPU_R5900
+	uasm_i_nop(&p);
+	uasm_i_nop(&p);
+#endif
+
 	if (IS_ENABLED(CONFIG_64BIT) && (scratch_reg >= 0 || scratchpad_available()) && use_bbit_insns()) {
 		htlb_info = build_fast_tlb_refill_handler(&p, &l, &r, K0, K1,
 							  scratch_reg);
@@ -2049,6 +2054,11 @@ build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
 {
 	struct work_registers wr = build_get_work_registers(p);
 
+#ifdef CONFIG_CPU_R5900
+	uasm_i_nop(p);
+	uasm_i_nop(p);
+#endif
+
 #ifdef CONFIG_64BIT
 	build_get_pmde64(p, l, r, wr.r1, wr.r2); /* get pmd in ptr */
 #else

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (2 preceding siblings ...)
  2018-02-11  7:56                                         ` [RFC] MIPS: R5900: Workaround exception NOP execution bug (FLX05) Fredrik Noring
@ 2018-02-11  8:01                                         ` Fredrik Noring
  2018-02-11 11:16                                           ` Aw: " "Jürgen Urban"
  2018-02-11  8:09                                         ` [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE Fredrik Noring
                                                           ` (5 subsequent siblings)
  9 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-11  8:01 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
This change has been ported from v2.6 patches. I have not found any note
describing this in the TX79 manual.

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index 4008298c1880..a0b0fbedad8c 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -52,6 +52,14 @@ NESTED(except_vec3_generic, 0, sp)
 #endif
 	PTR_L	k0, exception_handlers(k1)
 	jr	k0
+#ifdef CONFIG_CPU_R5900
+	/* There should be nothing which looks like a cache instruction. */
+	nop
+	nop
+	nop
+	nop
+	nop
+#endif
 	.set	pop
 	END(except_vec3_generic)
 
@@ -709,6 +717,14 @@ isrdhwr:
 	.set	arch=r4000
 	eret
 	.set	mips0
+#ifdef CONFIG_CPU_R5900
+	/* There should be nothing which looks like cache instruction. */
+	nop
+	nop
+	nop
+	nop
+	nop
+#endif
 #endif
 	.set	pop
 	END(handle_ri_rdhwr)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 761b6c369321..795c490a429f 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1950,12 +1950,36 @@ void __init *set_except_vector(int n, void *addr)
 		u32 *buf = (u32 *)(ebase + 0x200);
 		unsigned int k0 = 26;
 		if ((handler & jump_mask) == ((ebase + 0x200) & jump_mask)) {
+#ifdef CONFIG_CPU_R5900
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+#endif
 			uasm_i_j(&buf, handler & ~jump_mask);
 			uasm_i_nop(&buf);
+#ifdef CONFIG_CPU_R5900
+			/* There are no data allowed which could be interpreted as cache instruction. */
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+#endif
 		} else {
+#ifdef CONFIG_CPU_R5900
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+#endif
 			UASM_i_LA(&buf, k0, handler);
 			uasm_i_jr(&buf, k0);
 			uasm_i_nop(&buf);
+#ifdef CONFIG_CPU_R5900
+			/* There are no data allowed which could be interpreted as cache instruction. */
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+			uasm_i_nop(&buf);
+#endif
 		}
 		local_flush_icache_range(ebase + 0x200, (unsigned long)buf);
 	}

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (3 preceding siblings ...)
  2018-02-11  8:01                                         ` [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot Fredrik Noring
@ 2018-02-11  8:09                                         ` Fredrik Noring
  2018-02-11 11:07                                           ` Aw: " "Jürgen Urban"
  2018-02-11  8:29                                         ` [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers Fredrik Noring
                                                           ` (4 subsequent siblings)
  9 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-11  8:09 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

Signed-off-by: Fredrik Noring <noring@nocrew.org>
---
This change has been ported from v2.6 patches. I have not found any note
describing this in the TX79 manual.

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index e23765312e25..b67f31d04716 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1378,6 +1378,16 @@ static void build_r4000_tlb_refill_handler(void)
 		uasm_l_leave(&l, p);
 		uasm_i_eret(&p); /* return from trap */
 	}
+
+#ifdef CONFIG_CPU_R5900
+	/* There should be nothing which can be interpreted as cache instruction. */
+	uasm_i_nop(&p);
+	uasm_i_nop(&p);
+	uasm_i_nop(&p);
+	uasm_i_nop(&p);
+	uasm_i_nop(&p);
+#endif
+
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
 	uasm_l_tlb_huge_update(&l, p);
 	if (htlb_info.need_reload_pte)
@@ -2132,6 +2142,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct uasm_label **l,
 	uasm_l_leave(l, *p);
 	build_restore_work_registers(p);
 	uasm_i_eret(p); /* return from trap */
+#ifdef CONFIG_CPU_R5900
+	/* There should be nothing which can be interpreted as cache instruction. */
+	uasm_i_nop(p);
+	uasm_i_nop(p);
+	uasm_i_nop(p);
+	uasm_i_nop(p);
+	uasm_i_nop(p);
+#endif
 
 #ifdef CONFIG_64BIT
 	build_get_pgd_vmalloc64(p, l, r, tmp, ptr, not_refill);

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (4 preceding siblings ...)
  2018-02-11  8:09                                         ` [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE Fredrik Noring
@ 2018-02-11  8:29                                         ` Fredrik Noring
  2018-02-11 10:33                                           ` Aw: " "Jürgen Urban"
  2018-02-17 14:43                                         ` [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers Fredrik Noring
                                                           ` (3 subsequent siblings)
  9 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-11  8:29 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

Hi Jürgen,

Would you be able to explain the notes

	/* In an error exception handler the user space could be uncached. */

in the patch ported from v2.6 below?

Fredrik

diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h
index b463f2aa5a61..79390b194e6d 100644
--- a/arch/mips/include/asm/ftrace.h
+++ b/arch/mips/include/asm/ftrace.h
@@ -19,9 +19,12 @@
 extern void _mcount(void);
 #define mcount _mcount
 
+#ifdef CONFIG_CPU_R5900
 #define safe_load(load, src, dst, error)		\
 do {							\
 	asm volatile (					\
+		/* In an error exception handler the user space could be uncached. */ \
+		"sync.l							\n"	\
 		"1: " load " %[tmp_dst], 0(%[tmp_src])\n"	\
 		"   li %[tmp_err], 0\n"			\
 		"2: .insn\n"				\
@@ -40,7 +43,55 @@ do {							\
 		: "memory"				\
 	);						\
 } while (0)
+#else
+#define safe_load(load, src, dst, error)		\
+do {							\
+	asm volatile (					\
+		"1: " load " %[" STR(dst) "], 0(%[" STR(src) "])\n"\
+		"   li %[" STR(error) "], 0\n"		\
+		"2:\n"					\
+							\
+		".section .fixup, \"ax\"\n"		\
+		"3: li %[" STR(error) "], 1\n"		\
+		"   j 2b\n"				\
+		".previous\n"				\
+							\
+		".section\t__ex_table,\"a\"\n\t"	\
+		STR(PTR) "\t1b, 3b\n\t"			\
+		".previous\n"				\
+							\
+		: [dst] "=&r" (dst), [error] "=r" (error)\
+		: [src] "r" (src)			\
+		: "memory"				\
+	);						\
+} while (0)
+#endif
 
+#ifdef CONFIG_CPU_R5900
+#define safe_store(store, src, dst, error)	\
+do {						\
+	asm volatile (				\
+		/* In an error exception handler the user space could be uncached. */ \
+		"sync.l							\n"	\
+		"1: " store " %[" STR(src) "], 0(%[" STR(dst) "])\n"\
+		"   li %[" STR(error) "], 0\n"	\
+		"2:\n"				\
+						\
+		".section .fixup, \"ax\"\n"	\
+		"3: li %[" STR(error) "], 1\n"	\
+		"   j 2b\n"			\
+		".previous\n"			\
+						\
+		".section\t__ex_table,\"a\"\n\t"\
+		STR(PTR) "\t1b, 3b\n\t"		\
+		".previous\n"			\
+						\
+		: [error] "=r" (error)		\
+		: [dst] "r" (dst), [src] "r" (src)\
+		: "memory"			\
+	);					\
+} while (0)
+#else
 #define safe_store(store, src, dst, error)	\
 do {						\
 	asm volatile (				\
@@ -62,6 +113,7 @@ do {						\
 		: "memory"			\
 	);					\
 } while (0)
+#endif
 
 #define safe_load_code(dst, src, error) \
 	safe_load(STR(lw), src, dst, error)
diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
index b71306947290..a4eecafc524b 100644
--- a/arch/mips/include/asm/uaccess.h
+++ b/arch/mips/include/asm/uaccess.h
@@ -315,11 +315,14 @@ do {									\
 	__gu_err;							\
 })
 
+#ifdef CONFIG_CPU_R5900
 #define __get_data_asm(val, insn, addr)					\
 {									\
 	long __gu_tmp;							\
 									\
 	__asm__ __volatile__(						\
+	/* In an error exception handler the user space could be uncached. */ \
+	"sync.l							\n"	\
 	"1:	"insn("%1", "%3")"				\n"	\
 	"2:							\n"	\
 	"	.insn						\n"	\
@@ -336,10 +339,32 @@ do {									\
 									\
 	(val) = (__typeof__(*(addr))) __gu_tmp;				\
 }
+#else
+#define __get_data_asm(val, insn, addr)					\
+{									\
+	long __gu_tmp;							\
+									\
+	__asm__ __volatile__(						\
+	"1:	"insn("%1", "%3")"				\n"	\
+	"2:							\n"	\
+	"	.section .fixup,\"ax\"				\n"	\
+	"3:	li	%0, %4					\n"	\
+	"	j	2b					\n"	\
+	"	.previous					\n"	\
+	"	.section __ex_table,\"a\"			\n"	\
+	"	"__UA_ADDR "\t1b, 3b				\n"	\
+	"	.previous					\n"	\
+	: "=r" (__gu_err), "=r" (__gu_tmp)				\
+	: "0" (0), "o" (__m(addr)), "i" (-EFAULT));			\
+									\
+	(val) = (__typeof__(*(addr))) __gu_tmp;				\
+}
+#endif
 
 /*
  * Get a long long 64 using 32 bit registers.
  */
+#ifdef CONFIG_CPU_R5900
 #define __get_data_asm_ll32(val, insn, addr)				\
 {									\
 	union {								\
@@ -348,7 +373,11 @@ do {									\
 	} __gu_tmp;							\
 									\
 	__asm__ __volatile__(						\
+	/* In an error exception handler the user space could be uncached. */ \
+	"sync.l							\n"	\
 	"1:	" insn("%1", "(%3)")"				\n"	\
+	/* In an error exception handler the user space could be uncached. */ \
+	"sync.l							\n"	\
 	"2:	" insn("%D1", "4(%3)")"				\n"	\
 	"3:							\n"	\
 	"	.insn						\n"	\
@@ -367,6 +396,33 @@ do {									\
 									\
 	(val) = __gu_tmp.t;						\
 }
+#else
+#define __get_data_asm_ll32(val, insn, addr)				\
+{									\
+	union {								\
+		unsigned long long	l;				\
+		__typeof__(*(addr))	t;				\
+	} __gu_tmp;							\
+									\
+	__asm__ __volatile__(						\
+	"1:	" insn("%1", "(%3)")"				\n"	\
+	"2:	" insn("%D1", "4(%3)")"				\n"	\
+	"3:	.section	.fixup,\"ax\"			\n"	\
+	"4:	li	%0, %4					\n"	\
+	"	move	%1, $0					\n"	\
+	"	move	%D1, $0					\n"	\
+	"	j	3b					\n"	\
+	"	.previous					\n"	\
+	"	.section	__ex_table,\"a\"		\n"	\
+	"	" __UA_ADDR "	1b, 4b				\n"	\
+	"	" __UA_ADDR "	2b, 4b				\n"	\
+	"	.previous					\n"	\
+	: "=r" (__gu_err), "=&r" (__gu_tmp.l)				\
+	: "0" (0), "r" (addr), "i" (-EFAULT));				\
+									\
+	(val) = __gu_tmp.t;						\
+}
+#endif
 
 #ifndef CONFIG_EVA
 #define __put_kernel_common(ptr, size) __put_user_common(ptr, size)
@@ -456,6 +512,38 @@ do {									\
 	__pu_err;							\
 })
 
+#define __put_user_check_atomic(x, ptr, size)				\
+({									\
+	__typeof__(*(ptr)) __user *__pu_addr = (ptr);			\
+	__typeof__(*(ptr)) __pu_val = (x);				\
+	int __pu_err = -EFAULT;						\
+									\
+	if (likely(access_ok(VERIFY_WRITE,  __pu_addr, size))) {	\
+		__put_kernel_common(ptr, size);				\
+	}								\
+	__pu_err;							\
+})
+
+#ifdef CONFIG_CPU_R5900
+#define __put_data_asm(insn, ptr)					\
+{									\
+	__asm__ __volatile__(						\
+	/* In an error exception handler the user space could be uncached. */ \
+	"sync.l							\n"	\
+	"1:	"insn("%z2", "%3")"	# __put_data_asm	\n"	\
+	"2:							\n"	\
+	"	.section	.fixup,\"ax\"			\n"	\
+	"3:	li	%0, %4					\n"	\
+	"	j	2b					\n"	\
+	"	.previous					\n"	\
+	"	.section	__ex_table,\"a\"		\n"	\
+	"	" __UA_ADDR "	1b, 3b				\n"	\
+	"	.previous					\n"	\
+	: "=r" (__pu_err)						\
+	: "0" (0), "Jr" (__pu_val), "o" (__m(ptr)),			\
+	  "i" (-EFAULT));						\
+}
+#else
 #define __put_data_asm(insn, ptr)					\
 {									\
 	__asm__ __volatile__(						\
@@ -473,7 +561,32 @@ do {									\
 	: "0" (0), "Jr" (__pu_val), "o" (__m(ptr)),			\
 	  "i" (-EFAULT));						\
 }
+#endif
 
+#ifdef CONFIG_CPU_R5900
+#define __put_data_asm_ll32(insn, ptr)					\
+{									\
+	__asm__ __volatile__(						\
+	/* In an error exception handler the user space could be uncached. */ \
+	"sync.l							\n"	\
+	"1:	"insn("%2", "(%3)")"	# __put_data_asm_ll32	\n"	\
+	/* In an error exception handler the user space could be uncached. */ \
+	"sync.l							\n"	\
+	"2:	"insn("%D2", "4(%3)")"				\n"	\
+	"3:							\n"	\
+	"	.section	.fixup,\"ax\"			\n"	\
+	"4:	li	%0, %4					\n"	\
+	"	j	3b					\n"	\
+	"	.previous					\n"	\
+	"	.section	__ex_table,\"a\"		\n"	\
+	"	" __UA_ADDR "	1b, 4b				\n"	\
+	"	" __UA_ADDR "	2b, 4b				\n"	\
+	"	.previous"						\
+	: "=r" (__pu_err)						\
+	: "0" (0), "r" (__pu_val), "r" (ptr),				\
+	  "i" (-EFAULT));						\
+}
+#else
 #define __put_data_asm_ll32(insn, ptr)					\
 {									\
 	__asm__ __volatile__(						\
@@ -493,6 +606,7 @@ do {									\
 	: "0" (0), "r" (__pu_val), "r" (ptr),				\
 	  "i" (-EFAULT));						\
 }
+#endif
 
 extern void __put_user_unknown(void);
 
diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c
index b280a3d775a1..625b74de1ce4 100644
--- a/arch/mips/kernel/unaligned.c
+++ b/arch/mips/kernel/unaligned.c
@@ -488,10 +488,15 @@ do {                                                        \
 
 #else /* __BIG_ENDIAN */
 
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* FIXME: Is ".set push\n" etc. needed? */
+	/* In an error exception handler the user space could be uncached. */
 #define     _LoadHW(addr, value, res, type)  \
 do {                                                        \
 		__asm__ __volatile__ (".set\tnoat\n"        \
+			"sync.l\n\t"                        \
 			"1:\t"type##_lb("%0", "1(%2)")"\n"  \
+			"sync.l\n\t"                        \
 			"2:\t"type##_lbu("$1", "0(%2)")"\n\t"\
 			"sll\t%0, 0x8\n\t"                  \
 			"or\t%0, $1\n\t"                    \
@@ -511,10 +516,14 @@ do {                                                        \
 } while(0)
 
 #ifndef CONFIG_CPU_MIPSR6
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* In an error exception handler the user space could be uncached. */
 #define     _LoadW(addr, value, res, type)   \
 do {                                                        \
 		__asm__ __volatile__ (                      \
+			"sync.l\n\t"                        \
 			"1:\t"type##_lwl("%0", "3(%2)")"\n" \
+			"sync.l\n\t"                        \
 			"2:\t"type##_lwr("%0", "(%2)")"\n\t"\
 			"li\t%1, 0\n"                       \
 			"3:\n\t"                            \
@@ -569,11 +578,15 @@ do {                                                        \
 #endif /* CONFIG_CPU_MIPSR6 */
 
 
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* In an error exception handler the user space could be uncached. */
 #define     _LoadHWU(addr, value, res, type) \
 do {                                                        \
 		__asm__ __volatile__ (                      \
 			".set\tnoat\n"                      \
+			"sync.l\n\t"                        \
 			"1:\t"type##_lbu("%0", "1(%2)")"\n" \
+			"sync.l\n\t"                        \
 			"2:\t"type##_lbu("$1", "0(%2)")"\n\t"\
 			"sll\t%0, 0x8\n\t"                  \
 			"or\t%0, $1\n\t"                    \
@@ -594,10 +607,14 @@ do {                                                        \
 } while(0)
 
 #ifndef CONFIG_CPU_MIPSR6
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* In an error exception handler the user space could be uncached. */
 #define     _LoadWU(addr, value, res, type)  \
 do {                                                        \
 		__asm__ __volatile__ (                      \
+			"sync.l\n\t"                        \
 			"1:\t"type##_lwl("%0", "3(%2)")"\n" \
+			"sync.l\n\t"                        \
 			"2:\t"type##_lwr("%0", "(%2)")"\n\t"\
 			"dsll\t%0, %0, 32\n\t"              \
 			"dsrl\t%0, %0, 32\n\t"              \
@@ -616,10 +633,14 @@ do {                                                        \
 			: "r" (addr), "i" (-EFAULT));       \
 } while(0)
 
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* In an error exception handler the user space could be uncached. */
 #define     _LoadDW(addr, value, res)  \
 do {                                                        \
 		__asm__ __volatile__ (                      \
+			"sync.l\n\t"                        \
 			"1:\tldl\t%0, 7(%2)\n"              \
+			"sync.l\n\t"                        \
 			"2:\tldr\t%0, (%2)\n\t"             \
 			"li\t%1, 0\n"                       \
 			"3:\n\t"                            \
@@ -721,11 +742,15 @@ do {                                                        \
 } while(0)
 #endif /* CONFIG_CPU_MIPSR6 */
 
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* In an error exception handler the user space could be uncached. */
 #define     _StoreHW(addr, value, res, type) \
 do {                                                        \
 		__asm__ __volatile__ (                      \
 			".set\tnoat\n"                      \
+			"sync.l\n\t"                        \
 			"1:\t"type##_sb("%1", "0(%2)")"\n"  \
+			"sync.l\n\t"                        \
 			"srl\t$1,%1, 0x8\n"                 \
 			"2:\t"type##_sb("$1", "1(%2)")"\n"  \
 			".set\tat\n\t"                      \
@@ -745,10 +770,13 @@ do {                                                        \
 } while(0)
 
 #ifndef CONFIG_CPU_MIPSR6
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
 #define     _StoreW(addr, value, res, type)  \
 do {                                                        \
 		__asm__ __volatile__ (                      \
+			"sync.l\n\t"                        \
 			"1:\t"type##_swl("%1", "3(%2)")"\n" \
+			"sync.l\n\t"                        \
 			"2:\t"type##_swr("%1", "(%2)")"\n\t"\
 			"li\t%0, 0\n"                       \
 			"3:\n\t"                            \
@@ -765,10 +793,14 @@ do {                                                        \
 		: "r" (value), "r" (addr), "i" (-EFAULT));  \
 } while(0)
 
+	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
+	/* In an error exception handler the user space could be uncached. */
 #define     _StoreDW(addr, value, res) \
 do {                                                        \
 		__asm__ __volatile__ (                      \
+			"sync.l\n\t"                        \
 			"1:\tsdl\t%1, 7(%2)\n"              \
+			"sync.l\n\t"                        \
 			"2:\tsdr\t%1, (%2)\n\t"             \
 			"li\t%0, 0\n"                       \
 			"3:\n\t"                            \
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 2ff84f4b1717..36e48682e1e1 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -357,7 +357,10 @@ EXPORT_SYMBOL(csum_partial)
  * addr    : Address
  * handler : Exception handler
  */
+/* FIXME: #ifdef CONFIG_CPU_R5900 */
 #define EXC(insn, type, reg, addr, handler)	\
+	/* In an error exception handler the user space could be uncached. */ \
+	sync.l;						\
 	.if \mode == LEGACY_MODE;		\
 9:		insn reg, addr;			\
 		.section __ex_table,"a";	\
diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
index 489bc9cffcbd..b37731f53f46 100644
--- a/arch/mips/lib/memset.S
+++ b/arch/mips/lib/memset.S
@@ -46,6 +46,11 @@
 #define ___BUILD_EVA_INSN(insn, reg, addr) __EVAFY(insn, reg, addr)
 
 #define EX(insn,reg,addr,handler)			\
+	/* In an error exception handler the user space could be uncached. */ \
+	.set push;					\
+	.set noreorder;					\
+	sync.l;						\
+	.set pop;					\
 	.if \mode == LEGACY_MODE;			\
 9:		insn	reg, addr;			\
 	.else;						\
@@ -171,6 +176,19 @@
 #ifdef CONFIG_CPU_MICROMIPS
 	LONG_SRL	t7, t0, 1
 #endif
+#ifdef CONFIG_CPU_R5900
+	/* Each instruction has a leading sync.l */
+#if LONGSIZE == 4
+	.set		noat
+	/* 2 instructions for 4 Byte. */
+	LONG_SLL	AT, t0, 1
+	PTR_SUBU	t1, AT
+	.set		at
+#else
+	/* Verify memset for R5900 with 64 bit. 2 instructions for 8 Byte. */
+	PTR_SUBU	t1, t0
+#endif
+#else
 #if LONGSIZE == 4
 	PTR_SUBU	t1, FILLPTRG
 #else
@@ -179,6 +197,7 @@
 	PTR_SUBU	t1, AT
 	.set		at
 #endif
+#endif /* CONFIG_CPU_R5900 */
 	jr		t1
 	PTR_ADDU	a0, t0			/* dest ptr */
 
diff --git a/arch/mips/lib/strncpy_user.S b/arch/mips/lib/strncpy_user.S
index 44cc346fd400..0cf9a6660130 100644
--- a/arch/mips/lib/strncpy_user.S
+++ b/arch/mips/lib/strncpy_user.S
@@ -13,6 +13,8 @@
 #include <asm/regdef.h>
 
 #define EX(insn,reg,addr,handler)			\
+	/* In an error exception handler the user space could be uncached. */ \
+	sync.l;							\
 9:	insn	reg, addr;				\
 	.section __ex_table,"a";			\
 	PTR	9b, handler;				\
diff --git a/arch/mips/lib/strnlen_user.S b/arch/mips/lib/strnlen_user.S
index 474979641a8d..55f7e069a960 100644
--- a/arch/mips/lib/strnlen_user.S
+++ b/arch/mips/lib/strnlen_user.S
@@ -12,6 +12,8 @@
 #include <asm/regdef.h>
 
 #define EX(insn,reg,addr,handler)			\
+	/* In an error exception handler the user space could be uncached. */ \
+	sync.l;							\
 9:	insn	reg, addr;				\
 	.section __ex_table,"a";			\
 	PTR	9b, handler;				\

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Aw: [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers
  2018-02-11  8:29                                         ` [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers Fredrik Noring
@ 2018-02-11 10:33                                           ` "Jürgen Urban"
  2018-02-12  9:22                                               ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: "Jürgen Urban" @ 2018-02-11 10:33 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Maciej W. Rozycki, linux-mips

Hello Fredrik,

> Gesendet: Sonntag, 11. Februar 2018 um 09:29 Uhr
> Von: "Fredrik Noring" <noring@nocrew.org>
> An: "Maciej W. Rozycki" <macro@mips.com>, "Jürgen Urban" <JuergenUrban@gmx.de>
> Cc: linux-mips@linux-mips.org
> Betreff: [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers
>
> Hi Jürgen,
> 
> Would you be able to explain the notes
> 
> 	/* In an error exception handler the user space could be uncached. */
> 
> in the patch ported from v2.6 below?

The tx79architecture.pdf says:
2.4 kuseg becomes an uncached area when an error exception (Status.ERL = 1) occurs (FLX04)
2.4.1 Phenomenon
There are cases in which kuseg (0x0000_0000 – 0x7FFF_FFFF) becomes uncached in an error exception handler (Status.ERL==1) and data consistency with cached area (kseg, ksseg, kseg0) is lost.
2.4.2 Corrective measures
In an error exception handler (Status.ERL==1), when accessing kuseg (0x0000_0000 – 0x7FFF_FFFF), access it after guarding using SYNC.L as follows:
SYNC.L
SW ku seg

Best regards
Jürgen

> diff --git a/arch/mips/include/asm/ftrace.h b/arch/mips/include/asm/ftrace.h
> index b463f2aa5a61..79390b194e6d 100644
> --- a/arch/mips/include/asm/ftrace.h
> +++ b/arch/mips/include/asm/ftrace.h
> @@ -19,9 +19,12 @@
>  extern void _mcount(void);
>  #define mcount _mcount
>  
> +#ifdef CONFIG_CPU_R5900
>  #define safe_load(load, src, dst, error)		\
>  do {							\
>  	asm volatile (					\
> +		/* In an error exception handler the user space could be uncached. */ \
> +		"sync.l							\n"	\
>  		"1: " load " %[tmp_dst], 0(%[tmp_src])\n"	\
>  		"   li %[tmp_err], 0\n"			\
>  		"2: .insn\n"				\
> @@ -40,7 +43,55 @@ do {							\
>  		: "memory"				\
>  	);						\
>  } while (0)
> +#else
> +#define safe_load(load, src, dst, error)		\
> +do {							\
> +	asm volatile (					\
> +		"1: " load " %[" STR(dst) "], 0(%[" STR(src) "])\n"\
> +		"   li %[" STR(error) "], 0\n"		\
> +		"2:\n"					\
> +							\
> +		".section .fixup, \"ax\"\n"		\
> +		"3: li %[" STR(error) "], 1\n"		\
> +		"   j 2b\n"				\
> +		".previous\n"				\
> +							\
> +		".section\t__ex_table,\"a\"\n\t"	\
> +		STR(PTR) "\t1b, 3b\n\t"			\
> +		".previous\n"				\
> +							\
> +		: [dst] "=&r" (dst), [error] "=r" (error)\
> +		: [src] "r" (src)			\
> +		: "memory"				\
> +	);						\
> +} while (0)
> +#endif
>  
> +#ifdef CONFIG_CPU_R5900
> +#define safe_store(store, src, dst, error)	\
> +do {						\
> +	asm volatile (				\
> +		/* In an error exception handler the user space could be uncached. */ \
> +		"sync.l							\n"	\
> +		"1: " store " %[" STR(src) "], 0(%[" STR(dst) "])\n"\
> +		"   li %[" STR(error) "], 0\n"	\
> +		"2:\n"				\
> +						\
> +		".section .fixup, \"ax\"\n"	\
> +		"3: li %[" STR(error) "], 1\n"	\
> +		"   j 2b\n"			\
> +		".previous\n"			\
> +						\
> +		".section\t__ex_table,\"a\"\n\t"\
> +		STR(PTR) "\t1b, 3b\n\t"		\
> +		".previous\n"			\
> +						\
> +		: [error] "=r" (error)		\
> +		: [dst] "r" (dst), [src] "r" (src)\
> +		: "memory"			\
> +	);					\
> +} while (0)
> +#else
>  #define safe_store(store, src, dst, error)	\
>  do {						\
>  	asm volatile (				\
> @@ -62,6 +113,7 @@ do {						\
>  		: "memory"			\
>  	);					\
>  } while (0)
> +#endif
>  
>  #define safe_load_code(dst, src, error) \
>  	safe_load(STR(lw), src, dst, error)
> diff --git a/arch/mips/include/asm/uaccess.h b/arch/mips/include/asm/uaccess.h
> index b71306947290..a4eecafc524b 100644
> --- a/arch/mips/include/asm/uaccess.h
> +++ b/arch/mips/include/asm/uaccess.h
> @@ -315,11 +315,14 @@ do {									\
>  	__gu_err;							\
>  })
>  
> +#ifdef CONFIG_CPU_R5900
>  #define __get_data_asm(val, insn, addr)					\
>  {									\
>  	long __gu_tmp;							\
>  									\
>  	__asm__ __volatile__(						\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	"sync.l							\n"	\
>  	"1:	"insn("%1", "%3")"				\n"	\
>  	"2:							\n"	\
>  	"	.insn						\n"	\
> @@ -336,10 +339,32 @@ do {									\
>  									\
>  	(val) = (__typeof__(*(addr))) __gu_tmp;				\
>  }
> +#else
> +#define __get_data_asm(val, insn, addr)					\
> +{									\
> +	long __gu_tmp;							\
> +									\
> +	__asm__ __volatile__(						\
> +	"1:	"insn("%1", "%3")"				\n"	\
> +	"2:							\n"	\
> +	"	.section .fixup,\"ax\"				\n"	\
> +	"3:	li	%0, %4					\n"	\
> +	"	j	2b					\n"	\
> +	"	.previous					\n"	\
> +	"	.section __ex_table,\"a\"			\n"	\
> +	"	"__UA_ADDR "\t1b, 3b				\n"	\
> +	"	.previous					\n"	\
> +	: "=r" (__gu_err), "=r" (__gu_tmp)				\
> +	: "0" (0), "o" (__m(addr)), "i" (-EFAULT));			\
> +									\
> +	(val) = (__typeof__(*(addr))) __gu_tmp;				\
> +}
> +#endif
>  
>  /*
>   * Get a long long 64 using 32 bit registers.
>   */
> +#ifdef CONFIG_CPU_R5900
>  #define __get_data_asm_ll32(val, insn, addr)				\
>  {									\
>  	union {								\
> @@ -348,7 +373,11 @@ do {									\
>  	} __gu_tmp;							\
>  									\
>  	__asm__ __volatile__(						\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	"sync.l							\n"	\
>  	"1:	" insn("%1", "(%3)")"				\n"	\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	"sync.l							\n"	\
>  	"2:	" insn("%D1", "4(%3)")"				\n"	\
>  	"3:							\n"	\
>  	"	.insn						\n"	\
> @@ -367,6 +396,33 @@ do {									\
>  									\
>  	(val) = __gu_tmp.t;						\
>  }
> +#else
> +#define __get_data_asm_ll32(val, insn, addr)				\
> +{									\
> +	union {								\
> +		unsigned long long	l;				\
> +		__typeof__(*(addr))	t;				\
> +	} __gu_tmp;							\
> +									\
> +	__asm__ __volatile__(						\
> +	"1:	" insn("%1", "(%3)")"				\n"	\
> +	"2:	" insn("%D1", "4(%3)")"				\n"	\
> +	"3:	.section	.fixup,\"ax\"			\n"	\
> +	"4:	li	%0, %4					\n"	\
> +	"	move	%1, $0					\n"	\
> +	"	move	%D1, $0					\n"	\
> +	"	j	3b					\n"	\
> +	"	.previous					\n"	\
> +	"	.section	__ex_table,\"a\"		\n"	\
> +	"	" __UA_ADDR "	1b, 4b				\n"	\
> +	"	" __UA_ADDR "	2b, 4b				\n"	\
> +	"	.previous					\n"	\
> +	: "=r" (__gu_err), "=&r" (__gu_tmp.l)				\
> +	: "0" (0), "r" (addr), "i" (-EFAULT));				\
> +									\
> +	(val) = __gu_tmp.t;						\
> +}
> +#endif
>  
>  #ifndef CONFIG_EVA
>  #define __put_kernel_common(ptr, size) __put_user_common(ptr, size)
> @@ -456,6 +512,38 @@ do {									\
>  	__pu_err;							\
>  })
>  
> +#define __put_user_check_atomic(x, ptr, size)				\
> +({									\
> +	__typeof__(*(ptr)) __user *__pu_addr = (ptr);			\
> +	__typeof__(*(ptr)) __pu_val = (x);				\
> +	int __pu_err = -EFAULT;						\
> +									\
> +	if (likely(access_ok(VERIFY_WRITE,  __pu_addr, size))) {	\
> +		__put_kernel_common(ptr, size);				\
> +	}								\
> +	__pu_err;							\
> +})
> +
> +#ifdef CONFIG_CPU_R5900
> +#define __put_data_asm(insn, ptr)					\
> +{									\
> +	__asm__ __volatile__(						\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	"sync.l							\n"	\
> +	"1:	"insn("%z2", "%3")"	# __put_data_asm	\n"	\
> +	"2:							\n"	\
> +	"	.section	.fixup,\"ax\"			\n"	\
> +	"3:	li	%0, %4					\n"	\
> +	"	j	2b					\n"	\
> +	"	.previous					\n"	\
> +	"	.section	__ex_table,\"a\"		\n"	\
> +	"	" __UA_ADDR "	1b, 3b				\n"	\
> +	"	.previous					\n"	\
> +	: "=r" (__pu_err)						\
> +	: "0" (0), "Jr" (__pu_val), "o" (__m(ptr)),			\
> +	  "i" (-EFAULT));						\
> +}
> +#else
>  #define __put_data_asm(insn, ptr)					\
>  {									\
>  	__asm__ __volatile__(						\
> @@ -473,7 +561,32 @@ do {									\
>  	: "0" (0), "Jr" (__pu_val), "o" (__m(ptr)),			\
>  	  "i" (-EFAULT));						\
>  }
> +#endif
>  
> +#ifdef CONFIG_CPU_R5900
> +#define __put_data_asm_ll32(insn, ptr)					\
> +{									\
> +	__asm__ __volatile__(						\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	"sync.l							\n"	\
> +	"1:	"insn("%2", "(%3)")"	# __put_data_asm_ll32	\n"	\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	"sync.l							\n"	\
> +	"2:	"insn("%D2", "4(%3)")"				\n"	\
> +	"3:							\n"	\
> +	"	.section	.fixup,\"ax\"			\n"	\
> +	"4:	li	%0, %4					\n"	\
> +	"	j	3b					\n"	\
> +	"	.previous					\n"	\
> +	"	.section	__ex_table,\"a\"		\n"	\
> +	"	" __UA_ADDR "	1b, 4b				\n"	\
> +	"	" __UA_ADDR "	2b, 4b				\n"	\
> +	"	.previous"						\
> +	: "=r" (__pu_err)						\
> +	: "0" (0), "r" (__pu_val), "r" (ptr),				\
> +	  "i" (-EFAULT));						\
> +}
> +#else
>  #define __put_data_asm_ll32(insn, ptr)					\
>  {									\
>  	__asm__ __volatile__(						\
> @@ -493,6 +606,7 @@ do {									\
>  	: "0" (0), "r" (__pu_val), "r" (ptr),				\
>  	  "i" (-EFAULT));						\
>  }
> +#endif
>  
>  extern void __put_user_unknown(void);
>  
> diff --git a/arch/mips/kernel/unaligned.c b/arch/mips/kernel/unaligned.c
> index b280a3d775a1..625b74de1ce4 100644
> --- a/arch/mips/kernel/unaligned.c
> +++ b/arch/mips/kernel/unaligned.c
> @@ -488,10 +488,15 @@ do {                                                        \
>  
>  #else /* __BIG_ENDIAN */
>  
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* FIXME: Is ".set push\n" etc. needed? */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _LoadHW(addr, value, res, type)  \
>  do {                                                        \
>  		__asm__ __volatile__ (".set\tnoat\n"        \
> +			"sync.l\n\t"                        \
>  			"1:\t"type##_lb("%0", "1(%2)")"\n"  \
> +			"sync.l\n\t"                        \
>  			"2:\t"type##_lbu("$1", "0(%2)")"\n\t"\
>  			"sll\t%0, 0x8\n\t"                  \
>  			"or\t%0, $1\n\t"                    \
> @@ -511,10 +516,14 @@ do {                                                        \
>  } while(0)
>  
>  #ifndef CONFIG_CPU_MIPSR6
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _LoadW(addr, value, res, type)   \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
> +			"sync.l\n\t"                        \
>  			"1:\t"type##_lwl("%0", "3(%2)")"\n" \
> +			"sync.l\n\t"                        \
>  			"2:\t"type##_lwr("%0", "(%2)")"\n\t"\
>  			"li\t%1, 0\n"                       \
>  			"3:\n\t"                            \
> @@ -569,11 +578,15 @@ do {                                                        \
>  #endif /* CONFIG_CPU_MIPSR6 */
>  
>  
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _LoadHWU(addr, value, res, type) \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
>  			".set\tnoat\n"                      \
> +			"sync.l\n\t"                        \
>  			"1:\t"type##_lbu("%0", "1(%2)")"\n" \
> +			"sync.l\n\t"                        \
>  			"2:\t"type##_lbu("$1", "0(%2)")"\n\t"\
>  			"sll\t%0, 0x8\n\t"                  \
>  			"or\t%0, $1\n\t"                    \
> @@ -594,10 +607,14 @@ do {                                                        \
>  } while(0)
>  
>  #ifndef CONFIG_CPU_MIPSR6
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _LoadWU(addr, value, res, type)  \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
> +			"sync.l\n\t"                        \
>  			"1:\t"type##_lwl("%0", "3(%2)")"\n" \
> +			"sync.l\n\t"                        \
>  			"2:\t"type##_lwr("%0", "(%2)")"\n\t"\
>  			"dsll\t%0, %0, 32\n\t"              \
>  			"dsrl\t%0, %0, 32\n\t"              \
> @@ -616,10 +633,14 @@ do {                                                        \
>  			: "r" (addr), "i" (-EFAULT));       \
>  } while(0)
>  
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _LoadDW(addr, value, res)  \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
> +			"sync.l\n\t"                        \
>  			"1:\tldl\t%0, 7(%2)\n"              \
> +			"sync.l\n\t"                        \
>  			"2:\tldr\t%0, (%2)\n\t"             \
>  			"li\t%1, 0\n"                       \
>  			"3:\n\t"                            \
> @@ -721,11 +742,15 @@ do {                                                        \
>  } while(0)
>  #endif /* CONFIG_CPU_MIPSR6 */
>  
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _StoreHW(addr, value, res, type) \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
>  			".set\tnoat\n"                      \
> +			"sync.l\n\t"                        \
>  			"1:\t"type##_sb("%1", "0(%2)")"\n"  \
> +			"sync.l\n\t"                        \
>  			"srl\t$1,%1, 0x8\n"                 \
>  			"2:\t"type##_sb("$1", "1(%2)")"\n"  \
>  			".set\tat\n\t"                      \
> @@ -745,10 +770,13 @@ do {                                                        \
>  } while(0)
>  
>  #ifndef CONFIG_CPU_MIPSR6
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
>  #define     _StoreW(addr, value, res, type)  \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
> +			"sync.l\n\t"                        \
>  			"1:\t"type##_swl("%1", "3(%2)")"\n" \
> +			"sync.l\n\t"                        \
>  			"2:\t"type##_swr("%1", "(%2)")"\n\t"\
>  			"li\t%0, 0\n"                       \
>  			"3:\n\t"                            \
> @@ -765,10 +793,14 @@ do {                                                        \
>  		: "r" (value), "r" (addr), "i" (-EFAULT));  \
>  } while(0)
>  
> +	/* FIXME: Use #ifdef CONFIG_CPU_R5900 */
> +	/* In an error exception handler the user space could be uncached. */
>  #define     _StoreDW(addr, value, res) \
>  do {                                                        \
>  		__asm__ __volatile__ (                      \
> +			"sync.l\n\t"                        \
>  			"1:\tsdl\t%1, 7(%2)\n"              \
> +			"sync.l\n\t"                        \
>  			"2:\tsdr\t%1, (%2)\n\t"             \
>  			"li\t%0, 0\n"                       \
>  			"3:\n\t"                            \
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
> index 2ff84f4b1717..36e48682e1e1 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -357,7 +357,10 @@ EXPORT_SYMBOL(csum_partial)
>   * addr    : Address
>   * handler : Exception handler
>   */
> +/* FIXME: #ifdef CONFIG_CPU_R5900 */
>  #define EXC(insn, type, reg, addr, handler)	\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	sync.l;						\
>  	.if \mode == LEGACY_MODE;		\
>  9:		insn reg, addr;			\
>  		.section __ex_table,"a";	\
> diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S
> index 489bc9cffcbd..b37731f53f46 100644
> --- a/arch/mips/lib/memset.S
> +++ b/arch/mips/lib/memset.S
> @@ -46,6 +46,11 @@
>  #define ___BUILD_EVA_INSN(insn, reg, addr) __EVAFY(insn, reg, addr)
>  
>  #define EX(insn,reg,addr,handler)			\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	.set push;					\
> +	.set noreorder;					\
> +	sync.l;						\
> +	.set pop;					\
>  	.if \mode == LEGACY_MODE;			\
>  9:		insn	reg, addr;			\
>  	.else;						\
> @@ -171,6 +176,19 @@
>  #ifdef CONFIG_CPU_MICROMIPS
>  	LONG_SRL	t7, t0, 1
>  #endif
> +#ifdef CONFIG_CPU_R5900
> +	/* Each instruction has a leading sync.l */
> +#if LONGSIZE == 4
> +	.set		noat
> +	/* 2 instructions for 4 Byte. */
> +	LONG_SLL	AT, t0, 1
> +	PTR_SUBU	t1, AT
> +	.set		at
> +#else
> +	/* Verify memset for R5900 with 64 bit. 2 instructions for 8 Byte. */
> +	PTR_SUBU	t1, t0
> +#endif
> +#else
>  #if LONGSIZE == 4
>  	PTR_SUBU	t1, FILLPTRG
>  #else
> @@ -179,6 +197,7 @@
>  	PTR_SUBU	t1, AT
>  	.set		at
>  #endif
> +#endif /* CONFIG_CPU_R5900 */
>  	jr		t1
>  	PTR_ADDU	a0, t0			/* dest ptr */
>  
> diff --git a/arch/mips/lib/strncpy_user.S b/arch/mips/lib/strncpy_user.S
> index 44cc346fd400..0cf9a6660130 100644
> --- a/arch/mips/lib/strncpy_user.S
> +++ b/arch/mips/lib/strncpy_user.S
> @@ -13,6 +13,8 @@
>  #include <asm/regdef.h>
>  
>  #define EX(insn,reg,addr,handler)			\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	sync.l;							\
>  9:	insn	reg, addr;				\
>  	.section __ex_table,"a";			\
>  	PTR	9b, handler;				\
> diff --git a/arch/mips/lib/strnlen_user.S b/arch/mips/lib/strnlen_user.S
> index 474979641a8d..55f7e069a960 100644
> --- a/arch/mips/lib/strnlen_user.S
> +++ b/arch/mips/lib/strnlen_user.S
> @@ -12,6 +12,8 @@
>  #include <asm/regdef.h>
>  
>  #define EX(insn,reg,addr,handler)			\
> +	/* In an error exception handler the user space could be uncached. */ \
> +	sync.l;							\
>  9:	insn	reg, addr;				\
>  	.section __ex_table,"a";			\
>  	PTR	9b, handler;				\
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Aw: [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE
  2018-02-11  8:09                                         ` [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE Fredrik Noring
@ 2018-02-11 11:07                                           ` "Jürgen Urban"
  0 siblings, 0 replies; 117+ messages in thread
From: "Jürgen Urban" @ 2018-02-11 11:07 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Maciej W. Rozycki, linux-mips

Hello Fredrik,

> Gesendet: Sonntag, 11. Februar 2018 um 09:09 Uhr
> Von: "Fredrik Noring" <noring@nocrew.org>
> An: "Maciej W. Rozycki" <macro@mips.com>, "Jürgen Urban" <JuergenUrban@gmx.de>
> Cc: linux-mips@linux-mips.org
> Betreff: [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE
>
> Signed-off-by: Fredrik Noring <noring@nocrew.org>
> ---
> This change has been ported from v2.6 patches. I have not found any note
> describing this in the TX79 manual.

The Restriction manual restri_e.pdf from Sony's Linux Toolkit says:

(2) Arrangement of Program Code and Cata 
When arraging program code and data in adjoining addresses, put 5 or more NOP instructions, or a combination of SYNC.P and NOP instructions between them. When the data arranged next to the program code has a specific bit pattern, it is regarded as a CACHE instruction, and may fetch a wrong instruction, destroy the data cache, or affect floating point divide of COP1.

(17) Undefined Instruction (2) 
Do not execute the following undefined instructions with specific bit pattern, since they interfere with the operation. 

a) Undefined instructions which interfere with floating-point calculations 
Inst[31:26]== 010001 &&
Inst[25:23]== 1*0 && 
(Inst[ 5: 9]== 010**1 || inst[5:0]==*1*011) 
Floating-point calculation results may cling to a certain value. This problem also occurs when this bit pattern 
exists in the data area next to the program code. Therefore, it is necessary to put 5 or more N0P 
instructions or a combination of SYNC.P and N0P instructions on the boundary between the program 
code and data. 

b) Undefined instructions which affect the data cache 
Inst[31:26]== 101111 && (Inst[20:16]== 10101 || 10111 || 11001 || 11011 || 11101 || 11110 || lllll) 
The data cache may be destroyed. An undefined instruction exception does not occcur. 

c) Undefined instructions which affect TLB entries 
Inst[31:26]==010000 && 
Inst[25:21]== 1**** && 
(Inst[ 5: 0]==000** || 0****1 || *01*** || ****1*) 
TLB entries may be destroyed.

Best regards
Jürgen

> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index e23765312e25..b67f31d04716 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -1378,6 +1378,16 @@ static void build_r4000_tlb_refill_handler(void)
>  		uasm_l_leave(&l, p);
>  		uasm_i_eret(&p); /* return from trap */
>  	}
> +
> +#ifdef CONFIG_CPU_R5900
> +	/* There should be nothing which can be interpreted as cache instruction. */
> +	uasm_i_nop(&p);
> +	uasm_i_nop(&p);
> +	uasm_i_nop(&p);
> +	uasm_i_nop(&p);
> +	uasm_i_nop(&p);
> +#endif
> +
>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>  	uasm_l_tlb_huge_update(&l, p);
>  	if (htlb_info.need_reload_pte)
> @@ -2132,6 +2142,14 @@ build_r4000_tlbchange_handler_tail(u32 **p, struct uasm_label **l,
>  	uasm_l_leave(l, *p);
>  	build_restore_work_registers(p);
>  	uasm_i_eret(p); /* return from trap */
> +#ifdef CONFIG_CPU_R5900
> +	/* There should be nothing which can be interpreted as cache instruction. */
> +	uasm_i_nop(p);
> +	uasm_i_nop(p);
> +	uasm_i_nop(p);
> +	uasm_i_nop(p);
> +	uasm_i_nop(p);
> +#endif
>  
>  #ifdef CONFIG_64BIT
>  	build_get_pgd_vmalloc64(p, l, r, tmp, ptr, not_refill);
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Aw: [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot
  2018-02-11  8:01                                         ` [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot Fredrik Noring
@ 2018-02-11 11:16                                           ` "Jürgen Urban"
  0 siblings, 0 replies; 117+ messages in thread
From: "Jürgen Urban" @ 2018-02-11 11:16 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Maciej W. Rozycki, linux-mips

Hello Fredrik,

> Gesendet: Sonntag, 11. Februar 2018 um 09:01 Uhr
> Von: "Fredrik Noring" <noring@nocrew.org>
> An: "Maciej W. Rozycki" <macro@mips.com>, "Jürgen Urban" <JuergenUrban@gmx.de>
> Cc: linux-mips@linux-mips.org
> Betreff: [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot
>
> Signed-off-by: Fredrik Noring <noring@nocrew.org>
> ---
> This change has been ported from v2.6 patches. I have not found any note
> describing this in the TX79 manual.

The 5 NOPs are because of restri_e.pdf (2) Arrangement of Program Code and Data.

> diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
> index 4008298c1880..a0b0fbedad8c 100644
> --- a/arch/mips/kernel/genex.S
> +++ b/arch/mips/kernel/genex.S
> @@ -52,6 +52,14 @@ NESTED(except_vec3_generic, 0, sp)
>  #endif
>  	PTR_L	k0, exception_handlers(k1)
>  	jr	k0
> +#ifdef CONFIG_CPU_R5900
> +	/* There should be nothing which looks like a cache instruction. */
> +	nop
> +	nop
> +	nop
> +	nop
> +	nop
> +#endif
>  	.set	pop
>  	END(except_vec3_generic)
>  
> @@ -709,6 +717,14 @@ isrdhwr:
>  	.set	arch=r4000
>  	eret
>  	.set	mips0
> +#ifdef CONFIG_CPU_R5900
> +	/* There should be nothing which looks like cache instruction. */
> +	nop
> +	nop
> +	nop
> +	nop
> +	nop
> +#endif
>  #endif
>  	.set	pop
>  	END(handle_ri_rdhwr)
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index 761b6c369321..795c490a429f 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -1950,12 +1950,36 @@ void __init *set_except_vector(int n, void *addr)
>  		u32 *buf = (u32 *)(ebase + 0x200);
>  		unsigned int k0 = 26;
>  		if ((handler & jump_mask) == ((ebase + 0x200) & jump_mask)) {
> +#ifdef CONFIG_CPU_R5900
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +#endif

The 2 nops are because of FLX05 in tx79architecture.pdf.

>  			uasm_i_j(&buf, handler & ~jump_mask);
>  			uasm_i_nop(&buf);
> +#ifdef CONFIG_CPU_R5900
> +			/* There are no data allowed which could be interpreted as cache instruction. */
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +#endif
>  		} else {
> +#ifdef CONFIG_CPU_R5900
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +#endif

Same here.

Best regards
Jürgen

>  			UASM_i_LA(&buf, k0, handler);
>  			uasm_i_jr(&buf, k0);
>  			uasm_i_nop(&buf);
> +#ifdef CONFIG_CPU_R5900
> +			/* There are no data allowed which could be interpreted as cache instruction. */
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +			uasm_i_nop(&buf);
> +#endif
>  		}
>  		local_flush_icache_range(ebase + 0x200, (unsigned long)buf);
>  	}
>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Aw: [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers
@ 2018-02-12  9:22                                               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-12  9:22 UTC (permalink / raw)
  To: Jürgen Urban, Fredrik Noring; +Cc: linux-mips

Jürgen, Fredrik --

> > Would you be able to explain the notes
> > 
> > 	/* In an error exception handler the user space could be uncached. */
> > 
> > in the patch ported from v2.6 below?
> 
> The tx79architecture.pdf says:
> 2.4 kuseg becomes an uncached area when an error exception (Status.ERL = 1) occurs (FLX04)
> 2.4.1 Phenomenon
> There are cases in which kuseg (0x0000_0000 – 0x7FFF_FFFF) becomes uncached in an error exception handler (Status.ERL==1) and data consistency with cached area (kseg, ksseg, kseg0) is lost.
> 2.4.2 Corrective measures
> In an error exception handler (Status.ERL==1), when accessing kuseg (0x0000_0000 – 0x7FFF_FFFF), access it after guarding using SYNC.L as follows:
> SYNC.L
> SW ku seg

 This change makes no sense to me anyway I am afraid.

 At the error level (Status.ERL=1) the user segment becomes unmapped and 
therefore all KUSEG addresses become physical addresses.  Which means that 
if any of this code you have patched is called to access user pages, then 
you have a bigger problem than just the cache going out of sync.

 The only reason to access KUSEG at the error level is to save/restore 
register state to/from a dedicated RAM area offset from $zero so that 
execution is restartable.  Unlike at the exception level you cannot use 
$k0 and $k1 as temporaries, because an error exception can happen any time 
including in particular while $k0 and $k1 are in active use at the 
exception level, so clobbering them would make the system non-restartable 
(of course receiving an error exception may mean that anyway).

 Code to write/read that dedicated area should be purpose-crafted and the 
area won't be accessed at any other time, so the issue of being cache 
coherent or not does not apply as the area will never be accessed with 
caching operations.

 I can see the R5900 has additional classes of error exceptions defined, 
such as debug and performance counter exceptions, which are not related to 
hardware faults and can happen in regular execution in response to certain 
conditions requested.  If you want to handle these implementation specific 
extensions and consequently serve these exceptions, then please take care 
of all the requirements as code to support them is added.

 Though as I wrote above it does not look to me like anything specific 
will be needed -- the handler at entry will save the state necessary for 
restartability to a dedicated RAM area first and then to the kernel stack, 
switch the error level off, do the necessary processing, and then reverse 
the steps before resuming execution interrupted.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Aw: [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers
@ 2018-02-12  9:22                                               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-12  9:22 UTC (permalink / raw)
  To: Jürgen Urban, Fredrik Noring; +Cc: linux-mips

Jürgen, Fredrik --

> > Would you be able to explain the notes
> > 
> > 	/* In an error exception handler the user space could be uncached. */
> > 
> > in the patch ported from v2.6 below?
> 
> The tx79architecture.pdf says:
> 2.4 kuseg becomes an uncached area when an error exception (Status.ERL = 1) occurs (FLX04)
> 2.4.1 Phenomenon
> There are cases in which kuseg (0x0000_0000 – 0x7FFF_FFFF) becomes uncached in an error exception handler (Status.ERL==1) and data consistency with cached area (kseg, ksseg, kseg0) is lost.
> 2.4.2 Corrective measures
> In an error exception handler (Status.ERL==1), when accessing kuseg (0x0000_0000 – 0x7FFF_FFFF), access it after guarding using SYNC.L as follows:
> SYNC.L
> SW ku seg

 This change makes no sense to me anyway I am afraid.

 At the error level (Status.ERL=1) the user segment becomes unmapped and 
therefore all KUSEG addresses become physical addresses.  Which means that 
if any of this code you have patched is called to access user pages, then 
you have a bigger problem than just the cache going out of sync.

 The only reason to access KUSEG at the error level is to save/restore 
register state to/from a dedicated RAM area offset from $zero so that 
execution is restartable.  Unlike at the exception level you cannot use 
$k0 and $k1 as temporaries, because an error exception can happen any time 
including in particular while $k0 and $k1 are in active use at the 
exception level, so clobbering them would make the system non-restartable 
(of course receiving an error exception may mean that anyway).

 Code to write/read that dedicated area should be purpose-crafted and the 
area won't be accessed at any other time, so the issue of being cache 
coherent or not does not apply as the area will never be accessed with 
caching operations.

 I can see the R5900 has additional classes of error exceptions defined, 
such as debug and performance counter exceptions, which are not related to 
hardware faults and can happen in regular execution in response to certain 
conditions requested.  If you want to handle these implementation specific 
extensions and consequently serve these exceptions, then please take care 
of all the requirements as code to support them is added.

 Though as I wrote above it does not look to me like anything specific 
will be needed -- the handler at entry will save the state necessary for 
restartability to a dedicated RAM area first and then to the kernel stack, 
switch the error level off, do the necessary processing, and then reverse 
the steps before resuming execution interrupted.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: R5900: Workaround for the short loop bug
  2018-02-11  7:29                                         ` [RFC] MIPS: R5900: Workaround for the short loop bug Fredrik Noring
@ 2018-02-12  9:25                                           ` Maciej W. Rozycki
  2018-02-12 15:22                                             ` Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-12  9:25 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

On Sat, 10 Feb 2018, Fredrik Noring wrote:

> The short loop bug under certain conditions causes loops to execute
> only once or twice. GCC 2.95 that shipped with Sony PS2 Linux had a
> patch with the following note:
> 
>     On the R5900, we must ensure that the compiler never generates
>     loops that satisfy all of the following conditions:
> 
>     - a loop consists of less than equal to six instructions
>       (including the branch delay slot);
>     - a loop contains only one conditional branch instruction at
>       the end of the loop;
>     - a loop does not contain any other branch or jump instructions;
>     - a branch delay slot of the loop is not NOP (EE 2.9 or later).
> 
>     We need to do this because of a bug in the chip.
> 
> Signed-off-by: Fredrik Noring <noring@nocrew.org>
> ---
> The exact NOP placements in this patch are provisional. Request for comment
> on the method to use. I believe there are at least three alternatives:
> 
> 1. Add #ifdefs or macros in the source code (similar to this patch).
> 2. Modify the assembler to automatically insert NOPs as required.
> 3. Avoid assembly and use C versions of memcpy etc. instead.
> 
> This change has been ported from v2.6 patches.

 I can't tell if this is a porting artefact or whether the reason is 
different, but many of these loops contain more than 6 instructions 
already, or need fewer than 3 NOPs.  Please review accordingly.

 Also can't this be handled automagically by GAS instead?  We have similar 
workarounds already implemented, see e.g. `-mfix-vr4130'.  Otherwise this 
is looking to me like a candidate for a maintenance nightmare (which the 
problem with getting loop instruction counts wrong in your patch is a sign 
of).

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-11  7:56                                         ` [RFC] MIPS: R5900: Workaround exception NOP execution bug (FLX05) Fredrik Noring
@ 2018-02-12  9:28                                           ` Maciej W. Rozycki
  2018-02-15 19:15                                             ` [RFC v2] " Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-12  9:28 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

On Sat, 10 Feb 2018, Fredrik Noring wrote:

> For the R5900, there are cases in which the first two instructions
> in an exception handler are executed as NOP instructions, when
> certain exceptions occur and then a bus error occurs immediately
> before jumping to the exception handler (FLX05).
> 
> The corrective measure is to place NOP in the first two instruction
> locations in all exception handlers.

 Well, but it would help if you only patched the handlers which are 
actually used by the R5900 (and only the handlers and not other code).

> diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
> index c7b64f4a8ad3..4008298c1880 100644
> --- a/arch/mips/kernel/genex.S
> +++ b/arch/mips/kernel/genex.S
> @@ -62,6 +66,8 @@ NESTED(except_vec3_r4000, 0, sp)
>  	.set	arch=r4000
>  	.set	noat
>  #ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
>  	sync.p
>  #endif
>  	mfc0	k1, CP0_CAUSE

 This hunk makes no sense, the R5900 does not have virtual coherency 
exceptions and therefore makes no use of this handler.

> @@ -174,6 +180,10 @@ LEAF(__r4k_wait)
>  	.align	5
>  BUILD_ROLLBACK_PROLOGUE handle_int
>  NESTED(handle_int, PT_SIZE, sp)
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry, this is jumped to from 
`except_vec3_generic' via the `exception_handlers' dispatcher.

> @@ -275,6 +285,10 @@ NESTED(handle_int, PT_SIZE, sp)
>   * to fit into space reserved for the exception handler.
>   */
>  NESTED(except_vec4, 0, sp)
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif
>  1:	j	1b			/* Dummy, will be replaced */
>  	END(except_vec4)

 This is not going to work as per the comment.  See `set_except_vector'.

> @@ -285,6 +299,10 @@ NESTED(except_vec4, 0, sp)
>   * unconditional jump to this vector.
>   */
>  NESTED(except_vec_ejtag_debug, 0, sp)
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry and can only be jumped to from the 
firmware, redirected from the 0xffffffffbfc00480 hardwired EJTAG exception 
entry point (not supported by the R5900 anyway).

> @@ -300,6 +318,10 @@ NESTED(except_vec_ejtag_debug, 0, sp)
>   */
>  BUILD_ROLLBACK_PROLOGUE except_vec_vi
>  NESTED(except_vec_vi, 0, sp)
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is an exception handler entry template for vectored interrupts, 
which are not supported by the R5900.

> @@ -319,6 +341,10 @@ EXPORT(except_vec_vi_end)
>   * Complete the register saves and invoke the handler which is passed in $v0
>   */
>  NESTED(except_vec_vi_handler, 0, sp)
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry and is called from vectored 
interrupt handlers.

> @@ -378,6 +404,10 @@ NESTED(except_vec_vi_handler, 0, sp)
>  NESTED(ejtag_debug_handler, PT_SIZE, sp)
>  	.set	push
>  	.set	noat
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry, this can only be reached from one 
of the dispatchers scattered throughout the arch/mips/ tree.

> @@ -424,6 +454,10 @@ EXPORT(ejtag_debug_buffer)
>   * unconditional jump to this vector.
>   */
>  NESTED(except_vec_nmi, 0, sp)
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry and can only be jumped to from the 
firmware, redirected from the 0xffffffffbfc00000 hardwired NMI exception 
entry point.

> @@ -436,6 +470,10 @@ NESTED(nmi_handler, PT_SIZE, sp)
>  	.cfi_signal_frame
>  	.set	push
>  	.set	noat
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry, this can only be reached from one 
of the dispatchers scattered throughout the arch/mips/ tree.

> @@ -521,6 +559,10 @@ NESTED(nmi_handler, PT_SIZE, sp)
>  	NESTED(handle_\exception, PT_SIZE, sp)
>  	.cfi_signal_frame
>  	.set	noat
> +#ifdef CONFIG_CPU_R5900
> +	nop
> +	nop
> +#endif

 This is not an exception handler entry, this is jumped to from 
`except_vec3_generic' via the `exception_handlers' dispatcher.

> diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
> index 89b425646647..e56f988b5c20 100644
> --- a/arch/mips/kernel/scall32-o32.S
> +++ b/arch/mips/kernel/scall32-o32.S
> @@ -30,6 +30,18 @@ NESTED(handle_sys, PT_SIZE, sp)
>  	.set	noat
>  #ifdef CONFIG_CPU_R5900
>  	/*
> +	 * For the R5900, there are cases in which the first two instructions
> +	 * in an exception handler are executed as NOP instructions, when
> +	 * certain exceptions occur and then a bus error occurs immediately
> +	 * before jumping to the exception handler (FLX05).
> +	 *
> +	 * The corrective measure is to place NOP in the first two instruction
> +	 * locations in all exception handlers.
> +	 */
> +	nop
> +	nop
> +
> +	/*

 Likewise.

> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index a18b013fd887..fc7ec8f9eed8 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -2049,6 +2054,11 @@ build_r4000_tlbchange_handler_head(u32 **p, struct uasm_label **l,
>  {
>  	struct work_registers wr = build_get_work_registers(p);
>  
> +#ifdef CONFIG_CPU_R5900
> +	uasm_i_nop(p);
> +	uasm_i_nop(p);
> +#endif
> +

 Likewise.

 IOW the only places that look relevant to me are: `except_vec3_generic', 
`build_r4000_tlb_refill_handler' and `set_except_vector'.  Please update 
your change accordingly.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: R5900: Workaround for the short loop bug
  2018-02-12  9:25                                           ` Maciej W. Rozycki
@ 2018-02-12 15:22                                             ` Fredrik Noring
  0 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-12 15:22 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Many thanks for your prompt and detailed reviews, Maciej,

> > The exact NOP placements in this patch are provisional. Request for comment
> > on the method to use. I believe there are at least three alternatives:
> > 
> > 1. Add #ifdefs or macros in the source code (similar to this patch).
> > 2. Modify the assembler to automatically insert NOPs as required.
> > 3. Avoid assembly and use C versions of memcpy etc. instead.
> > 
> > This change has been ported from v2.6 patches.
> 
>  I can't tell if this is a porting artefact or whether the reason is 
> different, but many of these loops contain more than 6 instructions 
> already, or need fewer than 3 NOPs.  Please review accordingly.
> 
>  Also can't this be handled automagically by GAS instead?  We have similar 
> workarounds already implemented, see e.g. `-mfix-vr4130'.  Otherwise this 
> is looking to me like a candidate for a maintenance nightmare (which the 
> problem with getting loop instruction counts wrong in your patch is a sign 
> of).

As noted above, please ignore the NOP details which just barely survived
from v2.6 (according to the principle that too many NOPs still work, whereas
too few crash badly), especially since I very much agree with you that it is
unreasonable to maintain such NOPs by hand and would rather proceed with
alternative (2) [from the list above] that is to modify the assembler instead.

Meanwhile, is it possible to run with alternative (3) that is to use C
fallbacks for the R5900, provided the performance penalty is reasonable?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-12  9:28                                           ` Maciej W. Rozycki
@ 2018-02-15 19:15                                             ` Fredrik Noring
  2018-02-15 20:49                                               ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-15 19:15 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  Well, but it would help if you only patched the handlers which are 
> actually used by the R5900 (and only the handlers and not other code).

Indeed, thanks. :) I'm glad this is cleared up, and greatly simplified too.
I tried to go through the details. According to 5-7 of the TX79 manual the
R5900 has six exception vector addresses:

- 0x80000000	TLB Refill EXL=0		build_r4000_tlb_refill_handler
- 0x80000080	Performance Counter
- 0x80000100	Debug, SIO
- 0x80000180	TLB Refill EXL=1, Others	except_vec3_generic
- 0x80000200	Interrupt			set_except_vector
- 0xbfc00000	Reset, NMI

Given that build_r4000_tlb_refill_handler copies 0x100 bytes with

	memcpy((void *)ebase, final_handler, 0x100);

it seems to overwrite the Performance Counter handler (ebase offset 0x80),
which isn't installed at all as I understand it (neither seems Debug, SIO).
A further complication: it seems to actually make use of up to 252 bytes:

/* The worst case length of the handler is around 18 instructions for           
 * R3000-style TLBs and up to 63 instructions for R4000-style TLBs.             
 * Maximum space available is 32 instructions for R3000 and 64                  
 * instructions for R4000.                                                      
 *                                                                              
 * We deliberately chose a buffer size of 128, so we won't scribble             
 * over anything important on overflow before we panic.                         
 */                                                                             
static u32 tlb_handler[128];                                                    

The R5900 wants two additional NOPs (8 bytes) for FLX05 and then another
five NOPs (20 bytes) for ERET (potentially up to 280 bytes):

https://www.linux-mips.org/archives/linux-mips/2018-02/msg00106.html

Fortunately, in practice, final_len ends on 31 all in all, just 4 bytes
below the 0x80 offset for the Performance Counter handler. Does the
following change make sense to at least partially address the overwrite?

--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1507,8 +1507,8 @@ static void build_r4000_tlb_refill_handler(void)
 	pr_debug("Wrote TLB refill handler (%u instructions).\n",
 		 final_len);
 
-	memcpy((void *)ebase, final_handler, 0x100);
-	local_flush_icache_range(ebase, ebase + 0x100);
+	memcpy((void *)ebase, final_handler, 4 * final_len);
+	local_flush_icache_range(ebase, ebase + 4 * final_len);
 
 	dump_handler("r4000_tlb_refill", (u32 *)ebase, 64);
 }

By the way, I tried to inspect the exception handlers via /dev/mem but this
fails with "bad address". Is it expected to work at all? A web search turned
up

https://www.linux-mips.org/archives/linux-mips/2000-12/msg00051.html

which gave some hope. :) Here is a memory layout that I think would be
interesting to access via /dev/mem:

http://www.psdevwiki.com/ps3/PS2_Emulation#PS2_Memory_and_Hardware_Mapped_Registers_Layout

>  IOW the only places that look relevant to me are: `except_vec3_generic', 
> `build_r4000_tlb_refill_handler' and `set_except_vector'.  Please update 
> your change accordingly.

Please find updated patch below. I've compiled and tested it. However, it
seems appropriate to also fix the issues with build_r4000_tlb_refill_handler
described above, and perhaps even install default handlers for the
Performance Counter, Debug and SIO?

Fredrik

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index c7b64f4a8ad3..a2bee29debe9 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -32,6 +32,10 @@
 NESTED(except_vec3_generic, 0, sp)
 	.set	push
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	nop
+	nop
+#endif
 #if R5432_CP0_INTERRUPT_WAR
 #ifdef CONFIG_CPU_R5900
 	sync.p
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 761b6c369321..b881b93f0418 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -1949,6 +1949,11 @@ void __init *set_except_vector(int n, void *addr)
 #endif
 		u32 *buf = (u32 *)(ebase + 0x200);
 		unsigned int k0 = 26;
+
+#ifdef CONFIG_CPU_R5900
+		uasm_i_nop(&buf);
+		uasm_i_nop(&buf);
+#endif
 		if ((handler & jump_mask) == ((ebase + 0x200) & jump_mask)) {
 			uasm_i_j(&buf, handler & ~jump_mask);
 			uasm_i_nop(&buf);
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index a18b013fd887..f4e0e748ed8a 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1308,6 +1308,11 @@ static void build_r4000_tlb_refill_handler(void)
 	memset(relocs, 0, sizeof(relocs));
 	memset(final_handler, 0, sizeof(final_handler));
 
+#ifdef CONFIG_CPU_R5900
+	uasm_i_nop(&p);
+	uasm_i_nop(&p);
+#endif
+
 	if (IS_ENABLED(CONFIG_64BIT) && (scratch_reg >= 0 || scratchpad_available()) && use_bbit_insns()) {
 		htlb_info = build_fast_tlb_refill_handler(&p, &l, &r, K0, K1,
 							  scratch_reg);

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-15 19:15                                             ` [RFC v2] " Fredrik Noring
@ 2018-02-15 20:49                                               ` Maciej W. Rozycki
  2018-02-17 11:16                                                 ` Fredrik Noring
  2018-02-18  8:47                                                 ` Fredrik Noring
  0 siblings, 2 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-15 20:49 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> Indeed, thanks. :) I'm glad this is cleared up, and greatly simplified too.
> I tried to go through the details. According to 5-7 of the TX79 manual the
> R5900 has six exception vector addresses:
> 
> - 0x80000000	TLB Refill EXL=0		build_r4000_tlb_refill_handler
> - 0x80000080	Performance Counter
> - 0x80000100	Debug, SIO
> - 0x80000180	TLB Refill EXL=1, Others	except_vec3_generic
> - 0x80000200	Interrupt			set_except_vector
> - 0xbfc00000	Reset, NMI
> 
> Given that build_r4000_tlb_refill_handler copies 0x100 bytes with
> 
> 	memcpy((void *)ebase, final_handler, 0x100);
> 
> it seems to overwrite the Performance Counter handler (ebase offset 0x80),
> which isn't installed at all as I understand it (neither seems Debug, SIO).

 Regular MIPS processors have the XTLB Refill handler at offset 0x80 
(which we use with 64-bit kernels in place of TLB Refill, by setting 
Status KX/SX/UX bits) and the Cache Error handler at offset 0x100 (which 
uses C/KSEG1 rather than C/KSEG0 as the base).  Both Refill slots are made 
available for a single TLB/XTLB handler as we never use both at a time 
(this may eventually change, as per recent discussions about making the 
user address space wrap where possible, on a per-task basis according to 
the ABI selected).

 If R5900 support wants to use these handler slots in a different manner, 
then it is of course free to.

> A further complication: it seems to actually make use of up to 252 bytes:
> 
> /* The worst case length of the handler is around 18 instructions for           
>  * R3000-style TLBs and up to 63 instructions for R4000-style TLBs.             
>  * Maximum space available is 32 instructions for R3000 and 64                  
>  * instructions for R4000.                                                      
>  *                                                                              
>  * We deliberately chose a buffer size of 128, so we won't scribble             
>  * over anything important on overflow before we panic.                         
>  */                                                                             
> static u32 tlb_handler[128];                                                    
> 
> The R5900 wants two additional NOPs (8 bytes) for FLX05 and then another
> five NOPs (20 bytes) for ERET (potentially up to 280 bytes):
> 
> https://www.linux-mips.org/archives/linux-mips/2018-02/msg00106.html
> 
> Fortunately, in practice, final_len ends on 31 all in all, just 4 bytes
> below the 0x80 offset for the Performance Counter handler. Does the
> following change make sense to at least partially address the overwrite?
> 
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -1507,8 +1507,8 @@ static void build_r4000_tlb_refill_handler(void)
>  	pr_debug("Wrote TLB refill handler (%u instructions).\n",
>  		 final_len);
>  
> -	memcpy((void *)ebase, final_handler, 0x100);
> -	local_flush_icache_range(ebase, ebase + 0x100);
> +	memcpy((void *)ebase, final_handler, 4 * final_len);
> +	local_flush_icache_range(ebase, ebase + 4 * final_len);
>  
>  	dump_handler("r4000_tlb_refill", (u32 *)ebase, 64);
>  }

 I didn't comment on the erratum workaround addressing speculative 
execution beyond ERET, because I haven't made final conclusions as to code 
will have to exactly look like.

 However please note that in reality 5 NOPs are not required in these 
generated handlers (except perhaps from the interrupt handler, which will 
have to be double-checked, due to being set up differently), because the 
lone reason for them to be inserted is to prevent from data interpreted as 
ill-formed code being speculatively executed.  But any handler that 
follows does not contain ill-formed code and the `tlb_handler' buffer is 
cleared before any generated machine code is built within, so any trailing 
padding uses the encoding of NOP.  Which means you can exclude these 5 
NOPs from calculation.

> By the way, I tried to inspect the exception handlers via /dev/mem but this
> fails with "bad address". Is it expected to work at all? A web search turned
> up
> 
> https://www.linux-mips.org/archives/linux-mips/2000-12/msg00051.html
> 
> which gave some hope. :) Here is a memory layout that I think would be
> interesting to access via /dev/mem:
> 
> http://www.psdevwiki.com/ps3/PS2_Emulation#PS2_Memory_and_Hardware_Mapped_Registers_Layout

 You could use /dev/mem to inspect exception handlers I suppose, but that 
would be awkward.  It's mostly useful to access MMIO as I described in the 
message you were kind enough to dig out from the depths of history.

 For exception handler examination I suggest using /proc/kcore instead, 
which gives you access to kernel memory via an artificial ELF image, 
making this a piece of cake.  Like this for example:

$ gdb -c /proc/kcore
[...]
#0  0x00000000 in ?? ()
(gdb) set architecture mips:isa32r2
The target architecture is assumed to be mips:isa32r2
(gdb) x /32i 0x80000000
0x80000000:	lui	k1,0x8483
0x80000004:	mfc0	k0,c0_badvaddr
0x80000008:	lw	k1,-30560(k1)
0x8000000c:	srl	k0,k0,0x1a
0x80000010:	sll	k0,k0,0x2
0x80000014:	addu	k1,k1,k0
0x80000018:	mfc0	k0,c0_context
0x8000001c:	lw	k1,0(k1)
0x80000020:	srl	k0,k0,0x3
0x80000024:	andi	k0,k0,0x3ff8
0x80000028:	addu	k1,k1,k0
0x8000002c:	lw	k0,0(k1)
0x80000030:	lw	k1,4(k1)
0x80000034:	srl	k0,k0,0x6
0x80000038:	mtc0	k0,c0_entrylo0
0x8000003c:	srl	k1,k1,0x6
0x80000040:	mtc0	k1,c0_entrylo1
0x80000044:	tlbwr
0x80000048:	eret
0x8000004c:	nop
0x80000050:	nop
0x80000054:	nop
0x80000058:	nop
0x8000005c:	nop
0x80000060:	nop
0x80000064:	nop
0x80000068:	nop
0x8000006c:	nop
0x80000070:	nop
0x80000074:	nop
0x80000078:	nop
0x8000007c:	nop
(gdb) 

Substitute `mips:5900' for `mips:isa32r2' to get R5900 disassembly.  If 
you want to see raw machine code too, use `disassemble -r', but watch out 
for the syntax, which is different.  As you can see the trailing NOPs 
required are already there. :)  You can supply `vmlinux' as the executable 
to debug too for symbolic access.

 You can also ask the kernel to dump generated handlers to the kernel log 
(and the console, if `debug' has been specified as a kernel parameter) at 
bootstrap by building tlbex.c and/or page.c with -DDEBUG, e.g.:

$ make CFLAGS_tlbex.o=-DDEBUG vmlinux

It can help if a bug in a generated handler prevents the kernel from 
starting userland.

> >  IOW the only places that look relevant to me are: `except_vec3_generic', 
> > `build_r4000_tlb_refill_handler' and `set_except_vector'.  Please update 
> > your change accordingly.
> 
> Please find updated patch below. I've compiled and tested it.

 It seems fine to me.

> However, it
> seems appropriate to also fix the issues with build_r4000_tlb_refill_handler
> described above, and perhaps even install default handlers for the
> Performance Counter, Debug and SIO?

 A handler for SIO is needed if SIOInt can be asserted without kernel 
control by PS/2 hardware.  Otherwise handlers will only be needed once the 
kernel has means to enable the respective exceptions.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-15 20:49                                               ` Maciej W. Rozycki
@ 2018-02-17 11:16                                                 ` Fredrik Noring
  2018-02-17 11:57                                                   ` Maciej W. Rozycki
  2018-02-18  8:47                                                 ` Fredrik Noring
  1 sibling, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-17 11:16 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  You could use /dev/mem to inspect exception handlers I suppose, but that 
> would be awkward.  It's mostly useful to access MMIO as I described in the 
> message you were kind enough to dig out from the depths of history.
> 
>  For exception handler examination I suggest using /proc/kcore instead, 
> which gives you access to kernel memory via an artificial ELF image, 
> making this a piece of cake.  Like this for example:
> 
> $ gdb -c /proc/kcore
> [...]
> #0  0x00000000 in ?? ()
> (gdb) set architecture mips:isa32r2
> The target architecture is assumed to be mips:isa32r2
> (gdb) x /32i 0x80000000
> 0x80000000:	lui	k1,0x8483
> 0x80000004:	mfc0	k0,c0_badvaddr
> 0x80000008:	lw	k1,-30560(k1)
> 0x8000000c:	srl	k0,k0,0x1a

This was an interesting exercise. I suspect GDB runs out of memory since

	# gdb -q -c /proc/kcore
	[New process 1]
	Segmentation fault

with

	# dmesg | tail -n3
	do_page_fault(): sending SIGSEGV to gdb for invalid read access from 000000a8
	epc = 00953910 in gdb[400000+6d1000]
	ra  = 009538b8 in gdb[400000+6d1000]

to me looks like GDB does a NULL pointer deference (the PS2 has 32 MiB of
RAM, of which 16 MiB is used for a ramdisk in my setup). GDB once could
handle core files remotely, but this capability is apparently now lost:

https://www.redhat.com/archives/crash-utility/2011-December/msg00019.html

One can get a little further by sharing /proc using v9fs to obtain:

	# mipsel-linux-gdb -q -c /mnt/kcore
	[New process 1]
	Core was generated by `ramdisk_size=16384 crtmode=pal1 video=ps2fb:pal,640x480-32 rd_start=0x8063c000'.
	#0  0x00000000 in ?? ()
	(gdb) set architecture mips:5900
	The target architecture is assumed to be mips:5900
	(gdb) x /32i 0x80000000
	   0x80000000:	Cannot access memory at address 0x80000000

In this case I'm wondering whether kcore contains proper ELF headers. What
is the output of readelf for your kcore? I have this:

	ELF Header:
	  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
	  Class:                             ELF32
	  Data:                              2's complement, little endian
	  Version:                           1 (current)
	  OS/ABI:                            UNIX - System V
	  ABI Version:                       0
	  Type:                              CORE (Core file)
	  Machine:                           MIPS R3000
	  Version:                           0x1
	  Entry point address:               0x0
	  Start of program headers:          52 (bytes into file)
	  Start of section headers:          0 (bytes into file)
	  Flags:                             0x0
	  Size of this header:               52 (bytes)
	  Size of program headers:           32 (bytes)
	  Number of program headers:         3
	  Size of section headers:           0 (bytes)
	  Number of section headers:         0
	  Section header string table index: 0
	
	There are no sections in this file.
	
	There are no sections to group in this file.
	
	Program Headers:
	  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
	  NOTE           0x000094 0x00000000 0x00000000 0x0074c 0x00000     0
	  LOAD           0x40001000 0xc0000000 0xffffffff 0x3f7fe000 0x3f7fe000 RWE 0x1000
	  LOAD           0x001000 0x80000000 0x00000000 0x2000000 0x2000000 RWE 0x1000
	
	There is no dynamic section in this file.
	
	There are no relocations in this file.
	
	The decoding of unwind sections for machine type MIPS R3000 is not currently supported.
	
	No version information found in this file.
	
	Displaying notes found at file offset 0x00000094 with length 0x0000074c:
	  Owner                 Data size	Description
	  CORE                 0x00000100	NT_PRSTATUS (prstatus structure)
	  CORE                 0x00000080	NT_PRPSINFO (prpsinfo structure)
	  CORE                 0x00000590	NT_TASKSTRUCT (task structure)

Returning to the more awkward /dev/mem device, the "bad address" error with
for example

	# xxd -s $(( 0x80000000 )) -l 256 /dev/mem
	xxd: /dev/mem: Bad address

is due to drivers/char/mem.c:valid_phys_addr_range which fails on

	return addr + count <= __pa(high_memory);

since

	0x80000000 + 16 <= 0x2000000

is false for CPHYSADDR(0x82000000) in arch/mips/include/asm/page.h:___pa:

	if (!IS_ENABLED(CONFIG_EVA)) {
		/*
		 * We're using the standard MIPS32 legacy memory map, ie.
		 * the address x is going to be in kseg0 or kseg1. We can
		 * handle either case by masking out the desired bits using
		 * CPHYSADDR.
		 */
		return CPHYSADDR(x);
	}

I noticed that /dev/mem is an exception to this comment just above ___pa:

	/*
	 * __pa()/__va() should be used only during mem init.
	 */

Finally, trying to mmap /dev/mem also fails, because

	/* Does it even fit in phys_addr_t? */                                  
	if (offset >> PAGE_SHIFT != vma->vm_pgoff) {                            

in drivers/char/mem.c:mmap_mem computes

	0x00080000 != 0xfff80000

resulting in -EINVAL. Is this the expected behaviour?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-17 11:16                                                 ` Fredrik Noring
@ 2018-02-17 11:57                                                   ` Maciej W. Rozycki
  2018-02-17 13:38                                                     ` Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-17 11:57 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> This was an interesting exercise. I suspect GDB runs out of memory since
> 
> 	# gdb -q -c /proc/kcore
> 	[New process 1]
> 	Segmentation fault
> 
> with
> 
> 	# dmesg | tail -n3
> 	do_page_fault(): sending SIGSEGV to gdb for invalid read access from 000000a8
> 	epc = 00953910 in gdb[400000+6d1000]
> 	ra  = 009538b8 in gdb[400000+6d1000]
> 
> to me looks like GDB does a NULL pointer deference (the PS2 has 32 MiB of
> RAM, of which 16 MiB is used for a ramdisk in my setup). GDB once could
> handle core files remotely, but this capability is apparently now lost:
> 
> https://www.redhat.com/archives/crash-utility/2011-December/msg00019.html

 If you can't access /proc/kcore with GDB locally, for whatever reason, 
then `dd' it (or a part of it); to a regular file and copy it to another 
machine.  Use cross-GDB if necessary.  With 16 MiB of RAM available only 
it can be getting really tight; the kernel itself takes half of it too I 
suppose.

> In this case I'm wondering whether kcore contains proper ELF headers. What
> is the output of readelf for your kcore? I have this:

 Looks reasonable to me.

> Returning to the more awkward /dev/mem device, the "bad address" error with
> for example
> 
> 	# xxd -s $(( 0x80000000 )) -l 256 /dev/mem
> 	xxd: /dev/mem: Bad address

 You need to use bus (physical) rather than virtual addresses with 
/dev/mem, so:

# xxd -s 0 -l 256 /dev/mem

or suchlike.

 HTH,

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-17 11:57                                                   ` Maciej W. Rozycki
@ 2018-02-17 13:38                                                     ` Fredrik Noring
  2018-02-17 15:03                                                       ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-17 13:38 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  If you can't access /proc/kcore with GDB locally, for whatever reason, 
> then `dd' it (or a part of it); to a regular file and copy it to another 
> machine. Use cross-GDB if necessary.  With 16 MiB of RAM available only 
> it can be getting really tight; the kernel itself takes half of it too I 
> suppose.

Both a (complete) remote copy of kcore, and one shared via v9fs, yield
"Cannot access memory at address 0x80000000" with a cross-GDB, unfortunately:

> > One can get a little further by sharing /proc using v9fs to obtain:
> > 
> > 	# mipsel-linux-gdb -q -c /mnt/kcore
> > 	[New process 1]
> > 	Core was generated by `ramdisk_size=16384 crtmode=pal1 video=ps2fb:pal,640x480-32 rd_start=0x8063c000'.
> > 	#0  0x00000000 in ?? ()
> > 	(gdb) set architecture mips:5900
> > 	The target architecture is assumed to be mips:5900
> > 	(gdb) x /32i 0x80000000
> > 	   0x80000000:	Cannot access memory at address 0x80000000

By examining the read operations for /proc/kcore, it seems GDB reaches this
"cannot access" conclusion from the ELF headers.

>  You need to use bus (physical) rather than virtual addresses with 
> /dev/mem, so:
> 
> # xxd -s 0 -l 256 /dev/mem
> 
> or suchlike.

Ah, the value of the physical address was a misunderstanding on my part. The
convoluted combination of mipsel-linux-objcopy and mipsel-linux-objdump gets
the disassembly done without GDB, as shown below. :D

It looks very similar to yours, with additional NOPs and SYNCs required for
the R5900:

	# ssh ps2 head -c 128 /dev/mem >kcore &&
	    mipsel-linux-objcopy -I binary -O elf32-little kcore kcore.elf &&
	    mipsel-linux-objdump -D -m mips:5900 kcore.elf
	kcore.elf:     file format elf32-little
	Disassembly of section .data:
	00000000 <_binary_kcore_start>:
		...
	   8:	3c1b8061 	lui	k1,0x8061
	   c:	0000040f 	sync.p
	  10:	401a4000 	mfc0	k0,c0_badvaddr
	  14:	8f7b2c60 	lw	k1,11360(k1)
	  18:	001ad582 	srl	k0,k0,0x16
	  1c:	001ad080 	sll	k0,k0,0x2
	  20:	037ad821 	addu	k1,k1,k0
	  24:	0000040f 	sync.p
	  28:	401a2000 	mfc0	k0,c0_context
	  2c:	8f7b0000 	lw	k1,0(k1)
	  30:	001ad042 	srl	k0,k0,0x1
	  34:	335a0ff8 	andi	k0,k0,0xff8
	  38:	037ad821 	addu	k1,k1,k0
	  3c:	8f7a0000 	lw	k0,0(k1)
	  40:	8f7b0004 	lw	k1,4(k1)
	  44:	001ad142 	srl	k0,k0,0x5
	  48:	409a1000 	mtc0	k0,c0_entrylo0
	  4c:	0000040f 	sync.p
	  50:	001bd942 	srl	k1,k1,0x5
	  54:	409b1800 	mtc0	k1,c0_entrylo1
	  58:	0000040f 	sync.p
	  5c:	42000006 	tlbwr
	  60:	0000040f 	sync.p
	  64:	42000018 	eret
		...

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (5 preceding siblings ...)
  2018-02-11  8:29                                         ` [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers Fredrik Noring
@ 2018-02-17 14:43                                         ` Fredrik Noring
  2018-02-17 15:18                                           ` Maciej W. Rozycki
  2018-02-18  9:26                                         ` [RFC] MIPS: R5900: Workaround where MSB must be 0 for the instruction cache Fredrik Noring
                                                           ` (2 subsequent siblings)
  9 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-17 14:43 UTC (permalink / raw)
  To: Jürgen Urban; +Cc: Maciej W. Rozycki, linux-mips

Hi Jürgen,

Would you be able to elaborate on the following change with a workaround for
saving and restoring R5900 FPU registers? Is this problem documented in your
copy of Sony's Linux Toolkit Restriction manual?
    
    Fixed saving and restoring of FPU registers. Odd FPU registers were
    lost on exceptions and when simulating 64 bit FPU. Debian 5.0 mipsel
    uses MOV.D to move FPU registers. This is not supported by R5900 and
    failed in the simulation because of the bug above.

Fredrik

diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index 8d1e30b94c2d..a67ef7964bc1 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -141,6 +141,52 @@
 	.set	pop
 	.endm
 
+#ifdef CONFIG_CPU_R5900
+	/*
+	 * Kernel expects that floating point registers are saved as 64-bit
+	 * with the sdc1 instruction, but this is not working with R5900.
+	 * The 64-bit write is simulated as two 32-bit writes.
+	 */
+	.macro fpu_save_double thread status tmp1=t0
+	.set push
+	SET_HARDFLOAT
+	cfc1	\tmp1,  fcr31
+	swc1	$f0,  THREAD_FPR0(\thread)
+	swc1	$f1,  (THREAD_FPR0 + 4)(\thread)
+	swc1	$f2,  THREAD_FPR2(\thread)
+	swc1	$f3,  (THREAD_FPR2 + 4)(\thread)
+	swc1	$f4,  THREAD_FPR4(\thread)
+	swc1	$f5,  (THREAD_FPR4 + 4)(\thread)
+	swc1	$f6,  THREAD_FPR6(\thread)
+	swc1	$f7,  (THREAD_FPR6 + 4)(\thread)
+	swc1	$f8,  THREAD_FPR8(\thread)
+	swc1	$f9,  (THREAD_FPR8 + 4)(\thread)
+	swc1	$f10, THREAD_FPR10(\thread)
+	swc1	$f11, (THREAD_FPR10 + 4)(\thread)
+	swc1	$f12, THREAD_FPR12(\thread)
+	swc1	$f13, (THREAD_FPR12 + 4)(\thread)
+	swc1	$f14, THREAD_FPR14(\thread)
+	swc1	$f15, (THREAD_FPR14 + 4)(\thread)
+	swc1	$f16, THREAD_FPR16(\thread)
+	swc1	$f17, (THREAD_FPR16 + 4)(\thread)
+	swc1	$f18, THREAD_FPR18(\thread)
+	swc1	$f19, (THREAD_FPR18 + 4)(\thread)
+	swc1	$f20, THREAD_FPR20(\thread)
+	swc1	$f21, (THREAD_FPR20 + 4)(\thread)
+	swc1	$f22, THREAD_FPR22(\thread)
+	swc1	$f23, (THREAD_FPR22 + 4)(\thread)
+	swc1	$f24, THREAD_FPR24(\thread)
+	swc1	$f25, (THREAD_FPR24 + 4)(\thread)
+	swc1	$f26, THREAD_FPR26(\thread)
+	swc1	$f27, (THREAD_FPR26 + 4)(\thread)
+	swc1	$f28, THREAD_FPR28(\thread)
+	swc1	$f29, (THREAD_FPR28 + 4)(\thread)
+	swc1	$f30, THREAD_FPR30(\thread)
+	swc1	$f31, (THREAD_FPR30 + 4)(\thread)
+	sw	\tmp1, THREAD_FCR31(\thread)
+	.set pop
+	.endm
+#else
 	.macro	fpu_save_double thread status tmp
 #if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
 		defined(CONFIG_CPU_MIPSR6)
@@ -151,6 +197,7 @@
 #endif
 	fpu_save_16even \thread \tmp
 	.endm
+#endif
 
 	.macro	fpu_restore_16even thread tmp=t0
 	.set	push
@@ -200,6 +247,52 @@
 	.set	pop
 	.endm
 
+#ifdef CONFIG_CPU_R5900
+	/*
+	 * Kernel expects that floating point registers are read as 64-bit
+	 * with the ldc1 instruction, but this is not working with R5900.
+	 * The 64-bit read is simulated as two 32-bit reads.
+	 */
+	.macro	fpu_restore_double thread status tmp=t0
+	.set push
+	SET_HARDFLOAT
+	lw	\tmp, THREAD_FCR31(\thread)
+	lwc1	$f0,  THREAD_FPR0(\thread)
+	lwc1	$f1,  (THREAD_FPR0 + 4)(\thread)
+	lwc1	$f2,  THREAD_FPR2(\thread)
+	lwc1	$f3,  (THREAD_FPR2 + 4)(\thread)
+	lwc1	$f4,  THREAD_FPR4(\thread)
+	lwc1	$f5,  (THREAD_FPR4 + 4)(\thread)
+	lwc1	$f6,  THREAD_FPR6(\thread)
+	lwc1	$f7,  (THREAD_FPR6 + 4)(\thread)
+	lwc1	$f8,  THREAD_FPR8(\thread)
+	lwc1	$f9,  (THREAD_FPR8 + 4)(\thread)
+	lwc1	$f10, THREAD_FPR10(\thread)
+	lwc1	$f11, (THREAD_FPR10 + 4)(\thread)
+	lwc1	$f12, THREAD_FPR12(\thread)
+	lwc1	$f13, (THREAD_FPR12 + 4)(\thread)
+	lwc1	$f14, THREAD_FPR14(\thread)
+	lwc1	$f15, (THREAD_FPR14 + 4)(\thread)
+	lwc1	$f16, THREAD_FPR16(\thread)
+	lwc1	$f17, (THREAD_FPR16 + 4)(\thread)
+	lwc1	$f18, THREAD_FPR18(\thread)
+	lwc1	$f19, (THREAD_FPR18 + 4)(\thread)
+	lwc1	$f20, THREAD_FPR20(\thread)
+	lwc1	$f21, (THREAD_FPR20 + 4)(\thread)
+	lwc1	$f22, THREAD_FPR22(\thread)
+	lwc1	$f23, (THREAD_FPR22 + 4)(\thread)
+	lwc1	$f24, THREAD_FPR24(\thread)
+	lwc1	$f25, (THREAD_FPR24 + 4)(\thread)
+	lwc1	$f26, THREAD_FPR26(\thread)
+	lwc1	$f27, (THREAD_FPR26 + 4)(\thread)
+	lwc1	$f28, THREAD_FPR28(\thread)
+	lwc1	$f29, (THREAD_FPR28 + 4)(\thread)
+	lwc1	$f30, THREAD_FPR30(\thread)
+	lwc1	$f31, (THREAD_FPR30 + 4)(\thread)
+	ctc1	\tmp, fcr31
+	.set	pop
+	.endm
+#else
 	.macro	fpu_restore_double thread status tmp
 #if defined(CONFIG_64BIT) || defined(CONFIG_CPU_MIPSR2) || \
 		defined(CONFIG_CPU_MIPSR6)
@@ -211,6 +304,7 @@
 #endif
 	fpu_restore_16even \thread \tmp
 	.endm
+#endif
 
 #if defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	.macro	_EXT	rd, rs, p, s
diff --git a/arch/mips/kernel/Makefile b/arch/mips/kernel/Makefile
index f10e1e15e1c6..bf192fc9957a 100644
--- a/arch/mips/kernel/Makefile
+++ b/arch/mips/kernel/Makefile
@@ -44,6 +44,7 @@ obj-y				+= $(sw-y)
 
 obj-$(CONFIG_CPU_R4K_FPU)	+= r4k_fpu.o
 obj-$(CONFIG_CPU_R3000)		+= r2300_fpu.o
+obj-$(CONFIG_CPU_R5900)		+= r5900_fpu.o
 obj-$(CONFIG_CPU_TX39XX)	+= r2300_fpu.o
 
 obj-$(CONFIG_SMP)		+= smp.o
diff --git a/arch/mips/kernel/r5900_fpu.S b/arch/mips/kernel/r5900_fpu.S
new file mode 100644
index 000000000000..d4fdc823444d
--- /dev/null
+++ b/arch/mips/kernel/r5900_fpu.S
@@ -0,0 +1,389 @@
+/*
+ * FPU handling on MIPS r5900. Copied from r4k_fpu.c.
+ *
+ * Copyright (C) 2010-2013 Jürgen Urban
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <asm/asm.h>
+#include <asm/asmmacro.h>
+#include <asm/errno.h>
+#include <asm/fpregdef.h>
+#include <asm/mipsregs.h>
+#include <asm/asm-offsets.h>
+#include <asm/regdef.h>
+
+	.macro	EX insn, reg, src
+	.set	push
+	SET_HARDFLOAT
+	.set	nomacro
+	/* In an error exception handler the user space could be uncached. */
+	sync.l
+.ex\@:	\insn	\reg, \src
+	.set	pop
+	.section __ex_table,"a"
+	PTR	.ex\@, fault
+	.previous
+	.endm
+
+	.set	noreorder
+	.set	arch=r5900
+
+/*
+ * Save a thread's fp context.
+ */
+LEAF(_save_fp)
+	fpu_save_double a0 t0 t1		# clobbers t1
+	jr	ra
+	END(_save_fp)
+
+/*
+ * Restore a thread's fp context.
+ */
+LEAF(_restore_fp)
+	fpu_restore_double a0 t0 t1		# clobbers t1
+	jr	ra
+	END(_restore_fp)
+
+LEAF(_save_fp_context)
+	.set	push
+	SET_HARDFLOAT
+	cfc1	t1, fcr31
+	.set	pop
+
+	/* Store the 32 32-bit registers */
+	EX	swc1 $f0, SC_FPREGS+0(a0)
+	EX	swc1 $f1, SC_FPREGS+4(a0)
+	EX	swc1 $f2, SC_FPREGS+16(a0)
+	EX	swc1 $f3, SC_FPREGS+20(a0)
+	EX	swc1 $f4, SC_FPREGS+32(a0)
+	EX	swc1 $f5, SC_FPREGS+36(a0)
+	EX	swc1 $f6, SC_FPREGS+48(a0)
+	EX	swc1 $f7, SC_FPREGS+52(a0)
+	EX	swc1 $f8, SC_FPREGS+64(a0)
+	EX	swc1 $f9, SC_FPREGS+68(a0)
+	EX	swc1 $f10, SC_FPREGS+80(a0)
+	EX	swc1 $f11, SC_FPREGS+84(a0)
+	EX	swc1 $f12, SC_FPREGS+96(a0)
+	EX	swc1 $f13, SC_FPREGS+100(a0)
+	EX	swc1 $f14, SC_FPREGS+112(a0)
+	EX	swc1 $f15, SC_FPREGS+116(a0)
+	EX	swc1 $f16, SC_FPREGS+128(a0)
+	EX	swc1 $f17, SC_FPREGS+132(a0)
+	EX	swc1 $f18, SC_FPREGS+144(a0)
+	EX	swc1 $f19, SC_FPREGS+148(a0)
+	EX	swc1 $f20, SC_FPREGS+160(a0)
+	EX	swc1 $f21, SC_FPREGS+164(a0)
+	EX	swc1 $f22, SC_FPREGS+176(a0)
+	EX	swc1 $f23, SC_FPREGS+180(a0)
+	EX	swc1 $f24, SC_FPREGS+192(a0)
+	EX	swc1 $f25, SC_FPREGS+196(a0)
+	EX	swc1 $f26, SC_FPREGS+208(a0)
+	EX	swc1 $f27, SC_FPREGS+212(a0)
+	EX	swc1 $f28, SC_FPREGS+224(a0)
+	EX	swc1 $f29, SC_FPREGS+228(a0)
+	EX	swc1 $f30, SC_FPREGS+240(a0)
+	EX	swc1 $f31, SC_FPREGS+244(a0)
+	EX	sw t1, SC_FPC_CSR(a0)
+	jr	ra
+	 li	v0, 0					# success
+	END(_save_fp_context)
+
+#ifdef CONFIG_MIPS32_COMPAT
+	/* Save 32-bit process floating point context */
+LEAF(_save_fp_context32)
+	.set	push
+	SET_HARDFLOAT
+	cfc1	t1, fcr31
+	.set	pop
+
+	EX	swc1 $f0, SC32_FPREGS+0(a0)
+	EX	swc1 $f1, SC32_FPREGS+4(a0)
+	EX	swc1 $f2, SC32_FPREGS+16(a0)
+	EX	swc1 $f3, SC32_FPREGS+20(a0)
+	EX	swc1 $f4, SC32_FPREGS+32(a0)
+	EX	swc1 $f5, SC32_FPREGS+36(a0)
+	EX	swc1 $f6, SC32_FPREGS+48(a0)
+	EX	swc1 $f7, SC32_FPREGS+52(a0)
+	EX	swc1 $f8, SC32_FPREGS+64(a0)
+	EX	swc1 $f9, SC32_FPREGS+68(a0)
+	EX	swc1 $f10, SC32_FPREGS+80(a0)
+	EX	swc1 $f11, SC32_FPREGS+84(a0)
+	EX	swc1 $f12, SC32_FPREGS+96(a0)
+	EX	swc1 $f13, SC32_FPREGS+100(a0)
+	EX	swc1 $f14, SC32_FPREGS+112(a0)
+	EX	swc1 $f15, SC32_FPREGS+116(a0)
+	EX	swc1 $f16, SC32_FPREGS+128(a0)
+	EX	swc1 $f17, SC32_FPREGS+132(a0)
+	EX	swc1 $f18, SC32_FPREGS+144(a0)
+	EX	swc1 $f19, SC32_FPREGS+148(a0)
+	EX	swc1 $f20, SC32_FPREGS+160(a0)
+	EX	swc1 $f21, SC32_FPREGS+164(a0)
+	EX	swc1 $f22, SC32_FPREGS+176(a0)
+	EX	swc1 $f23, SC32_FPREGS+180(a0)
+	EX	swc1 $f24, SC32_FPREGS+192(a0)
+	EX	swc1 $f25, SC32_FPREGS+196(a0)
+	EX	swc1 $f26, SC32_FPREGS+208(a0)
+	EX	swc1 $f27, SC32_FPREGS+212(a0)
+	EX	swc1 $f28, SC32_FPREGS+224(a0)
+	EX	swc1 $f29, SC32_FPREGS+228(a0)
+	EX	swc1 $f30, SC32_FPREGS+240(a0)
+	EX	swc1 $f31, SC32_FPREGS+244(a0)
+	EX	sw t1, SC32_FPC_CSR(a0)
+	cfc1	t0, $0				# implementation/version
+	EX	sw t0, SC32_FPC_EIR(a0)
+
+	jr	ra
+	 li	v0, 0					# success
+	END(_save_fp_context32)
+#endif
+
+/*
+ * Restore FPU state:
+ *  - fp gp registers
+ *  - cp1 status/control register
+ */
+LEAF(_restore_fp_context)
+	EX	lw t0, SC_FPC_CSR(a0)
+	EX	lwc1 $f0, SC_FPREGS+0(a0)
+	EX	lwc1 $f1, SC_FPREGS+4(a0)
+	EX	lwc1 $f2, SC_FPREGS+16(a0)
+	EX	lwc1 $f3, SC_FPREGS+20(a0)
+	EX	lwc1 $f4, SC_FPREGS+32(a0)
+	EX	lwc1 $f5, SC_FPREGS+36(a0)
+	EX	lwc1 $f6, SC_FPREGS+48(a0)
+	EX	lwc1 $f7, SC_FPREGS+52(a0)
+	EX	lwc1 $f8, SC_FPREGS+64(a0)
+	EX	lwc1 $f9, SC_FPREGS+68(a0)
+	EX	lwc1 $f10, SC_FPREGS+80(a0)
+	EX	lwc1 $f11, SC_FPREGS+84(a0)
+	EX	lwc1 $f12, SC_FPREGS+96(a0)
+	EX	lwc1 $f13, SC_FPREGS+100(a0)
+	EX	lwc1 $f14, SC_FPREGS+112(a0)
+	EX	lwc1 $f15, SC_FPREGS+116(a0)
+	EX	lwc1 $f16, SC_FPREGS+128(a0)
+	EX	lwc1 $f17, SC_FPREGS+132(a0)
+	EX	lwc1 $f18, SC_FPREGS+144(a0)
+	EX	lwc1 $f19, SC_FPREGS+148(a0)
+	EX	lwc1 $f20, SC_FPREGS+160(a0)
+	EX	lwc1 $f21, SC_FPREGS+164(a0)
+	EX	lwc1 $f22, SC_FPREGS+176(a0)
+	EX	lwc1 $f23, SC_FPREGS+180(a0)
+	EX	lwc1 $f24, SC_FPREGS+192(a0)
+	EX	lwc1 $f25, SC_FPREGS+196(a0)
+	EX	lwc1 $f26, SC_FPREGS+208(a0)
+	EX	lwc1 $f27, SC_FPREGS+212(a0)
+	EX	lwc1 $f28, SC_FPREGS+224(a0)
+	EX	lwc1 $f29, SC_FPREGS+228(a0)
+	EX	lwc1 $f30, SC_FPREGS+240(a0)
+	EX	lwc1 $f31, SC_FPREGS+244(a0)
+	.set	push
+	SET_HARDFLOAT
+	ctc1	t0, fcr31
+	.set	pop
+	jr	ra
+	 li	v0, 0					# success
+	END(_restore_fp_context)
+
+#ifdef CONFIG_MIPS32_COMPAT
+LEAF(_restore_fp_context32)
+	/* Restore an o32 sigcontext.  */
+	EX	lw t0, SC32_FPC_CSR(a0)
+	EX	lwc1 $f0, SC32_FPREGS+0(a0)
+	EX	lwc1 $f1, SC32_FPREGS+4(a0)
+	EX	lwc1 $f2, SC32_FPREGS+16(a0)
+	EX	lwc1 $f3, SC32_FPREGS+20(a0)
+	EX	lwc1 $f4, SC32_FPREGS+32(a0)
+	EX	lwc1 $f5, SC32_FPREGS+36(a0)
+	EX	lwc1 $f6, SC32_FPREGS+48(a0)
+	EX	lwc1 $f7, SC32_FPREGS+52(a0)
+	EX	lwc1 $f8, SC32_FPREGS+64(a0)
+	EX	lwc1 $f9, SC32_FPREGS+68(a0)
+	EX	lwc1 $f10, SC32_FPREGS+80(a0)
+	EX	lwc1 $f11, SC32_FPREGS+84(a0)
+	EX	lwc1 $f12, SC32_FPREGS+96(a0)
+	EX	lwc1 $f13, SC32_FPREGS+100(a0)
+	EX	lwc1 $f14, SC32_FPREGS+112(a0)
+	EX	lwc1 $f15, SC32_FPREGS+116(a0)
+	EX	lwc1 $f16, SC32_FPREGS+128(a0)
+	EX	lwc1 $f17, SC32_FPREGS+132(a0)
+	EX	lwc1 $f18, SC32_FPREGS+144(a0)
+	EX	lwc1 $f19, SC32_FPREGS+148(a0)
+	EX	lwc1 $f20, SC32_FPREGS+160(a0)
+	EX	lwc1 $f21, SC32_FPREGS+164(a0)
+	EX	lwc1 $f22, SC32_FPREGS+176(a0)
+	EX	lwc1 $f23, SC32_FPREGS+180(a0)
+	EX	lwc1 $f24, SC32_FPREGS+192(a0)
+	EX	lwc1 $f25, SC32_FPREGS+196(a0)
+	EX	lwc1 $f26, SC32_FPREGS+208(a0)
+	EX	lwc1 $f27, SC32_FPREGS+212(a0)
+	EX	lwc1 $f28, SC32_FPREGS+224(a0)
+	EX	lwc1 $f29, SC32_FPREGS+228(a0)
+	EX	lwc1 $f30, SC32_FPREGS+240(a0)
+	EX	lwc1 $f31, SC32_FPREGS+244(a0)
+	.set	push
+	SET_HARDFLOAT
+	ctc1	t0, fcr31
+	.set	pop
+	jr	ra
+	 li	v0, 0					# success
+	END(_restore_fp_context32)
+#endif
+
+/*
+ * Load the FPU with signalling NANS.  This bit pattern we're using has
+ * the property that no matter whether considered as single or as double
+ * precision represents signaling NANS.
+ *
+ * The value to initialize fcr31 to comes in $a0.
+ */
+
+	.set push
+	SET_HARDFLOAT
+
+LEAF(_init_fpu)
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
+	mfc0	t0, CP0_STATUS
+	li	t1, ST0_CU1
+	or	t0, t1
+	mtc0	t0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
+	enable_fpu_hazard
+
+	ctc1	a0, fcr31
+
+	li	t1, -1				# SNaN
+
+#ifdef CONFIG_64BIT
+	sll	t0, t0, 5
+	bgez	t0, 1f				# 16 / 32 register mode?
+
+	dmtc1	t1, $f1
+	dmtc1	t1, $f3
+	dmtc1	t1, $f5
+	dmtc1	t1, $f7
+	dmtc1	t1, $f9
+	dmtc1	t1, $f11
+	dmtc1	t1, $f13
+	dmtc1	t1, $f15
+	dmtc1	t1, $f17
+	dmtc1	t1, $f19
+	dmtc1	t1, $f21
+	dmtc1	t1, $f23
+	dmtc1	t1, $f25
+	dmtc1	t1, $f27
+	dmtc1	t1, $f29
+	dmtc1	t1, $f31
+1:
+#endif
+
+#if defined(CONFIG_CPU_MIPS32) || defined(CONFIG_CPU_R5900)
+	mtc1	t1, $f0
+	mtc1	t1, $f1
+	mtc1	t1, $f2
+	mtc1	t1, $f3
+	mtc1	t1, $f4
+	mtc1	t1, $f5
+	mtc1	t1, $f6
+	mtc1	t1, $f7
+	mtc1	t1, $f8
+	mtc1	t1, $f9
+	mtc1	t1, $f10
+	mtc1	t1, $f11
+	mtc1	t1, $f12
+	mtc1	t1, $f13
+	mtc1	t1, $f14
+	mtc1	t1, $f15
+	mtc1	t1, $f16
+	mtc1	t1, $f17
+	mtc1	t1, $f18
+	mtc1	t1, $f19
+	mtc1	t1, $f20
+	mtc1	t1, $f21
+	mtc1	t1, $f22
+	mtc1	t1, $f23
+	mtc1	t1, $f24
+	mtc1	t1, $f25
+	mtc1	t1, $f26
+	mtc1	t1, $f27
+	mtc1	t1, $f28
+	mtc1	t1, $f29
+	mtc1	t1, $f30
+	mtc1	t1, $f31
+
+#if defined(CONFIG_CPU_MIPS32_R2) || defined(CONFIG_CPU_MIPS32_R6)
+	.set    push
+	.set    MIPS_ISA_LEVEL_RAW
+	.set	fp=64
+	sll     t0, t0, 5			# is Status.FR set?
+	bgez    t0, 1f				# no: skip setting upper 32b
+
+	mthc1   t1, $f0
+	mthc1   t1, $f1
+	mthc1   t1, $f2
+	mthc1   t1, $f3
+	mthc1   t1, $f4
+	mthc1   t1, $f5
+	mthc1   t1, $f6
+	mthc1   t1, $f7
+	mthc1   t1, $f8
+	mthc1   t1, $f9
+	mthc1   t1, $f10
+	mthc1   t1, $f11
+	mthc1   t1, $f12
+	mthc1   t1, $f13
+	mthc1   t1, $f14
+	mthc1   t1, $f15
+	mthc1   t1, $f16
+	mthc1   t1, $f17
+	mthc1   t1, $f18
+	mthc1   t1, $f19
+	mthc1   t1, $f20
+	mthc1   t1, $f21
+	mthc1   t1, $f22
+	mthc1   t1, $f23
+	mthc1   t1, $f24
+	mthc1   t1, $f25
+	mthc1   t1, $f26
+	mthc1   t1, $f27
+	mthc1   t1, $f28
+	mthc1   t1, $f29
+	mthc1   t1, $f30
+	mthc1   t1, $f31
+1:	.set    pop
+#endif /* CONFIG_CPU_MIPS32_R2 || CONFIG_CPU_MIPS32_R6 */
+#else
+	.set	MIPS_ISA_ARCH_LEVEL_RAW
+	dmtc1	t1, $f0
+	dmtc1	t1, $f2
+	dmtc1	t1, $f4
+	dmtc1	t1, $f6
+	dmtc1	t1, $f8
+	dmtc1	t1, $f10
+	dmtc1	t1, $f12
+	dmtc1	t1, $f14
+	dmtc1	t1, $f16
+	dmtc1	t1, $f18
+	dmtc1	t1, $f20
+	dmtc1	t1, $f22
+	dmtc1	t1, $f24
+	dmtc1	t1, $f26
+	dmtc1	t1, $f28
+	dmtc1	t1, $f30
+#endif
+	jr	ra
+	END(_init_fpu)
+
+	.set pop	/* SET_HARDFLOAT */
+	.set	reorder
+
+	.type	fault@function
+	.ent	fault
+fault:	li	v0, -EFAULT				# failure
+	jr	ra
+	.end	fault

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-17 13:38                                                     ` Fredrik Noring
@ 2018-02-17 15:03                                                       ` Maciej W. Rozycki
  2018-02-17 20:04                                                         ` Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-17 15:03 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> Both a (complete) remote copy of kcore, and one shared via v9fs, yield
> "Cannot access memory at address 0x80000000" with a cross-GDB, unfortunately:
> 
> > > One can get a little further by sharing /proc using v9fs to obtain:
> > > 
> > > 	# mipsel-linux-gdb -q -c /mnt/kcore
> > > 	[New process 1]
> > > 	Core was generated by `ramdisk_size=16384 crtmode=pal1 video=ps2fb:pal,640x480-32 rd_start=0x8063c000'.
> > > 	#0  0x00000000 in ?? ()
> > > 	(gdb) set architecture mips:5900
> > > 	The target architecture is assumed to be mips:5900
> > > 	(gdb) x /32i 0x80000000
> > > 	   0x80000000:	Cannot access memory at address 0x80000000
> 
> By examining the read operations for /proc/kcore, it seems GDB reaches this
> "cannot access" conclusion from the ELF headers.

 Hmm, whether it works or not seems to depend on GDB version.  It looks to 
me like we have a regression here.  Working GDB has:

(gdb) info files
Local core dump file:
        `/proc/kcore', file type elf32-tradlittlemips.
        0xffffffffc0000000 - 0xfffffffffff94000 is load1
        0xffffffff80000000 - 0xffffffff90000000 is load2
(gdb)

Broken GDB has:

(gdb) info files
Local core dump file:
        `/proc/kcore', file type elf32-tradlittlemips-freebsd.
        0xffffffffc0000000 - 0xfffffffffff94000 is load1
        0xffffffff80000000 - 0xffffffff90000000 is load2
(gdb)

Notice the different BFD target, `elf32-tradlittlemips-freebsd'.  You're 
supposed to be able to override it with `set gnutarget', but that doesn't 
seem to impress GDB, e.g.:

(gdb) show gnutarget
The current BFD target is "auto".
(gdb) set gnutarget elf32-tradlittlemips
(gdb) show gnutarget
The current BFD target is "elf32-tradlittlemips".
(gdb) info files
Local core dump file:
        `/home/mjr/src/kcore', file type elf32-tradlittlemips-freebsd.
        0xffffffffc0000000 - 0xfffffffffff94000 is load1
        0xffffffff80000000 - 0xffffffff90000000 is load2
(gdb)

I'll see if I can track down what is going on here.

> >  You need to use bus (physical) rather than virtual addresses with 
> > /dev/mem, so:
> > 
> > # xxd -s 0 -l 256 /dev/mem
> > 
> > or suchlike.
> 
> Ah, the value of the physical address was a misunderstanding on my part. The
> convoluted combination of mipsel-linux-objcopy and mipsel-linux-objdump gets
> the disassembly done without GDB, as shown below. :D
> 
> It looks very similar to yours, with additional NOPs and SYNCs required for
> the R5900:
> 
> 	# ssh ps2 head -c 128 /dev/mem >kcore &&
> 	    mipsel-linux-objcopy -I binary -O elf32-little kcore kcore.elf &&
> 	    mipsel-linux-objdump -D -m mips:5900 kcore.elf
> 	kcore.elf:     file format elf32-little
> 	Disassembly of section .data:
> 	00000000 <_binary_kcore_start>:
> 		...
> 	   8:	3c1b8061 	lui	k1,0x8061
> 	   c:	0000040f 	sync.p
> 	  10:	401a4000 	mfc0	k0,c0_badvaddr
> 	  14:	8f7b2c60 	lw	k1,11360(k1)
> 	  18:	001ad582 	srl	k0,k0,0x16
> 	  1c:	001ad080 	sll	k0,k0,0x2
> 	  20:	037ad821 	addu	k1,k1,k0
> 	  24:	0000040f 	sync.p
> 	  28:	401a2000 	mfc0	k0,c0_context
> 	  2c:	8f7b0000 	lw	k1,0(k1)
> 	  30:	001ad042 	srl	k0,k0,0x1
> 	  34:	335a0ff8 	andi	k0,k0,0xff8
> 	  38:	037ad821 	addu	k1,k1,k0
> 	  3c:	8f7a0000 	lw	k0,0(k1)
> 	  40:	8f7b0004 	lw	k1,4(k1)
> 	  44:	001ad142 	srl	k0,k0,0x5
> 	  48:	409a1000 	mtc0	k0,c0_entrylo0
> 	  4c:	0000040f 	sync.p
> 	  50:	001bd942 	srl	k1,k1,0x5
> 	  54:	409b1800 	mtc0	k1,c0_entrylo1
> 	  58:	0000040f 	sync.p
> 	  5c:	42000006 	tlbwr
> 	  60:	0000040f 	sync.p
> 	  64:	42000018 	eret
> 		...

 Good.  You probably want to add `--adjust-vma=0x80000000' to `objdump', 
so that addresses are right.  You can use `-b binary' with `objdump' too, 
to avoid the extra `objcopy' step.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers
  2018-02-17 14:43                                         ` [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers Fredrik Noring
@ 2018-02-17 15:18                                           ` Maciej W. Rozycki
  2018-02-17 17:47                                             ` Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-17 15:18 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> Would you be able to elaborate on the following change with a workaround for
> saving and restoring R5900 FPU registers? Is this problem documented in your
> copy of Sony's Linux Toolkit Restriction manual?
>     
>     Fixed saving and restoring of FPU registers. Odd FPU registers were
>     lost on exceptions and when simulating 64 bit FPU. Debian 5.0 mipsel
>     uses MOV.D to move FPU registers. This is not supported by R5900 and
>     failed in the simulation because of the bug above.

 I thought we agreed the R5900 FPU is unusable for regular Linux software 
and decided to go for full FPU emulation unconditionally.

 We could add a special R5900 mode, denoted with a dedicated 
Tag_GNU_MIPS_ABI_FP attribute and MIPS ABI Flags FP ABI setting, which 
would then enable hardware FPU for the selected task, but I suggest we 
defer any actual code proposals until we have all the design details 
settled.

> diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
> index 8d1e30b94c2d..a67ef7964bc1 100644
> --- a/arch/mips/include/asm/asmmacro.h
> +++ b/arch/mips/include/asm/asmmacro.h
> @@ -141,6 +141,52 @@
>  	.set	pop
>  	.endm
>  
> +#ifdef CONFIG_CPU_R5900
> +	/*
> +	 * Kernel expects that floating point registers are saved as 64-bit
> +	 * with the sdc1 instruction, but this is not working with R5900.
> +	 * The 64-bit write is simulated as two 32-bit writes.
> +	 */
> +	.macro fpu_save_double thread status tmp1=t0
> +	.set push
> +	SET_HARDFLOAT
> +	cfc1	\tmp1,  fcr31
> +	swc1	$f0,  THREAD_FPR0(\thread)
> +	swc1	$f1,  (THREAD_FPR0 + 4)(\thread)
> +	swc1	$f2,  THREAD_FPR2(\thread)
> +	swc1	$f3,  (THREAD_FPR2 + 4)(\thread)

 Etc. -- can you reuse MIPS I code here, i.e. use S.D?  GAS should be 
doing the right thing with `-march=r5900' (if not, then it has a bug).

> @@ -200,6 +247,52 @@
>  	.set	pop
>  	.endm
>  
> +#ifdef CONFIG_CPU_R5900
> +	/*
> +	 * Kernel expects that floating point registers are read as 64-bit
> +	 * with the ldc1 instruction, but this is not working with R5900.
> +	 * The 64-bit read is simulated as two 32-bit reads.
> +	 */
> +	.macro	fpu_restore_double thread status tmp=t0
> +	.set push
> +	SET_HARDFLOAT
> +	lw	\tmp, THREAD_FCR31(\thread)
> +	lwc1	$f0,  THREAD_FPR0(\thread)
> +	lwc1	$f1,  (THREAD_FPR0 + 4)(\thread)
> +	lwc1	$f2,  THREAD_FPR2(\thread)
> +	lwc1	$f3,  (THREAD_FPR2 + 4)(\thread)

 Likewise L.D.

> diff --git a/arch/mips/kernel/r5900_fpu.S b/arch/mips/kernel/r5900_fpu.S
> new file mode 100644
> index 000000000000..d4fdc823444d
> --- /dev/null
> +++ b/arch/mips/kernel/r5900_fpu.S
> @@ -0,0 +1,389 @@
> +/*
> + * FPU handling on MIPS r5900. Copied from r4k_fpu.c.
> + *
> + * Copyright (C) 2010-2013 Jürgen Urban
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asmmacro.h>
> +#include <asm/errno.h>
> +#include <asm/fpregdef.h>
> +#include <asm/mipsregs.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/regdef.h>
> +
> +	.macro	EX insn, reg, src
> +	.set	push
> +	SET_HARDFLOAT
> +	.set	nomacro
> +	/* In an error exception handler the user space could be uncached. */
> +	sync.l
> +.ex\@:	\insn	\reg, \src
> +	.set	pop
> +	.section __ex_table,"a"
> +	PTR	.ex\@, fault
> +	.previous
> +	.endm
> +
> +	.set	noreorder
> +	.set	arch=r5900
> +
> +/*
> + * Save a thread's fp context.
> + */
> +LEAF(_save_fp)
> +	fpu_save_double a0 t0 t1		# clobbers t1
> +	jr	ra
> +	END(_save_fp)
> +
> +/*
> + * Restore a thread's fp context.
> + */
> +LEAF(_restore_fp)
> +	fpu_restore_double a0 t0 t1		# clobbers t1
> +	jr	ra
> +	END(_restore_fp)
> +
> +LEAF(_save_fp_context)
> +	.set	push
> +	SET_HARDFLOAT
> +	cfc1	t1, fcr31
> +	.set	pop
> +
> +	/* Store the 32 32-bit registers */
> +	EX	swc1 $f0, SC_FPREGS+0(a0)
> +	EX	swc1 $f1, SC_FPREGS+4(a0)
> +	EX	swc1 $f2, SC_FPREGS+16(a0)
> +	EX	swc1 $f3, SC_FPREGS+20(a0)

 Likewise.

> +/*
> + * Restore FPU state:
> + *  - fp gp registers
> + *  - cp1 status/control register
> + */
> +LEAF(_restore_fp_context)
> +	EX	lw t0, SC_FPC_CSR(a0)
> +	EX	lwc1 $f0, SC_FPREGS+0(a0)
> +	EX	lwc1 $f1, SC_FPREGS+4(a0)
> +	EX	lwc1 $f2, SC_FPREGS+16(a0)
> +	EX	lwc1 $f3, SC_FPREGS+20(a0)

 Likewise.

> +#if defined(CONFIG_CPU_MIPS32_R2) || defined(CONFIG_CPU_MIPS32_R6)
> +	.set    push
> +	.set    MIPS_ISA_LEVEL_RAW
> +	.set	fp=64
> +	sll     t0, t0, 5			# is Status.FR set?
> +	bgez    t0, 1f				# no: skip setting upper 32b
> +
> +	mthc1   t1, $f0
> +	mthc1   t1, $f1
> +	mthc1   t1, $f2
> +	mthc1   t1, $f3

 You surely do not want all this MIPS32r2 stuff, or do you?

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers
  2018-02-17 15:18                                           ` Maciej W. Rozycki
@ 2018-02-17 17:47                                             ` Fredrik Noring
  2018-02-17 19:33                                               ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-17 17:47 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  I thought we agreed the R5900 FPU is unusable for regular Linux software 
> and decided to go for full FPU emulation unconditionally.

Yes, that's true, we are in agreement. I was unaware that the FPU emulation
was complete enough to cover all registers (not only a set of instructions).
Sorry about that. I will simply remove this patch then.

>  We could add a special R5900 mode, denoted with a dedicated 
> Tag_GNU_MIPS_ABI_FP attribute and MIPS ABI Flags FP ABI setting, which 
> would then enable hardware FPU for the selected task, but I suggest we 
> defer any actual code proposals until we have all the design details 
> settled.

Agreed. :)

>  Etc. -- can you reuse MIPS I code here, i.e. use S.D?  GAS should be 
> doing the right thing with `-march=r5900' (if not, then it has a bug).

Possibly, I am somewhat unfamiliar with this area. So let's revisit this FPU
issue after the initial submission.

>  You surely do not want all this MIPS32r2 stuff, or do you?

No, not that I know of.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers
  2018-02-17 17:47                                             ` Fredrik Noring
@ 2018-02-17 19:33                                               ` Maciej W. Rozycki
  0 siblings, 0 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-17 19:33 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> >  I thought we agreed the R5900 FPU is unusable for regular Linux software 
> > and decided to go for full FPU emulation unconditionally.
> 
> Yes, that's true, we are in agreement. I was unaware that the FPU emulation
> was complete enough to cover all registers (not only a set of instructions).
> Sorry about that. I will simply remove this patch then.

 The MIPS/Linux user ABI specifies a full architectural FPU, so not only 
we have to handle cases missing from hardware that cause an Unimplemented 
Operation exception, such as commonly operations on denormals, but (having 
not chosen, many years ago, to have the emulator in the userland) we have 
to emulate the whole FPU as well, for processors that do not have the unit 
at all.  We are accurate enough even to throw SIGILL for otherwise handled
FP instructions that are however supposed to be missing at the ISA level 
implemented by the CPU we are currently running on.

 We emulate a double unit, so operations on both double and single 
floating-point data types as well as corresponding fixed-point data types 
are supported.  We do not emulate extra stuff though, such as operations 
on the paired single data type or MIPS-3D instructions.  You have to 
access real FPU hardware to use them (and then you're in trouble if they 
cause an Unimplemented Operation exception).  It would be nice if someone 
contributed the missing bits.

> >  Etc. -- can you reuse MIPS I code here, i.e. use S.D?  GAS should be 
> > doing the right thing with `-march=r5900' (if not, then it has a bug).
> 
> Possibly, I am somewhat unfamiliar with this area. So let's revisit this FPU
> issue after the initial submission.

 Have a look at arch/mips/kernel/r2300_fpu.S; L.D and S.D are GAS macros 
which, depending on the architecture selected, expand to either LDC1 and 
SDC1 instructions or LWC1 and SWC1 instruction pairs, correctly ordered 
according to the endianness selected.  You can probably use that file as 
it stands for the R5900 FPU, when you get to it.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-17 15:03                                                       ` Maciej W. Rozycki
@ 2018-02-17 20:04                                                         ` Fredrik Noring
  2018-02-20 14:09                                                           ` Maciej W. Rozycki
  0 siblings, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-17 20:04 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  Hmm, whether it works or not seems to depend on GDB version.  It looks to 
> me like we have a regression here.  Working GDB has:
> 
> (gdb) info files
> Local core dump file:
>         `/proc/kcore', file type elf32-tradlittlemips.
>         0xffffffffc0000000 - 0xfffffffffff94000 is load1
>         0xffffffff80000000 - 0xffffffff90000000 is load2
> (gdb)
> 
> Broken GDB has:
> 
> (gdb) info files
> Local core dump file:
>         `/proc/kcore', file type elf32-tradlittlemips-freebsd.
>         0xffffffffc0000000 - 0xfffffffffff94000 is load1
>         0xffffffff80000000 - 0xffffffff90000000 is load2
> (gdb)
> 
> Notice the different BFD target, `elf32-tradlittlemips-freebsd'.  You're 
> supposed to be able to override it with `set gnutarget', but that doesn't 
> seem to impress GDB, e.g.:
> 
> (gdb) show gnutarget
> The current BFD target is "auto".
> (gdb) set gnutarget elf32-tradlittlemips
> (gdb) show gnutarget
> The current BFD target is "elf32-tradlittlemips".
> (gdb) info files
> Local core dump file:
>         `/home/mjr/src/kcore', file type elf32-tradlittlemips-freebsd.
>         0xffffffffc0000000 - 0xfffffffffff94000 is load1
>         0xffffffff80000000 - 0xffffffff90000000 is load2
> (gdb)
> 
> I'll see if I can track down what is going on here.

Thank you for taking a closer look at GDB! However, I don't observe the
"freebsd" BFD target with a cross-GDB version 8.1 (via v9fs in this case):

	# mipsel-linux-gdb --version | head -n1
	GNU gdb (GDB) 8.1
	# mipsel-linux-gdb -q -c /mnt/kcore
	[New process 1]
	Core was generated by `ramdisk_size=16384 crtmode=pal1 video=ps2fb:pal,640x480-32 rd_start=0x8062c000'.
	#0  0x00000000 in ?? ()
	(gdb) show gnutarget
	The current BFD target is "auto".
	(gdb) info files
	Local core dump file:
		`/mnt/kcore', file type elf32-tradlittlemips.
		0xffffffffc0000000 - 0xfffffffffffcd000 is load1
		0xffffffff80000000 - 0xffffffff80001000 is load2
		0xffffffff80010000 - 0xffffffff82000000 is load3
	(gdb) x /32i 0xffffffff80000000
	   0x80000000:	Cannot access memory at address 0x80000000
	(gdb) x /32i 0x80000000
	   0x80000000:	Cannot access memory at address 0x80000000
	(gdb)

>  Good.  You probably want to add `--adjust-vma=0x80000000' to `objdump', 
> so that addresses are right.  You can use `-b binary' with `objdump' too, 
> to avoid the extra `objcopy' step.

Perfect, thanks!

By the way, what about presenting misaligned SQ instructions like

	# mipsel-linux-gdb -q busybox
	Reading symbols from busybox...(no debugging symbols found)...done.
	(gdb) set architecture mips:5900
	The target architecture is assumed to be mips:5900
	(gdb) x /i 0x4036b0
	   0x4036b0:	sq	v1,-6085(zero)

as RDHWR, which is the interpretation with Linux?

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-15 20:49                                               ` Maciej W. Rozycki
  2018-02-17 11:16                                                 ` Fredrik Noring
@ 2018-02-18  8:47                                                 ` Fredrik Noring
  2018-02-20 14:41                                                   ` Maciej W. Rozycki
  1 sibling, 1 reply; 117+ messages in thread
From: Fredrik Noring @ 2018-02-18  8:47 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  I didn't comment on the erratum workaround addressing speculative 
> execution beyond ERET, because I haven't made final conclusions as to code 
> will have to exactly look like.

OK.

>  However please note that in reality 5 NOPs are not required in these 
> generated handlers (except perhaps from the interrupt handler, which will 
> have to be double-checked, due to being set up differently), because the 
> lone reason for them to be inserted is to prevent from data interpreted as 
> ill-formed code being speculatively executed.  But any handler that 
> follows does not contain ill-formed code and the `tlb_handler' buffer is 
> cleared before any generated machine code is built within, so any trailing 
> padding uses the encoding of NOP.  Which means you can exclude these 5 
> NOPs from calculation.

Sure, makes sense.

> Substitute `mips:5900' for `mips:isa32r2' to get R5900 disassembly.  If 
> you want to see raw machine code too, use `disassemble -r', but watch out 
> for the syntax, which is different.  As you can see the trailing NOPs 
> required are already there. :)

Due to trailing zeroes, I suppose. :)

> You can supply `vmlinux' as the executable to debug too for symbolic
> access.
> 
>  You can also ask the kernel to dump generated handlers to the kernel log 
> (and the console, if `debug' has been specified as a kernel parameter) at 
> bootstrap by building tlbex.c and/or page.c with -DDEBUG, e.g.:
> 
> $ make CFLAGS_tlbex.o=-DDEBUG vmlinux
> 
> It can help if a bug in a generated handler prevents the kernel from 
> starting userland.

Thank you for these tips. Eventually I'd like to make use of kernel tracing
features, BPF (MIPS JIT seems to require a 64 bit kernel though), dynamic
debug, etc.

>  A handler for SIO is needed if SIOInt can be asserted without kernel 
> control by PS/2 hardware.  Otherwise handlers will only be needed once the 
> kernel has means to enable the respective exceptions.

Serial I/O requires soldering for the PS2. Jürgen Urban, Rick Gaiser, and
others have it and they can more easily debug the early boot stages. The
proposed PS2 serial driver uses a 20 ms timer and polling instead of SIOInt:

https://github.com/frno7/linux/blob/ps2-v4.15-n7/drivers/tty/serial/ps2-uart.c

I don't have a serial port. My setup consists of ssh over a wireless RT3070*
USB device. Obviously a great number of things could potentially fail in
that chain but it is surprisingly reliable. :)

* A few hardcoded DMA buffer sizes in the RT3070 driver have to be made
  smaller since PS2 IOP DMA memory is limited to 256 KiB. It would be nice
  if USB drivers could adjust themselves to the amount of available memory,
  or make it configurable.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Workaround where MSB must be 0 for the instruction cache
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (6 preceding siblings ...)
  2018-02-17 14:43                                         ` [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers Fredrik Noring
@ 2018-02-18  9:26                                         ` Fredrik Noring
  2018-02-18 11:08                                         ` [RFC] MIPS: R5900: Add mandatory SYNC.P to all M[FT]C0 instructions Fredrik Noring
  2018-03-03 12:26                                         ` [RFC] MIPS: PS2: Interrupt request (IRQ) support Fredrik Noring
  9 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-18  9:26 UTC (permalink / raw)
  To: Jürgen Urban; +Cc: Maciej W. Rozycki, linux-mips

Hi Jürgen,

Would you know if this R5900 bug is documented in Sony's Linux Toolkit
Restriction manual or in the TX79 manual?

Fredrik

diff --git a/arch/mips/include/asm/r4kcache.h b/arch/mips/include/asm/r4kcache.h
index 2c905dbe6464..4a4f552f6885 100644
--- a/arch/mips/include/asm/r4kcache.h
+++ b/arch/mips/include/asm/r4kcache.h
@@ -28,7 +28,12 @@
  *  - We need a properly sign extended address for 64-bit code.  To get away
  *    without ifdefs we let the compiler do it by a type cast.
  */
+#ifdef CONFIG_CPU_R5900
+/* CPU has a bug MSB must be 0 for instruction cache. */
+#define INDEX_BASE	0
+#else
 #define INDEX_BASE	CKSEG0
+#endif
 
 #define cache_op_s(op,addr)						\
 	__asm__ __volatile__(						\

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: Aw: [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers
  2018-02-12  9:22                                               ` Maciej W. Rozycki
  (?)
@ 2018-02-18 10:30                                               ` Fredrik Noring
  -1 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-18 10:30 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  This change makes no sense to me anyway I am afraid.
> 
>  At the error level (Status.ERL=1) the user segment becomes unmapped and 
> therefore all KUSEG addresses become physical addresses.  Which means that 
> if any of this code you have patched is called to access user pages, then 
> you have a bigger problem than just the cache going out of sync.
> 
>  The only reason to access KUSEG at the error level is to save/restore 
> register state to/from a dedicated RAM area offset from $zero so that 
> execution is restartable.  Unlike at the exception level you cannot use 
> $k0 and $k1 as temporaries, because an error exception can happen any time 
> including in particular while $k0 and $k1 are in active use at the 
> exception level, so clobbering them would make the system non-restartable 
> (of course receiving an error exception may mean that anyway).
> 
>  Code to write/read that dedicated area should be purpose-crafted and the 
> area won't be accessed at any other time, so the issue of being cache 
> coherent or not does not apply as the area will never be accessed with 
> caching operations.

Thanks for your detailed description!

>  I can see the R5900 has additional classes of error exceptions defined, 
> such as debug and performance counter exceptions, which are not related to 
> hardware faults and can happen in regular execution in response to certain 
> conditions requested.  If you want to handle these implementation specific 
> extensions and consequently serve these exceptions, then please take care 
> of all the requirements as code to support them is added.

Yes, hardware breakpoints and hardware performance counters are very
interesting features to develop after the initial submission.

>  Though as I wrote above it does not look to me like anything specific 
> will be needed -- the handler at entry will save the state necessary for 
> restartability to a dedicated RAM area first and then to the kernel stack, 
> switch the error level off, do the necessary processing, and then reverse 
> the steps before resuming execution interrupted.

Excellent. I will just drop this patch then.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [RFC] MIPS: R5900: Add mandatory SYNC.P to all M[FT]C0 instructions
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (7 preceding siblings ...)
  2018-02-18  9:26                                         ` [RFC] MIPS: R5900: Workaround where MSB must be 0 for the instruction cache Fredrik Noring
@ 2018-02-18 11:08                                         ` Fredrik Noring
  2018-03-03 12:26                                         ` [RFC] MIPS: PS2: Interrupt request (IRQ) support Fredrik Noring
  9 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-18 11:08 UTC (permalink / raw)
  To: Jürgen Urban; +Cc: Maciej W. Rozycki, linux-mips

Hi Jürgen,

The Toshiba TX79 manual mandates that all MTC0 instructions must be followed
by a SYNC.P instruction as a barrier to guarantee COP0 register updates.
There is one exception to this rule:
    
An MTC0 instruction which loads the EntryHi COP0 register can be followed by
a TLBWI or a TLBWR instruction without having an intervening SYNC.P. This
special case is handled by a hardware interlock.

This is documented on page C-28. However, MFC0 does not have a similar note.
How come it has preceding SYNC.P instructions in this change?
    
Fredrik

diff --git a/arch/mips/include/asm/asmmacro.h b/arch/mips/include/asm/asmmacro.h
index feb069cbf44e..8d1e30b94c2d 100644
--- a/arch/mips/include/asm/asmmacro.h
+++ b/arch/mips/include/asm/asmmacro.h
@@ -56,9 +56,15 @@
 	.endm
 #else
 	.macro	local_irq_enable reg=t0
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	\reg, CP0_STATUS
 	ori	\reg, \reg, 1
 	mtc0	\reg, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	irq_enable_hazard
 	.endm
 
@@ -68,10 +74,16 @@
 	addi    \reg, \reg, 1
 	sw      \reg, TI_PRE_COUNT($28)
 #endif
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	\reg, CP0_STATUS
 	ori	\reg, \reg, 1
 	xori	\reg, \reg, 1
 	mtc0	\reg, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	irq_disable_hazard
 #ifdef CONFIG_PREEMPT
 	lw      \reg, TI_PRE_COUNT($28)
diff --git a/arch/mips/include/asm/irqflags.h b/arch/mips/include/asm/irqflags.h
index 9d3610be2323..a2b653d42151 100644
--- a/arch/mips/include/asm/irqflags.h
+++ b/arch/mips/include/asm/irqflags.h
@@ -78,9 +78,15 @@ static inline void arch_local_irq_restore(unsigned long flags)
 	/*
 	 * Fast, dangerous.  Life is fun, life is good.
 	 */
+#ifdef CONFIG_CPU_R5900
+	"	sync.p							\n"
+#endif
 	"	mfc0	$1, $12						\n"
 	"	ins	$1, %[flags], 0, 1				\n"
 	"	mtc0	$1, $12						\n"
+#ifdef CONFIG_CPU_R5900
+	"	sync.p							\n"
+#endif
 #endif
 	"	" __stringify(__irq_disable_hazard) "			\n"
 	"	.set	pop						\n"
@@ -105,10 +111,16 @@ static inline void arch_local_irq_enable(void)
 #if   defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPSR6)
 	"	ei							\n"
 #else
+#ifdef CONFIG_CPU_R5900
+	"	sync.p							\n"
+#endif
 	"	mfc0	$1,$12						\n"
 	"	ori	$1,0x1f						\n"
 	"	xori	$1,0x1e						\n"
 	"	mtc0	$1,$12						\n"
+#ifdef CONFIG_CPU_R5900
+	"	sync.p							\n"
+#endif
 #endif
 	"	" __stringify(__irq_enable_hazard) "			\n"
 	"	.set	pop						\n"
@@ -124,6 +136,9 @@ static inline unsigned long arch_local_save_flags(void)
 	asm __volatile__(
 	"	.set	push						\n"
 	"	.set	reorder						\n"
+#ifdef CONFIG_CPU_R5900
+	"	sync.p							\n"
+#endif
 	"	mfc0	%[flags], $12					\n"
 	"	.set	pop						\n"
 	: [flags] "=r" (flags));
diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index 6b1f1ad0542c..2e6dad7c02b6 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -1245,6 +1245,29 @@ do {								\
  * Macros to access the system control coprocessor
  */
 
+#ifdef CONFIG_CPU_R5900
+#define __read_32bit_c0_register(source, sel)				\
+({ int __res;								\
+	if (sel == 0)							\
+		__asm__ __volatile__(					\
+			".set push\n\t"					\
+			".set noreorder\n\t"				\
+			"sync.p\n\t"					\
+			"mfc0\t%0, " #source "\n\t"			\
+			".set pop\n\t"					\
+			: "=r" (__res));				\
+	else								\
+		__asm__ __volatile__(					\
+			".set push\n\t"					\
+			".set noreorder\n\t"				\
+			".set\tmips32\n\t"				\
+			"sync.p\n\t"					\
+			"mfc0\t%0, " #source ", " #sel "\n\t"		\
+			".set pop\n\t"					\
+			: "=r" (__res));				\
+	__res;								\
+})
+#else
 #define __read_32bit_c0_register(source, sel)				\
 ({ unsigned int __res;							\
 	if (sel == 0)							\
@@ -1259,6 +1282,7 @@ do {								\
 			: "=r" (__res));				\
 	__res;								\
 })
+#endif
 
 #define __read_64bit_c0_register(source, sel)				\
 ({ unsigned long long __res;						\
@@ -1279,6 +1303,28 @@ do {								\
 	__res;								\
 })
 
+#ifdef CONFIG_CPU_R5900
+#define __write_32bit_c0_register(register, sel, value)			\
+do {									\
+	if (sel == 0)							\
+		__asm__ __volatile__(					\
+			".set push\n\t"					\
+			".set noreorder\n\t"				\
+			"mtc0\t%z0, " #register "\n\t"			\
+			"sync.p\n\t"					\
+			".set pop\n\t"					\
+			: : "Jr" ((unsigned int)(value)));		\
+	else								\
+		__asm__ __volatile__(					\
+			".set push\n\t"					\
+			".set noreorder\n\t"				\
+			".set\tmips32\n\t"				\
+			"mtc0\t%z0, " #register ", " #sel "\n\t"	\
+			"sync.p\n\t"					\
+			".set pop\n\t"					\
+			: : "Jr" ((unsigned int)(value)));		\
+} while (0)
+#else
 #define __write_32bit_c0_register(register, sel, value)			\
 do {									\
 	if (sel == 0)							\
@@ -1292,6 +1338,7 @@ do {									\
 			".set\tmips0"					\
 			: : "Jr" ((unsigned int)(value)));		\
 } while (0)
+#endif
 
 #define __write_64bit_c0_register(register, sel, value)			\
 do {									\
@@ -2525,6 +2572,14 @@ static inline void tlb_probe(void)
 	__asm__ __volatile__(
 		".set noreorder\n\t"
 		"tlbp\n\t"
+#ifdef CONFIG_CPU_R5900
+		/* No memory access behind the tlbp instruction. */
+		"sync.p\n\t"
+		"nop\n\t"
+		"nop\n\t"
+		"nop\n\t"
+		"nop\n\t"
+#endif
 		".set reorder");
 }
 
@@ -2550,6 +2605,14 @@ static inline void tlb_read(void)
 	__asm__ __volatile__(
 		".set noreorder\n\t"
 		"tlbr\n\t"
+#ifdef CONFIG_CPU_R5900
+		"sync.p\n\t"
+		/* No branch behind tlbr. */
+		"nop\n\t"
+		"nop\n\t"
+		"nop\n\t"
+		"nop\n\t"
+#endif
 		".set reorder");
 
 #if MIPS34K_MISSED_ITLB_WAR
@@ -2570,6 +2633,9 @@ static inline void tlb_write_indexed(void)
 	__asm__ __volatile__(
 		".set noreorder\n\t"
 		"tlbwi\n\t"
+#ifdef CONFIG_CPU_R5900
+		"sync.p\n\t"
+#endif
 		".set reorder");
 }
 
@@ -2578,6 +2644,9 @@ static inline void tlb_write_random(void)
 	__asm__ __volatile__(
 		".set noreorder\n\t"
 		"tlbwr\n\t"
+#ifdef CONFIG_CPU_R5900
+		"sync.p\n\t"
+#endif
 		".set reorder");
 }
 
diff --git a/arch/mips/include/asm/stackframe.h b/arch/mips/include/asm/stackframe.h
index cc4fa6f01e0e..14657fa6993b 100644
--- a/arch/mips/include/asm/stackframe.h
+++ b/arch/mips/include/asm/stackframe.h
@@ -116,6 +116,9 @@
 
 		/* SMP variation */
 		.macro	get_saved_sp docfi=0 tosp=0
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		ASM_CPUID_MFC0	k0, ASM_SMP_CPUID_REG
 #if defined(CONFIG_32BIT) || defined(KBUILD_64BIT_SYM32)
 		lui	k1, %hi(kernelsp)
@@ -140,6 +143,9 @@
 		.endm
 
 		.macro	set_saved_sp stackp temp temp2
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		ASM_CPUID_MFC0	\temp, ASM_SMP_CPUID_REG
 		LONG_SRL	\temp, SMP_CPUID_PTRSHIFT
 		LONG_S	\stackp, kernelsp(\temp)
@@ -165,6 +171,9 @@
 1:		move	ra, k0
 		li	k0, 3
 		mtc0	k0, $22
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 #endif /* CONFIG_CPU_JUMP_WORKAROUNDS */
 #if defined(CONFIG_32BIT) || defined(KBUILD_64BIT_SYM32)
 		lui	k1, %hi(kernelsp)
@@ -195,6 +204,9 @@
 		.set	push
 		.set	noat
 		.set	reorder
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	k0, CP0_STATUS
 		sll	k0, 3		/* extract cu0 bit */
 		.set	noreorder
@@ -251,15 +263,24 @@
 		 * need it to operate correctly
 		 */
 		LONG_S	$0, PT_R0(sp)
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	v1, CP0_STATUS
 		cfi_st	v0, PT_R2, \docfi
 		LONG_S	v1, PT_STATUS(sp)
 		cfi_st	$4, PT_R4, \docfi
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	v1, CP0_CAUSE
 		cfi_st	$5, PT_R5, \docfi
 		LONG_S	v1, PT_CAUSE(sp)
 		cfi_st	$6, PT_R6, \docfi
 		cfi_st	ra, PT_R31, \docfi
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		MFC0	ra, CP0_EPC
 		cfi_st	$7, PT_R7, \docfi
 #ifdef CONFIG_64BIT
@@ -273,6 +294,9 @@
 		cfi_st	$25, PT_R25, \docfi
 		cfi_st	$28, PT_R28, \docfi
 
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		/* Set thread_info if we're coming from user mode */
 		mfc0	k0, CP0_STATUS
 		sll	k0, 3		/* extract cu0 bit */
@@ -447,10 +471,16 @@
 		.set	reorder
 		.set	noat
 		RESET_MMR
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	a0, CP0_STATUS
 		ori	a0, STATMASK
 		xori	a0, STATMASK
 		mtc0	a0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		li	v1, ST0_CU1 | ST0_FR | ST0_IM
 		and	a0, v1
 		LONG_L	v0, PT_STATUS(sp)
@@ -458,8 +488,14 @@
 		and	v0, v1
 		or	v0, a0
 		mtc0	v0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		LONG_L	v1, PT_EPC(sp)
 		MTC0	v1, CP0_EPC
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		cfi_ld	$31, PT_R31, \docfi
 		cfi_ld	$28, PT_R28, \docfi
 		cfi_ld	$25, PT_R25, \docfi
@@ -502,11 +538,17 @@
  * Set cp0 enable bit as sign that we're running on the kernel stack
  */
 		.macro	CLI
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	t0, CP0_STATUS
 		li	t1, ST0_CU0 | STATMASK
 		or	t0, t1
 		xori	t0, STATMASK
 		mtc0	t0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		irq_disable_hazard
 		.endm
 
@@ -515,11 +557,17 @@
  * Set cp0 enable bit as sign that we're running on the kernel stack
  */
 		.macro	STI
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	t0, CP0_STATUS
 		li	t1, ST0_CU0 | STATMASK
 		or	t0, t1
 		xori	t0, STATMASK & ~1
 		mtc0	t0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		irq_enable_hazard
 		.endm
 
@@ -529,6 +577,9 @@
  * Set cp0 enable bit as sign that we're running on the kernel stack
  */
 		.macro	KMODE
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		mfc0	t0, CP0_STATUS
 		li	t1, ST0_CU0 | (STATMASK & ~1)
 #if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
@@ -539,6 +590,9 @@
 		or	t0, t1
 		xori	t0, STATMASK & ~1
 		mtc0	t0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+		sync.p
+#endif
 		irq_disable_hazard
 		.endm
 
diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index 37b9383eacd3..c7b64f4a8ad3 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -33,8 +33,14 @@ NESTED(except_vec3_generic, 0, sp)
 	.set	push
 	.set	noat
 #if R5432_CP0_INTERRUPT_WAR
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k0, CP0_INDEX
 #endif
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k1, CP0_CAUSE
 	andi	k1, k1, 0x7c
 #ifdef CONFIG_64BIT
@@ -55,6 +61,9 @@ NESTED(except_vec3_r4000, 0, sp)
 	.set	push
 	.set	arch=r4000
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k1, CP0_CAUSE
 	li	k0, 31<<2
 	andi	k1, k1, 0x7c
@@ -78,10 +87,16 @@ NESTED(except_vec3_r4000, 0, sp)
 	 * load / store will be re-executed.
 	 */
 handle_vced:
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k0, CP0_BADVADDR
 	li	k1, -4					# Is this ...
 	and	k0, k1					# ... really needed?
 	mtc0	zero, CP0_TAGLO
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	cache	Index_Store_Tag_D, (k0)
 	cache	Hit_Writeback_Inv_SD, (k0)
 #ifdef CONFIG_PROC_FS
@@ -93,6 +108,9 @@ handle_vced:
 	eret
 
 handle_vcei:
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k0, CP0_BADVADDR
 	cache	Hit_Writeback_Inv_SD, (k0)		# also cleans pi
 #ifdef CONFIG_PROC_FS
@@ -138,12 +156,18 @@ LEAF(__r4k_wait)
 	FEXPORT(rollback_\handler)
 	.set	push
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k0, CP0_EPC
 	PTR_LA	k1, __r4k_wait
 	ori	k0, 0x1f	/* 32 byte rollback region */
 	xori	k0, 0x1f
 	bne	k0, k1, \handler
 	MTC0	k0, CP0_EPC
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	.set pop
 	.endm
 
@@ -164,11 +188,17 @@ NESTED(handle_int, PT_SIZE, sp)
 	 */
 	.set	push
 	.set	noat
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k0, CP0_STATUS
 #if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
 	and	k0, ST0_IEP
 	bnez	k0, 1f
 
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k0, CP0_EPC
 	.set	noreorder
 	j	k0
@@ -349,6 +379,9 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 	.set	push
 	.set	noat
 	MTC0	k0, CP0_DESAVE
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k0, CP0_DEBUG
 
 	sll	k0, k0, 30	# Check for SDBBP.
@@ -364,6 +397,9 @@ NESTED(ejtag_debug_handler, PT_SIZE, sp)
 	LONG_L	k1, 0(k0)
 
 ejtag_return:
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k0, CP0_DESAVE
 	.set	mips32
 	deret
@@ -448,6 +484,9 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	.endm
 
 	.macro	__build_clear_ade
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	t0, CP0_BADVADDR
 	PTR_S	t0, PT_BVADDR(sp)
 	KMODE
@@ -531,16 +570,28 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	.set	noat
 	.set	noreorder
 	/* check if TLB contains a entry for EPC */
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k1, CP0_ENTRYHI
 	andi	k1, MIPS_ENTRYHI_ASID | MIPS_ENTRYHI_ASIDX
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k0, CP0_EPC
 	PTR_SRL	k0, _PAGE_SHIFT + 1
 	PTR_SLL	k0, _PAGE_SHIFT + 1
 	or	k1, k0
 	MTC0	k1, CP0_ENTRYHI
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mtc0_tlbw_hazard
 	tlbp
 	tlb_probe_hazard
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k1, CP0_INDEX
 	.set	pop
 	bltz	k1, handle_ri	/* slow path */
@@ -553,6 +604,9 @@ NESTED(nmi_handler, PT_SIZE, sp)
 	.set	noreorder
 	/* MIPS32:    0x7c03e83b: rdhwr v1,$29 */
 	/* microMIPS: 0x007d6b3c: rdhwr v1,$29 */
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k1, CP0_EPC
 #if defined(CONFIG_CPU_MICROMIPS) || defined(CONFIG_CPU_MIPS32_R2) || defined(CONFIG_CPU_MIPS64_R2)
 	and	k0, k1, 1
@@ -583,6 +637,9 @@ isrdhwr:
 	/* The insn is rdhwr.  No need to check CAUSE.BD here. */
 	get_saved_sp	/* k1 := current_thread_info */
 	.set	noreorder
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	k0, CP0_EPC
 #if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
 	ori	k1, _THREAD_MASK
@@ -600,6 +657,9 @@ isrdhwr:
 	.set	noat
 #endif
 	MTC0	k0, CP0_EPC
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	/* I hope three instructions between MTC0 and ERET are enough... */
 	ori	k1, _THREAD_MASK
 	xori	k1, _THREAD_MASK
diff --git a/arch/mips/kernel/head.S b/arch/mips/kernel/head.S
index d1bb506adc10..e4df507d63a6 100644
--- a/arch/mips/kernel/head.S
+++ b/arch/mips/kernel/head.S
@@ -34,10 +34,16 @@
 	 */
 	.macro	setup_c0_status set clr
 	.set	push
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	t0, CP0_STATUS
 	or	t0, ST0_CU0|\set|0x1f|\clr
 	xor	t0, 0x1f|\clr
 	mtc0	t0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	.set	noreorder
 	sll	zero,3				# ehb
 	.set	pop
@@ -130,6 +136,9 @@ dtb_found:
 #endif
 
 	MTC0		zero, CP0_CONTEXT	# clear context register
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	PTR_LA		$28, init_thread_union
 	/* Set the SP after an empty pt_regs.  */
 	PTR_LI		sp, _THREAD_SIZE - 32 - PT_SIZE
diff --git a/arch/mips/kernel/r4k_fpu.S b/arch/mips/kernel/r4k_fpu.S
index 8e3a6020c613..3959ae0af7a5 100644
--- a/arch/mips/kernel/r4k_fpu.S
+++ b/arch/mips/kernel/r4k_fpu.S
@@ -98,10 +98,16 @@ LEAF(_init_msa_upper)
 	SET_HARDFLOAT
 
 LEAF(_init_fpu)
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	t0, CP0_STATUS
 	li	t1, ST0_CU1
 	or	t0, t1
 	mtc0	t0, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	enable_fpu_hazard
 
 	ctc1	a0, fcr31
diff --git a/arch/mips/kernel/r4k_switch.S b/arch/mips/kernel/r4k_switch.S
index 17cf9341c1cf..6a5838977986 100644
--- a/arch/mips/kernel/r4k_switch.S
+++ b/arch/mips/kernel/r4k_switch.S
@@ -26,6 +26,9 @@
  */
 	.align	5
 	LEAF(resume)
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	t1, CP0_STATUS
 	LONG_S	t1, THREAD_STATUS(a0)
 	cpu_save_nonscratch a0
@@ -46,6 +49,9 @@
 
 	PTR_ADDU	t0, $28, _THREAD_SIZE - 32
 	set_saved_sp	t0, t1, t2
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	t1, CP0_STATUS		/* Do we really need this? */
 	li	a3, 0xff01
 	and	t1, a3
@@ -54,6 +60,9 @@
 	and	a2, a3
 	or	a2, t1
 	mtc0	a2, CP0_STATUS
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	move	v0, a0
 	jr	ra
 	END(resume)
diff --git a/arch/mips/mm/cex-gen.S b/arch/mips/mm/cex-gen.S
index 45dff5cd4b8e..c5075651229c 100644
--- a/arch/mips/mm/cex-gen.S
+++ b/arch/mips/mm/cex-gen.S
@@ -27,11 +27,17 @@
 	 * in the cache, we may not be able to recover.	 As a
 	 * first-order desperate measure, turn off KSEG0 cacheing.
 	 */
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	mfc0	k0,CP0_CONFIG
 	li	k1,~CONF_CM_CMASK
 	and	k0,k0,k1
 	ori	k0,k0,CONF_CM_UNCACHED
 	mtc0	k0,CP0_CONFIG
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	/* Give it a few cycles to sink in... */
 	nop
 	nop
diff --git a/arch/mips/mm/tlbex-fault.S b/arch/mips/mm/tlbex-fault.S
index 77db401fc620..fe2b2c61cca7 100644
--- a/arch/mips/mm/tlbex-fault.S
+++ b/arch/mips/mm/tlbex-fault.S
@@ -14,6 +14,9 @@
 	NESTED(tlb_do_page_fault_\write, PT_SIZE, sp)
 	.cfi_signal_frame
 	SAVE_ALL docfi=1
+#ifdef CONFIG_CPU_R5900
+	sync.p
+#endif
 	MFC0	a2, CP0_BADVADDR
 	KMODE
 	move	a0, sp
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 79b9f2ad3ff5..a18b013fd887 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -691,6 +691,9 @@ static void build_restore_pagemask(u32 **p, struct uasm_reloc **r,
 			uasm_i_mtc0(p, 0, C0_PAGEMASK);
 		}
 	}
+#ifdef CONFIG_CPU_R5900
+	uasm_i_syncp(p);
+#endif
 }
 
 static void build_huge_tlb_write_entry(u32 **p, struct uasm_label **l,
@@ -703,6 +706,9 @@ static void build_huge_tlb_write_entry(u32 **p, struct uasm_label **l,
 	uasm_i_lui(p, tmp, PM_HUGE_MASK >> 16);
 	uasm_i_ori(p, tmp, tmp, PM_HUGE_MASK & 0xffff);
 	uasm_i_mtc0(p, tmp, C0_PAGEMASK);
+#ifdef CONFIG_CPU_R5900
+	uasm_i_syncp(p);
+#endif
 
 	build_tlb_write_entry(p, l, r, wmode);
 
@@ -959,21 +965,21 @@ void build_get_pgde32(u32 **p, unsigned int tmp, unsigned int ptr)
 {
 	if (pgd_reg != -1) {
 		/* pgd is in pgd_reg */
-		uasm_i_mfc0(p, ptr, c0_kscratch(), pgd_reg);
-		uasm_i_mfc0(p, tmp, C0_BADVADDR); /* get faulting address */
+		UASM_i_MFC0(p, ptr, c0_kscratch(), pgd_reg);
+		UASM_i_MFC0(p, tmp, C0_BADVADDR); /* get faulting address */
 	} else {
 		long pgdc = (long)pgd_current;
 
 		/* 32 bit SMP has smp_processor_id() stored in CONTEXT. */
 #ifdef CONFIG_SMP
-		uasm_i_mfc0(p, ptr, SMP_CPUID_REG);
+		UASM_i_MFC0(p, ptr, SMP_CPUID_REG);
 		UASM_i_LA_mostly(p, tmp, pgdc);
 		uasm_i_srl(p, ptr, ptr, SMP_CPUID_PTRSHIFT);
 		uasm_i_addu(p, ptr, tmp, ptr);
 #else
 		UASM_i_LA_mostly(p, ptr, pgdc);
 #endif
-		uasm_i_mfc0(p, tmp, C0_BADVADDR); /* get faulting address */
+		UASM_i_MFC0(p, tmp, C0_BADVADDR); /* get faulting address */
 		uasm_i_lw(p, ptr, uasm_rel_lo(pgdc), ptr);
 	}
 	uasm_i_srl(p, tmp, tmp, PGDIR_SHIFT); /* get pgd only bits */

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-17 20:04                                                         ` Fredrik Noring
@ 2018-02-20 14:09                                                           ` Maciej W. Rozycki
  2018-02-22 17:04                                                             ` Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-20 14:09 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> > Notice the different BFD target, `elf32-tradlittlemips-freebsd'.  You're 
> > supposed to be able to override it with `set gnutarget', but that doesn't 
> > seem to impress GDB, e.g.:
> > 
> > (gdb) show gnutarget
> > The current BFD target is "auto".
> > (gdb) set gnutarget elf32-tradlittlemips
> > (gdb) show gnutarget
> > The current BFD target is "elf32-tradlittlemips".
> > (gdb) info files
> > Local core dump file:
> >         `/home/mjr/src/kcore', file type elf32-tradlittlemips-freebsd.
> >         0xffffffffc0000000 - 0xfffffffffff94000 is load1
> >         0xffffffff80000000 - 0xffffffff90000000 is load2
> > (gdb)
> > 
> > I'll see if I can track down what is going on here.
> 
> Thank you for taking a closer look at GDB!

 You are welcome, however it's my duty as a MIPS/GDB port maintainer.

> However, I don't observe the
> "freebsd" BFD target with a cross-GDB version 8.1 (via v9fs in this case):

 The likely cause is my development GDB builds use `--enable-targets=all' 
and your GDB configuration probably does not include secondary targets.  
Still I find it wrong that an incorrect BFD target is chosen, and then 
that an override does not work either (and especially that accessing 
memory in a core file seems completely broken in recent GDB versions).

> By the way, what about presenting misaligned SQ instructions like
> 
> 	# mipsel-linux-gdb -q busybox
> 	Reading symbols from busybox...(no debugging symbols found)...done.
> 	(gdb) set architecture mips:5900
> 	The target architecture is assumed to be mips:5900
> 	(gdb) x /i 0x4036b0
> 	   0x4036b0:	sq	v1,-6085(zero)
> 
> as RDHWR, which is the interpretation with Linux?

 Hmm, I have mixed feelings about it as RDHWR is not an R5900 instruction.  
Likewise we don't disassemble it, and neither we do LL, SC, SYNC, etc. say 
with `mips:3000' even though Linux will emulate them.  I feel this kind of 
stuff does not belong to instruction aliases either (compare `objdump' 
disassembly of any serious program with and without `-M no-aliases').

 I do recognize value for users here though, so perhaps an extra `-M' 
option would be due here, such as `-M linux-emulation' or suchlike, to 
cover all emulated instructions.  I'll think about it.  Please file an 
enhancement request in sourceware.org Bugzilla if it's something you are 
keen having (feel free to cc me; I'm <macro@linux-mips.org> there).

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-18  8:47                                                 ` Fredrik Noring
@ 2018-02-20 14:41                                                   ` Maciej W. Rozycki
  2018-02-22 17:27                                                     ` Fredrik Noring
  0 siblings, 1 reply; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-02-20 14:41 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

On Sun, 18 Feb 2018, Fredrik Noring wrote:

> > Substitute `mips:5900' for `mips:isa32r2' to get R5900 disassembly.  If 
> > you want to see raw machine code too, use `disassemble -r', but watch out 
> > for the syntax, which is different.  As you can see the trailing NOPs 
> > required are already there. :)
> 
> Due to trailing zeroes, I suppose. :)

 It's no coincidence and we use it to our advantage that an all-zeros 
pattern is the canonical NOP instruction encoding.

> >  A handler for SIO is needed if SIOInt can be asserted without kernel 
> > control by PS/2 hardware.  Otherwise handlers will only be needed once the 
> > kernel has means to enable the respective exceptions.
> 
> Serial I/O requires soldering for the PS2. Jürgen Urban, Rick Gaiser, and
> others have it and they can more easily debug the early boot stages. The
> proposed PS2 serial driver uses a 20 ms timer and polling instead of SIOInt:
> 
> https://github.com/frno7/linux/blob/ps2-v4.15-n7/drivers/tty/serial/ps2-uart.c

 So it looks like a random SIOInt is not supposed to happen and therefore 
I think a handler is not needed for the initial submission of the port if 
at all.

> I don't have a serial port. My setup consists of ssh over a wireless RT3070*
> USB device. Obviously a great number of things could potentially fail in
> that chain but it is surprisingly reliable. :)

 This has prompted me to look at what PS2 hardware provides and it indeed 
seems lacking in basic I/O connectivity.  What could one expect from a 
game console anyway?

 Do you use the netconsole then too?  Using a USB serial port adapter 
would be an alternative, although not a very powerful one either, because 
you need to have many parts of the OS initialised before you can get to 
such a port.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-20 14:09                                                           ` Maciej W. Rozycki
@ 2018-02-22 17:04                                                             ` Fredrik Noring
  0 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-22 17:04 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  The likely cause is my development GDB builds use `--enable-targets=all' 
> and your GDB configuration probably does not include secondary targets.  
> Still I find it wrong that an incorrect BFD target is chosen, and then 
> that an override does not work either (and especially that accessing 
> memory in a core file seems completely broken in recent GDB versions).

Please let me know if anything more is needed to reproduce the problem.

>  Hmm, I have mixed feelings about it as RDHWR is not an R5900 instruction.  
> Likewise we don't disassemble it, and neither we do LL, SC, SYNC, etc. say 
> with `mips:3000' even though Linux will emulate them.  I feel this kind of 
> stuff does not belong to instruction aliases either (compare `objdump' 
> disassembly of any serious program with and without `-M no-aliases').
> 
>  I do recognize value for users here though, so perhaps an extra `-M' 
> option would be due here, such as `-M linux-emulation' or suchlike, to 
> cover all emulated instructions.  I'll think about it.  Please file an 
> enhancement request in sourceware.org Bugzilla if it's something you are 
> keen having (feel free to cc me; I'm <macro@linux-mips.org> there).

Sure, I think it would be a helpful feature:

https://sourceware.org/bugzilla/show_bug.cgi?id=22877

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: R5900: Workaround exception NOP execution bug (FLX05)
  2018-02-20 14:41                                                   ` Maciej W. Rozycki
@ 2018-02-22 17:27                                                     ` Fredrik Noring
  0 siblings, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-02-22 17:27 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  Do you use the netconsole then too?  Using a USB serial port adapter 
> would be an alternative, although not a very powerful one either, because 
> you need to have many parts of the OS initialised before you can get to 
> such a port.

No, I haven't used netconsole, but I do have a frame buffer. There have been
a few very early kernel crashes during the porting from v2.6 to v4.15. All
of them could be identified to a single commit by bisection, and then it was
easy to find a solution to proceed, fortunately.

Eventually I'd like to get kexec working. I suspect it could be somewhat
tricky to debug though.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* [RFC] MIPS: PS2: Interrupt request (IRQ) support
  2018-01-31 23:01                                       ` Maciej W. Rozycki
                                                           ` (8 preceding siblings ...)
  2018-02-18 11:08                                         ` [RFC] MIPS: R5900: Add mandatory SYNC.P to all M[FT]C0 instructions Fredrik Noring
@ 2018-03-03 12:26                                         ` Fredrik Noring
  2018-03-03 13:09                                           ` Maciej W. Rozycki
  2018-03-18 10:45                                           ` Fredrik Noring
  9 siblings, 2 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-03-03 12:26 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jürgen Urban; +Cc: linux-mips

Hi Maciej & Jürgen,

This patch for IRQ support does a bit more than strictly required for the
initial submission by supporting for example the Graphics Synthesizer.
Please let me know if I should split it into several parts.

A few comments and questions:

1. The patch contains four volatile variables to handle masks:

	static volatile unsigned long intc_mask = 0;
	static volatile unsigned long dmac_mask = 0;
	static volatile unsigned long gs_mask = 0;
	static volatile unsigned long sbus_mask = 0;

   It seems the functions irq_set_chip_data and irq_data_get_irq_chip_data
   are used for similar purposes in other implementations. Perhaps it makes
   sense to use those instead?

2. Is there a reason to handle (or not handle) USB interrupts here? Is USB
   a special case as opposed to for example the Graphics Synthesizer, etc.?

3. What kind of name strings are recommended for struct irq_chip and struct
   irqaction? Heavily abbreviated such as "EE DMAC" or more spelled out such
   as "Emotion Engine DMAC"?

Fredrik

diff --git a/arch/mips/include/asm/mach-ps2/irq.h b/arch/mips/include/asm/mach-ps2/irq.h
new file mode 100644
index 000000000000..b5a9727607cf
--- /dev/null
+++ b/arch/mips/include/asm/mach-ps2/irq.h
@@ -0,0 +1,92 @@
+/*
+ * PlayStation 2 IRQs
+ *
+ * Copyright (C) 2000-2002 Sony Computer Entertainment Inc.
+ * Copyright (C) 2010-2013 Jürgen Urban
+ * Copyright (C) 2017-2018 Fredrik Noring
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef __ASM_PS2_IRQ_H
+#define __ASM_PS2_IRQ_H
+
+#define NR_IRQS		56
+
+/*
+ * The interrupt controller (INTC) arbitrates interrupts from peripheral
+ * devices, except for the DMAC.
+ */
+#define IRQ_INTC	0
+#define IRQ_INTC_GS	0	/* Graphics Synthesizer */
+#define IRQ_INTC_SBUS	1	/* Bus connecting the Emotion Engine to the
+				   I/O processor (IOP) via the sub-system
+				   interface (SIF) */
+#define IRQ_INTC_VB_ON	2	/* Vertical blank start */
+#define IRQ_INTC_VB_OFF	3	/* Vertical blank end */
+#define IRQ_INTC_VIF0	4	/* VPU0 Interface packet expansion engine */
+#define IRQ_INTC_VIF1	5	/* VPU1 Interface packet expansion engine */
+#define IRQ_INTC_VU0	6	/* Vector Core Operation Unit 0 */
+#define IRQ_INTC_VU1	7	/* Vector Core Operation Unit 1 */
+#define IRQ_INTC_IPU	8	/* Image processor unit (MPEG 2 video etc.) */
+#define IRQ_INTC_TIMER0	9	/* Independent screen timer 0 */
+#define IRQ_INTC_TIMER1	10	/* Independent screen timer 1 */
+#define IRQ_INTC_TIMER2	11	/* Independent screen timer 2 */
+#define IRQ_INTC_TIMER3	12	/* Independent screen timer 3 */
+#define IRQ_INTC_SFIFO	13	/* Error detected during SFIFO transfers */
+#define IRQ_INTC_VU0WD	14	/* VU0 watch dog for RUN (sends force break) */
+#define IRQ_INTC_PGPU	15
+
+/*
+ * The DMA controller (DMAC) handles transfers between main memory and
+ * peripheral devices or the scratch pad RAM (SPRAM).
+ *
+ * The DMAC arbitrates the main bus at the same time, and supports chain
+ * mode which switches transfer addresses according to DMA tags attached to
+ * the transfer. The stall control synchronises two-channel transfers with
+ * priority control.
+ *
+ * Data is transferred in 128-bit words which must be aligned. Bus snooping
+ * is not performed.
+ */
+#define IRQ_DMAC	16
+#define IRQ_DMAC_0	16
+#define IRQ_DMAC_1	17
+#define IRQ_DMAC_2	18
+#define IRQ_DMAC_3	19
+#define IRQ_DMAC_4	20
+#define IRQ_DMAC_5	21
+#define IRQ_DMAC_6	22
+#define IRQ_DMAC_7	23
+#define IRQ_DMAC_8	24
+#define IRQ_DMAC_9	25
+#define IRQ_DMAC_S	29
+#define IRQ_DMAC_ME	30
+#define IRQ_DMAC_BE	31
+
+/* Graphics Synthesizer */
+#define IRQ_GS		32
+#define IRQ_GS_SIGNAL	32
+#define IRQ_GS_FINISH	33
+#define IRQ_GS_HSYNC	34
+#define IRQ_GS_VSYNC	35
+#define IRQ_GS_EDW	36
+#define IRQ_GS_EXHSYNC	37
+#define IRQ_GS_EXVSYNC	38
+
+/*
+ * Bus connecting the Emotion Engine to the I/O processor (IOP)
+ * via the sub-system interface (SIF)
+ */
+#define IRQ_SBUS	40
+#define IRQ_SBUS_AIF	40
+#define IRQ_SBUS_PCIC	41
+#define IRQ_SBUS_USB	42
+
+/* MIPS IRQs */
+#define MIPS_CPU_IRQ_BASE 48
+#define IRQ_C0_INTC	50
+#define IRQ_C0_DMAC	51
+#define IRQ_C0_IRQ7	55
+
+#endif /* __ASM_PS2_IRQ_H */
diff --git a/arch/mips/include/asm/mach-ps2/speed.h b/arch/mips/include/asm/mach-ps2/speed.h
new file mode 100644
index 000000000000..3aedcb27afe9
--- /dev/null
+++ b/arch/mips/include/asm/mach-ps2/speed.h
@@ -0,0 +1,25 @@
+/*
+ * PlayStation 2 Ethernet
+ *
+ * Copyright (C) 2001      Sony Computer Entertainment Inc.
+ * Copyright (C) 2010-2013 Jürgen Urban
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#ifndef __ASM_PS2_SPEED_H
+#define __ASM_PS2_SPEED_H
+
+#define DEV9M_BASE		0x14000000
+
+#define SPD_R_REV		(DEV9M_BASE + 0x00)
+#define SPD_R_REV_1		(DEV9M_BASE + 0x00)
+#define SPD_R_REV_3		(DEV9M_BASE + 0x04)
+
+#define SPD_R_INTR_STAT		(DEV9M_BASE + 0x28)
+#define SPD_R_INTR_ENA		(DEV9M_BASE + 0x2a)
+#define SPD_R_XFR_CTRL		(DEV9M_BASE + 0x32)
+#define SPD_R_IF_CTRL		(DEV9M_BASE + 0x64)
+
+#endif /* __ASM_PS2_SPEED_H */
+
diff --git a/arch/mips/ps2/irq.c b/arch/mips/ps2/irq.c
new file mode 100644
index 000000000000..9e4392837b9f
--- /dev/null
+++ b/arch/mips/ps2/irq.c
@@ -0,0 +1,491 @@
+/*
+ * PlayStation 2 IRQs
+ *
+ * Copyright (C) 2000-2002 Sony Computer Entertainment Inc.
+ * Copyright (C) 2010-2013 Jürgen Urban
+ * Copyright (C) 2017-2018 Fredrik Noring
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/interrupt.h>
+#include <linux/ioport.h>
+
+#include <asm/bootinfo.h>
+#include <asm/io.h>
+#include <asm/mipsregs.h>
+#include <asm/irq_cpu.h>
+
+#include <asm/mach-ps2/irq.h>
+#include <asm/mach-ps2/speed.h>
+#include <asm/mach-ps2/ps2.h>
+
+#define INTC_STAT	0x1000f000
+#define INTC_MASK	0x1000f010
+#define DMAC_STAT	0x1000e010
+#define DMAC_MASK	0x1000e010
+#define GS_CSR		0x12001000
+#define GS_IMR		0x12001010
+
+#define SBUS_SMFLG	0x1000f230
+#define SBUS_AIF_INTSR	0x18000004
+#define SBUS_AIF_INTEN	0x18000006
+#define SBUS_PCIC_EXC1	0x1f801476
+#define SBUS_PCIC_CSC1	0x1f801464
+#define SBUS_PCIC_IMR1	0x1f801468
+#define SBUS_PCIC_TIMR	0x1f80147e
+#define SBUS_PCIC3_TIMR	0x1f801466
+
+/* INTC */
+
+static volatile unsigned long intc_mask = 0;	/* FIXME: Why volatile? */
+
+static inline void intc_enable_irq(struct irq_data *data)
+{
+	if (!(intc_mask & (1 << data->irq))) {
+		intc_mask |= (1 << data->irq);
+		outl(1 << data->irq, INTC_MASK);
+	}
+}
+
+static inline void intc_disable_irq(struct irq_data *data)
+{
+	if ((intc_mask & (1 << data->irq))) {
+		intc_mask &= ~(1 << data->irq);
+		outl(1 << data->irq, INTC_MASK);
+	}
+}
+
+static unsigned int intc_startup_irq(struct irq_data *data)
+{
+	intc_enable_irq(data);
+
+	return 0;
+}
+
+static void intc_shutdown_irq(struct irq_data *data)
+{
+	intc_disable_irq(data);
+}
+
+static void intc_ack_irq(struct irq_data *data)
+{
+	intc_disable_irq(data);
+	outl(1 << data->irq, INTC_STAT);
+}
+
+static void intc_end_irq(struct irq_data *data)
+{
+	intc_enable_irq(data);
+}
+
+static struct irq_chip intc_irq_type = {
+	.name		= "Emotion Engine INTC",
+	.irq_startup	= intc_startup_irq,
+	.irq_shutdown	= intc_shutdown_irq,
+	.irq_unmask	= intc_enable_irq,
+	.irq_mask	= intc_disable_irq,
+	.irq_mask_ack	= intc_ack_irq,
+	.irq_eoi	= intc_end_irq,
+};
+
+/* DMAC */
+
+static volatile unsigned long dmac_mask = 0;	/* FIXME: Why volatile? */
+
+static inline void dmac_enable_irq(struct irq_data *data)
+{
+	const unsigned int dmac_irq_nr = data->irq - IRQ_DMAC;
+
+	if (!(dmac_mask & (1 << dmac_irq_nr))) {
+		dmac_mask |= (1 << dmac_irq_nr);
+		outl(1 << (dmac_irq_nr + 16), DMAC_MASK);
+	}
+}
+
+static inline void dmac_disable_irq(struct irq_data *data)
+{
+	const unsigned int dmac_irq_nr = data->irq - IRQ_DMAC;
+
+	if ((dmac_mask & (1 << dmac_irq_nr))) {
+		dmac_mask &= ~(1 << dmac_irq_nr);
+		outl(1 << (dmac_irq_nr + 16), DMAC_MASK);
+	}
+}
+
+static unsigned int dmac_startup_irq(struct irq_data *data)
+{
+	dmac_enable_irq(data);
+
+	return 0;
+}
+
+static void dmac_shutdown_irq(struct irq_data *data)
+{
+	dmac_disable_irq(data);
+}
+
+static void dmac_ack_irq(struct irq_data *data)
+{
+	const unsigned int dmac_irq_nr = data->irq - IRQ_DMAC;
+
+	dmac_disable_irq(data);
+	outl(1 << dmac_irq_nr, DMAC_STAT);
+}
+
+static void dmac_end_irq(struct irq_data *data)
+{
+	dmac_enable_irq(data);
+}
+
+static struct irq_chip dmac_irq_type = {
+	.name		= "Emotion Engine DMAC",
+	.irq_startup	= dmac_startup_irq,
+	.irq_shutdown	= dmac_shutdown_irq,
+	.irq_unmask	= dmac_enable_irq,
+	.irq_mask	= dmac_disable_irq,
+	.irq_mask_ack	= dmac_ack_irq,
+	.irq_eoi	= dmac_end_irq,
+};
+
+/* Graphics Synthesizer */
+
+static volatile unsigned long gs_mask = 0;	/* FIXME: Why volatile? */
+
+void ps2_setup_gs_imr(void)
+{
+	outl(0xff00, GS_IMR);
+	outl((~gs_mask & 0x7f) << 8, GS_IMR);
+}
+
+static inline void gs_enable_irq(struct irq_data *data)
+{
+	unsigned int gs_irq_nr = data->irq - IRQ_GS;
+
+	gs_mask |= (1 << gs_irq_nr);
+	ps2_setup_gs_imr();
+}
+
+static inline void gs_disable_irq(struct irq_data *data)
+{
+	unsigned int gs_irq_nr = data->irq - IRQ_GS;
+
+	gs_mask &= ~(1 << gs_irq_nr);
+	ps2_setup_gs_imr();
+}
+
+static unsigned int gs_startup_irq(struct irq_data *data)
+{
+	gs_enable_irq(data);
+	return 0;
+}
+
+static void gs_shutdown_irq(struct irq_data *data)
+{
+	gs_disable_irq(data);
+}
+
+static void gs_ack_irq(struct irq_data *data)
+{
+	unsigned int gs_irq_nr = data->irq - IRQ_GS;
+
+	outl(0xff00, GS_IMR);
+	outl(1 << gs_irq_nr, GS_CSR);
+}
+
+static void gs_end_irq(struct irq_data *data)
+{
+	outl((~gs_mask & 0x7f) << 8, GS_IMR);
+	gs_enable_irq(data);
+}
+
+static struct irq_chip gs_irq_type = {
+	.name		= "Graphics Synthesizer",
+	.irq_startup	= gs_startup_irq,
+	.irq_shutdown	= gs_shutdown_irq,
+	.irq_unmask	= gs_enable_irq,
+	.irq_mask	= gs_disable_irq,
+	.irq_mask_ack	= gs_ack_irq,
+	.irq_eoi	= gs_end_irq,
+};
+
+/* SBUS */
+
+static volatile unsigned long sbus_mask = 0;	/* FIXME: Why volatile? */
+
+static inline unsigned long sbus_enter_irq(void)
+{
+	unsigned long istat = 0;
+
+	if (inl(SBUS_SMFLG) & (1 << 8)) {
+		outl(1 << 8, SBUS_SMFLG);
+		switch (ps2_pcic_type) {
+		case 1:
+			if (inw(SBUS_PCIC_CSC1) & 0x0080) {
+				outw(0xffff, SBUS_PCIC_CSC1);
+				istat |= 1 << (IRQ_SBUS_PCIC - IRQ_SBUS);
+			}
+			break;
+		case 2:
+			if (inw(SBUS_PCIC_CSC1) & 0x0080) {
+				outw(0xffff, SBUS_PCIC_CSC1);
+				istat |= 1 << (IRQ_SBUS_PCIC - IRQ_SBUS);
+			}
+			break;
+		case 3:
+			istat |= 1 << (IRQ_SBUS_PCIC - IRQ_SBUS);
+			break;
+		}
+	}
+
+	if (inl(SBUS_SMFLG) & (1 << 10)) {
+		outl(1 << 10, SBUS_SMFLG);
+		istat |= 1 << (IRQ_SBUS_USB - IRQ_SBUS);
+	}
+
+	return istat;
+}
+
+static inline void sbus_leave_irq(void)
+{
+	unsigned short mask;
+
+	if (ps2_pccard_present == 0x0100) {
+		mask = inw(SPD_R_INTR_ENA);
+		outw(0, SPD_R_INTR_ENA);
+		outw(mask, SPD_R_INTR_ENA);
+	}
+
+	switch (ps2_pcic_type) {
+	case 1:	/* Fall-through */
+	case 2:
+		mask = inw(SBUS_PCIC_TIMR);
+		outw(1, SBUS_PCIC_TIMR);
+		outw(mask, SBUS_PCIC_TIMR);
+		break;
+	case 3:
+		mask = inw(SBUS_PCIC3_TIMR);
+		outw(1, SBUS_PCIC3_TIMR);
+		outw(mask, SBUS_PCIC3_TIMR);
+		break;
+	}
+}
+
+static inline void sbus_enable_irq(struct irq_data *data)
+{
+	unsigned int sbus_irq_nr = data->irq - IRQ_SBUS;
+
+	sbus_mask |= (1 << sbus_irq_nr);
+
+	switch (data->irq) {
+	case IRQ_SBUS_PCIC:
+		switch (ps2_pcic_type) {
+		case 1:
+			outw(0xff7f, SBUS_PCIC_IMR1);
+			break;
+		case 2:
+			outw(0, SBUS_PCIC_TIMR);
+			break;
+		case 3:
+			outw(0, SBUS_PCIC3_TIMR);
+			break;
+		}
+		break;
+	case IRQ_SBUS_USB:
+		break;
+	}
+}
+
+static inline void sbus_disable_irq(struct irq_data *data)
+{
+	unsigned int sbus_irq_nr = data->irq - IRQ_SBUS;
+
+	sbus_mask &= ~(1 << sbus_irq_nr);
+
+	switch (data->irq) {
+	case IRQ_SBUS_PCIC:
+		switch (ps2_pcic_type) {
+		case 1:
+			outw(0xffff, SBUS_PCIC_IMR1);
+			break;
+		case 2:
+			outw(1, SBUS_PCIC_TIMR);
+			break;
+		case 3:
+			outw(1, SBUS_PCIC3_TIMR);
+			break;
+		}
+		break;
+	case IRQ_SBUS_USB:
+		break;
+	}
+}
+
+static unsigned int sbus_startup_irq(struct irq_data *data)
+{
+	sbus_enable_irq(data);
+
+	return 0;
+}
+
+static void sbus_shutdown_irq(struct irq_data *data)
+{
+	sbus_disable_irq(data);
+}
+
+static void sbus_ack_irq(struct irq_data *data)
+{
+}
+
+static void sbus_end_irq(struct irq_data *data)
+{
+}
+
+static struct irq_chip sbus_irq_type = {
+	.name		= "I/O processor",
+	.irq_startup	= sbus_startup_irq,
+	.irq_shutdown	= sbus_shutdown_irq,
+	.irq_unmask	= sbus_enable_irq,
+	.irq_mask	= sbus_disable_irq,
+	.irq_mask_ack	= sbus_ack_irq,
+	.irq_eoi	= sbus_end_irq,
+};
+
+static irqreturn_t gs_cascade(int irq, void *data)
+{
+	const u32 irq_reg = inl(GS_CSR) & gs_mask;
+
+	if (irq_reg)
+		generic_handle_irq(__fls(irq_reg) + IRQ_GS);
+
+	return IRQ_HANDLED;
+}
+
+static struct irqaction cascade_gs_irqaction = {
+	.name = "Graphics Synthesizer cascade",
+	.handler = gs_cascade,
+};
+
+static irqreturn_t sbus_cascade(int irq, void *data)
+{
+	u32 irq_reg;
+
+	preempt_disable();
+
+	irq_reg = sbus_enter_irq() & sbus_mask;
+	if (irq_reg)
+		generic_handle_irq(__fls(irq_reg) + IRQ_SBUS);
+	sbus_leave_irq();
+
+	preempt_enable_no_resched();
+
+	return IRQ_HANDLED;
+}
+
+static struct irqaction cascade_sbus_irqaction = {
+	.name = "SBUS cascade",
+	.handler = sbus_cascade,
+};
+
+static irqreturn_t intc_cascade(int irq, void *data)
+{
+	const u32 irq_reg = inl(INTC_STAT) & intc_mask;
+
+	if (irq_reg)
+		generic_handle_irq(__fls(irq_reg) + IRQ_INTC);
+
+	return IRQ_HANDLED;
+}
+
+static struct irqaction cascade_intc_irqaction = {
+	.handler = intc_cascade,
+	.name = "INTC cascade",
+};
+
+static irqreturn_t dmac_cascade(int irq, void *data)
+{
+	const u32 irq_reg = inl(DMAC_STAT) & dmac_mask;
+
+	if (irq_reg)
+		generic_handle_irq(__fls(irq_reg) + IRQ_DMAC);
+
+	return IRQ_HANDLED;
+}
+
+static struct irqaction cascade_dmac_irqaction = {
+	.name = "DMAC cascade",
+	.handler = dmac_cascade,
+};
+
+void __init arch_init_irq(void)
+{
+	int err;
+	int i;
+
+	mips_cpu_irq_init();	/* Initialise CPU IRQs. */
+
+	for (i = 0; i < MIPS_CPU_IRQ_BASE; i++) {
+		struct irq_chip *handler =
+			i < IRQ_DMAC ? &intc_irq_type :
+			i < IRQ_GS   ? &dmac_irq_type :
+			i < IRQ_SBUS ?   &gs_irq_type :
+				       &sbus_irq_type ;
+
+		irq_set_chip_and_handler(i, handler, handle_level_irq);
+	}
+
+	/* Initialise interrupt mask. */
+
+	intc_mask = 0;
+	outl(inl(INTC_MASK), INTC_MASK);
+	outl(inl(INTC_STAT), INTC_STAT);
+
+	dmac_mask = 0;
+	outl(inl(DMAC_MASK), DMAC_MASK);
+
+	gs_mask = 0;
+	outl(0xff00, GS_IMR);
+	outl(0x00ff, GS_CSR);
+
+	sbus_mask = 0;
+	outl((1 << 8) | (1 << 10), SBUS_SMFLG);
+
+	/* Enable cascaded GS IRQ. */
+	err = setup_irq(IRQ_INTC_GS, &cascade_gs_irqaction);
+	if (err)
+		printk(KERN_ERR "irq: Failed to setup GS IRQ (err = %d).\n", err);
+
+	/* Enable cascaded SBUS IRQ. */
+	err = setup_irq(IRQ_INTC_SBUS, &cascade_sbus_irqaction);
+	if (err)
+		printk(KERN_ERR "irq: Failed to setup SBUS IRQ (err = %d).\n", err);
+
+	/* Enable INTC interrupt. */
+	err = setup_irq(IRQ_C0_INTC, &cascade_intc_irqaction);
+	if (err)
+		printk(KERN_ERR "irq: Failed to setup INTC (err = %d).\n", err);
+
+	/* Enable DMAC interrupt. */
+	err = setup_irq(IRQ_C0_DMAC, &cascade_dmac_irqaction);
+	if (err)
+		printk(KERN_ERR "irq: Failed to setup DMAC (err = %d).\n", err);
+}
+
+asmlinkage void plat_irq_dispatch(void)
+{
+	const int pending = read_c0_status() & read_c0_cause();
+
+	/* First check for r4k counter/timer IRQs. */
+	if (pending & CAUSEF_IP2)
+		do_IRQ(IRQ_C0_INTC);	/* INTC interrupt. */
+	else if (pending & CAUSEF_IP3)
+		do_IRQ(IRQ_C0_DMAC);	/* DMAC interrupt. */
+	else if (pending & CAUSEF_IP7)
+		do_IRQ(IRQ_C0_IRQ7);	/* Timer interrupt. */
+	else
+		spurious_interrupt();
+}

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: PS2: Interrupt request (IRQ) support
  2018-03-03 12:26                                         ` [RFC] MIPS: PS2: Interrupt request (IRQ) support Fredrik Noring
@ 2018-03-03 13:09                                           ` Maciej W. Rozycki
  2018-03-03 14:14                                             ` Fredrik Noring
  2018-04-09 15:51                                             ` Fredrik Noring
  2018-03-18 10:45                                           ` Fredrik Noring
  1 sibling, 2 replies; 117+ messages in thread
From: Maciej W. Rozycki @ 2018-03-03 13:09 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Jürgen Urban, linux-mips

Hi Fredrik,

> This patch for IRQ support does a bit more than strictly required for the
> initial submission by supporting for example the Graphics Synthesizer.
> Please let me know if I should split it into several parts.

 I'm on holiday starting today and lasting two weeks.  I'll have a look at 
your patch when I am back.

  Maciej

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: PS2: Interrupt request (IRQ) support
  2018-03-03 13:09                                           ` Maciej W. Rozycki
@ 2018-03-03 14:14                                             ` Fredrik Noring
  2018-04-09 15:51                                             ` Fredrik Noring
  1 sibling, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-03-03 14:14 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  I'm on holiday starting today and lasting two weeks.  I'll have a look at 
> your patch when I am back.

Nice! I believe all patches for the initial submission are ready for
comments and reviews now, but I'm happy for two additional weeks to improve
and polish them. :)

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: PS2: Interrupt request (IRQ) support
  2018-03-03 12:26                                         ` [RFC] MIPS: PS2: Interrupt request (IRQ) support Fredrik Noring
  2018-03-03 13:09                                           ` Maciej W. Rozycki
@ 2018-03-18 10:45                                           ` Fredrik Noring
  2018-03-19 19:15                                             ` Thomas Gleixner
  2018-06-18 18:52                                             ` [RFC v2] " Fredrik Noring
  1 sibling, 2 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-03-18 10:45 UTC (permalink / raw)
  To: Maciej W. Rozycki, Thomas Gleixner; +Cc: linux-mips, Jürgen Urban

Hi Maciej and Thomas,

Thomas: Please have a look at the first questions below, regarding
irq_data->mask and irq_chip->irq_calc_mask. Are they supposed to be usable?

> +static volatile unsigned long intc_mask = 0;	/* FIXME: Why volatile? */
> +
> +static inline void intc_enable_irq(struct irq_data *data)
> +{
> +	if (!(intc_mask & (1 << data->irq))) {
> +		intc_mask |= (1 << data->irq);
> +		outl(1 << data->irq, INTC_MASK);
> +	}
> +}

The intc_mask variable can be removed, since INTC_MASK is readable, although
perhaps there are performance reasons to not read the register directly?

I also noticed that struct irq_data contains a mask field, which allows
simplifications to

static inline void intc_enable_irq(struct irq_data *data)
{
	if (!(inl(INTC_MASK) & data->mask))
		outl(data->mask, INTC_MASK);
}

provided the following patch is applied to kernel/irq/irqdesc.c:

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -109,6 +109,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
 	desc->irq_common_data.msi_desc = NULL;
 
 	desc->irq_data.common = &desc->irq_common_data;
+	desc->irq_data.mask = 1 << irq; /* FIXME: What about irq_calc_mask? */
 	desc->irq_data.irq = irq;
 	desc->irq_data.chip = &no_irq_chip;
 	desc->irq_data.chip_data = NULL;

Perhaps the mask field ought to be assigned "1 << irq_data->hwirq" instead,
unless irq_calc_mask is provided. The mask documentation is not entirely
clear on the use and any restrictions, and it does not seem to be used all
that much.

The mask field and irq_calc_mask were introduced by Thomas Gleixner in
commits 966dc736b819 "genirq: Generic chip: Cache per irq bit mask" and
d0051816e619 "genirq: irqchip: Add a mask calculation function" in 2013.

> +static inline void dmac_enable_irq(struct irq_data *data)
> +{
> +	const unsigned int dmac_irq_nr = data->irq - IRQ_DMAC;

This is perhaps a case where the difference between data->irq and
data->hwirq would be relevant to compute data->mask?

> +/* Graphics Synthesizer */
> +
> +static volatile unsigned long gs_mask = 0;	/* FIXME: Why volatile? */

The interrupt mask for the Graphics Synthesizer is only writable, and does
not toggle bits, so it appears a register copy somehow must be maintained by
the kernel.

> +void ps2_setup_gs_imr(void)
> +{
> +	outl(0xff00, GS_IMR);
> +	outl((~gs_mask & 0x7f) << 8, GS_IMR);
> +}

It is not entirely clear why GS_IMR needs to be fully masked (interrupts
disabled) before set to its proper value.

The GS User's Manual (p. 95) mentions that SIGMSK must be toggled when data
is written to the SIGNAL register, but does that apply here? And why not
only the SIGNAL bit zero then?

> +static inline unsigned long sbus_enter_irq(void)
> +{
> +	unsigned long istat = 0;
> +
> +	if (inl(SBUS_SMFLG) & (1 << 8)) {
> +		outl(1 << 8, SBUS_SMFLG);
> +		switch (ps2_pcic_type) {
> +		case 1:
> +			if (inw(SBUS_PCIC_CSC1) & 0x0080) {
> +				outw(0xffff, SBUS_PCIC_CSC1);
> +				istat |= 1 << (IRQ_SBUS_PCIC - IRQ_SBUS);
> +			}
> +			break;
> +		case 2:
> +			if (inw(SBUS_PCIC_CSC1) & 0x0080) {
> +				outw(0xffff, SBUS_PCIC_CSC1);
> +				istat |= 1 << (IRQ_SBUS_PCIC - IRQ_SBUS);
> +			}
> +			break;
> +		case 3:
> +			istat |= 1 << (IRQ_SBUS_PCIC - IRQ_SBUS);
> +			break;
> +		}
> +	}

It's unclear what these registers actually do.

> +	if (inl(SBUS_SMFLG) & (1 << 10)) {
> +		outl(1 << 10, SBUS_SMFLG);
> +		istat |= 1 << (IRQ_SBUS_USB - IRQ_SBUS);
> +	}

This is needed to support USB in the initial patch submission.

> +static inline void sbus_enable_irq(struct irq_data *data)
> +{
> +	unsigned int sbus_irq_nr = data->irq - IRQ_SBUS;
> +
> +	sbus_mask |= (1 << sbus_irq_nr);
> +
> +	switch (data->irq) {
> +	case IRQ_SBUS_PCIC:
> +		switch (ps2_pcic_type) {
> +		case 1:
> +			outw(0xff7f, SBUS_PCIC_IMR1);
> +			break;
> +		case 2:
> +			outw(0, SBUS_PCIC_TIMR);
> +			break;
> +		case 3:
> +			outw(0, SBUS_PCIC3_TIMR);
> +			break;
> +		}
> +		break;
> +	case IRQ_SBUS_USB:
> +		break;

Something needs to be done to mask and unmask USB interrupts, but it's not
entirely clear in what way. As Alan Stern notes in

https://marc.info/?l=linux-usb&m=152106073613807&w=2

disabling interrupts by setting OHCI_INTR_MIE in the OHCI registers isn't
the recommended method.

> +static struct irq_chip sbus_irq_type = {
> +	.name		= "I/O processor",

Are solidus and space allowed in names?

> +static struct irqaction cascade_intc_irqaction = {
> +	.handler = intc_cascade,
> +	.name = "INTC cascade",
> +};

I'm not sure how a cascade is supposed to work here.

> +	for (i = 0; i < MIPS_CPU_IRQ_BASE; i++) {
> +		struct irq_chip *handler =
> +			i < IRQ_DMAC ? &intc_irq_type :
> +			i < IRQ_GS   ? &dmac_irq_type :
> +			i < IRQ_SBUS ?   &gs_irq_type :
> +				       &sbus_irq_type ;
> +
> +		irq_set_chip_and_handler(i, handler, handle_level_irq);
> +	}

I'm considering unrolling this loop into four separate loops.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: PS2: Interrupt request (IRQ) support
  2018-03-18 10:45                                           ` Fredrik Noring
@ 2018-03-19 19:15                                             ` Thomas Gleixner
  2018-06-18 18:52                                             ` [RFC v2] " Fredrik Noring
  1 sibling, 0 replies; 117+ messages in thread
From: Thomas Gleixner @ 2018-03-19 19:15 UTC (permalink / raw)
  To: Fredrik Noring; +Cc: Maciej W. Rozycki, linux-mips, Jürgen Urban

On Sun, 18 Mar 2018, Fredrik Noring wrote:

> Hi Maciej and Thomas,
> 
> Thomas: Please have a look at the first questions below, regarding
> irq_data->mask and irq_chip->irq_calc_mask. Are they supposed to be usable?
> 
> > +static volatile unsigned long intc_mask = 0;	/* FIXME: Why volatile? */
> > +
> > +static inline void intc_enable_irq(struct irq_data *data)
> > +{
> > +	if (!(intc_mask & (1 << data->irq))) {
> > +		intc_mask |= (1 << data->irq);
> > +		outl(1 << data->irq, INTC_MASK);
> > +	}
> > +}
> 
> The intc_mask variable can be removed, since INTC_MASK is readable, although
> perhaps there are performance reasons to not read the register directly?
> 
> I also noticed that struct irq_data contains a mask field, which allows
> simplifications to
> 
> static inline void intc_enable_irq(struct irq_data *data)
> {
> 	if (!(inl(INTC_MASK) & data->mask))
> 		outl(data->mask, INTC_MASK);

That's a pointless exercise. The core code already knows whether an
interrupt is masked or not and makes the calls conditionally.

> }
> 
> provided the following patch is applied to kernel/irq/irqdesc.c:
> 
> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
> --- a/kernel/irq/irqdesc.c
> +++ b/kernel/irq/irqdesc.c
> @@ -109,6 +109,7 @@ static void desc_set_defaults(unsigned int irq, struct irq_desc *desc, int node,
>  	desc->irq_common_data.msi_desc = NULL;
>  
>  	desc->irq_data.common = &desc->irq_common_data;
> +	desc->irq_data.mask = 1 << irq; /* FIXME: What about irq_calc_mask? */
> 
> Perhaps the mask field ought to be assigned "1 << irq_data->hwirq" instead,
> unless irq_calc_mask is provided. The mask documentation is not entirely
> clear on the use and any restrictions, and it does not seem to be used all
> that much.

Neither works. @irq is the virtual Linux interrupt number and there is no
guarantee that it maps 1:1 to a hardware irq number. Also this falls apart
when @irq >= 32 because the mask field is 32bit....

> The mask field and irq_calc_mask were introduced by Thomas Gleixner in
> commits 966dc736b819 "genirq: Generic chip: Cache per irq bit mask" and
> d0051816e619 "genirq: irqchip: Add a mask calculation function" in 2013.

Yes, The generic irq chip uses this. I havent seen the full irqchip patch
so I can't tell whether this driver could/should use the generic chip.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC] MIPS: PS2: Interrupt request (IRQ) support
  2018-03-03 13:09                                           ` Maciej W. Rozycki
  2018-03-03 14:14                                             ` Fredrik Noring
@ 2018-04-09 15:51                                             ` Fredrik Noring
  1 sibling, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-04-09 15:51 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Jürgen Urban, linux-mips

Hi Maciej,

>  I'm on holiday starting today and lasting two weeks.  I'll have a look at 
> your patch when I am back.

How are the reviews going? I think some of the most important and for me
least understood parts of the initial submission are:

https://www.linux-mips.org/archives/linux-mips/2018-02/msg00100.html
https://www.linux-mips.org/archives/linux-mips/2018-02/msg00102.html
https://www.linux-mips.org/archives/linux-mips/2018-02/msg00103.html
https://www.linux-mips.org/archives/linux-mips/2018-02/msg00117.html
https://www.linux-mips.org/archives/linux-mips/2018-02/msg00219.html
https://www.linux-mips.org/archives/linux-mips/2018-02/msg00221.html
https://www.linux-mips.org/archives/linux-mips/2018-03/msg00035.html

I'm currently rewriting the Graphics Synthesizer and frame buffer drivers
from scratch, with the following changes:

- modules;
- proper handling of video modes;
- several new video modes including 1920x1080p;
- compatibility with PS2 HDMI adapters;
- extended sysfs with register files;
- hardware support for panning, etc.
- vertical blank synchronisation;
- performance improvements;
- bug fixes.

One critical issue with the OHCI driver was resolved in commit d6c931ea32dc0
"USB: OHCI: Fix NULL dereference in HCDs using HCD_LOCAL_MEM". Robin Murphy
and Christoph Hellwig proposed fixes to DMA handling and Robin will submit a
patch as discussed here:

https://marc.info/?t=152010179900001&r=1&w=2

The OHCI driver still has an issue (and workaround) with interrupts, related
to the IRQ handling.

Fredrik

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: [RFC v2] MIPS: PS2: Interrupt request (IRQ) support
  2018-03-18 10:45                                           ` Fredrik Noring
  2018-03-19 19:15                                             ` Thomas Gleixner
@ 2018-06-18 18:52                                             ` Fredrik Noring
  1 sibling, 0 replies; 117+ messages in thread
From: Fredrik Noring @ 2018-06-18 18:52 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips, Jürgen Urban

Hi Maciej,

I have completely reworked the handling of IRQs: it's now modular and
simplified (INTC patch attached below). The cascading interrupts for DMA
and the Graphics Synthesizer are setup in separate modules. The SBUS
interrupts are shared instead of demultiplexed and RPC is no longer used
as an alternative to interrupt forwarding.

Unrelated to IRQs: I have also replaced all previous BIOS calls. The kernel
no longer needs a BIOS and reclaims its memory space. In particular, the
I/O processor (IOP) is reset by the kernel and a boot loader is no longer
needed to perform this task. I also have a collection of patches to get
kexec working with a compressed (vmlinuz) kernel, so it can reboot itself.

I have implemented a graphical putc to render boot prints to the screen,
both when decompressing the kernel and for early boot stages (prom_putchar).
A UART requires extra hardware and soldering and is significantly more
difficult to install.

The frame buffer module is essentially complete. It can operate in two
distinct modes: console xor virtual mode. In console mode text is rendered
as textured tiles from local Graphics Synthesizer memory. This is very fast,
memory and bandwidth efficient. In virtual mode a memory buffer is allocated
in main memory and copied via DMA to the Graphics Synthesizer. This enables
mmap, but it is fairly inefficient and mostly useful for compatibility.
YWRAP and other acceleration techniques are implemented too.

I have a separate device driver for the Graphics Synthesizer. Its interface,
the GIF, is serial with a streaming graphical hardware primitive format (for
points, triangles, sprites, etc.) and a character device (such as /dev/gs)
would be ideal and the most efficient way to render graphics for the R5900,
especially if scatter-gather DMA is implemented as well.

I have many more ideas yet to explore and implement. :)

Fredrik

--- /dev/null
+++ b/arch/mips/ps2/irq.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PlayStation 2 Interrupt controller (INTC) IRQs
+ *
+ * Copyright (C) 2018 Fredrik Noring
+ */
+
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/ioport.h>
+#include <linux/module.h>
+#include <linux/types.h>
+
+#include <asm/bootinfo.h>
+#include <asm/io.h>
+#include <asm/irq_cpu.h>
+#include <asm/mipsregs.h>
+
+#include <asm/mach-ps2/irq.h>
+#include <asm/mach-ps2/ps2.h>
+
+static inline void intc_reverse_mask(struct irq_data *data)
+{
+	outl(BIT(data->irq - IRQ_INTC), INTC_MASK);
+}
+
+static void intc_mask_ack(struct irq_data *data)
+{
+	const unsigned int bit = BIT(data->irq - IRQ_INTC);
+
+	outl(bit, INTC_MASK);
+	outl(bit, INTC_STAT);
+}
+
+#define INTC_IRQ_TYPE(irq_, name_)				\
+	{							\
+		.irq = irq_,					\
+		.irq_chip = {					\
+			.name = name_,				\
+			.irq_unmask = intc_reverse_mask,	\
+			.irq_mask = intc_reverse_mask,		\
+			.irq_mask_ack = intc_mask_ack,		\
+		}						\
+	}
+
+static struct {
+	unsigned int irq;
+	struct irq_chip irq_chip;
+} intc_irqs[] = {
+	INTC_IRQ_TYPE(IRQ_INTC_GS,     "INTC GS"),
+	INTC_IRQ_TYPE(IRQ_INTC_SBUS,   "INTC SBUS"),
+	INTC_IRQ_TYPE(IRQ_INTC_VB_ON,  "INTC VB on"),
+	INTC_IRQ_TYPE(IRQ_INTC_VB_OFF, "INTC VB off"),
+	INTC_IRQ_TYPE(IRQ_INTC_VIF0,   "INTC VIF0"),
+	INTC_IRQ_TYPE(IRQ_INTC_VIF1,   "INTC VIF1"),
+	INTC_IRQ_TYPE(IRQ_INTC_VU0,    "INTC VU0"),
+	INTC_IRQ_TYPE(IRQ_INTC_VU1,    "INTC VU1"),
+	INTC_IRQ_TYPE(IRQ_INTC_IPU,    "INTC IPU"),
+	INTC_IRQ_TYPE(IRQ_INTC_TIMER0, "INTC timer0"),
+	INTC_IRQ_TYPE(IRQ_INTC_TIMER1, "INTC timer1"),
+	INTC_IRQ_TYPE(IRQ_INTC_TIMER2, "INTC timer2"),
+	INTC_IRQ_TYPE(IRQ_INTC_TIMER3, "INTC timer3"),
+	INTC_IRQ_TYPE(IRQ_INTC_SFIFO,  "INTC SFIFO"),
+	INTC_IRQ_TYPE(IRQ_INTC_VU0WD,  "INTC VU0WD"),
+	INTC_IRQ_TYPE(IRQ_INTC_PGPU,   "INTC PGPU"),
+};
+
+static irqreturn_t intc_cascade(int irq, void *data)
+{
+	unsigned int pending, irq_intc;
+	irqreturn_t status = IRQ_NONE;
+
+	for (pending = inl(INTC_STAT); pending; pending &= ~BIT(irq_intc)) {
+		irq_intc = __fls(pending);
+
+		if (generic_handle_irq(irq_intc + IRQ_INTC) < 0)
+			spurious_interrupt();
+		else
+			status = IRQ_HANDLED;
+	}
+
+	return status;
+}
+
+static struct irqaction cascade_intc_irqaction = {
+	.name = "INTC cascade",
+	.handler = intc_cascade,
+};
+
+void __init arch_init_irq(void)
+{
+	int err;
+	int i;
+
+	mips_cpu_irq_init();
+
+	for (i = 0; i < ARRAY_SIZE(intc_irqs); i++)
+		irq_set_chip_and_handler(intc_irqs[i].irq,
+			&intc_irqs[i].irq_chip, handle_level_irq);
+
+	/* FIXME: Is HARDIRQS_SW_RESEND needed? Are these edge types needed? */
+	irq_set_irq_type(IRQ_INTC_GS, IRQ_TYPE_EDGE_FALLING);
+	irq_set_irq_type(IRQ_INTC_SBUS, IRQ_TYPE_EDGE_FALLING);
+	irq_set_irq_type(IRQ_INTC_VB_ON, IRQ_TYPE_EDGE_RISING);
+	irq_set_irq_type(IRQ_INTC_VB_OFF, IRQ_TYPE_EDGE_FALLING);
+
+	outl(inl(INTC_MASK), INTC_MASK);
+	outl(inl(INTC_STAT), INTC_STAT);
+
+	err = setup_irq(IRQ_C0_INTC, &cascade_intc_irqaction);
+	if (err)
+		printk(KERN_ERR "irq: Failed to setup INTC (err = %d).\n", err);
+}
+
+asmlinkage void plat_irq_dispatch(void)
+{
+	const unsigned int pending = read_c0_status() & read_c0_cause();
+
+	if (!(pending & (CAUSEF_IP2 | CAUSEF_IP3 | CAUSEF_IP7)))
+		return spurious_interrupt();
+
+	if (pending & CAUSEF_IP2)
+		do_IRQ(IRQ_C0_INTC);	/* INTC interrupt */
+	if (pending & CAUSEF_IP3)
+		do_IRQ(IRQ_C0_DMAC);	/* DMAC interrupt */
+	if (pending & CAUSEF_IP7)
+		do_IRQ(IRQ_C0_IRQ7);	/* Timer interrupt */
+}

^ permalink raw reply	[flat|nested] 117+ messages in thread

end of thread, other threads:[~2018-06-18 18:52 UTC | newest]

Thread overview: 117+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-27 13:23 [PATCH] MIPS: Add basic R5900 support Fredrik Noring
2017-08-28 13:53 ` Ralf Baechle
2017-08-28 17:11   ` Maciej W. Rozycki
2017-08-29 17:33   ` Fredrik Noring
2017-08-29 17:24 ` Maciej W. Rozycki
2017-08-29 17:24   ` Maciej W. Rozycki
2017-08-30 13:23   ` Fredrik Noring
2017-08-31 15:11     ` Maciej W. Rozycki
2017-08-31 15:11       ` Maciej W. Rozycki
2017-09-02 10:28   ` Fredrik Noring
2017-09-09 10:13     ` Maciej W. Rozycki
2017-09-09 10:13       ` Maciej W. Rozycki
2017-09-11  5:21       ` Maciej W. Rozycki
2017-09-11  5:21         ` Maciej W. Rozycki
2017-09-12 17:59         ` Fredrik Noring
2017-09-15 11:12           ` Maciej W. Rozycki
2017-09-15 11:12             ` Maciej W. Rozycki
2017-09-15 13:19             ` Fredrik Noring
2017-09-15 18:28               ` Maciej W. Rozycki
2017-09-15 18:28                 ` Maciej W. Rozycki
2017-09-02 14:10   ` [PATCH v2] " Fredrik Noring
2017-09-11  5:18     ` Maciej W. Rozycki
2017-09-11  5:18       ` Maciej W. Rozycki
2017-09-11 15:17       ` Fredrik Noring
2017-09-14 13:50         ` Maciej W. Rozycki
2017-09-14 13:50           ` Maciej W. Rozycki
2017-09-16 13:34           ` Fredrik Noring
2017-09-18 17:05             ` Maciej W. Rozycki
2017-09-18 17:05               ` Maciej W. Rozycki
2017-09-18 19:24               ` Fredrik Noring
2017-09-19 12:44                 ` Maciej W. Rozycki
2017-09-19 12:44                   ` Maciej W. Rozycki
2017-09-20 14:54                   ` Fredrik Noring
2017-09-26 11:50                     ` Maciej W. Rozycki
2017-09-26 11:50                       ` Maciej W. Rozycki
2017-09-27 17:21                       ` Fredrik Noring
2017-09-28 12:13                         ` Maciej W. Rozycki
2017-09-28 12:13                           ` Maciej W. Rozycki
2017-09-30  6:56                           ` Fredrik Noring
2017-10-02  9:05                             ` Maciej W. Rozycki
2017-10-02  9:05                               ` Maciej W. Rozycki
2017-10-02 16:33                               ` Fredrik Noring
2017-10-29 17:20                               ` Fredrik Noring
2017-11-10 23:34                                 ` Maciej W. Rozycki
2017-11-10 23:34                                   ` Maciej W. Rozycki
2017-11-11 16:04                                   ` Fredrik Noring
2018-01-29 20:27                                     ` Fredrik Noring
2018-01-31 23:01                                       ` Maciej W. Rozycki
2018-02-11  7:29                                         ` [RFC] MIPS: R5900: Workaround for the short loop bug Fredrik Noring
2018-02-12  9:25                                           ` Maciej W. Rozycki
2018-02-12 15:22                                             ` Fredrik Noring
2018-02-11  7:46                                         ` [RFC] MIPS: R5900: Use SYNC.L for data cache and SYNC.P for instruction cache Fredrik Noring
2018-02-11  7:56                                         ` [RFC] MIPS: R5900: Workaround exception NOP execution bug (FLX05) Fredrik Noring
2018-02-12  9:28                                           ` Maciej W. Rozycki
2018-02-15 19:15                                             ` [RFC v2] " Fredrik Noring
2018-02-15 20:49                                               ` Maciej W. Rozycki
2018-02-17 11:16                                                 ` Fredrik Noring
2018-02-17 11:57                                                   ` Maciej W. Rozycki
2018-02-17 13:38                                                     ` Fredrik Noring
2018-02-17 15:03                                                       ` Maciej W. Rozycki
2018-02-17 20:04                                                         ` Fredrik Noring
2018-02-20 14:09                                                           ` Maciej W. Rozycki
2018-02-22 17:04                                                             ` Fredrik Noring
2018-02-18  8:47                                                 ` Fredrik Noring
2018-02-20 14:41                                                   ` Maciej W. Rozycki
2018-02-22 17:27                                                     ` Fredrik Noring
2018-02-11  8:01                                         ` [RFC] MIPS: R5900: Workaround for CACHE instruction near branch delay slot Fredrik Noring
2018-02-11 11:16                                           ` Aw: " "Jürgen Urban"
2018-02-11  8:09                                         ` [RFC] MIPS: R5900: The ERET instruction has issues with delay slot and CACHE Fredrik Noring
2018-02-11 11:07                                           ` Aw: " "Jürgen Urban"
2018-02-11  8:29                                         ` [RFC] MIPS: R5900: Use mandatory SYNC.L in exception handlers Fredrik Noring
2018-02-11 10:33                                           ` Aw: " "Jürgen Urban"
2018-02-12  9:22                                             ` Maciej W. Rozycki
2018-02-12  9:22                                               ` Maciej W. Rozycki
2018-02-18 10:30                                               ` Fredrik Noring
2018-02-17 14:43                                         ` [RFC] MIPS: R5900: Workaround for saving and restoring FPU registers Fredrik Noring
2018-02-17 15:18                                           ` Maciej W. Rozycki
2018-02-17 17:47                                             ` Fredrik Noring
2018-02-17 19:33                                               ` Maciej W. Rozycki
2018-02-18  9:26                                         ` [RFC] MIPS: R5900: Workaround where MSB must be 0 for the instruction cache Fredrik Noring
2018-02-18 11:08                                         ` [RFC] MIPS: R5900: Add mandatory SYNC.P to all M[FT]C0 instructions Fredrik Noring
2018-03-03 12:26                                         ` [RFC] MIPS: PS2: Interrupt request (IRQ) support Fredrik Noring
2018-03-03 13:09                                           ` Maciej W. Rozycki
2018-03-03 14:14                                             ` Fredrik Noring
2018-04-09 15:51                                             ` Fredrik Noring
2018-03-18 10:45                                           ` Fredrik Noring
2018-03-19 19:15                                             ` Thomas Gleixner
2018-06-18 18:52                                             ` [RFC v2] " Fredrik Noring
2017-10-30 17:55                               ` [PATCH v2] MIPS: Add basic R5900 support Fredrik Noring
2017-11-24 10:26                                 ` Maciej W. Rozycki
2017-11-24 10:26                                   ` Maciej W. Rozycki
2017-11-24 10:39                                   ` Maciej W. Rozycki
2017-11-24 10:39                                     ` Maciej W. Rozycki
2017-09-20 14:07               ` Fredrik Noring
2017-09-21 21:07                 ` Maciej W. Rozycki
2017-09-21 21:07                   ` Maciej W. Rozycki
2017-09-22 16:37                   ` Fredrik Noring
2017-09-22 16:37                     ` Fredrik Noring
2017-09-29 23:55                     ` Maciej W. Rozycki
2017-09-29 23:55                       ` Maciej W. Rozycki
2017-09-30 18:26                       ` Fredrik Noring
2017-10-02  9:11                         ` Maciej W. Rozycki
2017-10-02  9:11                           ` Maciej W. Rozycki
2017-10-03 19:49                           ` Fredrik Noring
2017-10-05 19:04                             ` Fredrik Noring
2017-10-06 20:28                           ` Fredrik Noring
2017-10-15 16:39                             ` Fredrik Noring
2017-10-17 12:23                               ` Maciej W. Rozycki
2017-10-17 12:23                                 ` Maciej W. Rozycki
2017-10-21 18:00                                 ` Fredrik Noring
2017-10-23 16:10                                   ` Maciej W. Rozycki
2017-10-23 16:10                                     ` Maciej W. Rozycki
2017-09-21 18:11               ` Paul Burton
2017-09-21 18:11                 ` Paul Burton
2017-09-21 19:48                 ` Maciej W. Rozycki
2017-09-21 19:48                   ` Maciej W. Rozycki
2017-10-29 18:42       ` Fredrik Noring

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.