All of lore.kernel.org
 help / color / mirror / Atom feed
* mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-10 14:16 Jon Burgess
  2002-07-11  0:15 ` Ralf Baechle
  2002-07-11  7:34 ` Carsten Langgaard
  0 siblings, 2 replies; 40+ messages in thread
From: Jon Burgess @ 2002-07-10 14:16 UTC (permalink / raw)
  To: linux-mips



Symptom:
====
The linux mips 2.4.17 kernel compiled with gcc-2.96-110 (from H.J.Lu) hangs
before reaching the 'Calibrating delay loop'. When the same kernel is compiled
with gcc-3.0.4 or egcs-1.1.2 it works OK. I have included what I think is the
cause, some patches to test the theory and some possible fixes.

Cause:
====
I tracked the problem back to the CP0_STATUS being corrupted by the
mips32_flush_cache_all_pc routine, leading to a lockup once interrupts are
enabled. Looking at a disassembly of the code suggests the broken code changes
the value of the AT register while the working code leaves it alone. The
compiler is allowed to do this, but it exposes the real problem which appears to
be a problem between the 'cache' instruction of blast_icache() and the 'mfc0' of
the __restore_flags(). The 'mfc0 at, $12' seems to be ignored. This isn't a
problem with the gcc-3.0.4 code since AT still contains the value of CP0_STATUS
from the __save_and_cli at the start of the routine.

This may be caused by the cache routines running from the a cached kseg0, it
looks like it can be fixed by making sure that the are always called via
KSEG1ADDR(fn) which looks like it could be done with a bit of fiddling of the
setup_cache_funcs code. I have included a patch below which starts this, but I
haven't caught all combinations of how the routines are called.

Alternatively it could be a CP0 pipeline interaction of the cache instruction
and mfc0 but I can't find anything detailed about it. I thought this was the
problem initially and have included a patch below which adds an extra nop.

I believe the root of the problem is that the routines are running in kseg0, but
If anyone has any other ideas as to what could causing the problem then i'd be
glad to know.

You can test this by inserting some extra code to change AT between the save &
restore and see if it causes a problem (see included patches below)

Current source:
static inline void mips32_flush_cache_all_pc(void)
{
     unsigned long flags;

     __save_and_cli(flags);
     blast_dcache(); blast_icache();
     __restore_flags(flags);
}

Disassembly of  mips32_flush_cache_all_pc() for broken code gcc-2.96:
00000c30 <mips32_flush_cache_all_pc>:
 c30:   40066000        mfc0    a2,$12
 c34:   00000000        nop
 c38:   34c10001        ori     at,a2,0x1
 c3c:   38210001        xori    at,at,0x1
 c40:   40816000        mtc0    at,$12
 c44:   00000040        ssnop
 c48:   00000040        ssnop
 c4c:   00000040        ssnop
 c50:   3c030000        lui     v1,0x0
 c54:   8c630000        lw      v1,0(v1)
 c58:   3c048000        lui     a0,0x8000
 c5c:   3c018000        lui     at,0x8000
*** See here how the compiler has changed AT here
 c60:   00231821        addu    v1,at,v1
 c64:   0083102b        sltu    v0,a0,v1
 c68:   10400008        beqz    v0,c8c <mips32_flush_cache_all_pc+0x5c>
 c6c:   00000000        nop
 c70:   3c050000        lui     a1,0x0
 c74:   8ca50000        lw      a1,0(a1)
 c78:   bc810000        cache   0x1,0(a0)
 c7c:   00852021        addu    a0,a0,a1
 c80:   0083102b        sltu    v0,a0,v1
 c84:   1440fffc        bnez    v0,c78 <mips32_flush_cache_all_pc+0x48>
 c88:   00000000        nop
 c8c:   3c030000        lui     v1,0x0
 c90:   8c630000        lw      v1,0(v1)
 c94:   3c048000        lui     a0,0x8000
 c98:   3c018000        lui     at,0x8000
 c9c:   00231821        addu    v1,at,v1
 ca0:   0083102b        sltu    v0,a0,v1
 ca4:   10400008        beqz    v0,cc8 <mips32_flush_cache_all_pc+0x98>
 ca8:   00000000        nop
 cac:   3c050000        lui     a1,0x0
 cb0:   8ca50000        lw      a1,0(a1)
 cb4:   bc800000        cache   0x0,0(a0)
 cb8:   00852021        addu    a0,a0,a1
 cbc:   0083102b        sltu    v0,a0,v1
 cc0:   1440fffc        bnez    v0,cb4 <mips32_flush_cache_all_pc+0x84>
 cc4:   00000000        nop
 cc8:   40016000        mfc0    at,$12
*** The instruction above is the one which seems to be skipped.
 ccc:   30c60001        andi    a2,a2,0x1
 cd0:   34210001        ori     at,at,0x1
 cd4:   38210001        xori    at,at,0x1
 cd8:   00c13025        or      a2,a2,at
 cdc:   40866000        mtc0    a2,$12
        ...
 cec:   03e00008        jr      ra
 cf0:   00000000        nop


Patches to demonstrate the problem:
====
Here is a patch to change AT in the cache_flush routine to show that this
corrupts the CP0_STATUS value (for a mips32 processor with no secondary cache).
When this is applied in conjunction with the patch above you should see the
CP0_STATUS being corrupted and the kernel will probably hang. I have
demonstrated that this change is enough to break a working kernel/compiler
combination.

--- linux/arch/mips/mm/c-mips32.c-orig   Wed Jul 10 14:12:09 2002
+++ linux/arch/mips/mm/c-mips32.c  Wed Jul 10 14:18:17 2002
@@ -74,7 +74,9 @@
     unsigned long flags;

     __save_and_cli(flags);
-    blast_dcache(); blast_icache();
+    blast_dcache();
+    __asm__("lui\t$at, 0x8000\n\t");
+    blast_icache();
     __restore_flags(flags);
 }

Here is the patch that I used to catch the problem when CP0_STATUS is being
corrupted by the cache flush routine. The CP0_STATUS should not be changed by
the call to the cache flush routine.

--- linux/arch/mips/kernel/traps.c Thu May 23 15:19:35 2002
+++ ../traps.c Wed Jul 10 13:46:54 2002
@@ -889,7 +889,10 @@
     memcpy((void *)(KSEG0 + 0x80), &except_vec1_generic, 0x80);
     memcpy((void *)(KSEG0 + 0x100), &except_vec2_generic, 0x80);
     memcpy((void *)(KSEG0 + 0x180), &except_vec3_generic, 0x80);
+
+    printk("CP0_STATUS before flush = 0x%x\n",
read_32bit_cp0_register(CP0_STATUS));
     flush_icache_range(KSEG0 + 0x80, KSEG0 + 0x200);
+    printk("CP0_STATUS after flush  = 0x%x\n",
read_32bit_cp0_register(CP0_STATUS));
     /*
      * Setup default vectors
      */


Fix (to call cache routines via uncached Kseg1)
====
Assuming the root of the problem is that the cache flush routines are running
from cached kseg0. This patch needs a bit more work to make sure that all the
other routines are called similarly.

--- linux/arch/mips/mm/c-mips32.c-orig   Wed Jul 10 14:12:09 2002
+++ linux/arch/mips/mm/c-mips32.c  Wed Jul 10 14:45:03 2002
@@ -606,8 +608,13 @@
 {
     _clear_page = (void *)mips32_clear_page_dc;
     _copy_page = (void *)mips32_copy_page_dc;
+#if 1
+    _flush_cache_all =   (void (*)(u32,u32)) KSEG1ADDR((unsigned
long)mips32_flush_cache_all_pc);
+    ___flush_cache_all = (void (*)(u32,u32)) KSEG1ADDR((unsigned
long)mips32_flush_cache_all_pc);
+#else
     _flush_cache_all = mips32_flush_cache_all_pc;
     ___flush_cache_all = mips32_flush_cache_all_pc;
+#endif
     _flush_cache_mm = mips32_flush_cache_mm_pc;
     _flush_cache_range = mips32_flush_cache_range_pc;
     _flush_cache_page = mips32_flush_cache_page_pc;



Fix (If the root of the problem is a pipeline hazard)
====
The patch below fix is to insert an extra 'nop' at the end of the various
blast_[id]cache routines to clear the hazard condition before the code returns.
The 'sc' routines may need a similar fix. A different  workaround places a 'nop'
at the start of the __restore_flags routine, but I believe it is better to fix
the problem at the source of the hazard.

--- linux/include/asm-mips/mips32_cache.h     Wed Apr 10 22:53:12 2002
+++ ../mips32_cache.h    Wed Jul 10 13:10:40 2002
@@ -189,73 +189,85 @@
 static inline void blast_dcache(void)
 {
     unsigned long start = KSEG0;
     unsigned long end = (start + dcache_size);

     while(start < end) {
          cache_unroll(start,Index_Writeback_Inv_D);
          start += dc_lsize;
     }
+    /* Prevent hazard with following mfc0 */
+    __asm__("nop\n\t");
 }

 static inline void blast_dcache_page(unsigned long page)
 {
     unsigned long start = page;
     unsigned long end = (start + PAGE_SIZE);

     while(start < end) {
          cache_unroll(start,Hit_Writeback_Inv_D);
          start += dc_lsize;
     }
+    /* Prevent hazard with following mfc0 */
+        __asm__("nop\n\t");
 }

 static inline void blast_dcache_page_indexed(unsigned long page)
 {
     unsigned long start = page;
     unsigned long end = (start + PAGE_SIZE);

     while(start < end) {
          cache_unroll(start,Index_Writeback_Inv_D);
          start += dc_lsize;
     }
+    /* Prevent hazard with following mfc0 */
+        __asm__("nop\n\t");
 }

 static inline void blast_icache(void)
 {
     unsigned long start = KSEG0;
     unsigned long end = (start + icache_size);

     while(start < end) {
          cache_unroll(start,Index_Invalidate_I);
          start += ic_lsize;
     }
+    /* Prevent hazard with following mfc0 */
+        __asm__("nop\n\t");
 }

 static inline void blast_icache_page(unsigned long page)
 {
     unsigned long start = page;
     unsigned long end = (start + PAGE_SIZE);

     while(start < end) {
          cache_unroll(start,Hit_Invalidate_I);
          start += ic_lsize;
     }
+    /* Prevent hazard with following mfc0 */
+        __asm__("nop\n\t");
 }

 static inline void blast_icache_page_indexed(unsigned long page)
 {
     unsigned long start = page;
     unsigned long end = (start + PAGE_SIZE);

     while(start < end) {
          cache_unroll(start,Index_Invalidate_I);
          start += ic_lsize;
     }
+    /* Prevent hazard with following mfc0 */
+        __asm__("nop\n\t");
 }

 static inline void blast_scache(void)
 {
     unsigned long start = KSEG0;
     unsigned long end = KSEG0 + scache_size;

     while(start < end) {
          cache_unroll(start,Index_Writeback_Inv_SD);



     Jon Burgess

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-10 14:16 mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96 Jon Burgess
@ 2002-07-11  0:15 ` Ralf Baechle
  2002-07-11  8:48   ` Gleb O. Raiko
  2002-07-11  7:34 ` Carsten Langgaard
  1 sibling, 1 reply; 40+ messages in thread
From: Ralf Baechle @ 2002-07-11  0:15 UTC (permalink / raw)
  To: Jon Burgess; +Cc: linux-mips

On Wed, Jul 10, 2002 at 03:16:21PM +0100, Jon Burgess wrote:

> This may be caused by the cache routines running from the a cached kseg0, it
> looks like it can be fixed by making sure that the are always called via
> KSEG1ADDR(fn) which looks like it could be done with a bit of fiddling of the
> setup_cache_funcs code. I have included a patch below which starts this, but I
> haven't caught all combinations of how the routines are called.

While that could be done it's not a good idea; running code in KSEG1 is
very slow.

> Alternatively it could be a CP0 pipeline interaction of the cache instruction
> and mfc0 but I can't find anything detailed about it. I thought this was the
> problem initially and have included a patch below which adds an extra nop.

Running uncached is so slow that the pipeline will slip and stall basically
every cycle which should get you around the hazard.  Anyway, there's no
hazard for mfc0 documented in the MIPS32 spec so this smells like a silicon
bug.

Which particular CPU and revision are you using?

  Ralf

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-10 14:16 mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96 Jon Burgess
  2002-07-11  0:15 ` Ralf Baechle
@ 2002-07-11  7:34 ` Carsten Langgaard
  1 sibling, 0 replies; 40+ messages in thread
From: Carsten Langgaard @ 2002-07-11  7:34 UTC (permalink / raw)
  To: Jon Burgess; +Cc: linux-mips

This sound more like a hardware bug to me.
What CPU are you running on ?

/Carsten

Jon Burgess wrote:

> Symptom:
> ====
> The linux mips 2.4.17 kernel compiled with gcc-2.96-110 (from H.J.Lu) hangs
> before reaching the 'Calibrating delay loop'. When the same kernel is compiled
> with gcc-3.0.4 or egcs-1.1.2 it works OK. I have included what I think is the
> cause, some patches to test the theory and some possible fixes.
>
> Cause:
> ====
> I tracked the problem back to the CP0_STATUS being corrupted by the
> mips32_flush_cache_all_pc routine, leading to a lockup once interrupts are
> enabled. Looking at a disassembly of the code suggests the broken code changes
> the value of the AT register while the working code leaves it alone. The
> compiler is allowed to do this, but it exposes the real problem which appears to
> be a problem between the 'cache' instruction of blast_icache() and the 'mfc0' of
> the __restore_flags(). The 'mfc0 at, $12' seems to be ignored. This isn't a
> problem with the gcc-3.0.4 code since AT still contains the value of CP0_STATUS
> from the __save_and_cli at the start of the routine.
>
> This may be caused by the cache routines running from the a cached kseg0, it
> looks like it can be fixed by making sure that the are always called via
> KSEG1ADDR(fn) which looks like it could be done with a bit of fiddling of the
> setup_cache_funcs code. I have included a patch below which starts this, but I
> haven't caught all combinations of how the routines are called.
>
> Alternatively it could be a CP0 pipeline interaction of the cache instruction
> and mfc0 but I can't find anything detailed about it. I thought this was the
> problem initially and have included a patch below which adds an extra nop.
>
> I believe the root of the problem is that the routines are running in kseg0, but
> If anyone has any other ideas as to what could causing the problem then i'd be
> glad to know.
>
> You can test this by inserting some extra code to change AT between the save &
> restore and see if it causes a problem (see included patches below)
>
> Current source:
> static inline void mips32_flush_cache_all_pc(void)
> {
>      unsigned long flags;
>
>      __save_and_cli(flags);
>      blast_dcache(); blast_icache();
>      __restore_flags(flags);
> }
>
> Disassembly of  mips32_flush_cache_all_pc() for broken code gcc-2.96:
> 00000c30 <mips32_flush_cache_all_pc>:
>  c30:   40066000        mfc0    a2,$12
>  c34:   00000000        nop
>  c38:   34c10001        ori     at,a2,0x1
>  c3c:   38210001        xori    at,at,0x1
>  c40:   40816000        mtc0    at,$12
>  c44:   00000040        ssnop
>  c48:   00000040        ssnop
>  c4c:   00000040        ssnop
>  c50:   3c030000        lui     v1,0x0
>  c54:   8c630000        lw      v1,0(v1)
>  c58:   3c048000        lui     a0,0x8000
>  c5c:   3c018000        lui     at,0x8000
> *** See here how the compiler has changed AT here
>  c60:   00231821        addu    v1,at,v1
>  c64:   0083102b        sltu    v0,a0,v1
>  c68:   10400008        beqz    v0,c8c <mips32_flush_cache_all_pc+0x5c>
>  c6c:   00000000        nop
>  c70:   3c050000        lui     a1,0x0
>  c74:   8ca50000        lw      a1,0(a1)
>  c78:   bc810000        cache   0x1,0(a0)
>  c7c:   00852021        addu    a0,a0,a1
>  c80:   0083102b        sltu    v0,a0,v1
>  c84:   1440fffc        bnez    v0,c78 <mips32_flush_cache_all_pc+0x48>
>  c88:   00000000        nop
>  c8c:   3c030000        lui     v1,0x0
>  c90:   8c630000        lw      v1,0(v1)
>  c94:   3c048000        lui     a0,0x8000
>  c98:   3c018000        lui     at,0x8000
>  c9c:   00231821        addu    v1,at,v1
>  ca0:   0083102b        sltu    v0,a0,v1
>  ca4:   10400008        beqz    v0,cc8 <mips32_flush_cache_all_pc+0x98>
>  ca8:   00000000        nop
>  cac:   3c050000        lui     a1,0x0
>  cb0:   8ca50000        lw      a1,0(a1)
>  cb4:   bc800000        cache   0x0,0(a0)
>  cb8:   00852021        addu    a0,a0,a1
>  cbc:   0083102b        sltu    v0,a0,v1
>  cc0:   1440fffc        bnez    v0,cb4 <mips32_flush_cache_all_pc+0x84>
>  cc4:   00000000        nop
>  cc8:   40016000        mfc0    at,$12
> *** The instruction above is the one which seems to be skipped.
>  ccc:   30c60001        andi    a2,a2,0x1
>  cd0:   34210001        ori     at,at,0x1
>  cd4:   38210001        xori    at,at,0x1
>  cd8:   00c13025        or      a2,a2,at
>  cdc:   40866000        mtc0    a2,$12
>         ...
>  cec:   03e00008        jr      ra
>  cf0:   00000000        nop
>
> Patches to demonstrate the problem:
> ====
> Here is a patch to change AT in the cache_flush routine to show that this
> corrupts the CP0_STATUS value (for a mips32 processor with no secondary cache).
> When this is applied in conjunction with the patch above you should see the
> CP0_STATUS being corrupted and the kernel will probably hang. I have
> demonstrated that this change is enough to break a working kernel/compiler
> combination.
>
> --- linux/arch/mips/mm/c-mips32.c-orig   Wed Jul 10 14:12:09 2002
> +++ linux/arch/mips/mm/c-mips32.c  Wed Jul 10 14:18:17 2002
> @@ -74,7 +74,9 @@
>      unsigned long flags;
>
>      __save_and_cli(flags);
> -    blast_dcache(); blast_icache();
> +    blast_dcache();
> +    __asm__("lui\t$at, 0x8000\n\t");
> +    blast_icache();
>      __restore_flags(flags);
>  }
>
> Here is the patch that I used to catch the problem when CP0_STATUS is being
> corrupted by the cache flush routine. The CP0_STATUS should not be changed by
> the call to the cache flush routine.
>
> --- linux/arch/mips/kernel/traps.c Thu May 23 15:19:35 2002
> +++ ../traps.c Wed Jul 10 13:46:54 2002
> @@ -889,7 +889,10 @@
>      memcpy((void *)(KSEG0 + 0x80), &except_vec1_generic, 0x80);
>      memcpy((void *)(KSEG0 + 0x100), &except_vec2_generic, 0x80);
>      memcpy((void *)(KSEG0 + 0x180), &except_vec3_generic, 0x80);
> +
> +    printk("CP0_STATUS before flush = 0x%x\n",
> read_32bit_cp0_register(CP0_STATUS));
>      flush_icache_range(KSEG0 + 0x80, KSEG0 + 0x200);
> +    printk("CP0_STATUS after flush  = 0x%x\n",
> read_32bit_cp0_register(CP0_STATUS));
>      /*
>       * Setup default vectors
>       */
>
> Fix (to call cache routines via uncached Kseg1)
> ====
> Assuming the root of the problem is that the cache flush routines are running
> from cached kseg0. This patch needs a bit more work to make sure that all the
> other routines are called similarly.
>
> --- linux/arch/mips/mm/c-mips32.c-orig   Wed Jul 10 14:12:09 2002
> +++ linux/arch/mips/mm/c-mips32.c  Wed Jul 10 14:45:03 2002
> @@ -606,8 +608,13 @@
>  {
>      _clear_page = (void *)mips32_clear_page_dc;
>      _copy_page = (void *)mips32_copy_page_dc;
> +#if 1
> +    _flush_cache_all =   (void (*)(u32,u32)) KSEG1ADDR((unsigned
> long)mips32_flush_cache_all_pc);
> +    ___flush_cache_all = (void (*)(u32,u32)) KSEG1ADDR((unsigned
> long)mips32_flush_cache_all_pc);
> +#else
>      _flush_cache_all = mips32_flush_cache_all_pc;
>      ___flush_cache_all = mips32_flush_cache_all_pc;
> +#endif
>      _flush_cache_mm = mips32_flush_cache_mm_pc;
>      _flush_cache_range = mips32_flush_cache_range_pc;
>      _flush_cache_page = mips32_flush_cache_page_pc;
>
> Fix (If the root of the problem is a pipeline hazard)
> ====
> The patch below fix is to insert an extra 'nop' at the end of the various
> blast_[id]cache routines to clear the hazard condition before the code returns.
> The 'sc' routines may need a similar fix. A different  workaround places a 'nop'
> at the start of the __restore_flags routine, but I believe it is better to fix
> the problem at the source of the hazard.
>
> --- linux/include/asm-mips/mips32_cache.h     Wed Apr 10 22:53:12 2002
> +++ ../mips32_cache.h    Wed Jul 10 13:10:40 2002
> @@ -189,73 +189,85 @@
>  static inline void blast_dcache(void)
>  {
>      unsigned long start = KSEG0;
>      unsigned long end = (start + dcache_size);
>
>      while(start < end) {
>           cache_unroll(start,Index_Writeback_Inv_D);
>           start += dc_lsize;
>      }
> +    /* Prevent hazard with following mfc0 */
> +    __asm__("nop\n\t");
>  }
>
>  static inline void blast_dcache_page(unsigned long page)
>  {
>      unsigned long start = page;
>      unsigned long end = (start + PAGE_SIZE);
>
>      while(start < end) {
>           cache_unroll(start,Hit_Writeback_Inv_D);
>           start += dc_lsize;
>      }
> +    /* Prevent hazard with following mfc0 */
> +        __asm__("nop\n\t");
>  }
>
>  static inline void blast_dcache_page_indexed(unsigned long page)
>  {
>      unsigned long start = page;
>      unsigned long end = (start + PAGE_SIZE);
>
>      while(start < end) {
>           cache_unroll(start,Index_Writeback_Inv_D);
>           start += dc_lsize;
>      }
> +    /* Prevent hazard with following mfc0 */
> +        __asm__("nop\n\t");
>  }
>
>  static inline void blast_icache(void)
>  {
>      unsigned long start = KSEG0;
>      unsigned long end = (start + icache_size);
>
>      while(start < end) {
>           cache_unroll(start,Index_Invalidate_I);
>           start += ic_lsize;
>      }
> +    /* Prevent hazard with following mfc0 */
> +        __asm__("nop\n\t");
>  }
>
>  static inline void blast_icache_page(unsigned long page)
>  {
>      unsigned long start = page;
>      unsigned long end = (start + PAGE_SIZE);
>
>      while(start < end) {
>           cache_unroll(start,Hit_Invalidate_I);
>           start += ic_lsize;
>      }
> +    /* Prevent hazard with following mfc0 */
> +        __asm__("nop\n\t");
>  }
>
>  static inline void blast_icache_page_indexed(unsigned long page)
>  {
>      unsigned long start = page;
>      unsigned long end = (start + PAGE_SIZE);
>
>      while(start < end) {
>           cache_unroll(start,Index_Invalidate_I);
>           start += ic_lsize;
>      }
> +    /* Prevent hazard with following mfc0 */
> +        __asm__("nop\n\t");
>  }
>
>  static inline void blast_scache(void)
>  {
>      unsigned long start = KSEG0;
>      unsigned long end = KSEG0 + scache_size;
>
>      while(start < end) {
>           cache_unroll(start,Index_Writeback_Inv_SD);
>
>      Jon Burgess

--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11  0:15 ` Ralf Baechle
@ 2002-07-11  8:48   ` Gleb O. Raiko
  2002-07-11  9:18     ` Carsten Langgaard
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-11  8:48 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Jon Burgess, linux-mips

Ralf Baechle wrote:
> 
> On Wed, Jul 10, 2002 at 03:16:21PM +0100, Jon Burgess wrote:
> 
> > This may be caused by the cache routines running from the a cached kseg0, it
> > looks like it can be fixed by making sure that the are always called via
> > KSEG1ADDR(fn) which looks like it could be done with a bit of fiddling of the
> > setup_cache_funcs code. I have included a patch below which starts this, but I
> > haven't caught all combinations of how the routines are called.
> 
> While that could be done it's not a good idea; running code in KSEG1 is
> very slow.
> 

Unfortunately, it's required by manuals for some processors. As I know,
IDT HW manuals clearly state cache flush routines must operate from
uncached code and must access uncached data only. Examples are R30x1 CPU
series.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11  8:48   ` Gleb O. Raiko
@ 2002-07-11  9:18     ` Carsten Langgaard
  2002-07-11 10:06       ` Gleb O. Raiko
  0 siblings, 1 reply; 40+ messages in thread
From: Carsten Langgaard @ 2002-07-11  9:18 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Ralf Baechle, Jon Burgess, linux-mips

"Gleb O. Raiko" wrote:

> Ralf Baechle wrote:
> >
> > On Wed, Jul 10, 2002 at 03:16:21PM +0100, Jon Burgess wrote:
> >
> > > This may be caused by the cache routines running from the a cached kseg0, it
> > > looks like it can be fixed by making sure that the are always called via
> > > KSEG1ADDR(fn) which looks like it could be done with a bit of fiddling of the
> > > setup_cache_funcs code. I have included a patch below which starts this, but I
> > > haven't caught all combinations of how the routines are called.
> >
> > While that could be done it's not a good idea; running code in KSEG1 is
> > very slow.
> >
>
> Unfortunately, it's required by manuals for some processors. As I know,
> IDT HW manuals clearly state cache flush routines must operate from
> uncached code and must access uncached data only. Examples are R30x1 CPU
> series.

Yes, that's true.
But that code belongs in the R30xx specific cache routines, not in the MIPS32 cache
routines.

>
> Regards,
> Gleb.

--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11  9:18     ` Carsten Langgaard
@ 2002-07-11 10:06       ` Gleb O. Raiko
  2002-07-11 10:15         ` Carsten Langgaard
                           ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-11 10:06 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: Ralf Baechle, Jon Burgess, linux-mips

Carsten Langgaard wrote:
> > Unfortunately, it's required by manuals for some processors. As I know,
> > IDT HW manuals clearly state cache flush routines must operate from
> > uncached code and must access uncached data only. Examples are R30x1 CPU
> > series.
> 
> Yes, that's true.
> But that code belongs in the R30xx specific cache routines, not in the MIPS32 cache
> routines.

I don't wonder if other IDT CPUs also require this, including those that
conform MIPS32.
Basically, requirement of uncached run makes hadrware logic much simpler
and allows  to save silicon a bit.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 10:06       ` Gleb O. Raiko
@ 2002-07-11 10:15         ` Carsten Langgaard
  2002-07-11 10:36           ` Gleb O. Raiko
  2002-07-11 11:12         ` Ralf Baechle
  2002-07-11 11:23         ` Maciej W. Rozycki
  2 siblings, 1 reply; 40+ messages in thread
From: Carsten Langgaard @ 2002-07-11 10:15 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Ralf Baechle, Jon Burgess, linux-mips

"Gleb O. Raiko" wrote:

> Carsten Langgaard wrote:
> > > Unfortunately, it's required by manuals for some processors. As I know,
> > > IDT HW manuals clearly state cache flush routines must operate from
> > > uncached code and must access uncached data only. Examples are R30x1 CPU
> > > series.
> >
> > Yes, that's true.
> > But that code belongs in the R30xx specific cache routines, not in the MIPS32 cache
> > routines.
>
> I don't wonder if other IDT CPUs also require this, including those that
> conform MIPS32.
> Basically, requirement of uncached run makes hadrware logic much simpler
> and allows  to save silicon a bit.

That could be true, but then again I suggest making specific cache routines for those
CPUs.
It would be a real performance hit for the rest of us, if we have to operate from
uncached space.


>
> Regards,
> Gleb.

--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 10:15         ` Carsten Langgaard
@ 2002-07-11 10:36           ` Gleb O. Raiko
  2002-07-11 10:46             ` Carsten Langgaard
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-11 10:36 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: Ralf Baechle, Jon Burgess, linux-mips

Carsten Langgaard wrote:
> 
> "Gleb O. Raiko" wrote:
> > Basically, requirement of uncached run makes hadrware logic much simpler
> > and allows  to save silicon a bit.
> 
> That could be true, but then again I suggest making specific cache routines for those
> CPUs.
> It would be a real performance hit for the rest of us, if we have to operate from
> uncached space.
> 

In theory, yes, there is a performance penalty. In practice, I doubt
this penalty is significant. Sure, Linux likes to flush cahces, not to
say more. But, did somebody measure the penalty of uncached runs? Even
with microbencnmarks like lmbench.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 10:36           ` Gleb O. Raiko
@ 2002-07-11 10:46             ` Carsten Langgaard
  0 siblings, 0 replies; 40+ messages in thread
From: Carsten Langgaard @ 2002-07-11 10:46 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Ralf Baechle, Jon Burgess, linux-mips

"Gleb O. Raiko" wrote:

> Carsten Langgaard wrote:
> >
> > "Gleb O. Raiko" wrote:
> > > Basically, requirement of uncached run makes hadrware logic much simpler
> > > and allows  to save silicon a bit.
> >
> > That could be true, but then again I suggest making specific cache routines for those
> > CPUs.
> > It would be a real performance hit for the rest of us, if we have to operate from
> > uncached space.
> >
>
> In theory, yes, there is a performance penalty. In practice, I doubt
> this penalty is significant. Sure, Linux likes to flush cahces, not to
> say more. But, did somebody measure the penalty of uncached runs? Even
> with microbencnmarks like lmbench.

Yes, I have tried running linux this way, because I wanted to eliminate the reason I sow
cache problems on one of our tests chip, was due to execute the cache operating
instruction from cached space.
I didn't thought it was that big a penalty, because you are flushing the cache anyway, but
I didn't had to run any benchmarks, so obviously was it when I booted my system.


>
> Regards,
> Gleb.

--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 10:06       ` Gleb O. Raiko
  2002-07-11 10:15         ` Carsten Langgaard
@ 2002-07-11 11:12         ` Ralf Baechle
  2002-07-11 17:01           ` Maciej W. Rozycki
  2002-07-11 11:23         ` Maciej W. Rozycki
  2 siblings, 1 reply; 40+ messages in thread
From: Ralf Baechle @ 2002-07-11 11:12 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Carsten Langgaard, Jon Burgess, linux-mips

On Thu, Jul 11, 2002 at 02:06:30PM +0400, Gleb O. Raiko wrote:

> > > Unfortunately, it's required by manuals for some processors. As I know,
> > > IDT HW manuals clearly state cache flush routines must operate from
> > > uncached code and must access uncached data only. Examples are R30x1 CPU
> > > series.
> > 
> > Yes, that's true.
> > But that code belongs in the R30xx specific cache routines, not in the
> > MIPS32 cache routines.
> 
> I don't wonder if other IDT CPUs also require this, including those that
> conform MIPS32.
> Basically, requirement of uncached run makes hadrware logic much simpler
> and allows  to save silicon a bit.

The R3000 cache manipulation mechanism is implemented by giving magic
meaning to store instruction while the isolate cache and swap cache bits
are in use.  By their very implementation they're both incompatible with
normal operation of caches and therefore can only be used from uncached
space.

For most part of it the situation for R4000 class CPUs is easier; a cache
instruction either hits or fails.  The only problem I could imagine an
access conflict in the i-cache when both normal instruction fetch and the
cache instruction are going to operate on the i-cache and that sounds like
a less likely problem to me.

Of course not having read the RTL of all CPUs there is a bit of speculation
here :)

  Ralf

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 10:06       ` Gleb O. Raiko
  2002-07-11 10:15         ` Carsten Langgaard
  2002-07-11 11:12         ` Ralf Baechle
@ 2002-07-11 11:23         ` Maciej W. Rozycki
  2002-07-11 13:11           ` Gleb O. Raiko
  2 siblings, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-11 11:23 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Carsten Langgaard, Ralf Baechle, Jon Burgess, linux-mips

On Thu, 11 Jul 2002, Gleb O. Raiko wrote:

> > But that code belongs in the R30xx specific cache routines, not in the MIPS32 cache
> > routines.
> 
> I don't wonder if other IDT CPUs also require this, including those that
> conform MIPS32.

 Well, for r3k it may seem somewhat justified as cache flushing requires
cache isolation.  But the IDT manual for their whole family of processors
claims the D-cache can function as an I-cache (when swapped; doesn't
apply when not, obviously) and cache flushing can run from KSEG0.

 See "IDT MIPS Microprocessor Family Software Reference Manual", chapter 5
"Cache Management", section "Invalidation":

 "To invalidate the cache in the R30xx:
[...]
 The invalidate routine is normally executed with its instructions
cacheable.  This sounds like a lot of trouble; but in fact shouldnt
require any extra steps to run cached. An invalidation routine in uncached
space will run 4-10 times slower."

That's right as the IsC bit only isolates the D-cache (either the real one
or the I-cache, when swapped) and not the I-cache, so no need to waste
cycles running uncached as the I-cache works normally. 

> Basically, requirement of uncached run makes hadrware logic much simpler
> and allows  to save silicon a bit.

 Why?  I see no dependency.  What's the problem with interleaving cache
fills and invalidations?

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 11:23         ` Maciej W. Rozycki
@ 2002-07-11 13:11           ` Gleb O. Raiko
  2002-07-11 13:41             ` Maciej W. Rozycki
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-11 13:11 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

"Maciej W. Rozycki" wrote:
> 
> On Thu, 11 Jul 2002, Gleb O. Raiko wrote:
> > I don't wonder if other IDT CPUs also require this, including those that
> > conform MIPS32.
> 
>  Well, for r3k it may seem somewhat justified as cache flushing requires
> cache isolation.  But the IDT manual for their whole family of processors
> claims the D-cache can function as an I-cache (when swapped; doesn't
> apply when not, obviously) and cache flushing can run from KSEG0.
> 
>  See "IDT MIPS Microprocessor Family Software Reference Manual", chapter 5
> "Cache Management", section "Invalidation":
> 
>  "To invalidate the cache in the R30xx:
> [...]
>  The invalidate routine is normally executed with its instructions
> cacheable.  This sounds like a lot of trouble; but in fact shouldnt
> require any extra steps to run cached. An invalidation routine in uncached
> space will run 4-10 times slower."
> 

Aha, you also stepped on this rake. :-) The problem with IDT manuals
that they frequently contradict itself. You're right, SW manual allows
cached flushes, but hardware manuals for the family prohibit this and
state that flashes must be uncahed.
(a hw manual on family, the same chapter, the same section :-) )

It's not only the place where IDT manuals are wrong. For example, their
wbflush example suggests *(int*)KSEG0 instead *(int*)KSEG1.

> > Basically, requirement of uncached run makes hadrware logic much simpler
> > and allows  to save silicon a bit.
> 
>  Why?  I see no dependency.  What's the problem with interleaving cache
> fills and invalidations?

There're two possible optimization:
1. (Requires only the instruction that swaps caches must run uncached)
	CPU may skip implementation of double check of cache hit on loads.
	Scenario: mtc0 with cache swapping with ensuring next instructions are
in cache
	(pipelining here!); swap occurs; must check again the instructions are
in 
	the cache because the same cacheline in the data cache may have valid
bit set
	and CPU will get data instead of code.
2. (Requires the whole routine must run uncached)
	CPU may skip check of cache hit on loads from an isolated cache. 

i don't know what optimization IDT made, perhaps, number 3. But, 1. is
really worth to implement.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 13:11           ` Gleb O. Raiko
@ 2002-07-11 13:41             ` Maciej W. Rozycki
  2002-07-11 15:27               ` Gleb O. Raiko
  0 siblings, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-11 13:41 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: linux-mips

On Thu, 11 Jul 2002, Gleb O. Raiko wrote:

> Aha, you also stepped on this rake. :-) The problem with IDT manuals
> that they frequently contradict itself. You're right, SW manual allows
> cached flushes, but hardware manuals for the family prohibit this and
> state that flashes must be uncahed.
> (a hw manual on family, the same chapter, the same section :-) )

 Wonderful...  Add their non-existent support to that.  I'm afraid I'll
have to ignore problem reports which involve their processors. :-(

> >  Why?  I see no dependency.  What's the problem with interleaving cache
> > fills and invalidations?
> 
> There're two possible optimization:
> 1. (Requires only the instruction that swaps caches must run uncached)
> 	CPU may skip implementation of double check of cache hit on loads.
> 	Scenario: mtc0 with cache swapping with ensuring next instructions are
> in cache
> 	(pipelining here!); swap occurs; must check again the instructions are
> in 
> 	the cache because the same cacheline in the data cache may have valid
> bit set
> 	and CPU will get data instead of code.

 I can't really see a problem here for proper implementations.  The CPU
may have fetched a few instructions beyond the mtc0 doing a cache swap.
It's OK since we didn't modify the code.  As long as the swap doesn't
complete, the CPU is using the real I-cache.  Once it's completed, it uses
the D-cache.  Since the new cache is used in the normal mode of operation,
now tag matches and line replacements occur here as if it was the real
I-cache.  No need to do any extra checks at any stage. 

> 2. (Requires the whole routine must run uncached)
> 	CPU may skip check of cache hit on loads from an isolated cache. 

 But the other cache isn't isolated -- IsC only works on the cache that
plays the role of the D-cache. 

> i don't know what optimization IDT made, perhaps, number 3. But, 1. is
> really worth to implement.

 It's possible they broke something, simply. 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 13:41             ` Maciej W. Rozycki
@ 2002-07-11 15:27               ` Gleb O. Raiko
  2002-07-11 15:59                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-11 15:27 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

> >
> > There're two possible optimization:
> > 1. (Requires only the instruction that swaps caches must run uncached)
> >       CPU may skip implementation of double check of cache hit on loads.
> >       Scenario: mtc0 with cache swapping with ensuring next instructions are
> > in cache
> >       (pipelining here!); swap occurs; must check again the instructions are
> > in
> >       the cache because the same cacheline in the data cache may have valid
> > bit set
> >       and CPU will get data instead of code.
> 
>  I can't really see a problem here for proper implementations.  The CPU
> may have fetched a few instructions beyond the mtc0 doing a cache swap.

Load from memory into I-cache, setting the valid bit.

> It's OK since we didn't modify the code.  As long as the swap doesn't
> complete, the CPU is using the real I-cache.  Once it's completed, it uses
> the D-cache.  Since the new cache is used in the normal mode of operation,
> now tag matches and line replacements occur here as if it was the real
> I-cache.  No need to do any extra checks at any stage.

Have to check the cacheline at given address again. D-cache may have the
valid bit set for the cacheline at the same address. Address means
location in a cache, not memory. Check at address requires one extra
tick as opposed to checking the bit.

Please, note that CPU isn't a monolitic program, but rather a set of
functional blocks, so "proper implementation" may require additional
signals on wires and delays.

> 
>  It's possible they broke something, simply.

My guess they implemented No. 1. more or less.

Anybody from IDT here with strong willing to broke NDA ? :-)

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 15:27               ` Gleb O. Raiko
@ 2002-07-11 15:59                 ` Maciej W. Rozycki
  2002-07-12 10:26                   ` Gleb O. Raiko
  0 siblings, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-11 15:59 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: linux-mips

On Thu, 11 Jul 2002, Gleb O. Raiko wrote:

> Have to check the cacheline at given address again. D-cache may have the
> valid bit set for the cacheline at the same address. Address means
> location in a cache, not memory. Check at address requires one extra
> tick as opposed to checking the bit.

 Well, you issue an instruction word read from the cache.  The answer is
either a hit, providing a word at the data bus at the same time (so you
can't get a hit from one cache and data from the other) or a miss with no
valid data -- you have to stall in this case, waiting for a refill.  Then
when data from the main memory arrives, it is latched in the cache (it
doesn't really matter, which one now -- if it's the wrong one, then
another refill will happen next time the memory address is dereferenced)
and provided to the CPU at the same time. 

> Please, note that CPU isn't a monolitic program, but rather a set of
> functional blocks, so "proper implementation" may require additional
> signals on wires and delays.

 Some kind of synchronization is needed as everywhere in the CPU, as it's
mostly a synchronous circuit.  As long as the state at clock edges is
consistent, there is no problem with cache swapping.  That's what I mean
by a "proper implementation".

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 11:12         ` Ralf Baechle
@ 2002-07-11 17:01           ` Maciej W. Rozycki
  2002-07-11 23:02             ` Dominic Sweetman
  0 siblings, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-11 17:01 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Gleb O. Raiko, Carsten Langgaard, Jon Burgess, linux-mips

On Thu, 11 Jul 2002, Ralf Baechle wrote:

> The R3000 cache manipulation mechanism is implemented by giving magic
> meaning to store instruction while the isolate cache and swap cache bits
> are in use.  By their very implementation they're both incompatible with
> normal operation of caches and therefore can only be used from uncached
> space.

 Well, docs state only the cache that acts as the D-cache gets isolated
and the one that acts as the I-cache always functions normally (and the
real D-cache has all the logic needed to pretend it's an I-cache
successfully).  Thus running from an uncached space is not needed.  I
haven't checked it explicitly, but the flushing functions would fail
(hang) quite soon otherwise and they don't.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 17:01           ` Maciej W. Rozycki
@ 2002-07-11 23:02             ` Dominic Sweetman
  2002-07-12  0:24               ` Ralf Baechle
                                 ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Dominic Sweetman @ 2002-07-11 23:02 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Ralf Baechle, Gleb O. Raiko, Carsten Langgaard, Jon Burgess, linux-mips


>  Well, docs state only the cache that acts as the D-cache gets isolated
> and the one that acts as the I-cache always functions normally (and the
> real D-cache has all the logic needed to pretend it's an I-cache
> successfully).  Thus running from an uncached space is not needed.  I
> haven't checked it explicitly, but the flushing functions would fail
> (hang) quite soon otherwise and they don't.

I wrote the IDT software manual (I later used bits of it as the basis
of parts of "See MIPS Run".) Of course, every word of it is true.  I
can't comment on IDT's hardware manuals, though in my experience they
were somewhat better than the average.

Algorithmics produced cached routines to manipulate R3000-style caches
and they work reliably; what's more, the hardware specs say they
should.

Yes, it seems a bit strange to run with 'isolated'; but 'isolated'
really only stops loads and stores from reading/writing at the bus
interface.  It's use in cache functions was an accident; the original
R2000 caches did not support partial-word writes and could be
invalidated just by writing a byte to them.  When this was discovered
to be unacceptable, the 'isolated' bit (intended for diagnostics) was
pressed into service.

It is extraordinarily inefficient to run invalidate or writeback
routines uncached.  It will add perhaps 4-10 instruction fetches
(external memory reads) for each cache line invalidated; that's
probably double the memory overhead of the DMA operation which
necessitated the invalidation. That's something production-quality
code shouldn't do.

PS: my standard appeal.  When you say you 'flush' a cache do you mean
invalidate, write-back, or both?  If (as I suspect) not all of you
mean the same thing, should you not instead speak of 'invalidate' and
'writeback'... sloppy language surely leads to sloppy programming?

-- 
Dominic Sweetman
Algorithmics Ltd
The Fruit Farm, Ely Road, Chittering, CAMBS CB5 9PH, ENGLAND
phone +44 1223 706200/fax +44 1223 706250/direct +44 1223 706205
http://www.algor.co.uk

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 23:02             ` Dominic Sweetman
@ 2002-07-12  0:24               ` Ralf Baechle
  2002-07-12 10:37               ` Gleb O. Raiko
  2002-07-12 18:40               ` Maciej W. Rozycki
  2 siblings, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-07-12  0:24 UTC (permalink / raw)
  To: Dominic Sweetman
  Cc: Maciej W. Rozycki, Gleb O. Raiko, Carsten Langgaard, Jon Burgess,
	linux-mips

On Fri, Jul 12, 2002 at 12:02:27AM +0100, Dominic Sweetman wrote:

> PS: my standard appeal.  When you say you 'flush' a cache do you mean
> invalidate, write-back, or both?  If (as I suspect) not all of you
> mean the same thing, should you not instead speak of 'invalidate' and
> 'writeback'... sloppy language surely leads to sloppy programming?

I already had discussions with 68k people about this problem of
terminology.  It seems there is no unambigous terms for the whole
``cachology'' in the industry.

  Ralf

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 15:59                 ` Maciej W. Rozycki
@ 2002-07-12 10:26                   ` Gleb O. Raiko
  2002-07-12 19:02                     ` Maciej W. Rozycki
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-12 10:26 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

"Maciej W. Rozycki" wrote:
> 
> On Thu, 11 Jul 2002, Gleb O. Raiko wrote:
> 
> > Have to check the cacheline at given address again. D-cache may have the
> > valid bit set for the cacheline at the same address. Address means
> > location in a cache, not memory. Check at address requires one extra
> > tick as opposed to checking the bit.
> 
>  Well, you issue an instruction word read from the cache.  The answer is
> either a hit, providing a word at the data bus at the same time (so you
> can't get a hit from one cache and data from the other) or a miss with no
> valid data -- you have to stall in this case, waiting for a refill.  

Let it be miss and stall.

>Then
> when data from the main memory arrives, it is latched in the cache (it
> doesn't really matter, which one now -- if it's the wrong one, then
> another refill will happen next time the memory address is dereferenced)
> and provided to the CPU at the same time.

At this time, CPU continues the execution of previous stalled
instruction. CPU knows the stalled instruction is in I-cache, but,
unfortunately, caches have been swapped already. The same cacheline in
the D-cache was valid bit set. CPU get data instead of code.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 23:02             ` Dominic Sweetman
  2002-07-12  0:24               ` Ralf Baechle
@ 2002-07-12 10:37               ` Gleb O. Raiko
  2002-07-12 18:40               ` Maciej W. Rozycki
  2 siblings, 0 replies; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-12 10:37 UTC (permalink / raw)
  To: Dominic Sweetman
  Cc: Maciej W. Rozycki, Ralf Baechle, Carsten Langgaard, Jon Burgess,
	linux-mips

Dominic Sweetman wrote:
> PS: my standard appeal.  When you say you 'flush' a cache do you mean
> invalidate, write-back, or both?

I personally mean the routine has 'flush' in its name. So, 'to flush a
cache' just mens 'to call this routine'.

Considering the name belongs to the common code (as opposed to the arch
specific code), I doubt the name shall be changed.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 23:02             ` Dominic Sweetman
  2002-07-12  0:24               ` Ralf Baechle
  2002-07-12 10:37               ` Gleb O. Raiko
@ 2002-07-12 18:40               ` Maciej W. Rozycki
  2 siblings, 0 replies; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-12 18:40 UTC (permalink / raw)
  To: Dominic Sweetman
  Cc: Ralf Baechle, Gleb O. Raiko, Carsten Langgaard, Jon Burgess, linux-mips

On Fri, 12 Jul 2002, Dominic Sweetman wrote:

> PS: my standard appeal.  When you say you 'flush' a cache do you mean
> invalidate, write-back, or both?  If (as I suspect) not all of you
> mean the same thing, should you not instead speak of 'invalidate' and
> 'writeback'... sloppy language surely leads to sloppy programming?

 For me, a "flush" is both (i.e., as Gleb noticed, that's what functions
with the word in names do).  For a lone write back or invalidation, I
would use these terms respectively.

 Well, for the R3k the term is unambiguous anyway...

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-12 10:26                   ` Gleb O. Raiko
@ 2002-07-12 19:02                     ` Maciej W. Rozycki
  2002-07-15  9:16                       ` Gleb O. Raiko
  0 siblings, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-12 19:02 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: linux-mips

On Fri, 12 Jul 2002, Gleb O. Raiko wrote:

> >  Well, you issue an instruction word read from the cache.  The answer is
> > either a hit, providing a word at the data bus at the same time (so you
> > can't get a hit from one cache and data from the other) or a miss with no
> > valid data -- you have to stall in this case, waiting for a refill.  
> 
> Let it be miss and stall.
> 
> >Then
> > when data from the main memory arrives, it is latched in the cache (it
> > doesn't really matter, which one now -- if it's the wrong one, then
> > another refill will happen next time the memory address is dereferenced)
> > and provided to the CPU at the same time.
> 
> At this time, CPU continues the execution of previous stalled

 We don't care of previous instructions.  The pipeline is stalled at the
intruction word fetch stage.  Previously fetched instructions continue
being processed until they leave the pipeline. 

> instruction. CPU knows the stalled instruction is in I-cache, but,
> unfortunately, caches have been swapped already. The same cacheline in
> the D-cache was valid bit set. CPU get data instead of code.

 Well, I certainly understand what you mean, from the beginning, actually,
but I still can't see why this would happen for a real implementation. 
When a cache miss happens an instruction word is read directly from the
main memory to the pipeline and a cache fill happens "accidentally".

 What you describe, would require a CPU to query a cache status somehow
during a fill (what if another fill is in progress? -- a cache controller
may perform a fill of additional lines itself as it happens in certain
implementations) and then issue a second read when the fill completes. 
That looks weird to me -- why would you design it this way? 

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-12 19:02                     ` Maciej W. Rozycki
@ 2002-07-15  9:16                       ` Gleb O. Raiko
  2002-07-16  9:00                         ` Maciej W. Rozycki
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-15  9:16 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

"Maciej W. Rozycki" wrote:
>  Well, I certainly understand what you mean, from the beginning, actually,
> but I still can't see why this would happen for a real implementation.
> When a cache miss happens an instruction word is read directly from the
> main memory to the pipeline and a cache fill happens "accidentally".
> 
>  What you describe, would require a CPU to query a cache status somehow
> during a fill (what if another fill is in progress? -- a cache controller
> may perform a fill of additional lines itself as it happens in certain
> implementations) and then issue a second read when the fill completes.
> That looks weird to me -- why would you design it this way?
> 

A cache is filled in cachline units. There is a possibility to fill only
part of cacheline in one cache and to store the rest of data in another
cache on the switch. Both caches will set the valid bit on this
cacheline, but only part of cacheline is valid. In the worst cache, this
operation may load instructions that will be reused later, for example,
part of the main loop of the cache invalidation (sic!) routine.

Unfortunately, the behaviour depends on whether miss occurs, what
instructions are loaded, how they are aligned, and so on. It means, if
you get crash on this kernel version, you won't get a crash on another.
If you add debug routines, everything is OK. Other black magic tricks
are also here. (As you may guess, I explain my real experience here.
:-). Analyzer doesn't help, bus transactions look good.)

In order to avoid this, CPU shall either perform the check again or
freeze everything on the cache swap operation. The latter doesn't look
real. Anyway, it's a lot of additional unnatural logic. So, the
requirement to run swapping operation uncached looks reasonable.

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-15  9:16                       ` Gleb O. Raiko
@ 2002-07-16  9:00                         ` Maciej W. Rozycki
  2002-07-16 10:20                           ` Gleb O. Raiko
  0 siblings, 1 reply; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-16  9:00 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: linux-mips

On Mon, 15 Jul 2002, Gleb O. Raiko wrote:

> A cache is filled in cachline units. There is a possibility to fill only
> part of cacheline in one cache and to store the rest of data in another
> cache on the switch. Both caches will set the valid bit on this
> cacheline, but only part of cacheline is valid. In the worst cache, this
> operation may load instructions that will be reused later, for example,
> part of the main loop of the cache invalidation (sic!) routine.

 Well, it looks possible for a CPU with cache lines wider than 32-bits
(are there any such R3k-class CPUs?), indeed.

> Unfortunately, the behaviour depends on whether miss occurs, what
> instructions are loaded, how they are aligned, and so on. It means, if
> you get crash on this kernel version, you won't get a crash on another.
> If you add debug routines, everything is OK. Other black magic tricks
> are also here. (As you may guess, I explain my real experience here.
> :-). Analyzer doesn't help, bus transactions look good.)

 How true -- I've seen such nastinesses, too. :-/  Except that I don't
have an analyzer.

> In order to avoid this, CPU shall either perform the check again or
> freeze everything on the cache swap operation. The latter doesn't look
> real. Anyway, it's a lot of additional unnatural logic. So, the
> requirement to run swapping operation uncached looks reasonable.

 Well, the simplest effective approach would be a third alternative, i.e. 
to make swapping happen only when no fill is in progress.  Trivial logic
with a single D latch on the swap signal should suffice -- I don't think
the save on omitting it is worth breaking architecture specs and
performance.

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-16  9:00                         ` Maciej W. Rozycki
@ 2002-07-16 10:20                           ` Gleb O. Raiko
  2002-07-16 10:36                             ` Maciej W. Rozycki
  0 siblings, 1 reply; 40+ messages in thread
From: Gleb O. Raiko @ 2002-07-16 10:20 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: linux-mips

"Maciej W. Rozycki" wrote:
>  Well, it looks possible for a CPU with cache lines wider than 32-bits
> (are there any such R3k-class CPUs?), indeed.

Yes, IDT R3081 has 16-byte I-cacheline. It also may have 16-byte
D-cacheline, it depends on DB Refill set, may be set by wiring and in
software too. DB Refill is here to support burst reads modern DRAMs
have. 

> 
> > Unfortunately, the behaviour depends on whether miss occurs, what
> > instructions are loaded, how they are aligned, and so on. It means, if
> > you get crash on this kernel version, you won't get a crash on another.
> > If you add debug routines, everything is OK. Other black magic tricks
> > are also here. (As you may guess, I explain my real experience here.
> > :-). Analyzer doesn't help, bus transactions look good.)
> 
>  How true -- I've seen such nastinesses, too. :-/  Except that I don't
> have an analyzer.

Don't care, it doesn't help in such situations.

> 
> > In order to avoid this, CPU shall either perform the check again or
> > freeze everything on the cache swap operation. The latter doesn't look
> > real. Anyway, it's a lot of additional unnatural logic. So, the
> > requirement to run swapping operation uncached looks reasonable.
> 
>  Well, the simplest effective approach would be a third alternative, i.e.
> to make swapping happen only when no fill is in progress.  Trivial logic
> with a single D latch on the swap signal should suffice -- I don't think
> the save on omitting it is worth breaking architecture specs and
> performance.
> 

In two words, it's unclear when there are no fills. Too much situations,
additional stall condition (which may break spec anyway). I can't
present full explanation, sorry. You have to believe. :-)

BTW, I reread my R3081 HW Manual and found two intresting places about
cache operation:

"These mechanisms [cache sizing, cache flushing] are enabled through the
use of the ⌠IsC■ (Isolate Cache) and SwC (Swap Cache) bits of the status
register, which resides in the on-chip System Control Co-Processor
(CP0). Instructions which immediately precede and succeed these
operations must not be cacheable, so that the actual swapping/isolation
of the cache does not disrupt operation."

Note precede instructions.
Then, on cache sizeing:

"Cache Sizing

[Famous algorithm that we implement]

Note that this software should operate as uncached. Once this algorithm
is done, software should return the caches to their normal state by
performing either a complete cache flush or an invalidate of those cache
lines modified by the sizing algorithm."

Regards,
Gleb.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-16 10:20                           ` Gleb O. Raiko
@ 2002-07-16 10:36                             ` Maciej W. Rozycki
  0 siblings, 0 replies; 40+ messages in thread
From: Maciej W. Rozycki @ 2002-07-16 10:36 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: linux-mips

On Tue, 16 Jul 2002, Gleb O. Raiko wrote:

> In two words, it's unclear when there are no fills. Too much situations,
> additional stall condition (which may break spec anyway). I can't

 The cache logic somehow must know a fill is in progress.  Or at least
that it starts or finishes.  Another latch may store the state.

> present full explanation, sorry. You have to believe. :-)

 You don't have to convince me broken hardware is out there.  I am simply
trying to emphasize there is still some demand on good hardware, even if
marginally more expensive.

> BTW, I reread my R3081 HW Manual and found two intresting places about
> cache operation:
> 
> "These mechanisms [cache sizing, cache flushing] are enabled through the
> use of the “IsC” (Isolate Cache) and SwC (Swap Cache) bits of the status
> register, which resides in the on-chip System Control Co-Processor
> (CP0). Instructions which immediately precede and succeed these
> operations must not be cacheable, so that the actual swapping/isolation
> of the cache does not disrupt operation."

 So someone will need to code a set of cache handling functions with a
workaround for this deficiency if we are to support a system with this CPU
as it appears Linux-capable.

 Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-22  8:18 ` Sedjai, Mohamed
  0 siblings, 0 replies; 40+ messages in thread
From: Sedjai, Mohamed @ 2002-07-22  8:18 UTC (permalink / raw)
  To: Linux/MIPS Development; +Cc: Ralf Baechle, Gleb O. Raiko, carstenl

Thanks for your answer. However I still do not get the whole picture.
Here is my understanding:

Let say I have copied some code, call it CODE-1, from network into memory. 
Before I can execute CODE-1 , I need to flush the instruction cache, 
which obviously does not contain CODE-1. By the way, CODE-1 is likely 
to be present in D-Cache but this does not help so much. 

When Instruction cache flush is performed, all the I-Cache lines are
invalidated to force the core to fetch from main memory instead of I-cache.

Let's call the routine performing this operation CODE-INV. If CODE-INV is
running cached, then it is contained in some cache lines that we will call 
CODE-INV-LINES. CODE-INV is a loop that goes through all the cache lines and 
mark them as invalid.

At some point of this process, CODE-INV-LINES are invalidated but as CODE-INV 
goes on to the next lines, it is re-inserted into CODE-INV-LINES.

So when CODE-INV returns, all the I-Cache lines are marked Invalid except 
CODE-INV-LINES.

Is this correct ?

If it is why is this not causing problems ? Since there is a chance that 
CODE-1 contains code whose cache location is also CODE-INV-LINES 
and thus gets wrong instructions.

Regards,

Mohamed.


-----Original Message-----
From: Geert Uytterhoeven [mailto:geert@linux-m68k.org]
Sent: Freitag, 12. Juli 2002 11:27
To: Sedjai, Mohamed
Cc: Jon Burgess; Ralf Baechle; Gleb O. Raiko; Linux/MIPS Development;
carstenl@mips.com
Subject: RE: mips32_flush_cache routine corrupts CP0_STATUS with
gcc-2.96


On Fri, 12 Jul 2002, Sedjai, Mohamed wrote:
> If you run instruction cache flushing cached, then the cache will be dirty
> when the routine returns. At least the line(s) containing the routine itself ?
> Or am I missing something ?

Since the contents of the instruction cache are never changed (except by a
cache load), an instruction cache line can never become dirty.

Dirty cache lines and cache line write back are an exclusive privilege of write
back data caches.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-22  8:18 ` Sedjai, Mohamed
  0 siblings, 0 replies; 40+ messages in thread
From: Sedjai, Mohamed @ 2002-07-22  8:18 UTC (permalink / raw)
  To: Linux/MIPS Development; +Cc: Ralf Baechle, Gleb O. Raiko, carstenl

Thanks for your answer. However I still do not get the whole picture.
Here is my understanding:

Let say I have copied some code, call it CODE-1, from network into memory. 
Before I can execute CODE-1 , I need to flush the instruction cache, 
which obviously does not contain CODE-1. By the way, CODE-1 is likely 
to be present in D-Cache but this does not help so much. 

When Instruction cache flush is performed, all the I-Cache lines are
invalidated to force the core to fetch from main memory instead of I-cache.

Let's call the routine performing this operation CODE-INV. If CODE-INV is
running cached, then it is contained in some cache lines that we will call 
CODE-INV-LINES. CODE-INV is a loop that goes through all the cache lines and 
mark them as invalid.

At some point of this process, CODE-INV-LINES are invalidated but as CODE-INV 
goes on to the next lines, it is re-inserted into CODE-INV-LINES.

So when CODE-INV returns, all the I-Cache lines are marked Invalid except 
CODE-INV-LINES.

Is this correct ?

If it is why is this not causing problems ? Since there is a chance that 
CODE-1 contains code whose cache location is also CODE-INV-LINES 
and thus gets wrong instructions.

Regards,

Mohamed.


-----Original Message-----
From: Geert Uytterhoeven [mailto:geert@linux-m68k.org]
Sent: Freitag, 12. Juli 2002 11:27
To: Sedjai, Mohamed
Cc: Jon Burgess; Ralf Baechle; Gleb O. Raiko; Linux/MIPS Development;
carstenl@mips.com
Subject: RE: mips32_flush_cache routine corrupts CP0_STATUS with
gcc-2.96


On Fri, 12 Jul 2002, Sedjai, Mohamed wrote:
> If you run instruction cache flushing cached, then the cache will be dirty
> when the routine returns. At least the line(s) containing the routine itself ?
> Or am I missing something ?

Since the contents of the instruction cache are never changed (except by a
cache load), an instruction cache line can never become dirty.

Dirty cache lines and cache line write back are an exclusive privilege of write
back data caches.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-15  9:42 Jon Burgess
  0 siblings, 0 replies; 40+ messages in thread
From: Jon Burgess @ 2002-07-15  9:42 UTC (permalink / raw)
  To: Maciej W. Rozycki, Gleb O. Raiko; +Cc: linux-mips



On Fri, 12 Jul 2002, Gleb O. Raiko wrote:
> instruction. CPU knows the stalled instruction is in I-cache, but,
> unfortunately, caches have been swapped already. The same cacheline in
> the D-cache was valid bit set. CPU get data instead of code.

I'm glad this cache swapping mess is not necessary for mips32 chips. I imagine
that when the caches are swapped the instruction fetch will examine the D-cache
so it will not 'know' the instruction is in the I-cache. If the address is
present in both caches then the content should be the same. If the cache
routines have some self-modifying code then this really is asking for trouble.

     Jon

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-12 15:24 Jon Burgess
  0 siblings, 0 replies; 40+ messages in thread
From: Jon Burgess @ 2002-07-12 15:24 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: Gleb O. Raiko, Ralf Baechle, linux-mips



>One of the things it fix, is a typo, which will hit you because you have
different
>size of the I- and D-caches.
>But it also fix, that in a lot of the cache routine, we only flush one of the
ways.
>
>Please see the attached patch, I think Ralf didn't liked it, because it wasn't
>optimized for speed, but at least it's better than what we got.
>
>/Carsten

The typo was fixed in the code which Broadcom supplied to us. I noticed this was
not fixed in the mainstream code and was meaning to submit a patch for it. I
will check out the other changes next week. Thanks for the patch.

     Jon

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-12  9:08 ` Sedjai, Mohamed
  (?)
  (?)
@ 2002-07-12  9:40 ` Carsten Langgaard
  -1 siblings, 0 replies; 40+ messages in thread
From: Carsten Langgaard @ 2002-07-12  9:40 UTC (permalink / raw)
  To: Sedjai, Mohamed; +Cc: Jon Burgess, Ralf Baechle, Gleb O. Raiko, linux-mips

"Sedjai, Mohamed" wrote:

> Hello everybody,
>
> I've been following this thread with much attention and interest,
> and I would like to give my small contribution, even though my expertise
> is far lower than yours.
>
> Our TX49 (R4000 based) manual also states that "if the CACHE instruction
> is issued for the line in which this instruction exists the operation
> is not guaranted". As you can see from the arch/mips/mm/r4xx0.c file,
> TX49 routines always disable caches before operating.
>
> Recently, one of our customer, raised the question since he was comparing
> performance between TX49 and another comparable MIPS architecture.
> He noticed a huge difference in favor of the other vendor.
> For information it was a multimedia application. Investigations
> showed that the other vendor was running cache flushing operations cached.
> He tried to also run TX49 cached and, miracle, TX49 performed much better
> than the other chip. And the application could run for hours without
> crashing.
>
> I have contacted our designers, and the answer I got so far is that a problem can
> occur depending on the alignement of the CACHE instructions and on the set
> in which they are located (TX49 cache is 4 way set). This confirms Jon's
> investigation. Carsten, can you comment this, as a MIPS insider ? Which
> CPUs are concerned ?

I can only speak for our own products (4Kc serie, 5Kc serie and 20Kc).
They have no problem with running the cache routine from cached space.

"Index load/store Tag" cache operation instructions, however should be executed from
uncached space, but they are not used in linux.

>
> Further investigation are now ongoing to find a proper workaround and thus suggestions
> are highly apreciated.
>
> >From my side I have a very simple question:
>
> If you run instruction cache flushing cached, then the cache will be dirty
> when the routine returns. At least the line(s) containing the routine itself ?
> Or am I missing something ?
>
> Best Regards,
>
> Mohamed SEDJAI
> TOSHIBA Electronics Europe
>
> PS: Sorry Dominic for a possible misusage of the terminology. BTW I found your book
> wonderfully well written and consider it as a reference to anyone who wants
> to write a technical book.
>
> -----Original Message-----
> From: Jon Burgess [mailto:Jon_Burgess@eur.3com.com]
> Sent: Donnerstag, 11. Juli 2002 18:34
> To: Ralf Baechle
> Cc: Gleb O. Raiko; linux-mips@oss.sgi.com; carstenl@mips.com
> Subject: Re: mips32_flush_cache routine corrupts CP0_STATUS with
> gcc-2.96
>
> > Ralf wrote:
> >Have you tried to insert a large number of nops instead?
>
> My investigation suggests that a single extra nop is sufficient. I have also
> tried inserting extra nops before the cache routine to see if the relative
> alignment of the instructions with respect to the cacheline has an influence,
> but it has no effect. I am suspicious that if this occurs with the instruction
> following the loop then something odd might be occuring on every loop iteration
> as well. I might try adjusting the instructions in the loop to see if that has
> any effect.
>
> >  Or preferably,
> >how about replacing the __restore_flags() in your example with the
> >following piece of inline assembler:
> >
> >  __asm__ __volatile__("mtc0\t%0, $12" ::"r" (flags) : "memory");
>
> I am happy that the current assembler code looks correct, but this change would
> make it simpler.
>
>      Jon

--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-12  9:08 ` Sedjai, Mohamed
  (?)
@ 2002-07-12  9:26 ` Geert Uytterhoeven
  -1 siblings, 0 replies; 40+ messages in thread
From: Geert Uytterhoeven @ 2002-07-12  9:26 UTC (permalink / raw)
  To: Sedjai, Mohamed
  Cc: Jon Burgess, Ralf Baechle, Gleb O. Raiko, Linux/MIPS Development,
	carstenl

On Fri, 12 Jul 2002, Sedjai, Mohamed wrote:
> If you run instruction cache flushing cached, then the cache will be dirty
> when the routine returns. At least the line(s) containing the routine itself ?
> Or am I missing something ?

Since the contents of the instruction cache are never changed (except by a
cache load), an instruction cache line can never become dirty.

Dirty cache lines and cache line write back are an exclusive privilege of write
back data caches.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11  9:49 Jon Burgess
  2002-07-11 12:13 ` Ralf Baechle
@ 2002-07-12  9:18 ` Carsten Langgaard
  1 sibling, 0 replies; 40+ messages in thread
From: Carsten Langgaard @ 2002-07-12  9:18 UTC (permalink / raw)
  To: Jon Burgess; +Cc: Gleb O. Raiko, Ralf Baechle, linux-mips

[-- Attachment #1: Type: text/plain, Size: 5294 bytes --]

One thing that just stroke me, I've made some changes to the MIPS32 cache routine
last month.
I just realized it hadn't made it into the CVS tree. It doesn't explain what you
are seeing, but it fixes some problem that you might get.

One of the things it fix, is a typo, which will hit you because you have different
size of the I- and D-caches.
But it also fix, that in a lot of the cache routine, we only flush one of the ways.

Please see the attached patch, I think Ralf didn't liked it, because it wasn't
optimized for speed, but at least it's better than what we got.

/Carsten


Jon Burgess wrote:

> >This sound more like a hardware bug to me.
> >What CPU are you running on ?
> >
> >/Carsten
>
> The CPU is a Broadcom BCM6352 which contains a 225Mhz Mips32 core with cache as
> follows:
>      16Kb Instruction cache, linesize 16 bytes (2 ways)
>      4Kb Data cache, linesize 16 byes (2 ways)
> I don't know the exact variant of the core but the chip has only been recently
> released. I will try asking my Broadcom contacts to see if they can shed any
> further light on this.
>
> >Ralf Baechle wrote:
> >>
> >> While that could be done it's not a good idea; running code in KSEG1 is
> >> very slow.
> >>
> >
> >Unfortunately, it's required by manuals for some processors. As I know,
> >IDT HW manuals clearly state cache flush routines must operate from
> >uncached code and must access uncached data only. Examples are R30x1 CPU
> >series.
> >
> >Regards,
> >Gleb.
>
> I'm fairly sure i've seen comments that say that cache manipulation code should
> be run uncached. My current thought is that it is probably safe to manipulate
> the d-cache with cached code, but I-cache manipulation which could invalidate
> the cacheline containing the currently executing code really should be run
> uncached. I think the CPU probably then skips instructions until it gets to the
> next cacheline, what effect this has depends on how the instructions are aligned
> relative to the cacheline. Given how hit-and-miss this is I am suspicous that
> this problem could simply be lurking undiscovered.
>
> The patch below makes the I-Cache routines run via kseg1, it is a bit ugly but
> seems to work. I have not measured the performance impact of this patch.
>
> diff -Nruw --exclude=CVS include/asm-mips-old/mips32_cache.h
> include/asm-mips/mips32_cache.h
> --- include/asm-mips-old/mips32_cache.h  Wed Apr 10 22:53:12 2002
> +++ include/asm-mips/mips32_cache.h      Thu Jul 11 10:25:06 2002
> @@ -219,37 +219,24 @@
>      }
>  }
>
> +/* Call I-cache manipulation routines via uncached kseg1 */
> +extern void mips32_blast_icache(void);
> +extern void mips32_blast_icache_page(unsigned long page);
> +extern void mips32_blast_icache_page_indexed(unsigned long page);
> +
>  static inline void blast_icache(void)
>  {
> -    unsigned long start = KSEG0;
> -    unsigned long end = (start + icache_size);
> -
> -    while(start < end) {
> -         cache_unroll(start,Index_Invalidate_I);
> -         start += ic_lsize;
> -    }
> +    ((void (*)(void))KSEG1ADDR((unsigned long)mips32_blast_icache))();
>  }
>
>  static inline void blast_icache_page(unsigned long page)
>  {
> -    unsigned long start = page;
> -    unsigned long end = (start + PAGE_SIZE);
> -
> -    while(start < end) {
> -         cache_unroll(start,Hit_Invalidate_I);
> -         start += ic_lsize;
> -    }
> +    ((void (*)(unsigned long))KSEG1ADDR((unsigned
> long)mips32_blast_icache_page))(page);
>  }
>
>  static inline void blast_icache_page_indexed(unsigned long page)
>  {
> -    unsigned long start = page;
> -    unsigned long end = (start + PAGE_SIZE);
> -
> -    while(start < end) {
> -         cache_unroll(start,Index_Invalidate_I);
> -         start += ic_lsize;
> -    }
> +    ((void (*)(unsigned long))KSEG1ADDR((unsigned
> long)mips32_blast_icache_page_indexed))(page);
>  }
>
>  static inline void blast_scache(void)
> diff -Nurw arch/mips/mm-old/c-mips32.c arch/mips/mm/c-mips32.c
> --- arch/mips/mm-old/c-mips32.c    Thu May 23 15:19:36 2002
> +++ arch/mips/mm/c-mips32.c   Thu Jul 11 10:21:02 2002
> @@ -708,3 +708,36 @@
>
>      __flush_cache_all();
>  }
> +
> +void mips32_blast_icache(void)
> +{
> +    unsigned long start = KSEG0;
> +    unsigned long end = (start + icache_size);
> +
> +    while(start < end) {
> +         cache_unroll(start,Index_Invalidate_I);
> +         start += ic_lsize;
> +    }
> +}
> +
> +void mips32_blast_icache_page(unsigned long page)
> +{
> +    unsigned long start = page;
> +    unsigned long end = (start + PAGE_SIZE);
> +
> +    while(start < end) {
> +         cache_unroll(start,Hit_Invalidate_I);
> +         start += ic_lsize;
> +    }
> +}
> +
> +void mips32_blast_icache_page_indexed(unsigned long page)
> +{
> +    unsigned long start = page;
> +    unsigned long end = (start + PAGE_SIZE);
> +
> +    while(start < end) {
> +         cache_unroll(start,Index_Invalidate_I);
> +         start += ic_lsize;
> +    }
> +}
>
>      Jon Burgess

--
_    _ ____  ___   Carsten Langgaard   Mailto:carstenl@mips.com
|\  /|||___)(___   MIPS Denmark        Direct: +45 4486 5527
| \/ |||    ____)  Lautrupvang 4B      Switch: +45 4486 5555
  TECHNOLOGIES     2750 Ballerup       Fax...: +45 4486 5556
                   Denmark             http://www.mips.com



[-- Attachment #2: cache.patch --]
[-- Type: text/plain, Size: 5752 bytes --]

--- arch/mips/mm/c-mips32.c	Wed May 29 05:03:17 2002
+++ ../../linux-2.4.18/sw/linux-2.4.18/arch/mips/mm/c-mips32.c	Thu Jul 11 09:55:08 2002
@@ -17,7 +17,6 @@
  *
  * MIPS32 CPU variant specific MMU/Cache routines.
  */
-#include <linux/config.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/sched.h>
@@ -75,7 +74,8 @@
 	unsigned long flags;
 
 	__save_and_cli(flags);
-	blast_dcache(); blast_icache();
+	blast_dcache(); 
+	blast_icache();
 	__restore_flags(flags);
 }
 
@@ -303,7 +303,7 @@
 	if (!(vma->vm_flags & VM_EXEC))
 		return;
 
-	address = KSEG0 + ((unsigned long)page_address(page) & PAGE_MASK & (dcache_size - 1));
+	address = KSEG0 + ((unsigned long)page_address(page) & PAGE_MASK & (icache_size - 1));
 	blast_icache_page_indexed(address);
 }
 
@@ -317,7 +317,7 @@
 	unsigned int flags;
 
 	if (size >= dcache_size) {
-		flush_cache_all();
+		blast_dcache();
 	} else {
 	        __save_and_cli(flags);
 		a = addr & ~(dc_lsize - 1);
@@ -338,7 +338,7 @@
 	unsigned long end, a;
 
 	if (size >= scache_size) {
-		flush_cache_all();
+		blast_scache();
 		return;
 	}
 
@@ -358,13 +358,13 @@
 	unsigned int flags;
 
 	if (size >= dcache_size) {
-		flush_cache_all();
+		blast_dcache();
 	} else {
 	        __save_and_cli(flags);
 		a = addr & ~(dc_lsize - 1);
 		end = (addr + size) & ~(dc_lsize - 1);
 		while (1) {
-			flush_dcache_line(a); /* Hit_Writeback_Inv_D */
+			invalidate_dcache_line(a); /* Hit_Inv_D */
 			if (a == end) break;
 			a += dc_lsize;
 		}
@@ -380,14 +380,14 @@
 	unsigned long end, a;
 
 	if (size >= scache_size) {
-		flush_cache_all();
+		blast_scache();
 		return;
 	}
 
 	a = addr & ~(sc_lsize - 1);
 	end = (addr + size) & ~(sc_lsize - 1);
 	while (1) {
-		flush_scache_line(a); /* Hit_Writeback_Inv_SD */
+		invalidate_scache_line(a); /* Hit_Writeback_Inv_SD */
 		if (a == end) break;
 		a += sc_lsize;
 	}
--- include/asm-mips/mips32_cache.h	Wed Jul  3 08:30:08 2002
+++ ../../linux-2.4.18/sw/linux-2.4.18/include/asm-mips/mips32_cache.h	Mon Jun 24 13:17:36 2002
@@ -37,41 +37,65 @@
 
 static inline void flush_icache_line_indexed(unsigned long addr)
 {
-	__asm__ __volatile__(
-		".set noreorder\n\t"
-		".set mips3\n\t"
-		"cache %1, (%0)\n\t"
-		".set mips0\n\t"
-		".set reorder"
-		:
-		: "r" (addr),
-		  "i" (Index_Invalidate_I));
+	unsigned long waystep = icache_size/mips_cpu.icache.ways;
+	unsigned int way;
+
+	for (way = 0; way < mips_cpu.icache.ways; way++)
+	{
+		__asm__ __volatile__(
+			".set noreorder\n\t"
+			".set mips3\n\t"
+			"cache %1, (%0)\n\t"
+			".set mips0\n\t"
+			".set reorder"
+			:
+			: "r" (addr),
+			"i" (Index_Invalidate_I));
+		
+		addr += waystep;
+	}
 }
 
 static inline void flush_dcache_line_indexed(unsigned long addr)
 {
-	__asm__ __volatile__(
-		".set noreorder\n\t"
-		".set mips3\n\t"
-		"cache %1, (%0)\n\t"
-		".set mips0\n\t"
-		".set reorder"
-		:
-		: "r" (addr),
-		  "i" (Index_Writeback_Inv_D));
+	unsigned long waystep = dcache_size/mips_cpu.dcache.ways;
+	unsigned int way;
+
+	for (way = 0; way < mips_cpu.dcache.ways; way++)
+	{
+		__asm__ __volatile__(
+			".set noreorder\n\t"
+			".set mips3\n\t"
+			"cache %1, (%0)\n\t"
+			".set mips0\n\t"
+			".set reorder"
+			:
+			: "r" (addr),
+			"i" (Index_Writeback_Inv_D));
+
+		addr += waystep;
+	}
 }
 
 static inline void flush_scache_line_indexed(unsigned long addr)
 {
-	__asm__ __volatile__(
-		".set noreorder\n\t"
-		".set mips3\n\t"
-		"cache %1, (%0)\n\t"
-		".set mips0\n\t"
-		".set reorder"
-		:
-		: "r" (addr),
-		  "i" (Index_Writeback_Inv_SD));
+	unsigned long waystep = scache_size/mips_cpu.scache.ways;
+	unsigned int way;
+
+	for (way = 0; way < mips_cpu.scache.ways; way++)
+	{
+		__asm__ __volatile__(
+			".set noreorder\n\t"
+			".set mips3\n\t"
+			"cache %1, (%0)\n\t"
+			".set mips0\n\t"
+			".set reorder"
+			:
+			: "r" (addr),
+			"i" (Index_Writeback_Inv_SD));
+
+		addr += waystep;
+	}
 }
 
 static inline void flush_icache_line(unsigned long addr)
@@ -210,12 +234,17 @@
 
 static inline void blast_dcache_page_indexed(unsigned long page)
 {
-	unsigned long start = page;
-	unsigned long end = (start + PAGE_SIZE);
-
-	while(start < end) {
-		cache_unroll(start,Index_Writeback_Inv_D);
-		start += dc_lsize;
+	unsigned long start;
+	unsigned long end = (page + PAGE_SIZE);
+	unsigned long waystep = dcache_size/mips_cpu.dcache.ways;
+	unsigned int way;
+
+	for (way = 0; way < mips_cpu.dcache.ways; way++) {
+		start = page + way*waystep;
+		while(start < end) {
+			cache_unroll(start,Index_Writeback_Inv_D);
+			start += dc_lsize;
+		}
 	}
 }
 
@@ -243,12 +272,17 @@
 
 static inline void blast_icache_page_indexed(unsigned long page)
 {
-	unsigned long start = page;
-	unsigned long end = (start + PAGE_SIZE);
-
-	while(start < end) {
-		cache_unroll(start,Index_Invalidate_I);
-		start += ic_lsize;
+	unsigned long start;
+	unsigned long end = (page + PAGE_SIZE);
+	unsigned long waystep = icache_size/mips_cpu.icache.ways;
+	unsigned int way;
+
+	for (way = 0; way < mips_cpu.icache.ways; way++) {
+		start = page + way*waystep;
+		while(start < end) {
+			cache_unroll(start,Index_Invalidate_I);
+			start += ic_lsize;
+		}
 	}
 }
 
@@ -276,12 +310,17 @@
 
 static inline void blast_scache_page_indexed(unsigned long page)
 {
-	unsigned long start = page;
-	unsigned long end = page + PAGE_SIZE;
-
-	while(start < end) {
-		cache_unroll(start,Index_Writeback_Inv_SD);
-		start += sc_lsize;
+	unsigned long start;
+	unsigned long end = (page + PAGE_SIZE);
+	unsigned long waystep = scache_size/mips_cpu.scache.ways;
+	unsigned int way;
+
+	for (way = 0; way < mips_cpu.scache.ways; way++) {
+		start = page + way*waystep;
+		while(start < end) {
+			cache_unroll(start,Index_Writeback_Inv_SD);
+			start += sc_lsize;
+		}
 	}
 }
 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-12  9:08 ` Sedjai, Mohamed
  0 siblings, 0 replies; 40+ messages in thread
From: Sedjai, Mohamed @ 2002-07-12  9:08 UTC (permalink / raw)
  To: Jon Burgess, Ralf Baechle; +Cc: Gleb O. Raiko, linux-mips, carstenl

Hello everybody,

I've been following this thread with much attention and interest,
and I would like to give my small contribution, even though my expertise
is far lower than yours.

Our TX49 (R4000 based) manual also states that "if the CACHE instruction
is issued for the line in which this instruction exists the operation 
is not guaranted". As you can see from the arch/mips/mm/r4xx0.c file, 
TX49 routines always disable caches before operating. 

Recently, one of our customer, raised the question since he was comparing
performance between TX49 and another comparable MIPS architecture. 
He noticed a huge difference in favor of the other vendor.
For information it was a multimedia application. Investigations
showed that the other vendor was running cache flushing operations cached. 
He tried to also run TX49 cached and, miracle, TX49 performed much better 
than the other chip. And the application could run for hours without
crashing.

I have contacted our designers, and the answer I got so far is that a problem can
occur depending on the alignement of the CACHE instructions and on the set
in which they are located (TX49 cache is 4 way set). This confirms Jon's 
investigation. Carsten, can you comment this, as a MIPS insider ? Which
CPUs are concerned ?

Further investigation are now ongoing to find a proper workaround and thus suggestions 
are highly apreciated.

>From my side I have a very simple question:

If you run instruction cache flushing cached, then the cache will be dirty
when the routine returns. At least the line(s) containing the routine itself ?
Or am I missing something ?

Best Regards,

Mohamed SEDJAI
TOSHIBA Electronics Europe 

PS: Sorry Dominic for a possible misusage of the terminology. BTW I found your book 
wonderfully well written and consider it as a reference to anyone who wants 
to write a technical book. 










-----Original Message-----
From: Jon Burgess [mailto:Jon_Burgess@eur.3com.com]
Sent: Donnerstag, 11. Juli 2002 18:34
To: Ralf Baechle
Cc: Gleb O. Raiko; linux-mips@oss.sgi.com; carstenl@mips.com
Subject: Re: mips32_flush_cache routine corrupts CP0_STATUS with
gcc-2.96




> Ralf wrote:
>Have you tried to insert a large number of nops instead?

My investigation suggests that a single extra nop is sufficient. I have also
tried inserting extra nops before the cache routine to see if the relative
alignment of the instructions with respect to the cacheline has an influence,
but it has no effect. I am suspicious that if this occurs with the instruction
following the loop then something odd might be occuring on every loop iteration
as well. I might try adjusting the instructions in the loop to see if that has
any effect.

>  Or preferably,
>how about replacing the __restore_flags() in your example with the
>following piece of inline assembler:
>
>  __asm__ __volatile__("mtc0\t%0, $12" ::"r" (flags) : "memory");

I am happy that the current assembler code looks correct, but this change would
make it simpler.

     Jon

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-12  9:08 ` Sedjai, Mohamed
  0 siblings, 0 replies; 40+ messages in thread
From: Sedjai, Mohamed @ 2002-07-12  9:08 UTC (permalink / raw)
  To: Jon Burgess, Ralf Baechle; +Cc: Gleb O. Raiko, linux-mips, carstenl

Hello everybody,

I've been following this thread with much attention and interest,
and I would like to give my small contribution, even though my expertise
is far lower than yours.

Our TX49 (R4000 based) manual also states that "if the CACHE instruction
is issued for the line in which this instruction exists the operation 
is not guaranted". As you can see from the arch/mips/mm/r4xx0.c file, 
TX49 routines always disable caches before operating. 

Recently, one of our customer, raised the question since he was comparing
performance between TX49 and another comparable MIPS architecture. 
He noticed a huge difference in favor of the other vendor.
For information it was a multimedia application. Investigations
showed that the other vendor was running cache flushing operations cached. 
He tried to also run TX49 cached and, miracle, TX49 performed much better 
than the other chip. And the application could run for hours without
crashing.

I have contacted our designers, and the answer I got so far is that a problem can
occur depending on the alignement of the CACHE instructions and on the set
in which they are located (TX49 cache is 4 way set). This confirms Jon's 
investigation. Carsten, can you comment this, as a MIPS insider ? Which
CPUs are concerned ?

Further investigation are now ongoing to find a proper workaround and thus suggestions 
are highly apreciated.

From my side I have a very simple question:

If you run instruction cache flushing cached, then the cache will be dirty
when the routine returns. At least the line(s) containing the routine itself ?
Or am I missing something ?

Best Regards,

Mohamed SEDJAI
TOSHIBA Electronics Europe 

PS: Sorry Dominic for a possible misusage of the terminology. BTW I found your book 
wonderfully well written and consider it as a reference to anyone who wants 
to write a technical book. 










-----Original Message-----
From: Jon Burgess [mailto:Jon_Burgess@eur.3com.com]
Sent: Donnerstag, 11. Juli 2002 18:34
To: Ralf Baechle
Cc: Gleb O. Raiko; linux-mips@oss.sgi.com; carstenl@mips.com
Subject: Re: mips32_flush_cache routine corrupts CP0_STATUS with
gcc-2.96




> Ralf wrote:
>Have you tried to insert a large number of nops instead?

My investigation suggests that a single extra nop is sufficient. I have also
tried inserting extra nops before the cache routine to see if the relative
alignment of the instructions with respect to the cacheline has an influence,
but it has no effect. I am suspicious that if this occurs with the instruction
following the loop then something odd might be occuring on every loop iteration
as well. I might try adjusting the instructions in the loop to see if that has
any effect.

>  Or preferably,
>how about replacing the __restore_flags() in your example with the
>following piece of inline assembler:
>
>  __asm__ __volatile__("mtc0\t%0, $12" ::"r" (flags) : "memory");

I am happy that the current assembler code looks correct, but this change would
make it simpler.

     Jon

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11 12:11 Jon Burgess
@ 2002-07-11 16:53 ` Jun Sun
  0 siblings, 0 replies; 40+ messages in thread
From: Jun Sun @ 2002-07-11 16:53 UTC (permalink / raw)
  To: Jon Burgess; +Cc: Carsten Langgaard, Gleb O. Raiko, Ralf Baechle, linux-mips

Jon Burgess wrote:

> 
>>>I don't wonder if other IDT CPUs also require this, including those that
>>>conform MIPS32.
>>>Basically, requirement of uncached run makes hadrware logic much simpler
>>>and allows  to save silicon a bit.
>>>
>>That could be true, but then again I suggest making specific cache routines for
>>
> those
> 
>>CPUs.
>>It would be a real performance hit for the rest of us, if we have to operate
>>
> from
> 
>>uncached space.
>>
> 
> I pulled together the relevant code to generate a module to test this problem
> and it looks like the CPU always misses 1 instruction following the end of the
> cache loop. If I add some nop's to change the alignment of the code it doesn't
> seem to make any difference. The same thing seems to happen even if I change the
> cache flush to a 'Hit_invalidate' of some completely different memory region.
> One thing I thought might happen is the CPU ending the loop early as soon as it
> invalidates the cacheline containing the current instructions, but this doesn't
> seem to be the case, the 'end' address is always correct. Perhaps this really is
> a hardware problem.
> 
> The test module below does a blast_icache then a few well known instructions and
> signifies if anything has been missed. I typically get the following on our
> board.
>      Cacheop skipped 1 instructions, end = 0x80004000



Here is the test results from Malta 4kc

Cacheop skipped 0 instructions, end = 0x80004000

root@10.0.18.6:~# cat /proc/cpuinfo
processor               : 0
cpu model               : MIPS 4Kc V0.1
BogoMIPS                : 79.66
wait instruction        : no
microsecond timers      : yes
extra interrupt vector  : yes
hardware watchpoint     : yes
VCED exceptions         : not available
VCEI exceptions         : not available


Jun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-11 16:33 Jon Burgess
  0 siblings, 0 replies; 40+ messages in thread
From: Jon Burgess @ 2002-07-11 16:33 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: Gleb O. Raiko, linux-mips, carstenl



> Ralf wrote:
>Have you tried to insert a large number of nops instead?

My investigation suggests that a single extra nop is sufficient. I have also
tried inserting extra nops before the cache routine to see if the relative
alignment of the instructions with respect to the cacheline has an influence,
but it has no effect. I am suspicious that if this occurs with the instruction
following the loop then something odd might be occuring on every loop iteration
as well. I might try adjusting the instructions in the loop to see if that has
any effect.

>  Or preferably,
>how about replacing the __restore_flags() in your example with the
>following piece of inline assembler:
>
>  __asm__ __volatile__("mtc0\t%0, $12" ::"r" (flags) : "memory");

I am happy that the current assembler code looks correct, but this change would
make it simpler.

     Jon

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
  2002-07-11  9:49 Jon Burgess
@ 2002-07-11 12:13 ` Ralf Baechle
  2002-07-12  9:18 ` Carsten Langgaard
  1 sibling, 0 replies; 40+ messages in thread
From: Ralf Baechle @ 2002-07-11 12:13 UTC (permalink / raw)
  To: Jon Burgess; +Cc: Gleb O. Raiko, linux-mips, carstenl

On Thu, Jul 11, 2002 at 10:49:55AM +0100, Jon Burgess wrote:

> I'm fairly sure i've seen comments that say that cache manipulation code
> should be run uncached. My current thought is that it is probably safe to
> manipulate the d-cache with cached code, but I-cache manipulation which
> could invalidate the cacheline containing the currently executing code
> really should be run uncached. I think the CPU probably then skips
> instructions until it gets to the next cacheline, what effect this has
> depends on how the instructions are aligned relative to the cacheline.
> Given how hit-and-miss this is I am suspicous that this problem could
> simply be lurking undiscovered.
> 
> The patch below makes the I-Cache routines run via kseg1, it is a bit
> ugly but seems to work. I have not measured the performance impact of
> this patch.

Have you tried to insert a large number of nops instead?  Or preferably,
how about replacing the __restore_flags() in your example with the
following piece of inline assembler:

  __asm__ __volatile__("mtc0\t%0, $12" ::"r" (flags) : "memory");

  Ralf

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-11 12:11 Jon Burgess
  2002-07-11 16:53 ` Jun Sun
  0 siblings, 1 reply; 40+ messages in thread
From: Jon Burgess @ 2002-07-11 12:11 UTC (permalink / raw)
  To: Carsten Langgaard; +Cc: Gleb O. Raiko, Ralf Baechle, linux-mips



>> I don't wonder if other IDT CPUs also require this, including those that
>> conform MIPS32.
>> Basically, requirement of uncached run makes hadrware logic much simpler
>> and allows  to save silicon a bit.
>
>That could be true, but then again I suggest making specific cache routines for
those
>CPUs.
>It would be a real performance hit for the rest of us, if we have to operate
from
>uncached space.

I pulled together the relevant code to generate a module to test this problem
and it looks like the CPU always misses 1 instruction following the end of the
cache loop. If I add some nop's to change the alignment of the code it doesn't
seem to make any difference. The same thing seems to happen even if I change the
cache flush to a 'Hit_invalidate' of some completely different memory region.
One thing I thought might happen is the CPU ending the loop early as soon as it
invalidates the cacheline containing the current instructions, but this doesn't
seem to be the case, the 'end' address is always correct. Perhaps this really is
a hardware problem.

The test module below does a blast_icache then a few well known instructions and
signifies if anything has been missed. I typically get the following on our
board.
     Cacheop skipped 1 instructions, end = 0x80004000

The end address is correct, so the cache flush completes, but 1 instruction is
missed.

I would be interested to know if someone can test this on another mips32
processor since I don't have any others available.

Simply adding an extra nop after the cache loop might be a good workaround for
this board.

Module compiled with:
/tmp/crossdev/mips/bin/mips-linux-gcc -G 0 -mips2 -mno-abicalls -fno-pic
-mlong-calls -fno-common -O2 -fno-strict-aliasing  -I/usr/src/linux/include
-Wall -DMODULE -D__KERNEL__ -fno-common -c -o  test_tmp.o test.c
/tmp/crossdev/mips/bin/mips-linux-ld -r -G0 -o test.o test_tmp.o


#include <linux/module.h>
#include <linux/init.h>
#include <linux/sysctl.h>
#include <asm/cacheops.h>

#include <asm/bootinfo.h>
#include <asm/cpu.h>
#include <asm/bcache.h>
#include <asm/page.h>
#include <asm/system.h>
#include <asm/addrspace.h>

#define icache_size (16 * 1024)
#define ic_lsize (16)

#define cache_unroll(base,op)                   \
        __asm__ __volatile__("                  \
                .set noreorder;                 \
                .set mips3;                     \
                cache %1, (%0);                 \
                .set mips0;                     \
                .set reorder"                   \
                :                               \
                : "r" (base),                   \
                  "i" (op));

static inline unsigned test_blast_icache(void)
{
     unsigned long start = KSEG0;
     unsigned long end = (start + icache_size);

     while(start < end) {
          cache_unroll(start,Index_Invalidate_I);
          start += ic_lsize;
     }
     return start;
}

static int __init init(void)
{
     int i = 4;
     unsigned int end;

     end = test_blast_icache();

     __asm__(
          ".set push        \n"
          ".set noreorder   \n"
          "   addu %0,-1   \n"
          "   addu %0,-1   \n"
          "   addu %0,-1   \n"
          "   addu %0,-1   \n"
          ".set pop         \n"
          : "=r" (i)
          : "r" (i));

     printk("Cacheop skipped %u instructions, end = 0x%x\n", i, end);
     return 0;
}

static void __exit fini(void)
{
}

module_init(init);
module_exit(fini);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96
@ 2002-07-11  9:49 Jon Burgess
  2002-07-11 12:13 ` Ralf Baechle
  2002-07-12  9:18 ` Carsten Langgaard
  0 siblings, 2 replies; 40+ messages in thread
From: Jon Burgess @ 2002-07-11  9:49 UTC (permalink / raw)
  To: Gleb O. Raiko; +Cc: Ralf Baechle, linux-mips, carstenl



>This sound more like a hardware bug to me.
>What CPU are you running on ?
>
>/Carsten

The CPU is a Broadcom BCM6352 which contains a 225Mhz Mips32 core with cache as
follows:
     16Kb Instruction cache, linesize 16 bytes (2 ways)
     4Kb Data cache, linesize 16 byes (2 ways)
I don't know the exact variant of the core but the chip has only been recently
released. I will try asking my Broadcom contacts to see if they can shed any
further light on this.

>Ralf Baechle wrote:
>>
>> While that could be done it's not a good idea; running code in KSEG1 is
>> very slow.
>>
>
>Unfortunately, it's required by manuals for some processors. As I know,
>IDT HW manuals clearly state cache flush routines must operate from
>uncached code and must access uncached data only. Examples are R30x1 CPU
>series.
>
>Regards,
>Gleb.

I'm fairly sure i've seen comments that say that cache manipulation code should
be run uncached. My current thought is that it is probably safe to manipulate
the d-cache with cached code, but I-cache manipulation which could invalidate
the cacheline containing the currently executing code really should be run
uncached. I think the CPU probably then skips instructions until it gets to the
next cacheline, what effect this has depends on how the instructions are aligned
relative to the cacheline. Given how hit-and-miss this is I am suspicous that
this problem could simply be lurking undiscovered.

The patch below makes the I-Cache routines run via kseg1, it is a bit ugly but
seems to work. I have not measured the performance impact of this patch.

diff -Nruw --exclude=CVS include/asm-mips-old/mips32_cache.h
include/asm-mips/mips32_cache.h
--- include/asm-mips-old/mips32_cache.h  Wed Apr 10 22:53:12 2002
+++ include/asm-mips/mips32_cache.h      Thu Jul 11 10:25:06 2002
@@ -219,37 +219,24 @@
     }
 }

+/* Call I-cache manipulation routines via uncached kseg1 */
+extern void mips32_blast_icache(void);
+extern void mips32_blast_icache_page(unsigned long page);
+extern void mips32_blast_icache_page_indexed(unsigned long page);
+
 static inline void blast_icache(void)
 {
-    unsigned long start = KSEG0;
-    unsigned long end = (start + icache_size);
-
-    while(start < end) {
-         cache_unroll(start,Index_Invalidate_I);
-         start += ic_lsize;
-    }
+    ((void (*)(void))KSEG1ADDR((unsigned long)mips32_blast_icache))();
 }

 static inline void blast_icache_page(unsigned long page)
 {
-    unsigned long start = page;
-    unsigned long end = (start + PAGE_SIZE);
-
-    while(start < end) {
-         cache_unroll(start,Hit_Invalidate_I);
-         start += ic_lsize;
-    }
+    ((void (*)(unsigned long))KSEG1ADDR((unsigned
long)mips32_blast_icache_page))(page);
 }

 static inline void blast_icache_page_indexed(unsigned long page)
 {
-    unsigned long start = page;
-    unsigned long end = (start + PAGE_SIZE);
-
-    while(start < end) {
-         cache_unroll(start,Index_Invalidate_I);
-         start += ic_lsize;
-    }
+    ((void (*)(unsigned long))KSEG1ADDR((unsigned
long)mips32_blast_icache_page_indexed))(page);
 }

 static inline void blast_scache(void)
diff -Nurw arch/mips/mm-old/c-mips32.c arch/mips/mm/c-mips32.c
--- arch/mips/mm-old/c-mips32.c    Thu May 23 15:19:36 2002
+++ arch/mips/mm/c-mips32.c   Thu Jul 11 10:21:02 2002
@@ -708,3 +708,36 @@

     __flush_cache_all();
 }
+
+void mips32_blast_icache(void)
+{
+    unsigned long start = KSEG0;
+    unsigned long end = (start + icache_size);
+
+    while(start < end) {
+         cache_unroll(start,Index_Invalidate_I);
+         start += ic_lsize;
+    }
+}
+
+void mips32_blast_icache_page(unsigned long page)
+{
+    unsigned long start = page;
+    unsigned long end = (start + PAGE_SIZE);
+
+    while(start < end) {
+         cache_unroll(start,Hit_Invalidate_I);
+         start += ic_lsize;
+    }
+}
+
+void mips32_blast_icache_page_indexed(unsigned long page)
+{
+    unsigned long start = page;
+    unsigned long end = (start + PAGE_SIZE);
+
+    while(start < end) {
+         cache_unroll(start,Index_Invalidate_I);
+         start += ic_lsize;
+    }
+}

     Jon Burgess

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2002-07-22  8:19 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-10 14:16 mips32_flush_cache routine corrupts CP0_STATUS with gcc-2.96 Jon Burgess
2002-07-11  0:15 ` Ralf Baechle
2002-07-11  8:48   ` Gleb O. Raiko
2002-07-11  9:18     ` Carsten Langgaard
2002-07-11 10:06       ` Gleb O. Raiko
2002-07-11 10:15         ` Carsten Langgaard
2002-07-11 10:36           ` Gleb O. Raiko
2002-07-11 10:46             ` Carsten Langgaard
2002-07-11 11:12         ` Ralf Baechle
2002-07-11 17:01           ` Maciej W. Rozycki
2002-07-11 23:02             ` Dominic Sweetman
2002-07-12  0:24               ` Ralf Baechle
2002-07-12 10:37               ` Gleb O. Raiko
2002-07-12 18:40               ` Maciej W. Rozycki
2002-07-11 11:23         ` Maciej W. Rozycki
2002-07-11 13:11           ` Gleb O. Raiko
2002-07-11 13:41             ` Maciej W. Rozycki
2002-07-11 15:27               ` Gleb O. Raiko
2002-07-11 15:59                 ` Maciej W. Rozycki
2002-07-12 10:26                   ` Gleb O. Raiko
2002-07-12 19:02                     ` Maciej W. Rozycki
2002-07-15  9:16                       ` Gleb O. Raiko
2002-07-16  9:00                         ` Maciej W. Rozycki
2002-07-16 10:20                           ` Gleb O. Raiko
2002-07-16 10:36                             ` Maciej W. Rozycki
2002-07-11  7:34 ` Carsten Langgaard
2002-07-11  9:49 Jon Burgess
2002-07-11 12:13 ` Ralf Baechle
2002-07-12  9:18 ` Carsten Langgaard
2002-07-11 12:11 Jon Burgess
2002-07-11 16:53 ` Jun Sun
2002-07-11 16:33 Jon Burgess
2002-07-12  9:08 Sedjai, Mohamed
2002-07-12  9:08 ` Sedjai, Mohamed
2002-07-12  9:26 ` Geert Uytterhoeven
2002-07-12  9:40 ` Carsten Langgaard
2002-07-12 15:24 Jon Burgess
2002-07-15  9:42 Jon Burgess
2002-07-22  8:18 Sedjai, Mohamed
2002-07-22  8:18 ` Sedjai, Mohamed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.