All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast path for rdhwr emulation for TLS
@ 2006-07-07 15:00 Atsushi Nemoto
  2006-07-07 15:22 ` Maciej W. Rozycki
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-07 15:00 UTC (permalink / raw)
  To: linux-mips; +Cc: ralf

Adding special short path for emulationg RDHWR which is used to
support TLS.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index b563811..545bcb1 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -357,7 +357,7 @@ #endif
 	BUILD_HANDLER ibe be cli silent			/* #6  */
 	BUILD_HANDLER dbe be cli silent			/* #7  */
 	BUILD_HANDLER bp bp sti silent			/* #9  */
-	BUILD_HANDLER ri ri sti silent			/* #10 */
+	BUILD_HANDLER ri_slow ri sti silent		/* #10 */
 	BUILD_HANDLER cpu cpu sti silent		/* #11 */
 	BUILD_HANDLER ov ov sti silent			/* #12 */
 	BUILD_HANDLER tr tr sti silent			/* #13 */
@@ -369,6 +369,39 @@ #endif
 	BUILD_HANDLER dsp dsp sti silent		/* #26 */
 	BUILD_HANDLER reserved reserved sti verbose	/* others */
 
+	.align	5
+	LEAF(handle_ri)
+	.set	push
+	.set	noat
+	mfc0	k0, CP0_CAUSE
+	MFC0	k1, CP0_EPC
+	bltz	k0, handle_ri_slow	/* if delay slot */
+	lw	k0, (k1)
+	li	k1, 0x7c03e83b	/* rdhwr v1,$29 */
+	bne	k0, k1, handle_ri_slow	/* if not ours */
+	get_saved_sp	/* k1 := current_thread_info */
+	MFC0	k0, CP0_EPC
+	LONG_ADDIU	k0, 4
+	.set	noreorder
+#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	jr	k0
+	 rfe
+#else
+	/* I hope three instructions between MTC0 and ERET are enough... */
+	MTC0	k0, CP0_EPC
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	.set	mips3
+	eret
+	.set	mips0
+#endif
+	.set	pop
+	END(handle_ri)
+
 #ifdef CONFIG_64BIT
 /* A temporary overflow handler used by check_daddi(). */
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 15:00 [PATCH] fast path for rdhwr emulation for TLS Atsushi Nemoto
@ 2006-07-07 15:22 ` Maciej W. Rozycki
  2006-07-07 16:12   ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Maciej W. Rozycki @ 2006-07-07 15:22 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: linux-mips, ralf

On Sat, 8 Jul 2006, Atsushi Nemoto wrote:

> Adding special short path for emulationg RDHWR which is used to
> support TLS.

 You need to take care of VIVT I-caches.

> @@ -369,6 +369,39 @@ #endif
>  	BUILD_HANDLER dsp dsp sti silent		/* #26 */
>  	BUILD_HANDLER reserved reserved sti verbose	/* others */
>  
> +	.align	5
> +	LEAF(handle_ri)
> +	.set	push
> +	.set	noat
> +	mfc0	k0, CP0_CAUSE
> +	MFC0	k1, CP0_EPC
> +	bltz	k0, handle_ri_slow	/* if delay slot */
> +	lw	k0, (k1)

 For a VIVT I-cache this can result in a TLB exception.  TLB handlers are 
not currently prepared for being called at the exception level.

 Also I am fairly sure gas won't fill the branch delay slot above -- a 
trivial rearrangement of code would save a cycle here (and this is a fast 
path, so we do not want wasting time).

> +	li	k1, 0x7c03e83b	/* rdhwr v1,$29 */
> +	bne	k0, k1, handle_ri_slow	/* if not ours */
> +	get_saved_sp	/* k1 := current_thread_info */
> +	MFC0	k0, CP0_EPC
> +	LONG_ADDIU	k0, 4

 I suggest moving MFC0 ahead of get_saved_sp to avoid a stall.  I would 
fit in the branch delay slot nicely.

  Maciej

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 15:22 ` Maciej W. Rozycki
@ 2006-07-07 16:12   ` Atsushi Nemoto
  2006-07-07 16:43     ` Atsushi Nemoto
  2006-07-07 16:58     ` Maciej W. Rozycki
  0 siblings, 2 replies; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-07 16:12 UTC (permalink / raw)
  To: macro; +Cc: linux-mips, ralf

On Fri, 7 Jul 2006 16:22:46 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > +	.align	5
> > +	LEAF(handle_ri)
> > +	.set	push
> > +	.set	noat
> > +	mfc0	k0, CP0_CAUSE
> > +	MFC0	k1, CP0_EPC
> > +	bltz	k0, handle_ri_slow	/* if delay slot */
> > +	lw	k0, (k1)
> 
>  For a VIVT I-cache this can result in a TLB exception.  TLB handlers are 
> not currently prepared for being called at the exception level.

Thanks, now I understand the problem.  Are there any good solutions?
Only I can think now is using handle_ri_slow for such CPUs.

>  Also I am fairly sure gas won't fill the branch delay slot above -- a 
> trivial rearrangement of code would save a cycle here (and this is a fast 
> path, so we do not want wasting time).

Well, here is a code compiled by binutils 2.17.  This version of gas
can put MFC0 on the delay slot.  But it might be better to use
noreorder by myself.

80012a80 <handle_ri>:
80012a80:	401a6800 	mfc0	k0,c0_cause
80012a84:	0740fd2e 	bltz	k0,80011f40 <handle_ri_slow>
80012a88:	401b7000 	mfc0	k1,c0_epc
80012a8c:	8f7a0000 	lw	k0,0(k1)
80012a90:	3c1b7c03 	lui	k1,0x7c03
80012a94:	377be83b 	ori	k1,k1,0xe83b
80012a98:	175bfd29 	bne	k0,k1,80011f40 <handle_ri_slow>
80012a9c:	00000000 	nop
80012aa0:	3c1b801b 	lui	k1,0x801b
80012aa4:	8f7b4008 	lw	k1,16392(k1)
80012aa8:	401a7000 	mfc0	k0,c0_epc
80012aac:	275a0004 	addiu	k0,k0,4
80012ab0:	409a7000 	mtc0	k0,c0_epc
80012ab4:	377b1fff 	ori	k1,k1,0x1fff
80012ab8:	3b7b1fff 	xori	k1,k1,0x1fff
80012abc:	8f63000c 	lw	v1,12(k1)
80012ac0:	42000018 	eret

> > +	li	k1, 0x7c03e83b	/* rdhwr v1,$29 */
> > +	bne	k0, k1, handle_ri_slow	/* if not ours */
> > +	get_saved_sp	/* k1 := current_thread_info */
> > +	MFC0	k0, CP0_EPC
> > +	LONG_ADDIU	k0, 4
> 
>  I suggest moving MFC0 ahead of get_saved_sp to avoid a stall.  I would 
> fit in the branch delay slot nicely.

The MFC0 can not be moved.  SMP version of get_saved_sp uses k0 and
k1.  But of course I can use #ifdef CONFIG_SMP, but these assumption
makes the code a bit fragile.  Another performance vs. maintainance
cost issue...

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 16:12   ` Atsushi Nemoto
@ 2006-07-07 16:43     ` Atsushi Nemoto
  2006-07-07 17:04       ` Maciej W. Rozycki
  2006-07-07 18:22       ` Ralf Baechle
  2006-07-07 16:58     ` Maciej W. Rozycki
  1 sibling, 2 replies; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-07 16:43 UTC (permalink / raw)
  To: macro; +Cc: linux-mips, ralf

On Sat, 08 Jul 2006 01:12:45 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote:
> >  For a VIVT I-cache this can result in a TLB exception.  TLB handlers are 
> > not currently prepared for being called at the exception level.
> 
> Thanks, now I understand the problem.  Are there any good solutions?
> Only I can think now is using handle_ri_slow for such CPUs.

Can we use Index_Load_Data_I to load the instruction code from icache?
Just an idea...

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 16:12   ` Atsushi Nemoto
  2006-07-07 16:43     ` Atsushi Nemoto
@ 2006-07-07 16:58     ` Maciej W. Rozycki
  2006-07-08 16:12       ` Atsushi Nemoto
  2006-07-10 14:55       ` Atsushi Nemoto
  1 sibling, 2 replies; 26+ messages in thread
From: Maciej W. Rozycki @ 2006-07-07 16:58 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: linux-mips, ralf

On Sat, 8 Jul 2006, Atsushi Nemoto wrote:

> >  For a VIVT I-cache this can result in a TLB exception.  TLB handlers are 
> > not currently prepared for being called at the exception level.
> 
> Thanks, now I understand the problem.  Are there any good solutions?
> Only I can think now is using handle_ri_slow for such CPUs.

 I have implemented an appropriate update to the TLB handlers (or actually 
it's enough to care for this case for the TLBL exception), but it predates 
the current synthesized ones.  There is a small impact resulting from 
this change and the synthesized handlers have the advantage of making it 
only necessary for these chips that do need such handling.

 There are two possible ways of handling TLB exceptions from the exception 
level, both requiring checking cp0.index.p (which we do not do at the 
moment under the assumption a TLB refill exception has already been taken 
and handled) and if a failure is indicated either:

1. jumping to the TLB refill handler,

or:

2. executing "tlbwr" rather than "tlbwi".

Both are good, but I have not benchmarked them -- note that a failure is 
expected to be an extremely rare event, so it's the performance for the 
probe success that matters.

> >  Also I am fairly sure gas won't fill the branch delay slot above -- a 
> > trivial rearrangement of code would save a cycle here (and this is a fast 
> > path, so we do not want wasting time).
> 
> Well, here is a code compiled by binutils 2.17.  This version of gas
> can put MFC0 on the delay slot.  But it might be better to use
> noreorder by myself.
> 
> 80012a80 <handle_ri>:
> 80012a80:	401a6800 	mfc0	k0,c0_cause
> 80012a84:	0740fd2e 	bltz	k0,80011f40 <handle_ri_slow>
> 80012a88:	401b7000 	mfc0	k1,c0_epc
> 80012a8c:	8f7a0000 	lw	k0,0(k1)

 Still bad -- you have a stall on $k1 here.  And on $k0 two instructions 
earlier.

> 80012a90:	3c1b7c03 	lui	k1,0x7c03
> 80012a94:	377be83b 	ori	k1,k1,0xe83b
> 80012a98:	175bfd29 	bne	k0,k1,80011f40 <handle_ri_slow>
> 80012a9c:	00000000 	nop

 And this "nop" is a waste of time.

> 80012aa0:	3c1b801b 	lui	k1,0x801b
> 80012aa4:	8f7b4008 	lw	k1,16392(k1)
> 80012aa8:	401a7000 	mfc0	k0,c0_epc
> 80012aac:	275a0004 	addiu	k0,k0,4
> 80012ab0:	409a7000 	mtc0	k0,c0_epc
> 80012ab4:	377b1fff 	ori	k1,k1,0x1fff
> 80012ab8:	3b7b1fff 	xori	k1,k1,0x1fff
> 80012abc:	8f63000c 	lw	v1,12(k1)
> 80012ac0:	42000018 	eret

 I'd restructure the code more or less like this, taking care for (almost) 
all stalls resulting from interlocks on coprocessor moves and memory loads 
and likewise avoiding the need for "nop" fillers there for MIPS I 
processors:

	.set	push
	.set	noat
	.set	noreorder
	mfc0	k0, CP0_CAUSE
	MFC0	k1, CP0_EPC
	bltz	k0, handle_ri_slow	/* if delay slot */
	 lui	k0, 0x7c03
	lw	k1, (k1)
	ori	k0, 0xe83b		/* k0 := rdhwr v1,$29 */
	bne	k0, k1, handle_ri_slow	/* if not ours */
	 get_saved_sp			/* k1 := current_thread_info */
	MFC0	k0, CP0_EPC
#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
	ori	k1, _THREAD_MASK
	xori	k1, _THREAD_MASK
	LONG_L	v1, TI_FLAGS(k1)
	PTR_ADDIU k0, 4
	jr	k0
	 rfe
#else
	PTR_ADDIU k0, 4			/* stall on $k0 */
	MTC0	k0, CP0_EPC
	ori	k1, _THREAD_MASK
	xori	k1, _THREAD_MASK
	LONG_L	v1, TI_FLAGS(k1)
	eret
#endif
	.set	pop

I hope I got this right. ;-)

  Maciej

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 16:43     ` Atsushi Nemoto
@ 2006-07-07 17:04       ` Maciej W. Rozycki
  2006-07-07 18:22       ` Ralf Baechle
  1 sibling, 0 replies; 26+ messages in thread
From: Maciej W. Rozycki @ 2006-07-07 17:04 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: linux-mips, ralf

On Sat, 8 Jul 2006, Atsushi Nemoto wrote:

> Can we use Index_Load_Data_I to load the instruction code from icache?

 No need to go through such a hassle when we have a proper architectural 
way of handling it.  Remember MIPS TLB-based MMUs (the two variations I 
know well, anyway) were designed to support a paged kernel.

  Maciej

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 16:43     ` Atsushi Nemoto
  2006-07-07 17:04       ` Maciej W. Rozycki
@ 2006-07-07 18:22       ` Ralf Baechle
  1 sibling, 0 replies; 26+ messages in thread
From: Ralf Baechle @ 2006-07-07 18:22 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: macro, linux-mips

On Sat, Jul 08, 2006 at 01:43:39AM +0900, Atsushi Nemoto wrote:

> > >  For a VIVT I-cache this can result in a TLB exception.  TLB handlers are 
> > > not currently prepared for being called at the exception level.
> > 
> > Thanks, now I understand the problem.  Are there any good solutions?
> > Only I can think now is using handle_ri_slow for such CPUs.
> 
> Can we use Index_Load_Data_I to load the instruction code from icache?
> Just an idea...

In addition to what Maciej said - the format of instructions in the I-cache
is not necessarily the same as in memory.  Many processor store pre-decoded
instructions in the I-cache.

  Ralf

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 16:58     ` Maciej W. Rozycki
@ 2006-07-08 16:12       ` Atsushi Nemoto
  2006-07-10 14:40         ` Atsushi Nemoto
  2006-07-10 14:55       ` Atsushi Nemoto
  1 sibling, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-08 16:12 UTC (permalink / raw)
  To: macro; +Cc: linux-mips, ralf

On Fri, 7 Jul 2006 17:58:44 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > Thanks, now I understand the problem.  Are there any good solutions?
> > Only I can think now is using handle_ri_slow for such CPUs.
> 
>  I have implemented an appropriate update to the TLB handlers (or actually 
> it's enough to care for this case for the TLBL exception), but it predates 
> the current synthesized ones.  There is a small impact resulting from 
> this change and the synthesized handlers have the advantage of making it 
> only necessary for these chips that do need such handling.

Do you still have the code?  Could you post it for reference?

>  I'd restructure the code more or less like this, taking care for (almost) 
> all stalls resulting from interlocks on coprocessor moves and memory loads 
> and likewise avoiding the need for "nop" fillers there for MIPS I 
> processors:

Thanks.  I'll look it deeply.

> 	bne	k0, k1, handle_ri_slow	/* if not ours */
> 	 get_saved_sp			/* k1 := current_thread_info */

Unfortunately, get_saved_sp is not a single instruction...

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-08 16:12       ` Atsushi Nemoto
@ 2006-07-10 14:40         ` Atsushi Nemoto
  2006-09-14 17:28           ` Ralf Baechle
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-10 14:40 UTC (permalink / raw)
  To: linux-mips; +Cc: ralf, macro

Take 2.  Comments (especially from pipeline wizards) are welcome.

Add special short path for emulationg RDHWR which is used to support
TLS.  The handle_tlbl synthesizer takes a care for
cpu_has_vtag_icache.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index 37fda3d..dfceea9 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -375,6 +375,43 @@ #endif
 	BUILD_HANDLER dsp dsp sti silent		/* #26 */
 	BUILD_HANDLER reserved reserved sti verbose	/* others */
 
+	.align	5
+	LEAF(handle_ri_rdhwr)
+	.set	push
+	.set	noat
+	.set	noreorder
+	/* 0x7c03e83b: rdhwr v1,$29 */
+	MFC0	k1, CP0_EPC
+	lui	k0, 0x7c03
+	lw	k1, (k1)
+	ori	k0, 0xe83b
+	.set	reorder
+	bne	k0, k1, handle_ri	/* if not ours */
+	/* The insn is rdhwr.  No need to check CAUSE.BD here. */
+	get_saved_sp	/* k1 := current_thread_info */
+	.set	noreorder
+	MFC0	k0, CP0_EPC
+#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	LONG_ADDIU	k0, 4
+	jr	k0
+	 rfe
+#else
+	LONG_ADDIU	k0, 4		/* stall on $k0 */
+	MTC0	k0, CP0_EPC
+	/* I hope three instructions between MTC0 and ERET are enough... */
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	.set	mips3
+	eret
+	.set	mips0
+#endif
+	.set	pop
+	END(handle_ri_rdhwr)
+
 #ifdef CONFIG_64BIT
 /* A temporary overflow handler used by check_daddi(). */
 
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 954a198..46eba9f 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -52,6 +52,7 @@ extern asmlinkage void handle_dbe(void);
 extern asmlinkage void handle_sys(void);
 extern asmlinkage void handle_bp(void);
 extern asmlinkage void handle_ri(void);
+extern asmlinkage void handle_ri_rdhwr(void);
 extern asmlinkage void handle_cpu(void);
 extern asmlinkage void handle_ov(void);
 extern asmlinkage void handle_tr(void);
@@ -1381,6 +1382,15 @@ #endif
 	memcpy((void *)(uncached_ebase + offset), addr, size);
 }
 
+int __initdata rdhwr_noopt;
+static int __init set_rdhwr_noopt(char *str)
+{
+	rdhwr_noopt = 1;
+	return 1;
+}
+
+__setup("rdhwr_noopt", set_rdhwr_noopt);
+
 void __init trap_init(void)
 {
 	extern char except_vec3_generic, except_vec3_r4000;
@@ -1460,7 +1470,7 @@ void __init trap_init(void)
 
 	set_except_vector(8, handle_sys);
 	set_except_vector(9, handle_bp);
-	set_except_vector(10, handle_ri);
+	set_except_vector(10, rdhwr_noopt ? handle_ri : handle_ri_rdhwr);
 	set_except_vector(11, handle_cpu);
 	set_except_vector(12, handle_ov);
 	set_except_vector(13, handle_tr);
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 375e099..3f53fa7 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -817,9 +817,10 @@ static __init void __attribute__((unused
  * Write random or indexed TLB entry, and care about the hazards from
  * the preceeding mtc0 and for the following eret.
  */
-enum tlb_write_entry { tlb_random, tlb_indexed };
+enum tlb_write_entry { tlb_random, tlb_indexed, tlb_arbitrary };
 
-static __init void build_tlb_write_entry(u32 **p, struct label **l,
+static __init void build_tlb_write_entry(u32 **p, unsigned int tmp,
+					 struct label **l,
 					 struct reloc **r,
 					 enum tlb_write_entry wmode)
 {
@@ -828,6 +829,11 @@ static __init void build_tlb_write_entry
 	switch (wmode) {
 	case tlb_random: tlbw = i_tlbwr; break;
 	case tlb_indexed: tlbw = i_tlbwi; break;
+	case tlb_arbitrary:
+		/* tmp contains CP0_INDEX.  see build_update_entries(). */
+		/* if tmp <= 0, use tlbwr instead of tlbwi */
+		tlbw = i_tlbwr;
+		break;
 	}
 
 	switch (current_cpu_data.cputype) {
@@ -841,6 +847,10 @@ static __init void build_tlb_write_entry
 		 * This branch uses up a mtc0 hazard nop slot and saves
 		 * two nops after the tlbw instruction.
 		 */
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		}
 		il_bgezl(p, r, 0, label_tlbw_hazard);
 		tlbw(p);
 		l_tlbw_hazard(l, *p);
@@ -851,8 +861,13 @@ static __init void build_tlb_write_entry
 	case CPU_R4700:
 	case CPU_R5000:
 	case CPU_R5000A:
-		i_nop(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_nop(p);
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		i_nop(p);
 		break;
 
@@ -865,8 +880,13 @@ static __init void build_tlb_write_entry
 	case CPU_AU1550:
 	case CPU_AU1200:
 	case CPU_PR4450:
-		i_nop(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_nop(p);
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		break;
 
 	case CPU_R10000:
@@ -878,15 +898,24 @@ static __init void build_tlb_write_entry
 	case CPU_4KSC:
 	case CPU_20KC:
 	case CPU_25KF:
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		}
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		break;
 
 	case CPU_NEVADA:
-		i_nop(p); /* QED specifies 2 nops hazard */
 		/*
 		 * This branch uses up a mtc0 hazard nop slot and saves
 		 * a nop after the tlbw instruction.
 		 */
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_nop(p); /* QED specifies 2 nops hazard */
 		il_bgezl(p, r, 0, label_tlbw_hazard);
 		tlbw(p);
 		l_tlbw_hazard(l, *p);
@@ -896,8 +925,13 @@ static __init void build_tlb_write_entry
 		i_nop(p);
 		i_nop(p);
 		i_nop(p);
-		i_nop(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_nop(p);
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		break;
 
 	case CPU_4KEC:
@@ -905,7 +939,12 @@ static __init void build_tlb_write_entry
 	case CPU_34K:
 	case CPU_74K:
 		i_ehb(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		}
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		break;
 
 	case CPU_RM9000:
@@ -918,8 +957,13 @@ static __init void build_tlb_write_entry
 		i_ssnop(p);
 		i_ssnop(p);
 		i_ssnop(p);
-		i_ssnop(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_ssnop(p);
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		i_ssnop(p);
 		i_ssnop(p);
 		i_ssnop(p);
@@ -932,8 +976,13 @@ static __init void build_tlb_write_entry
 	case CPU_VR4181:
 	case CPU_VR4181A:
 		i_nop(p);
-		i_nop(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_nop(p);
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		i_nop(p);
 		i_nop(p);
 		break;
@@ -942,8 +991,13 @@ static __init void build_tlb_write_entry
 	case CPU_VR4133:
 	case CPU_R5432:
 		i_nop(p);
-		i_nop(p);
+		if (wmode == tlb_arbitrary) {
+			il_bgezl(p, r, tmp, label_tlbw_hazard);
+			i_tlbwi(p);
+		} else
+			i_nop(p);
 		tlbw(p);
+		l_tlbw_hazard(l, *p);
 		break;
 
 	default:
@@ -1123,7 +1177,7 @@ static __init void build_get_ptep(u32 **
 }
 
 static __init void build_update_entries(u32 **p, unsigned int tmp,
-					unsigned int ptep)
+					unsigned int ptep, int loadindex)
 {
 	/*
 	 * 64bit address support (36bit on a 32bit CPU) in a 32bit
@@ -1136,6 +1190,8 @@ #ifdef CONFIG_64BIT_PHYS_ADDR
 		i_dsrl(p, tmp, tmp, 6); /* convert to entrylo0 */
 		i_mtc0(p, tmp, C0_ENTRYLO0); /* load it */
 		i_dsrl(p, ptep, ptep, 6); /* convert to entrylo1 */
+		if (loadindex)
+			i_mfc0(p, tmp, C0_INDEX); /* used by tlb_arbitrary */
 		i_mtc0(p, ptep, C0_ENTRYLO1); /* load it */
 	} else {
 		int pte_off_even = sizeof(pte_t) / 2;
@@ -1145,6 +1201,8 @@ #ifdef CONFIG_64BIT_PHYS_ADDR
 		i_lw(p, tmp, pte_off_even, ptep); /* get even pte */
 		i_mtc0(p, tmp, C0_ENTRYLO0); /* load it */
 		i_lw(p, ptep, pte_off_odd, ptep); /* get odd pte */
+		if (loadindex)
+			i_mfc0(p, tmp, C0_INDEX); /* used by tlb_arbitrary */
 		i_mtc0(p, ptep, C0_ENTRYLO1); /* load it */
 	}
 #else
@@ -1157,8 +1215,8 @@ #else
 		i_mtc0(p, 0, C0_ENTRYLO0);
 	i_mtc0(p, tmp, C0_ENTRYLO0); /* load it */
 	i_SRL(p, ptep, ptep, 6); /* convert to entrylo1 */
-	if (r45k_bvahwbug())
-		i_mfc0(p, tmp, C0_INDEX);
+	if (r45k_bvahwbug() || loadindex)
+		i_mfc0(p, tmp, C0_INDEX); /* used by tlb_arbitrary */
 	if (r4k_250MHZhwbug())
 		i_mtc0(p, 0, C0_ENTRYLO1);
 	i_mtc0(p, ptep, C0_ENTRYLO1); /* load it */
@@ -1198,8 +1256,8 @@ #else
 #endif
 
 	build_get_ptep(&p, K0, K1);
-	build_update_entries(&p, K0, K1);
-	build_tlb_write_entry(&p, &l, &r, tlb_random);
+	build_update_entries(&p, K0, K1, 0);
+	build_tlb_write_entry(&p, K0, &l, &r, tlb_random);
 	l_leave(&l, p);
 	i_eret(&p); /* return from trap */
 
@@ -1647,12 +1705,13 @@ # endif
 static void __init
 build_r4000_tlbchange_handler_tail(u32 **p, struct label **l,
 				   struct reloc **r, unsigned int tmp,
-				   unsigned int ptr)
+				   unsigned int ptr,
+				   enum tlb_write_entry wmode)
 {
 	i_ori(p, ptr, ptr, sizeof(pte_t));
 	i_xori(p, ptr, ptr, sizeof(pte_t));
-	build_update_entries(p, tmp, ptr);
-	build_tlb_write_entry(p, l, r, tlb_indexed);
+	build_update_entries(p, tmp, ptr, wmode == tlb_arbitrary);
+	build_tlb_write_entry(p, tmp, l, r, wmode);
 	l_leave(l, *p);
 	i_eret(p); /* return from trap */
 
@@ -1667,6 +1726,9 @@ static void __init build_r4000_tlb_load_
 	struct label *l = labels;
 	struct reloc *r = relocs;
 	int i;
+ 	extern int rdhwr_noopt;
+ 	enum tlb_write_entry wmode = (!rdhwr_noopt && cpu_has_vtag_icache) ?
+		tlb_arbitrary : tlb_indexed;
 
 	memset(handle_tlbl, 0, sizeof(handle_tlbl));
 	memset(labels, 0, sizeof(labels));
@@ -1684,7 +1746,7 @@ static void __init build_r4000_tlb_load_
 	build_r4000_tlbchange_handler_head(&p, &l, &r, K0, K1);
 	build_pte_present(&p, &l, &r, K0, K1, label_nopage_tlbl);
 	build_make_valid(&p, &r, K0, K1);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1, wmode);
 
 	l_nopage_tlbl(&l, p);
 	i_j(&p, (unsigned long)tlb_do_page_fault_0 & 0x0fffffff);
@@ -1718,7 +1780,7 @@ static void __init build_r4000_tlb_store
 	build_r4000_tlbchange_handler_head(&p, &l, &r, K0, K1);
 	build_pte_writable(&p, &l, &r, K0, K1, label_nopage_tlbs);
 	build_make_write(&p, &r, K0, K1);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1, tlb_indexed);
 
 	l_nopage_tlbs(&l, p);
 	i_j(&p, (unsigned long)tlb_do_page_fault_1 & 0x0fffffff);
@@ -1753,7 +1815,7 @@ static void __init build_r4000_tlb_modif
 	build_pte_modifiable(&p, &l, &r, K0, K1, label_nopage_tlbm);
 	/* Present and writable bits set, set accessed and dirty bits. */
 	build_make_write(&p, &r, K0, K1);
-	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1);
+	build_r4000_tlbchange_handler_tail(&p, &l, &r, K0, K1, tlb_indexed);
 
 	l_nopage_tlbm(&l, p);
 	i_j(&p, (unsigned long)tlb_do_page_fault_1 & 0x0fffffff);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-07 16:58     ` Maciej W. Rozycki
  2006-07-08 16:12       ` Atsushi Nemoto
@ 2006-07-10 14:55       ` Atsushi Nemoto
  2006-07-11  2:53         ` Daniel Jacobowitz
  1 sibling, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-10 14:55 UTC (permalink / raw)
  To: macro; +Cc: linux-mips, ralf

On Fri, 7 Jul 2006 17:58:44 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> 	mfc0	k0, CP0_CAUSE
> 	MFC0	k1, CP0_EPC
> 	bltz	k0, handle_ri_slow	/* if delay slot */
> 	 lui	k0, 0x7c03

I noticed that checking for CP0_CAUSE.BD is unneeded, since we are
checking the instruction code anyway and "rdhwr" does not have a delay
slot.  I removed the checking on the "take 2" patch I just sent.

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-10 14:55       ` Atsushi Nemoto
@ 2006-07-11  2:53         ` Daniel Jacobowitz
  2006-07-11  3:20           ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Jacobowitz @ 2006-07-11  2:53 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: macro, linux-mips, ralf

On Mon, Jul 10, 2006 at 11:55:53PM +0900, Atsushi Nemoto wrote:
> On Fri, 7 Jul 2006 17:58:44 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > 	mfc0	k0, CP0_CAUSE
> > 	MFC0	k1, CP0_EPC
> > 	bltz	k0, handle_ri_slow	/* if delay slot */
> > 	 lui	k0, 0x7c03
> 
> I noticed that checking for CP0_CAUSE.BD is unneeded, since we are
> checking the instruction code anyway and "rdhwr" does not have a delay
> slot.  I removed the checking on the "take 2" patch I just sent.

Isn't BD "this instruction is in a delay slot", not "this instruction
has a delay slot"?  It affects where we go when we return.

BTW, if the fast emulation can't handle rdhwr in a delay slot, please
report a bug on GCC asking it not to put rdhwr in delay slots by
default.  It's probably worthwhile.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-11  2:53         ` Daniel Jacobowitz
@ 2006-07-11  3:20           ` Atsushi Nemoto
  2006-09-08 17:39             ` Nigel Stephens
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-07-11  3:20 UTC (permalink / raw)
  To: dan; +Cc: macro, linux-mips, ralf

On Mon, 10 Jul 2006 22:53:42 -0400, Daniel Jacobowitz <dan@debian.org> wrote:
> > I noticed that checking for CP0_CAUSE.BD is unneeded, since we are
> > checking the instruction code anyway and "rdhwr" does not have a delay
> > slot.  I removed the checking on the "take 2" patch I just sent.
> 
> Isn't BD "this instruction is in a delay slot", not "this instruction
> has a delay slot"?  It affects where we go when we return.
 
Well, the BD means "the exception occurred on a delay slot of this
(which EPC points) instruction".  If rdhwr was in a delay slot, EPC
points the preceding jump/branch instruction.  This fast path is
reading a instruction at the EPC (regardless BD), so it must not be
"rdhwr" and fall back to slow path.

> BTW, if the fast emulation can't handle rdhwr in a delay slot, please
> report a bug on GCC asking it not to put rdhwr in delay slots by
> default.  It's probably worthwhile.

If rdhwr was on a delay slot, the slow emulation will be more slower.
So I think rdhwr should not be put on delay slot anyway regardless
fast emulation.

I asked on GCC bugzilla a few days ago but can not got feedback yet.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-11  3:20           ` Atsushi Nemoto
@ 2006-09-08 17:39             ` Nigel Stephens
  2006-09-09 13:56               ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Nigel Stephens @ 2006-09-08 17:39 UTC (permalink / raw)
  To: Atsushi Nemoto, ralf; +Cc: dan, macro, linux-mips

moto wrote:
>   
>> BTW, if the fast emulation can't handle rdhwr in a delay slot, please
>> report a bug on GCC asking it not to put rdhwr in delay slots by
>> default.  It's probably worthwhile.
>>     
>
> If rdhwr was on a delay slot, the slow emulation will be more slower.
> So I think rdhwr should not be put on delay slot anyway regardless
> fast emulation.
>
> I asked on GCC bugzilla a few days ago but can not got feedback yet.
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126
>   

In spite of the GCC issue, is this patch now at the point where it could
be applied, or at least queued?

Nigel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-08 17:39             ` Nigel Stephens
@ 2006-09-09 13:56               ` Atsushi Nemoto
  2006-09-10 22:30                 ` Nigel Stephens
  2006-09-11 13:09                 ` Maciej W. Rozycki
  0 siblings, 2 replies; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-09 13:56 UTC (permalink / raw)
  To: nigel; +Cc: ralf, dan, macro, linux-mips

On Fri, 08 Sep 2006 18:39:08 +0100, Nigel Stephens <nigel@mips.com> wrote:
> > I asked on GCC bugzilla a few days ago but can not got feedback yet.
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126
> >   
> 
> In spite of the GCC issue, is this patch now at the point where it could
> be applied, or at least queued?

GCC 4.2 does not put RDHWR in delay slot now.  Also, there is a
"hackish fix" to prevent gcc move a RDHWR outside of a conditional
(from Richard Sandiford).

For kernel side, my patch can be still applied to current git tree as
is.

But I'm still looking for better solution (silver bullet?) for
cpu_has_vtag_icache case.

How about something like this (and do not touch tlbex.c)?

	LEAF(handle_ri_rdhwr_vivt)
	.set	push
	.set	noat
	.set	noreorder
	/* check if TLB contains a entry for EPC */
	MFC0	K1, CP0_ENTRYHI
	andi	k1, ASID_MASK
	MFC0	k0, CP0_EPC
	andi	k0, PAGE_MASK << 1
	or	k1, k0
	MTC0	k1, CP0_ENTRYHI
	tlbp
	mfc0	k1, CP0_INDEX
	bltz	k1, handle_ri	/* slow path */
	 nop
	/* fall thru */
	LEAF(handle_ri_rdhwr)

I'm wondering if this could work on CONFIG_MIPS_MT_SMTC case...

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-09 13:56               ` Atsushi Nemoto
@ 2006-09-10 22:30                 ` Nigel Stephens
  2006-09-11  5:04                   ` Atsushi Nemoto
  2006-09-11 13:09                 ` Maciej W. Rozycki
  1 sibling, 1 reply; 26+ messages in thread
From: Nigel Stephens @ 2006-09-10 22:30 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: ralf, dan, macro, linux-mips

Atsushi Nemoto wrote:
> On Fri, 08 Sep 2006 18:39:08 +0100, Nigel Stephens <nigel@mips.com> wrote:
>   
>>> I asked on GCC bugzilla a few days ago but can not got feedback yet.
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126
>>>   
>>>       
>> In spite of the GCC issue, is this patch now at the point where it could
>> be applied, or at least queued?
>>     
>
> GCC 4.2 does not put RDHWR in delay slot now.  Also, there is a
> "hackish fix" to prevent gcc move a RDHWR outside of a conditional
> (from Richard Sandiford).
>
> For kernel side, my patch can be still applied to current git tree as
> is.
>
> But I'm still looking for better solution (silver bullet?) for
> cpu_has_vtag_icache case.
>
> How about something like this (and do not touch tlbex.c)?
>
> 	LEAF(handle_ri_rdhwr_vivt)
> 	.set	push
> 	.set	noat
> 	.set	noreorder
> 	/* check if TLB contains a entry for EPC */
> 	MFC0	K1, CP0_ENTRYHI
> 	andi	k1, ASID_MASK
> 	MFC0	k0, CP0_EPC
> 	andi	k0, PAGE_MASK << 1
> 	or	k1, k0
> 	MTC0	k1, CP0_ENTRYHI
> 	tlbp
> 	mfc0	k1, CP0_INDEX
> 	bltz	k1, handle_ri	/* slow path */
> 	 nop
> 	/* fall thru */
> 	LEAF(handle_ri_rdhwr)
>
> I'm wondering if this could work on CONFIG_MIPS_MT_SMTC case...
>
>   

No, that wouldn't be reliable for CONFIG_MIPS_MT_SMTC, but then again 
the only CPU which currently runs SMTC has VIPT caches

Nigel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-10 22:30                 ` Nigel Stephens
@ 2006-09-11  5:04                   ` Atsushi Nemoto
  2006-09-11  8:50                     ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-11  5:04 UTC (permalink / raw)
  To: nigel; +Cc: ralf, dan, macro, linux-mips

On Sun, 10 Sep 2006 23:30:18 +0100, Nigel Stephens <nigel@mips.com> wrote:
> > 	LEAF(handle_ri_rdhwr_vivt)
...
> >
> > I'm wondering if this could work on CONFIG_MIPS_MT_SMTC case...
> 
> No, that wouldn't be reliable for CONFIG_MIPS_MT_SMTC, but then again 
> the only CPU which currently runs SMTC has VIPT caches

Then this woule be better then "take 2" patch?  This add some overhead
to fast RDHWR emulation path but no overhead to TLB refill path.

The tlb_probe_hazard is not exist in main branch for now but already
exist in queue branch.


Take 3.  Comments (especially from pipeline wizards) are welcome.

Add special short path for emulationg RDHWR which is used to support
TLS.  Add an extra prologue for cpu_has_vtag_icache case.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index 37fda3d..55e090e 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -19,6 +19,7 @@ #include <asm/fpregdef.h>
 #include <asm/mipsregs.h>
 #include <asm/stackframe.h>
 #include <asm/war.h>
+#include <asm/page.h>
 
 #define PANIC_PIC(msg)					\
 		.set push;				\
@@ -375,6 +376,72 @@ #endif
 	BUILD_HANDLER dsp dsp sti silent		/* #26 */
 	BUILD_HANDLER reserved reserved sti verbose	/* others */
 
+	.align	5
+	LEAF(handle_ri_rdhwr_vivt)
+#ifdef CONFIG_MIPS_MT_SMTC
+	PANIC_PIC("handle_ri_rdhwr_vivt called")
+#else
+	.set	push
+	.set	noat
+	.set	noreorder
+	/* check if TLB contains a entry for EPC */
+	MFC0	k1, CP0_ENTRYHI
+	andi	k1, 0xff	/* ASID_MASK */
+	MFC0	k0, CP0_EPC
+	PTR_SRL	k0, PAGE_SHIFT + 1
+	PTR_SLL	k0, PAGE_SHIFT + 1
+	or	k1, k0
+	MTC0	k1, CP0_ENTRYHI
+	mtc0_tlbw_hazard
+	tlbp
+#ifdef CONFIG_CPU_MIPSR2
+	_ehb			/* tlb_probe_hazard */
+#else
+	nop; nop; nop; nop; nop; nop	/* tlb_probe_hazard */
+#endif
+	mfc0	k1, CP0_INDEX
+	.set	pop
+	bltz	k1, handle_ri	/* slow path */
+	/* fall thru */
+#endif
+	END(handle_ri_rdhwr_vivt)
+
+	LEAF(handle_ri_rdhwr)
+	.set	push
+	.set	noat
+	.set	noreorder
+	/* 0x7c03e83b: rdhwr v1,$29 */
+	MFC0	k1, CP0_EPC
+	lui	k0, 0x7c03
+	lw	k1, (k1)
+	ori	k0, 0xe83b
+	.set	reorder
+	bne	k0, k1, handle_ri	/* if not ours */
+	/* The insn is rdhwr.  No need to check CAUSE.BD here. */
+	get_saved_sp	/* k1 := current_thread_info */
+	.set	noreorder
+	MFC0	k0, CP0_EPC
+#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	LONG_ADDIU	k0, 4
+	jr	k0
+	 rfe
+#else
+	LONG_ADDIU	k0, 4		/* stall on $k0 */
+	MTC0	k0, CP0_EPC
+	/* I hope three instructions between MTC0 and ERET are enough... */
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	.set	mips3
+	eret
+	.set	mips0
+#endif
+	.set	pop
+	END(handle_ri_rdhwr)
+
 #ifdef CONFIG_64BIT
 /* A temporary overflow handler used by check_daddi(). */
 
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e51d8fd..7ae454a 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -53,6 +53,8 @@ extern asmlinkage void handle_dbe(void);
 extern asmlinkage void handle_sys(void);
 extern asmlinkage void handle_bp(void);
 extern asmlinkage void handle_ri(void);
+extern asmlinkage void handle_ri_rdhwr_vivt(void);
+extern asmlinkage void handle_ri_rdhwr(void);
 extern asmlinkage void handle_cpu(void);
 extern asmlinkage void handle_ov(void);
 extern asmlinkage void handle_tr(void);
@@ -1453,6 +1455,15 @@ #endif
 	memcpy((void *)(uncached_ebase + offset), addr, size);
 }
 
+int __initdata rdhwr_noopt;
+static int __init set_rdhwr_noopt(char *str)
+{
+	rdhwr_noopt = 1;
+	return 1;
+}
+
+__setup("rdhwr_noopt", set_rdhwr_noopt);
+
 void __init trap_init(void)
 {
 	extern char except_vec3_generic, except_vec3_r4000;
@@ -1532,7 +1543,9 @@ void __init trap_init(void)
 
 	set_except_vector(8, handle_sys);
 	set_except_vector(9, handle_bp);
-	set_except_vector(10, handle_ri);
+	set_except_vector(10, rdhwr_noopt ? handle_ri :
+			  (cpu_has_vtag_icache ?
+			   handle_ri_rdhwr_vivt : handle_ri_rdhwr));
 	set_except_vector(11, handle_cpu);
 	set_except_vector(12, handle_ov);
 	set_except_vector(13, handle_tr);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11  5:04                   ` Atsushi Nemoto
@ 2006-09-11  8:50                     ` Atsushi Nemoto
  2006-09-11  9:49                       ` Thiemo Seufer
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-11  8:50 UTC (permalink / raw)
  To: nigel; +Cc: ralf, dan, macro, linux-mips

On Mon, 11 Sep 2006 14:04:03 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote:
> Then this woule be better then "take 2" patch?  This add some overhead
> to fast RDHWR emulation path but no overhead to TLB refill path.
> 
> The tlb_probe_hazard is not exist in main branch for now but already
> exist in queue branch.
> 
> 
> Take 3.  Comments (especially from pipeline wizards) are welcome.

Oops, "rdhwr_noopt" should be static in this take.  Revised.


Take 3(revised).

Add special short path for emulationg RDHWR which is used to support
TLS.  Add an extra prologue for cpu_has_vtag_icache case.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

diff --git a/arch/mips/kernel/genex.S b/arch/mips/kernel/genex.S
index 37fda3d..55e090e 100644
--- a/arch/mips/kernel/genex.S
+++ b/arch/mips/kernel/genex.S
@@ -19,6 +19,7 @@ #include <asm/fpregdef.h>
 #include <asm/mipsregs.h>
 #include <asm/stackframe.h>
 #include <asm/war.h>
+#include <asm/page.h>
 
 #define PANIC_PIC(msg)					\
 		.set push;				\
@@ -375,6 +376,72 @@ #endif
 	BUILD_HANDLER dsp dsp sti silent		/* #26 */
 	BUILD_HANDLER reserved reserved sti verbose	/* others */
 
+	.align	5
+	LEAF(handle_ri_rdhwr_vivt)
+#ifdef CONFIG_MIPS_MT_SMTC
+	PANIC_PIC("handle_ri_rdhwr_vivt called")
+#else
+	.set	push
+	.set	noat
+	.set	noreorder
+	/* check if TLB contains a entry for EPC */
+	MFC0	k1, CP0_ENTRYHI
+	andi	k1, 0xff	/* ASID_MASK */
+	MFC0	k0, CP0_EPC
+	PTR_SRL	k0, PAGE_SHIFT + 1
+	PTR_SLL	k0, PAGE_SHIFT + 1
+	or	k1, k0
+	MTC0	k1, CP0_ENTRYHI
+	mtc0_tlbw_hazard
+	tlbp
+#ifdef CONFIG_CPU_MIPSR2
+	_ehb			/* tlb_probe_hazard */
+#else
+	nop; nop; nop; nop; nop; nop	/* tlb_probe_hazard */
+#endif
+	mfc0	k1, CP0_INDEX
+	.set	pop
+	bltz	k1, handle_ri	/* slow path */
+	/* fall thru */
+#endif
+	END(handle_ri_rdhwr_vivt)
+
+	LEAF(handle_ri_rdhwr)
+	.set	push
+	.set	noat
+	.set	noreorder
+	/* 0x7c03e83b: rdhwr v1,$29 */
+	MFC0	k1, CP0_EPC
+	lui	k0, 0x7c03
+	lw	k1, (k1)
+	ori	k0, 0xe83b
+	.set	reorder
+	bne	k0, k1, handle_ri	/* if not ours */
+	/* The insn is rdhwr.  No need to check CAUSE.BD here. */
+	get_saved_sp	/* k1 := current_thread_info */
+	.set	noreorder
+	MFC0	k0, CP0_EPC
+#if defined(CONFIG_CPU_R3000) || defined(CONFIG_CPU_TX39XX)
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	LONG_ADDIU	k0, 4
+	jr	k0
+	 rfe
+#else
+	LONG_ADDIU	k0, 4		/* stall on $k0 */
+	MTC0	k0, CP0_EPC
+	/* I hope three instructions between MTC0 and ERET are enough... */
+	ori	k1, _THREAD_MASK
+	xori	k1, _THREAD_MASK
+	LONG_L	v1, TI_TP_VALUE(k1)
+	.set	mips3
+	eret
+	.set	mips0
+#endif
+	.set	pop
+	END(handle_ri_rdhwr)
+
 #ifdef CONFIG_64BIT
 /* A temporary overflow handler used by check_daddi(). */
 
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e51d8fd..e56b02f 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -53,6 +53,8 @@ extern asmlinkage void handle_dbe(void);
 extern asmlinkage void handle_sys(void);
 extern asmlinkage void handle_bp(void);
 extern asmlinkage void handle_ri(void);
+extern asmlinkage void handle_ri_rdhwr_vivt(void);
+extern asmlinkage void handle_ri_rdhwr(void);
 extern asmlinkage void handle_cpu(void);
 extern asmlinkage void handle_ov(void);
 extern asmlinkage void handle_tr(void);
@@ -1453,6 +1455,15 @@ #endif
 	memcpy((void *)(uncached_ebase + offset), addr, size);
 }
 
+static int __initdata rdhwr_noopt;
+static int __init set_rdhwr_noopt(char *str)
+{
+	rdhwr_noopt = 1;
+	return 1;
+}
+
+__setup("rdhwr_noopt", set_rdhwr_noopt);
+
 void __init trap_init(void)
 {
 	extern char except_vec3_generic, except_vec3_r4000;
@@ -1532,7 +1543,9 @@ void __init trap_init(void)
 
 	set_except_vector(8, handle_sys);
 	set_except_vector(9, handle_bp);
-	set_except_vector(10, handle_ri);
+	set_except_vector(10, rdhwr_noopt ? handle_ri :
+			  (cpu_has_vtag_icache ?
+			   handle_ri_rdhwr_vivt : handle_ri_rdhwr));
 	set_except_vector(11, handle_cpu);
 	set_except_vector(12, handle_ov);
 	set_except_vector(13, handle_tr);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11  8:50                     ` Atsushi Nemoto
@ 2006-09-11  9:49                       ` Thiemo Seufer
  2006-09-11 14:13                         ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Thiemo Seufer @ 2006-09-11  9:49 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: nigel, ralf, dan, macro, linux-mips

Atsushi Nemoto wrote:
[snip]
> @@ -375,6 +376,72 @@ #endif
>  	BUILD_HANDLER dsp dsp sti silent		/* #26 */
>  	BUILD_HANDLER reserved reserved sti verbose	/* others */
>  
> +	.align	5
> +	LEAF(handle_ri_rdhwr_vivt)
> +#ifdef CONFIG_MIPS_MT_SMTC
> +	PANIC_PIC("handle_ri_rdhwr_vivt called")
> +#else
> +	.set	push
> +	.set	noat
> +	.set	noreorder
> +	/* check if TLB contains a entry for EPC */
> +	MFC0	k1, CP0_ENTRYHI
> +	andi	k1, 0xff	/* ASID_MASK */
> +	MFC0	k0, CP0_EPC
> +	PTR_SRL	k0, PAGE_SHIFT + 1
> +	PTR_SLL	k0, PAGE_SHIFT + 1
> +	or	k1, k0
> +	MTC0	k1, CP0_ENTRYHI
> +	mtc0_tlbw_hazard
> +	tlbp

This needs a .set mips3/.set mips0 pair.

> +#ifdef CONFIG_CPU_MIPSR2
> +	_ehb			/* tlb_probe_hazard */
> +#else
> +	nop; nop; nop; nop; nop; nop	/* tlb_probe_hazard */
> +#endif

What about a mtc0_tlbp_hazard macro here?


Thiemo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-09 13:56               ` Atsushi Nemoto
  2006-09-10 22:30                 ` Nigel Stephens
@ 2006-09-11 13:09                 ` Maciej W. Rozycki
  2006-09-11 14:30                   ` Atsushi Nemoto
  1 sibling, 1 reply; 26+ messages in thread
From: Maciej W. Rozycki @ 2006-09-11 13:09 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: nigel, ralf, dan, linux-mips

On Sat, 9 Sep 2006, Atsushi Nemoto wrote:

> But I'm still looking for better solution (silver bullet?) for
> cpu_has_vtag_icache case.

 What's wrong with just letting a TLB fault happen?

  Maciej

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11  9:49                       ` Thiemo Seufer
@ 2006-09-11 14:13                         ` Atsushi Nemoto
  2006-09-11 15:17                           ` Thiemo Seufer
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-11 14:13 UTC (permalink / raw)
  To: ths; +Cc: nigel, ralf, dan, macro, linux-mips

On Mon, 11 Sep 2006 10:49:05 +0100, Thiemo Seufer <ths@networkno.de> wrote:
> > +	tlbp
> 
> This needs a .set mips3/.set mips0 pair.

The TLBP is belong to MIPS I ISA, isn't it?

> > +#ifdef CONFIG_CPU_MIPSR2
> > +	_ehb			/* tlb_probe_hazard */
> > +#else
> > +	nop; nop; nop; nop; nop; nop	/* tlb_probe_hazard */
> > +#endif
> 
> What about a mtc0_tlbp_hazard macro here?

You mean mtc0_tlbw_hazard?  I took them from tlb_probe_hazard macro in
queue branch.

And it looks current mtc0_tlbw_hazard asm macro does not match with
its C equivalent ...

	.macro	mtc0_tlbw_hazard
	b	. + 8
	.endm

#define mtc0_tlbw_hazard()						\
	__asm__ __volatile__(						\
	"	.set	noreorder				\n"	\
	"	nop						\n"	\
	"	nop						\n"	\
	"	nop						\n"	\
	"	nop						\n"	\
	"	nop						\n"	\
	"	nop						\n"	\
	"	.set	reorder					\n")

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11 13:09                 ` Maciej W. Rozycki
@ 2006-09-11 14:30                   ` Atsushi Nemoto
  2006-09-11 17:53                     ` Maciej W. Rozycki
  0 siblings, 1 reply; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-11 14:30 UTC (permalink / raw)
  To: macro; +Cc: nigel, ralf, dan, linux-mips

On Mon, 11 Sep 2006 14:09:20 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > But I'm still looking for better solution (silver bullet?) for
> > cpu_has_vtag_icache case.
> 
>  What's wrong with just letting a TLB fault happen?

It might add a little overhead to usual TLB refill handling.  The
overhead might be neglectable, but I'm not sure.

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11 14:13                         ` Atsushi Nemoto
@ 2006-09-11 15:17                           ` Thiemo Seufer
  0 siblings, 0 replies; 26+ messages in thread
From: Thiemo Seufer @ 2006-09-11 15:17 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: nigel, ralf, dan, macro, linux-mips

Atsushi Nemoto wrote:
> On Mon, 11 Sep 2006 10:49:05 +0100, Thiemo Seufer <ths@networkno.de> wrote:
> > > +	tlbp
> > 
> > This needs a .set mips3/.set mips0 pair.
> 
> The TLBP is belong to MIPS I ISA, isn't it?

Uh, right. I wasn't awake when I wrote that mail. :-)

> > > +#ifdef CONFIG_CPU_MIPSR2
> > > +	_ehb			/* tlb_probe_hazard */
> > > +#else
> > > +	nop; nop; nop; nop; nop; nop	/* tlb_probe_hazard */
> > > +#endif
> > 
> > What about a mtc0_tlbp_hazard macro here?
> 
> You mean mtc0_tlbw_hazard?  I took them from tlb_probe_hazard macro in
> queue branch.

Actually, I meant an equivalent to the build_tlb_probe_entry in tlbex.c,
plus a tlb_use_hazard.

> And it looks current mtc0_tlbw_hazard asm macro does not match with
> its C equivalent ...
> 
> 	.macro	mtc0_tlbw_hazard
> 	b	. + 8
> 	.endm
> 
> #define mtc0_tlbw_hazard()						\
> 	__asm__ __volatile__(						\
> 	"	.set	noreorder				\n"	\
> 	"	nop						\n"	\
> 	"	nop						\n"	\
> 	"	nop						\n"	\
> 	"	nop						\n"	\
> 	"	nop						\n"	\
> 	"	nop						\n"	\
> 	"	.set	reorder					\n")

It also lacks a case for R2 CPUs, where IIRC _ehb is the the way
approved by the spec.


Thiemo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11 14:30                   ` Atsushi Nemoto
@ 2006-09-11 17:53                     ` Maciej W. Rozycki
  2006-09-12  1:55                       ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Maciej W. Rozycki @ 2006-09-11 17:53 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: nigel, ralf, dan, linux-mips

On Mon, 11 Sep 2006, Atsushi Nemoto wrote:

> >  What's wrong with just letting a TLB fault happen?
> 
> It might add a little overhead to usual TLB refill handling.  The
> overhead might be neglectable, but I'm not sure.

 There is no need to change the refill handler -- only the general TLBL 
exception has to be modified.  And this one may be not too critical -- the 
change required is in the path to mark pages accessed.  Is the path 
frequent enough to seek a complex solution while a simple one would just 
work?

  Maciej

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-11 17:53                     ` Maciej W. Rozycki
@ 2006-09-12  1:55                       ` Atsushi Nemoto
  0 siblings, 0 replies; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-12  1:55 UTC (permalink / raw)
  To: macro; +Cc: nigel, ralf, dan, linux-mips

On Mon, 11 Sep 2006 18:53:29 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > It might add a little overhead to usual TLB refill handling.  The
> > overhead might be neglectable, but I'm not sure.
> 
>  There is no need to change the refill handler -- only the general TLBL 
> exception has to be modified.  And this one may be not too critical -- the 
> change required is in the path to mark pages accessed.  Is the path 
> frequent enough to seek a complex solution while a simple one would just 
> work?

Yes, my description was wrong.  general TLBL handling, not TLB refill
handling.

Hmm, it seems not so critical indeed.  Then "take 2" patch would be
exactly what you preferred.

http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=20060710.234010.07457279.anemo%40mba.ocn.ne.jp

Any comments about that?

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-07-10 14:40         ` Atsushi Nemoto
@ 2006-09-14 17:28           ` Ralf Baechle
  2006-09-15  3:09             ` Atsushi Nemoto
  0 siblings, 1 reply; 26+ messages in thread
From: Ralf Baechle @ 2006-09-14 17:28 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: linux-mips, macro

On Mon, Jul 10, 2006 at 11:40:10PM +0900, Atsushi Nemoto wrote:

> Add special short path for emulationg RDHWR which is used to support
> TLS.  The handle_tlbl synthesizer takes a care for
> cpu_has_vtag_icache.

I'm just wondering if we actually need such optimizations.  Have you ran
any application benchmarks?

  Ralf

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] fast path for rdhwr emulation for TLS
  2006-09-14 17:28           ` Ralf Baechle
@ 2006-09-15  3:09             ` Atsushi Nemoto
  0 siblings, 0 replies; 26+ messages in thread
From: Atsushi Nemoto @ 2006-09-15  3:09 UTC (permalink / raw)
  To: ralf; +Cc: linux-mips, macro

On Thu, 14 Sep 2006 18:28:05 +0100, Ralf Baechle <ralf@linux-mips.org> wrote:
> > Add special short path for emulationg RDHWR which is used to support
> > TLS.  The handle_tlbl synthesizer takes a care for
> > cpu_has_vtag_icache.
> 
> I'm just wondering if we actually need such optimizations.  Have you ran
> any application benchmarks?

I've measured time of NPTL pthread_mutex_lock/pthread_mutex_unlock loop.

	pthread_mutex_init(&m, NULL);
	gettimeofday(&start, NULL);
	for (i = 0; i < 1000000; i++) {
		pthread_mutex_lock(&m);
		pthread_mutex_unlock(&m);
	}
	gettimeofday(&end, NULL);


Without optimization:
	0.826407 sec / 1000000 loop

With optimization:
	0.415667 sec / 1000000 loop

It would be worth to do.
---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-09-15  3:09 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-07 15:00 [PATCH] fast path for rdhwr emulation for TLS Atsushi Nemoto
2006-07-07 15:22 ` Maciej W. Rozycki
2006-07-07 16:12   ` Atsushi Nemoto
2006-07-07 16:43     ` Atsushi Nemoto
2006-07-07 17:04       ` Maciej W. Rozycki
2006-07-07 18:22       ` Ralf Baechle
2006-07-07 16:58     ` Maciej W. Rozycki
2006-07-08 16:12       ` Atsushi Nemoto
2006-07-10 14:40         ` Atsushi Nemoto
2006-09-14 17:28           ` Ralf Baechle
2006-09-15  3:09             ` Atsushi Nemoto
2006-07-10 14:55       ` Atsushi Nemoto
2006-07-11  2:53         ` Daniel Jacobowitz
2006-07-11  3:20           ` Atsushi Nemoto
2006-09-08 17:39             ` Nigel Stephens
2006-09-09 13:56               ` Atsushi Nemoto
2006-09-10 22:30                 ` Nigel Stephens
2006-09-11  5:04                   ` Atsushi Nemoto
2006-09-11  8:50                     ` Atsushi Nemoto
2006-09-11  9:49                       ` Thiemo Seufer
2006-09-11 14:13                         ` Atsushi Nemoto
2006-09-11 15:17                           ` Thiemo Seufer
2006-09-11 13:09                 ` Maciej W. Rozycki
2006-09-11 14:30                   ` Atsushi Nemoto
2006-09-11 17:53                     ` Maciej W. Rozycki
2006-09-12  1:55                       ` Atsushi Nemoto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.