linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-02-26  8:29 Joakim Tjernlund
  2010-02-26  8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
  0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26  8:29 UTC (permalink / raw)
  To: linuxppc-dev

This set of tries to optimize the TLB code on 8xx even
more. If they work, it should be a noticable performance
boost.

I would be very happy if you could test them for me.

Joakim Tjernlund (4):
  8xx: Optimze TLB Miss handlers
  8xx: Avoid testing for kernel space in ITLB Miss.
  8xx: Don't touch ACCESSED when no SWAP.
  8xx: Use SPRG2 and DAR registers to stash r11 and cr.

 arch/powerpc/kernel/head_8xx.S |   70 +++++++++++++++++++++++++++-------------
 1 files changed, 47 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] 8xx: Optimze TLB Miss handlers
  2010-02-26  8:29 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund
@ 2010-02-26  8:29 ` Joakim Tjernlund
  2010-02-26  8:29   ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26  8:29 UTC (permalink / raw)
  To: linuxppc-dev

This removes a couple of insn's from the TLB Miss
handlers whithout changing functionality.
---
 arch/powerpc/kernel/head_8xx.S |   11 +++--------
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3ef743f..ecc4a02 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -343,17 +343,14 @@ InstructionTLBMiss:
 	cmpwi	cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT
 	bne-	cr0, 2f
 
-	/* Clear PP lsb, 0x400 */
-	rlwinm 	r10, r10, 0, 22, 20
-
 	/* The Linux PTE won't go exactly into the MMU TLB.
-	 * Software indicator bits 22 and 28 must be clear.
+	 * Software indicator bits 21 and 28 must be clear.
 	 * Software indicator bits 24, 25, 26, and 27 must be
 	 * set.  All other Linux PTE bits control the behavior
 	 * of the MMU.
 	 */
 	li	r11, 0x00f0
-	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
+	rlwimi	r10, r11, 0, 0x07f8	/* Set 24-27, clear 21-23,28 */
 	DO_8xx_CPU6(0x2d80, r3)
 	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
 
@@ -444,9 +441,7 @@ DataStoreTLBMiss:
 
 	/* Honour kernel RO, User NA */
 	/* 0x200 == Extended encoding, bit 22 */
-	/* r11 =  (r10 & _PAGE_USER) >> 2 */
-	rlwinm	r11, r10, 32-2, 0x200
-	or	r10, r11, r10
+	rlwimi	r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */
 	/* r11 =  (r10 & _PAGE_RW) >> 1 */
 	rlwinm	r11, r10, 32-1, 0x200
 	or	r10, r11, r10
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss.
  2010-02-26  8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
@ 2010-02-26  8:29   ` Joakim Tjernlund
  2010-02-26  8:29     ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
  2010-03-16 21:19     ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
  2010-02-26 19:50   ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
  2010-02-26 20:10   ` Kumar Gala
  2 siblings, 2 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26  8:29 UTC (permalink / raw)
  To: linuxppc-dev

Only modules will cause ITLB Misses as we always pin
the first 8MB of kernel memory.
---
 arch/powerpc/kernel/head_8xx.S |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index ecc4a02..84ca1d9 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -318,12 +318,16 @@ InstructionTLBMiss:
 	/* If we are faulting a kernel address, we have to use the
 	 * kernel page tables.
 	 */
+#ifdef CONFIG_MODULES
+	/* Only modules will cause ITLB Misses as we always
+	 * pin the first 8MB of kernel memory */
 	andi.	r11, r10, 0x0800	/* Address >= 0x80000000 */
 	beq	3f
 	lis	r11, swapper_pg_dir@h
 	ori	r11, r11, swapper_pg_dir@l
 	rlwimi	r10, r11, 0, 2, 19
 3:
+#endif
 	lwz	r11, 0(r10)	/* Get the level 1 entry */
 	rlwinm.	r10, r11,0,0,19	/* Extract page descriptor page address */
 	beq	2f		/* If zero, don't try to find a pte */
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP.
  2010-02-26  8:29   ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
@ 2010-02-26  8:29     ` Joakim Tjernlund
  2010-02-26  8:29       ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund
  2010-03-16 21:20       ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
  2010-03-16 21:19     ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
  1 sibling, 2 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26  8:29 UTC (permalink / raw)
  To: linuxppc-dev

Only the swap function cares about the ACCESSED bit in
the pte. Do not waste cycles updateting ACCESSED when swap
is not compiled into the kernel.
---
 arch/powerpc/kernel/head_8xx.S |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 84ca1d9..6478a96 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -343,10 +343,11 @@ InstructionTLBMiss:
 	mfspr	r11, SPRN_MD_TWC	/* ....and get the pte address */
 	lwz	r10, 0(r11)	/* Get the pte */
 
+#ifdef CONFIG_SWAP
 	andi.	r11, r10, _PAGE_ACCESSED | _PAGE_PRESENT
 	cmpwi	cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT
 	bne-	cr0, 2f
-
+#endif
 	/* The Linux PTE won't go exactly into the MMU TLB.
 	 * Software indicator bits 21 and 28 must be clear.
 	 * Software indicator bits 24, 25, 26, and 27 must be
@@ -439,10 +440,11 @@ DataStoreTLBMiss:
 	 * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
 	 * r10 = (r10 & ~PRESENT) | r11;
 	 */
+#ifdef CONFIG_SWAP
 	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
 	and	r11, r11, r10
 	rlwimi	r10, r11, 0, _PAGE_PRESENT
-
+#endif
 	/* Honour kernel RO, User NA */
 	/* 0x200 == Extended encoding, bit 22 */
 	rlwimi	r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr.
  2010-02-26  8:29     ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
@ 2010-02-26  8:29       ` Joakim Tjernlund
  2010-03-16 21:20       ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
  1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26  8:29 UTC (permalink / raw)
  To: linuxppc-dev

This avoids storing these registers in memory.
CPU6 errata will still use the old way.
Remove some G2 leftover accesses from 2.4
---
 arch/powerpc/kernel/head_8xx.S |   49 +++++++++++++++++++++++++++++----------
 1 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 6478a96..1f1a04b 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -71,9 +71,6 @@ _ENTRY(_start);
  * in the first level table, but that would require many changes to the
  * Linux page directory/table functions that I don't want to do right now.
  *
- * I used to use SPRG2 for a temporary register in the TLB handler, but it
- * has since been put to other uses.  I now use a hack to save a register
- * and the CCR at memory location 0.....Someday I'll fix this.....
  *	-- Dan
  */
 	.globl	__start
@@ -302,8 +299,13 @@ InstructionTLBMiss:
 	DO_8xx_CPU6(0x3f80, r3)
 	mtspr	SPRN_M_TW, r10	/* Save a couple of working registers */
 	mfcr	r10
+#ifdef CONFIG_8xx_CPU6
 	stw	r10, 0(r0)
 	stw	r11, 4(r0)
+#else
+	mtspr	SPRN_DAR, r10
+	mtspr	SPRN_SPRG2, r11
+#endif
 	mfspr	r10, SPRN_SRR0	/* Get effective address of fault */
 #ifdef CONFIG_8xx_CPU15
 	addi	r11, r10, 0x1000
@@ -359,13 +361,19 @@ InstructionTLBMiss:
 	DO_8xx_CPU6(0x2d80, r3)
 	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */
 
-	mfspr	r10, SPRN_M_TW	/* Restore registers */
+	/* Restore registers */
+#ifndef CONFIG_8xx_CPU6
+	mfspr	r10, SPRN_DAR
+	mtcr	r10
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	mfspr	r11, SPRN_SPRG2
+#else
 	lwz	r11, 0(r0)
 	mtcr	r11
 	lwz	r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
 	lwz	r3, 8(r0)
 #endif
+	mfspr	r10, SPRN_M_TW
 	rfi
 2:
 	mfspr	r11, SPRN_SRR1
@@ -375,13 +383,20 @@ InstructionTLBMiss:
 	rlwinm	r11, r11, 0, 0xffff
 	mtspr	SPRN_SRR1, r11
 
-	mfspr	r10, SPRN_M_TW	/* Restore registers */
+	/* Restore registers */
+#ifndef CONFIG_8xx_CPU6
+	mfspr	r10, SPRN_DAR
+	mtcr	r10
+	li	r11, 0x00f0
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	mfspr	r11, SPRN_SPRG2
+#else
 	lwz	r11, 0(r0)
 	mtcr	r11
 	lwz	r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
 	lwz	r3, 8(r0)
 #endif
+	mfspr	r10, SPRN_M_TW
 	b	InstructionAccess
 
 	. = 0x1200
@@ -392,8 +407,13 @@ DataStoreTLBMiss:
 	DO_8xx_CPU6(0x3f80, r3)
 	mtspr	SPRN_M_TW, r10	/* Save a couple of working registers */
 	mfcr	r10
+#ifdef CONFIG_8xx_CPU6
 	stw	r10, 0(r0)
 	stw	r11, 4(r0)
+#else
+	mtspr	SPRN_DAR, r10
+	mtspr	SPRN_SPRG2, r11
+#endif
 	mfspr	r10, SPRN_M_TWB	/* Get level 1 table entry address */
 
 	/* If we are faulting a kernel address, we have to use the
@@ -461,18 +481,24 @@ DataStoreTLBMiss:
 	 * of the MMU.
 	 */
 2:	li	r11, 0x00f0
-	mtspr	SPRN_DAR,r11	/* Tag DAR */
 	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
 	DO_8xx_CPU6(0x3d80, r3)
 	mtspr	SPRN_MD_RPN, r10	/* Update TLB entry */
 
-	mfspr	r10, SPRN_M_TW	/* Restore registers */
+	/* Restore registers */
+#ifndef CONFIG_8xx_CPU6
+	mfspr	r10, SPRN_DAR
+	mtcr	r10
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
+	mfspr	r11, SPRN_SPRG2
+#else
+	mtspr	SPRN_DAR, r11	/* Tag DAR */
 	lwz	r11, 0(r0)
 	mtcr	r11
 	lwz	r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
 	lwz	r3, 8(r0)
 #endif
+	mfspr	r10, SPRN_M_TW
 	rfi
 
 /* This is an instruction TLB error on the MPC8xx.  This could be due
@@ -684,9 +710,6 @@ start_here:
 	tophys(r4,r2)
 	addi	r4,r4,THREAD	/* init task's THREAD */
 	mtspr	SPRN_SPRG_THREAD,r4
-	li	r3,0
-	/* XXX What is that for ? SPRG2 appears otherwise unused on 8xx */
-	mtspr	SPRN_SPRG2,r3	/* 0 => r1 has kernel sp */
 
 	/* stack */
 	lis	r1,init_thread_union@ha
-- 
1.6.4.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
  2010-02-26  8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
  2010-02-26  8:29   ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
@ 2010-02-26 19:50   ` Scott Wood
  2010-02-27 15:23     ` Joakim Tjernlund
  2010-02-26 20:10   ` Kumar Gala
  2 siblings, 1 reply; 30+ messages in thread
From: Scott Wood @ 2010-02-26 19:50 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev

On Fri, Feb 26, 2010 at 09:29:40AM +0100, Joakim Tjernlund wrote:
> This removes a couple of insn's from the TLB Miss
> handlers whithout changing functionality.
> ---

Did a quick test of the patchset, seems to work OK (without CONFIG_SWAP or
CONFIG_MODULES).  Didn't try with CONFIG_8xx_CPU6.

-Scott

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
  2010-02-26  8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
  2010-02-26  8:29   ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
  2010-02-26 19:50   ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
@ 2010-02-26 20:10   ` Kumar Gala
  2010-02-27 15:25     ` Joakim Tjernlund
  2 siblings, 1 reply; 30+ messages in thread
From: Kumar Gala @ 2010-02-26 20:10 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev


On Feb 26, 2010, at 2:29 AM, Joakim Tjernlund wrote:

> 	li	r11, 0x00f0
> -	rlwimi	r10, r11, 0, 24, 28	/* Set 24-27, clear 28 */
> +	rlwimi	r10, r11, 0, 0x07f8	/* Set 24-27, clear 21-23,28 */
> 	DO_8xx_CPU6(0x2d80, r3)
> 	mtspr	SPRN_MI_RPN, r10	/* Update TLB entry */

Cool, didn't know 'as' supported this notation.

- k

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
  2010-02-26 19:50   ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
@ 2010-02-27 15:23     ` Joakim Tjernlund
  0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-27 15:23 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev


Scott Wood <scottwood@freescale.com> wrote on 2010/02/26 20:50:18:
>
> On Fri, Feb 26, 2010 at 09:29:40AM +0100, Joakim Tjernlund wrote:
> > This removes a couple of insn's from the TLB Miss
> > handlers whithout changing functionality.
> > ---
>
> Did a quick test of the patchset, seems to work OK (without CONFIG_SWAP or
> CONFIG_MODULES).  Didn't try with CONFIG_8xx_CPU6.

Cool, thanks a lot!

Not sure anyone is using 2.6 with CPU6 errata. Seems it was fixed years ago.

Should I resend the whole series with SOB line or just include it here?

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
  2010-02-26 20:10   ` Kumar Gala
@ 2010-02-27 15:25     ` Joakim Tjernlund
  0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-27 15:25 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev

Kumar Gala <galak@kernel.crashing.org> wrote on 2010/02/26 21:10:31:
>
>
> On Feb 26, 2010, at 2:29 AM, Joakim Tjernlund wrote:
>
> >    li   r11, 0x00f0
> > -   rlwimi   r10, r11, 0, 24, 28   /* Set 24-27, clear 28 */
> > +   rlwimi   r10, r11, 0, 0x07f8   /* Set 24-27, clear 21-23,28 */
> >    DO_8xx_CPU6(0x2d80, r3)
> >    mtspr   SPRN_MI_RPN, r10   /* Update TLB entry */
>
> Cool, didn't know 'as' supported this notation.

Yeah, it was Scott who gave me the clue and from what I can tell it
is an official syntax form. I find much easier to understand.

 Jocke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss.
  2010-02-26  8:29   ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
  2010-02-26  8:29     ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
@ 2010-03-16 21:19     ` Benjamin Herrenschmidt
  2010-03-17  7:35       ` Joakim Tjernlund
  1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-16 21:19 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev

On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> +#ifdef CONFIG_MODULES
> +       /* Only modules will cause ITLB Misses as we always
> +        * pin the first 8MB of kernel memory */
>         andi.   r11, r10, 0x0800        /* Address >= 0x80000000 */
>         beq     3f
>         lis     r11, swapper_pg_dir@h
>         ori     r11, r11, swapper_pg_dir@l
>         rlwimi  r10, r11, 0, 2, 19
>  3:
> +#endif

You can optimize that further I think...

You can probably just remove the code above, and add something to
do_page_fault() that lazily copies the kernel PGD entries from
swapper_pg_dir to the app pgdir. (You can even pre-fill that when
creating a new mm).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP.
  2010-02-26  8:29     ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
  2010-02-26  8:29       ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund
@ 2010-03-16 21:20       ` Benjamin Herrenschmidt
  2010-03-17  7:40         ` Joakim Tjernlund
  1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-16 21:20 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev

On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> Only the swap function cares about the ACCESSED bit in
> the pte. Do not waste cycles updateting ACCESSED when swap
> is not compiled into the kernel.
> ---

Your changeset comment is a bit misleading since the code isn't actually
updating ACCESSED... it's testing if ACCESSED is set and goes to the
higher level fault if not (which might then update ACCESSED).

Cheers,
Ben.

>  arch/powerpc/kernel/head_8xx.S |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 84ca1d9..6478a96 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -343,10 +343,11 @@ InstructionTLBMiss:
>  	mfspr	r11, SPRN_MD_TWC	/* ....and get the pte address */
>  	lwz	r10, 0(r11)	/* Get the pte */
>  
> +#ifdef CONFIG_SWAP
>  	andi.	r11, r10, _PAGE_ACCESSED | _PAGE_PRESENT
>  	cmpwi	cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT
>  	bne-	cr0, 2f
> -
> +#endif
>  	/* The Linux PTE won't go exactly into the MMU TLB.
>  	 * Software indicator bits 21 and 28 must be clear.
>  	 * Software indicator bits 24, 25, 26, and 27 must be
> @@ -439,10 +440,11 @@ DataStoreTLBMiss:
>  	 * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
>  	 * r10 = (r10 & ~PRESENT) | r11;
>  	 */
> +#ifdef CONFIG_SWAP
>  	rlwinm	r11, r10, 32-5, _PAGE_PRESENT
>  	and	r11, r11, r10
>  	rlwimi	r10, r11, 0, _PAGE_PRESENT
> -
> +#endif
>  	/* Honour kernel RO, User NA */
>  	/* 0x200 == Extended encoding, bit 22 */
>  	rlwimi	r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss.
  2010-03-16 21:19     ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
@ 2010-03-17  7:35       ` Joakim Tjernlund
  0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-17  7:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 2010/03/16 22:19:36:
>
> On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> > +#ifdef CONFIG_MODULES
> > +       /* Only modules will cause ITLB Misses as we always
> > +        * pin the first 8MB of kernel memory */
> >         andi.   r11, r10, 0x0800        /* Address >= 0x80000000 */
> >         beq     3f
> >         lis     r11, swapper_pg_dir@h
> >         ori     r11, r11, swapper_pg_dir@l
> >         rlwimi  r10, r11, 0, 2, 19
> >  3:
> > +#endif
>
> You can optimize that further I think...
>
> You can probably just remove the code above, and add something to
> do_page_fault() that lazily copies the kernel PGD entries from
> swapper_pg_dir to the app pgdir. (You can even pre-fill that when
> creating a new mm).

I did look at this at some point and could not figure out how
to do this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP.
  2010-03-16 21:20       ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
@ 2010-03-17  7:40         ` Joakim Tjernlund
  0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-17  7:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 2010/03/16 22:20:52:
>
> On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> > Only the swap function cares about the ACCESSED bit in
> > the pte. Do not waste cycles updateting ACCESSED when swap
> > is not compiled into the kernel.
> > ---
>
> Your changeset comment is a bit misleading since the code isn't actually
> updating ACCESSED... it's testing if ACCESSED is set and goes to the
> higher level fault if not (which might then update ACCESSED).

Right, I did have one or two variants that did update ACCESSED that
I experimented with, I guess that I got a bit confused by that.

The jury is still out on whether this patch is an improvement or not.

    Jocke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-08 10:42                       ` Joakim Tjernlund
@ 2010-03-09  6:30                         ` Wolfgang Denk
  0 siblings, 0 replies; 30+ messages in thread
From: Wolfgang Denk @ 2010-03-09  6:30 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, hs, linuxppc-dev

Dear Joakim Tjernlund,

In message <OF1413A940.58E7B20E-ONC12576E0.003A9000-C12576E0.003ACFB7@transmode.se> you wrote:
>
> > I use NFS.
> 
> Then I think it is possible NFS gets in the way for stable measurements. Anyone
> have experience with running lmbench on NFS?

NFS may have some influence here, but I doubt it is the primary cause
for these variations. The network where Heiko is running these tests
is mostly idle, so it should provide fairly constant conditions. Of
coursem the use of the network on the MPC8xx itself will add to the
variation, but again I would not expect so big differences.

Heiko - there is a 10 GB disk attached to the "tqm8xx" system; I
think there should be a usable root file system on it, but I cannot
remember the actual state. Maybe we can use that. Please contact me
on jabber this afternoon!

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Living on Earth may be expensive, but it includes an annual free trip
around the Sun.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-08  9:06                     ` Heiko Schocher
@ 2010-03-08 10:42                       ` Joakim Tjernlund
  2010-03-09  6:30                         ` Wolfgang Denk
  0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-08 10:42 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Heiko Schocher <hs@denx.de> wrote on 2010/03/08 10:06:39:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29:
> >> Hello Joakim,
> >>
> >> Joakim Tjernlund wrote:
> >> [...]
> >>> What would be interesting is to skip patch 3 and turn off
> >>> MODULES add PIN_TLB and compare that against your unpatched .33 but
> >>> with MODULES off and PIN_TLB on
> >> run     version
> >>
> >> 1-4   Linux2.6.33-rc without module support and PIN_TLB=on
> >> 5-8   Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
> >>
> >>                  L M B E N C H  3 . 0   S U M M A R Y
> >>                  ------------------------------------
> >>        (Alpha software, do not distribute)
> >
> > hmm, these results varies a lot. The only stable result I can see is:
> >
> >> Memory latencies in nanoseconds - smaller is better
> >>     (WARNING - may not be correct, check graphs)
> >> ------------------------------------------------------------------------------
> >> Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
> >> --------- -------------   ---   ----   ----    --------    --------    -------
> >> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.0    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1164.8    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.2    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.7  183.2       183.8      1163.7    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.8  172.4       173.2      1147.3    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1148.3    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.1      1146.9    No L2 cache?
> >> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1147.3    No L2 cache?
> >
> > I don't see why the other results vary so much. Are you using NFS or having
> much network
> > traffic?
>
> I use NFS.

Then I think it is possible NFS gets in the way for stable measurements. Anyone
have experience with running lmbench on NFS?

        Jocke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-08  8:44                   ` Joakim Tjernlund
@ 2010-03-08  9:06                     ` Heiko Schocher
  2010-03-08 10:42                       ` Joakim Tjernlund
  0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-08  9:06 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Hello Joakim,

Joakim Tjernlund wrote:
> Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29:
>> Hello Joakim,
>>
>> Joakim Tjernlund wrote:
>> [...]
>>> What would be interesting is to skip patch 3 and turn off
>>> MODULES add PIN_TLB and compare that against your unpatched .33 but
>>> with MODULES off and PIN_TLB on
>> run     version
>>
>> 1-4   Linux2.6.33-rc without module support and PIN_TLB=on
>> 5-8   Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
>>
>>                  L M B E N C H  3 . 0   S U M M A R Y
>>                  ------------------------------------
>>        (Alpha software, do not distribute)
> 
> hmm, these results varies a lot. The only stable result I can see is:
> 
>> Memory latencies in nanoseconds - smaller is better
>>     (WARNING - may not be correct, check graphs)
>> ------------------------------------------------------------------------------
>> Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
>> --------- -------------   ---   ----   ----    --------    --------    -------
>> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.0    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1164.8    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.2    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.7  183.2       183.8      1163.7    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.8  172.4       173.2      1147.3    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1148.3    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.1      1146.9    No L2 cache?
>> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1147.3    No L2 cache?
> 
> I don't see why the other results vary so much. Are you using NFS or having much network
> traffic?

I use NFS.

bye
Heiko
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-08  7:46                 ` Heiko Schocher
@ 2010-03-08  8:44                   ` Joakim Tjernlund
  2010-03-08  9:06                     ` Heiko Schocher
  0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-08  8:44 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> [...]
> > What would be interesting is to skip patch 3 and turn off
> > MODULES add PIN_TLB and compare that against your unpatched .33 but
> > with MODULES off and PIN_TLB on
>
> run     version
>
> 1-4   Linux2.6.33-rc without module support and PIN_TLB=on
> 5-8   Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
>
>                  L M B E N C H  3 . 0   S U M M A R Y
>                  ------------------------------------
>        (Alpha software, do not distribute)

hmm, these results varies a lot. The only stable result I can see is:

> Memory latencies in nanoseconds - smaller is better
>     (WARNING - may not be correct, check graphs)
> ------------------------------------------------------------------------------
> Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
> --------- -------------   ---   ----   ----    --------    --------    -------
> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.0    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1164.8    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.2    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.7  183.2       183.8      1163.7    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  172.4       173.2      1147.3    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1148.3    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.1      1146.9    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1147.3    No L2 cache?

I don't see why the other results vary so much. Are you using NFS or having much network
traffic?

      Jocke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-05 10:40               ` Joakim Tjernlund
@ 2010-03-08  7:46                 ` Heiko Schocher
  2010-03-08  8:44                   ` Joakim Tjernlund
  0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-08  7:46 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Hello Joakim,

Joakim Tjernlund wrote:
[...]
> What would be interesting is to skip patch 3 and turn off
> MODULES add PIN_TLB and compare that against your unpatched .33 but
> with MODULES off and PIN_TLB on

run     version

1-4	Linux2.6.33-rc without module support and PIN_TLB=on
5-8	Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4

                 L M B E N C H  3 . 0   S U M M A R Y
                 ------------------------------------
		 (Alpha software, do not distribute)

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem   scal
                                                     pages line   par   load
                                                           bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0100    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0300    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0100    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0100    1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx    Linux 2.6.33-   66 2.97 8.91 127. 1238 270. 22.3 92.1 6386 27.K 83.K
tqm8xx    Linux 2.6.33-   66 3.05 8.99 129. 1208 261. 22.3 85.3 6418 27.K 83.K
tqm8xx    Linux 2.6.33-   66 3.05 8.81 128. 1205 270. 22.3 87.3 6342 27.K 82.K
tqm8xx    Linux 2.6.33-   66 3.05 8.82 132. 1215 270. 23.1 86.7 6357 27.K 82.K
tqm8xx    Linux 2.6.33-   66 3.28 9.29 128. 1257 260. 23.9 83.7 6511 28.K 84.K
tqm8xx    Linux 2.6.33-   66 3.34 9.35 126. 1264 271. 23.1 86.6 6437 27.K 84.K
tqm8xx    Linux 2.6.33-   66 3.19 8.97 130. 1212 271. 23.1 95.3 6480 27.K 84.K
tqm8xx    Linux 2.6.33-   66 3.28 8.76 127. 1229 269. 22.9 90.9 6293 27.K 82.K

Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host                 OS  intgr intgr  intgr  intgr  intgr
                          bit   add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx    Linux 2.6.33-   15.2   17.9 1.2500  124.1  202.4
tqm8xx    Linux 2.6.33-   15.6   18.0 1.1900  124.1  196.4
tqm8xx    Linux 2.6.33-   15.2   17.9 1.2400  124.9  202.5
tqm8xx    Linux 2.6.33-   15.2   17.9 1.2400  124.2  196.8
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  203.6
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  202.1
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5700  125.0  202.2
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  121.1  196.4

Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS int64  int64  int64  int64  int64
                         bit    add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx    Linux 2.6.33-    15.          12.9 1944.1 1895.2
tqm8xx    Linux 2.6.33-    15.          12.9 1886.3 1894.4
tqm8xx    Linux 2.6.33-    15.          12.9 1944.1 1895.2
tqm8xx    Linux 2.6.33-    15.          12.9 1886.3 1894.8
tqm8xx    Linux 2.6.33-    15.          13.2 1944.1 1894.4
tqm8xx    Linux 2.6.33-    15.          13.2 1944.8 1896.3
tqm8xx    Linux 2.6.33-    15.          13.2 1945.2 1837.4
tqm8xx    Linux 2.6.33-    15.          13.2 1957.8 1907.4

Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host                 OS  float  float  float  float
                         add    mul    div    bogo
--------- ------------- ------ ------ ------ ------
tqm8xx    Linux 2.6.33- 1011.0 1620.2 5467.0 9868.0
tqm8xx    Linux 2.6.33- 1004.5 1630.1 5468.0 9852.0
tqm8xx    Linux 2.6.33- 1012.2 1620.5 5472.0 9855.0
tqm8xx    Linux 2.6.33- 1011.0 1620.2 5469.0 9866.0
tqm8xx    Linux 2.6.33- 1004.8 1617.3 5503.0 9856.0
tqm8xx    Linux 2.6.33- 1004.9 1577.1 5469.0 9859.0
tqm8xx    Linux 2.6.33- 1011.4 1618.5 5470.0 9859.0
tqm8xx    Linux 2.6.33- 1004.9 1620.5 5471.0 9904.0

Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS  double double double double
                         add    mul    div    bogo
--------- ------------- ------  ------ ------ ------
tqm8xx    Linux 2.6.33- 1555.5 2789.5 3725.7  12.8K
tqm8xx    Linux 2.6.33- 1513.2 2772.0 3720.0  12.7K
tqm8xx    Linux 2.6.33- 1555.8 2772.1 3730.0  12.7K
tqm8xx    Linux 2.6.33- 1555.5 2699.0 3725.0  12.7K
tqm8xx    Linux 2.6.33- 1513.8 2699.5 3610.7  12.7K
tqm8xx    Linux 2.6.33- 1566.7 2771.6 3750.0  12.7K
tqm8xx    Linux 2.6.33- 1556.7 2789.2 3612.1  12.6K
tqm8xx    Linux 2.6.33- 1556.7 2698.5 3749.3  12.6K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx    Linux 2.6.33-   64.4   74.9  130.2  111.1  180.4   123.2   211.1
tqm8xx    Linux 2.6.33-   67.4   81.0  125.0  117.0  183.7   127.7   208.4
tqm8xx    Linux 2.6.33-   67.5   80.5   92.7  115.3  156.9   128.0   183.8
tqm8xx    Linux 2.6.33-   67.0   80.2   90.5  114.6  159.4   126.8   185.8
tqm8xx    Linux 2.6.33-   82.0   87.8   88.0  116.1  149.3   125.5   182.2
tqm8xx    Linux 2.6.33-   81.7   98.5   97.6  123.8  158.1   135.3   188.0
tqm8xx    Linux 2.6.33-   67.9   87.7   90.7  114.9  151.1   127.3   177.9
tqm8xx    Linux 2.6.33-   67.5   80.3   84.6  113.6  145.7   124.8   170.9

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx    Linux 2.6.33-  64.4 254.3 455. 648.0       941.8       2505
tqm8xx    Linux 2.6.33-  67.4 261.2 456. 645.8       909.1       2439
tqm8xx    Linux 2.6.33-  67.5 264.8 459. 638.5       932.0       2447
tqm8xx    Linux 2.6.33-  67.0 262.4 454. 643.9       909.9       2442
tqm8xx    Linux 2.6.33-  82.0 302.1 500. 651.4       937.2       2504
tqm8xx    Linux 2.6.33-  81.7 300.2 510. 643.2       909.7       2490
tqm8xx    Linux 2.6.33-  67.9 266.7 498. 645.5       923.4       2442
tqm8xx    Linux 2.6.33-  67.5 260.8 444. 640.3       917.7       2440

*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS   UDP  RPC/  TCP   RPC/ TCP
                               UDP         TCP  conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx    Linux 2.6.33- 6097.6 3731.3  30.3K 4000.0  4026.0  20.5    31.9 131.9
tqm8xx    Linux 2.6.33- 5747.1 3623.2  32.3K 3952.6  4030.0  16.6    31.0 132.7
tqm8xx    Linux 2.6.33- 5405.4 3610.1  32.3K 3921.6  4004.0  15.5    30.0 131.9
tqm8xx    Linux 2.6.33- 5681.8 3891.1  35.7K 4219.4  3966.0 6.038    30.4 128.7
tqm8xx    Linux 2.6.33-  12.7K 3649.6  34.5K 7092.2  4066.0 3.604    31.4 133.6
tqm8xx    Linux 2.6.33- 5405.4 4032.3  38.5K 5494.5  4036.0  18.1    31.0 128.6
tqm8xx    Linux 2.6.33- 5405.4 3610.1  37.0K 7142.9  4078.0  15.4    31.0 133.2
tqm8xx    Linux 2.6.33- 5714.3 3623.2  30.3K 7194.2  4054.0  12.7    29.9 133.0

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx    Linux 2.6.33- 14.9 16.1 13.0   21.4   55.6   32.4   34.5 55.7  53.0
tqm8xx    Linux 2.6.33- 14.9 16.2 12.9   21.3   55.5   32.4   34.5 55.7  53.0
tqm8xx    Linux 2.6.33- 14.8 16.0 13.0   21.4   55.6   32.4   34.5 55.7  53.0
tqm8xx    Linux 2.6.33- 15.0 16.2 13.8   21.3   55.6   32.4   34.5 55.7  53.0
tqm8xx    Linux 2.6.33- 14.9 16.0 13.4   21.3   55.7   32.5   34.6 55.8  53.2
tqm8xx    Linux 2.6.33- 15.1 16.2 13.6   21.3   55.7   32.5   34.6 55.8  53.2
tqm8xx    Linux 2.6.33- 15.0 16.2 12.9   21.3   55.7   32.5   34.6 55.8  53.2
tqm8xx    Linux 2.6.33- 15.1 16.2 13.1   21.5   55.7   32.5   34.7 55.8  53.2

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
--------- -------------   ---   ----   ----    --------    --------    -------
tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.0    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1164.8    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  183.2       184.0      1163.2    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  183.2       183.8      1163.7    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  172.4       173.2      1147.3    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1148.3    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.1      1146.9    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  172.5       173.2      1147.3    No L2 cache?

make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-04 16:30             ` Heiko Schocher
  2010-03-05 10:40               ` Joakim Tjernlund
@ 2010-03-07 16:03               ` Joakim Tjernlund
  1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-07 16:03 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Heiko Schocher <hs@denx.de> wrote on 2010/03/04 17:30:07:

> From: Heiko Schocher <hs@denx.de>
> To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Cc: Wolfgang Denk <wd@denx.de>, Klaus-J=FCrgen <heydeck@kieback-peter=
.de>,
> linuxppc-dev@ozlabs.org, Scott Wood <scottwood@freescale.com>
> Date: 2010/03/04 17:30
> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
> >> From: Wolfgang Denk <wd@denx.de>
> >> To: hs@denx.de
> >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrg=
en
> >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
> >> <scottwood@freescale.com>
> >> Date: 2010/03/04 13:17
> >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
> >>
> >> Dear Heiko,
> >>
> >> thanks for running the tests.
> >>
> >> In message <4B8F8BB4.6070201@denx.de> you wrote:
> >>> here the results:
> >>>
> >>> run   version
> >>>
> >>> 1-4   2.6.33-rc6 without your patches
> >>> 5-8   2.6.33-rc6 with all your patches
> >>> 9-12   2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touc=
h ACCESSED
> >> when no SWAP)
> >>> 13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy
> >> So CONFIG_PIN_TLB imroves the performance as expected, while the o=
ther
> >> patches don;t show any measurable improvememt - or am I reading th=
e
> >> results incorrectly?

BTW, I have impl. all of the newer 2.6 TLB/MMU fixes(including the dcbX=
 fixup) for 2.4 as well.
If there is any interest I can polish them and submit for 2.4? I do nee=
d an external tester
for that though.

 Jocke=

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-04 16:30             ` Heiko Schocher
@ 2010-03-05 10:40               ` Joakim Tjernlund
  2010-03-08  7:46                 ` Heiko Schocher
  2010-03-07 16:03               ` Joakim Tjernlund
  1 sibling, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-05 10:40 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Heiko Schocher <hs@denx.de> wrote on 2010/03/04 17:30:07:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
> >> From: Wolfgang Denk <wd@denx.de>
> >> To: hs@denx.de
> >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrg=
en
> >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
> >> <scottwood@freescale.com>
> >> Date: 2010/03/04 13:17
> >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
> >>
> >> Dear Heiko,
> >>
> >> thanks for running the tests.
> >>
> >> In message <4B8F8BB4.6070201@denx.de> you wrote:
> >>> here the results:
> >>>
> >>> run   version
> >>>
> >>> 1-4   2.6.33-rc6 without your patches
> >>> 5-8   2.6.33-rc6 with all your patches
> >>> 9-12   2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touc=
h ACCESSED
> >> when no SWAP)
> >>> 13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy
> >> So CONFIG_PIN_TLB imroves the performance as expected, while the o=
ther
> >> patches don;t show any measurable improvememt - or am I reading th=
e
> >> results incorrectly?
> >
> > Close but not quite. What stands out most is:
> >
> > Memory latencies in nanoseconds - smaller is better
> >     (WARNING - may not be correct, check graphs)
> > -------------------------------------------------------------------=
-----------
> > Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem=
    Guesses
> > --------- -------------   ---   ----   ----    --------    --------=
    -------
> > tqm8xx    Linux 2.6.33-    66   31.8  141.0       184.0      1165.7=

> > tqm8xx    Linux 2.6.33-    66   31.8  141.2       184.2      1165.3=

> > tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.3      1165.6=

> > tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.2      1166.2=

> >
> > tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1100.5=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1102.5=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.7=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.6=
    No L2 cache?
> >
> > tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.1=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.0=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.7=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.2=
    No L2 cache?
> >
> > tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.7      1099.8=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.6      1100.5=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.7  171.0       171.7      1101.0=
    No L2 cache?
> > tqm8xx    Linux 2.6.33-    66   31.8  171.0       171.6      1101.3=
    No L2 cache?
> >
> >
> > Besides the numbers, note how the first group doesn't have a Guesse=
s entry.
> > Is there something odd with the results for the first group?
>
> Hmm.. just to be safe, I made this test again, but it shows also no e=
ntry in
> "Guesses" ... Hardware, Linux Source, rootFS, lmbench sources, all th=
e
> same ...

OK

>
> > Also, since you are using MODULES, patch 2 is nullified.
> > Patch 1 is very minor and should not show I think.
> > This leaves patches 3 & 4.
> > There appears to be something funny with patch 3,Don't touch ACCESS=
ED when no SWAP, as
> > it yields bad numbers for Prot Fault so perhaps I am missing someth=
ing that
> needs ACCESSED
> > even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
> > Is there any messages in the kernel log(dmesg)?
>
> I couldn;t find something in the output with dmesg ... but if you
> want this output, I can send it to you.

No, if you can't find anything in there, I won't either.

What would be interesting is to skip patch 3 and turn off
MODULES add PIN_TLB and compare that against your unpatched .33 but
with MODULES off and PIN_TLB on

 Jocke=

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-04 13:06           ` Joakim Tjernlund
@ 2010-03-04 16:30             ` Heiko Schocher
  2010-03-05 10:40               ` Joakim Tjernlund
  2010-03-07 16:03               ` Joakim Tjernlund
  0 siblings, 2 replies; 30+ messages in thread
From: Heiko Schocher @ 2010-03-04 16:30 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Hello Joakim,

Joakim Tjernlund wrote:
> Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
>> From: Wolfgang Denk <wd@denx.de>
>> To: hs@denx.de
>> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-Jürgen
>> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
>> <scottwood@freescale.com>
>> Date: 2010/03/04 13:17
>> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
>>
>> Dear Heiko,
>>
>> thanks for running the tests.
>>
>> In message <4B8F8BB4.6070201@denx.de> you wrote:
>>> here the results:
>>>
>>> run   version
>>>
>>> 1-4   2.6.33-rc6 without your patches
>>> 5-8   2.6.33-rc6 with all your patches
>>> 9-12   2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED
>> when no SWAP)
>>> 13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y
>> So CONFIG_PIN_TLB imroves the performance as expected, while the other
>> patches don;t show any measurable improvememt - or am I reading the
>> results incorrectly?
> 
> Close but not quite. What stands out most is:
> 
> Memory latencies in nanoseconds - smaller is better
>     (WARNING - may not be correct, check graphs)
> ------------------------------------------------------------------------------
> Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
> --------- -------------   ---   ----   ----    --------    --------    -------
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       184.0      1165.7
> tqm8xx    Linux 2.6.33-    66   31.8  141.2       184.2      1165.3
> tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.3      1165.6
> tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.2      1166.2
> 
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1100.5    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1102.5    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.7    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.6    No L2 cache?
> 
> tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.1    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.0    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.7    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.2    No L2 cache?
> 
> tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.7      1099.8    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.6      1100.5    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.7  171.0       171.7      1101.0    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  171.0       171.6      1101.3    No L2 cache?
> 
> 
> Besides the numbers, note how the first group doesn't have a Guesses entry.
> Is there something odd with the results for the first group?

Hmm.. just to be safe, I made this test again, but it shows also no entry in
"Guesses" ... Hardware, Linux Source, rootFS, lmbench sources, all the
same ...

> Also, since you are using MODULES, patch 2 is nullified.
> Patch 1 is very minor and should not show I think.
> This leaves patches 3 & 4.
> There appears to be something funny with patch 3,Don't touch ACCESSED when no SWAP, as
> it yields bad numbers for Prot Fault so perhaps I am missing something that needs ACCESSED
> even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
> Is there any messages in the kernel log(dmesg)?

I couldn;t find something in the output with dmesg ... but if you
want this output, I can send it to you.

bye
Heiko
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-04 12:16         ` Wolfgang Denk
@ 2010-03-04 13:06           ` Joakim Tjernlund
  2010-03-04 16:30             ` Heiko Schocher
  0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-04 13:06 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Scott Wood, linuxppc-dev, hs

Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:

> From: Wolfgang Denk <wd@denx.de>
> To: hs@denx.de
> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrgen
> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
> <scottwood@freescale.com>
> Date: 2010/03/04 13:17
> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
>
> Dear Heiko,
>
> thanks for running the tests.
>
> In message <4B8F8BB4.6070201@denx.de> you wrote:
> >
> > here the results:
> >
> > run   version
> >
> > 1-4   2.6.33-rc6 without your patches
> > 5-8   2.6.33-rc6 with all your patches
> > 9-12   2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch =
ACCESSED
> when no SWAP)
> > 13-16   2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy
>
> So CONFIG_PIN_TLB imroves the performance as expected, while the othe=
r
> patches don;t show any measurable improvememt - or am I reading the
> results incorrectly?

Close but not quite. What stands out most is:

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
-----------------------------------------------------------------------=
-------
Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    =
Guesses
--------- -------------   ---   ----   ----    --------    --------    =
-------
tqm8xx    Linux 2.6.33-    66   31.8  141.0       184.0      1165.7
tqm8xx    Linux 2.6.33-    66   31.8  141.2       184.2      1165.3
tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.3      1165.6
tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.2      1166.2

tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1100.5    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1102.5    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.7    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.6    =
No L2 cache?

tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.1    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.0    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.7    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.2    =
No L2 cache?

tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.7      1099.8    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.6      1100.5    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  171.0       171.7      1101.0    =
No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  171.0       171.6      1101.3    =
No L2 cache?


Besides the numbers, note how the first group doesn't have a Guesses en=
try.
Is there something odd with the results for the first group?

Also, since you are using MODULES, patch 2 is nullified.
Patch 1 is very minor and should not show I think.
This leaves patches 3 & 4.
There appears to be something funny with patch 3,Don't touch ACCESSED w=
hen no SWAP, as
it yields bad numbers for Prot Fault so perhaps I am missing something =
that needs ACCESSED
even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
Is there any messages in the kernel log(dmesg)?

 Jocke=

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-04 10:30       ` Heiko Schocher
@ 2010-03-04 12:16         ` Wolfgang Denk
  2010-03-04 13:06           ` Joakim Tjernlund
  0 siblings, 1 reply; 30+ messages in thread
From: Wolfgang Denk @ 2010-03-04 12:16 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev

Dear Heiko,

thanks for running the tests.

In message <4B8F8BB4.6070201@denx.de> you wrote:
> 
> here the results:
> 
> run	version
> 
> 1-4	2.6.33-rc6 without your patches
> 5-8	2.6.33-rc6 with all your patches
> 9-12	2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP)
> 13-16	2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y

So CONFIG_PIN_TLB imroves the performance as expected, while the other
patches don;t show any measurable improvememt - or am I reading the
results incorrectly?


Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
And now remains  That we find out the cause of this effect, Or rather
say, the cause of this defect...           -- Hamlet, Act II, Scene 2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-03 10:38     ` Joakim Tjernlund
@ 2010-03-04 10:30       ` Heiko Schocher
  2010-03-04 12:16         ` Wolfgang Denk
  0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-04 10:30 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Hello Joakim,

Joakim Tjernlund wrote:
> Could you try reverting patch:
>   8xx: Don't touch ACCESSED when no SWAP.
> and see if that makes a difference?
[...]
> Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement,
> regardless of my patches.

here the results:

run	version

1-4	2.6.33-rc6 without your patches
5-8	2.6.33-rc6 with all your patches
9-12	2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP)
13-16	2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y

> Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement,
> regardless of my patches.

make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'

                 L M B E N C H  3 . 0   S U M M A R Y
                 ------------------------------------
		 (Alpha software, do not distribute)

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem   scal
                                                     pages line   par   load
                                                           bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0100    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0100    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.1700    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0100    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    28    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1


Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx    Linux 2.6.33-   66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
tqm8xx    Linux 2.6.33-   66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
tqm8xx    Linux 2.6.33-   66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
tqm8xx    Linux 2.6.33-   66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
tqm8xx    Linux 2.6.33-   66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
tqm8xx    Linux 2.6.33-   66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
tqm8xx    Linux 2.6.33-   66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
tqm8xx    Linux 2.6.33-   66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
tqm8xx    Linux 2.6.33-   66 3.06 8.83 128. 1355 269. 20.7 89.2 6927 29.K 87.K
tqm8xx    Linux 2.6.33-   66 3.05 8.84 127. 1344 271. 21.6 90.5 6868 29.K 88.K
tqm8xx    Linux 2.6.33-   66 3.06 8.84 131. 1376 260. 21.4 88.1 7119 29.K 87.K
tqm8xx    Linux 2.6.33-   66 3.05 8.90 122. 1342 272. 21.4 88.6 6847 29.K 88.K
tqm8xx    Linux 2.6.33-   66 3.19 9.10 122. 1205 265. 20.9 90.3 6358 27.K 83.K
tqm8xx    Linux 2.6.33-   66 3.28 9.10 124. 1208 270. 20.9 95.2 6217 27.K 82.K
tqm8xx    Linux 2.6.33-   66 3.19 8.98 125. 1210 270. 21.1 87.9 6364 27.K 83.K
tqm8xx    Linux 2.6.33-   66 3.19 8.86 124. 1237 262. 21.3 90.7 6311 27.K 84.K

Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host                 OS  intgr intgr  intgr  intgr  intgr
                          bit   add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx    Linux 2.6.33-   15.7   18.0 1.5600  124.2  203.1
tqm8xx    Linux 2.6.33-   15.7   17.4 1.5800  121.1  202.8
tqm8xx    Linux 2.6.33-   15.2   17.9 1.6200  124.2  202.7
tqm8xx    Linux 2.6.33-   15.2   17.9 1.6000  125.0  204.0
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5600  124.7  204.4
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5800  124.2  202.8
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  203.2
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5500  124.5  202.0
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5500  124.5  202.6
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5500  121.0  196.5
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  121.0  202.5
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5500  125.1  196.4
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  202.1
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  203.4
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  196.4
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  196.5

Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS int64  int64  int64  int64  int64
                         bit    add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx    Linux 2.6.33-    15.          13.3 1952.2 1838.2
tqm8xx    Linux 2.6.33-    15.          13.2 1951.5 1837.8
tqm8xx    Linux 2.6.33-    15.          13.2 1886.7 1907.8
tqm8xx    Linux 2.6.33-    15.          13.2 1951.5 1838.2
tqm8xx    Linux 2.6.33-    15.          13.3 1887.0 1902.2
tqm8xx    Linux 2.6.33-    15.          13.3 1887.4 1901.5
tqm8xx    Linux 2.6.33-    15.          13.3 1886.7 1893.0
tqm8xx    Linux 2.6.33-    15.          13.3 1950.0 1900.4
tqm8xx    Linux 2.6.33-    15.          13.3 1955.2 1906.7
tqm8xx    Linux 2.6.33-    15.          13.2 1943.7 1900.7
tqm8xx    Linux 2.6.33-    15.          13.3 1958.2 1910.4
tqm8xx    Linux 2.6.33-    15.          13.3 1886.7 1900.7
tqm8xx    Linux 2.6.33-    15.          13.3 1943.7 1837.4
tqm8xx    Linux 2.6.33-    15.          13.2 1944.1 1837.4
tqm8xx    Linux 2.6.33-    15.          13.2 1944.4 1906.1
tqm8xx    Linux 2.6.33-    15.          13.2 1957.8 1894.8

Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host                 OS  float  float  float  float
                         add    mul    div    bogo
--------- ------------- ------ ------ ------ ------
tqm8xx    Linux 2.6.33- 1008.9 1629.2 5527.0 9895.0
tqm8xx    Linux 2.6.33- 1008.9 1628.9 5495.0 9892.0
tqm8xx    Linux 2.6.33- 1007.8 1622.0 5499.0 9886.0
tqm8xx    Linux 2.6.33- 1016.5 1628.6 5319.0 9940.0
tqm8xx    Linux 2.6.33- 1008.0 1628.3 5497.0 9879.0
tqm8xx    Linux 2.6.33- 1007.6 1577.4 5495.0 9881.0
tqm8xx    Linux 2.6.33- 1014.8 1627.1 5493.0 9889.0
tqm8xx    Linux 2.6.33- 1004.6 1627.7 5487.0 9881.0
tqm8xx    Linux 2.6.33- 1003.8 1627.1 5490.0 9875.0
tqm8xx    Linux 2.6.33-  977.2 1628.0 5318.0 9924.0
tqm8xx    Linux 2.6.33- 1007.4 1627.7 5490.0 9882.0
tqm8xx    Linux 2.6.33- 1004.7 1628.0 5495.0 9891.0
tqm8xx    Linux 2.6.33- 1011.6 1630.1 5484.0 9855.0
tqm8xx    Linux 2.6.33-  977.0 1621.4 5469.0 9856.0
tqm8xx    Linux 2.6.33- 1011.4 1621.4 5471.0 9856.0
tqm8xx    Linux 2.6.33- 1004.9 1577.1 5470.0 9866.0

Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS  double double double double
                         add    mul    div    bogo
--------- ------------- ------  ------ ------ ------
tqm8xx    Linux 2.6.33- 1562.4 2782.8 3730.7  12.6K
tqm8xx    Linux 2.6.33- 1556.1 2781.5 3724.3  12.6K
tqm8xx    Linux 2.6.33- 1513.9 2801.0 3726.4  12.8K
tqm8xx    Linux 2.6.33- 1556.1 2780.9 3611.4  12.6K
tqm8xx    Linux 2.6.33- 1570.5 2772.6 3742.1  12.6K
tqm8xx    Linux 2.6.33- 1560.1 2703.0 3611.4  12.7K
tqm8xx    Linux 2.6.33- 1560.4 2779.5 3760.7  12.7K
tqm8xx    Linux 2.6.33- 1559.8 2773.0 3742.1  12.6K
tqm8xx    Linux 2.6.33- 1564.7 2699.0 3722.1  12.6K
tqm8xx    Linux 2.6.33- 1560.7 2790.0 3725.7  12.7K
tqm8xx    Linux 2.6.33- 1565.0 2780.0 3749.3  12.7K
tqm8xx    Linux 2.6.33- 1560.4 2700.0 3767.1  12.8K
tqm8xx    Linux 2.6.33- 1555.5 2772.1 3747.9  12.6K
tqm8xx    Linux 2.6.33- 1513.5 2772.5 3725.7  12.6K
tqm8xx    Linux 2.6.33- 1557.0 2772.5 3725.7  12.7K
tqm8xx    Linux 2.6.33- 1514.1 2773.5 3719.3  12.7K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx    Linux 2.6.33-   92.6  109.6  110.9  137.5  173.8   151.8   199.3
tqm8xx    Linux 2.6.33-   95.8  108.5  104.7  137.1  172.7   150.9   194.7
tqm8xx    Linux 2.6.33-   95.8  118.8   97.5  146.4  162.0   160.8   190.1
tqm8xx    Linux 2.6.33-   92.9  111.9  101.0  138.1  166.6   152.3   192.0
tqm8xx    Linux 2.6.33-   90.8  108.5  116.2  134.3  171.8   147.1   210.0
tqm8xx    Linux 2.6.33-  100.1  111.4  105.0  136.4  173.1   148.3   200.8
tqm8xx    Linux 2.6.33-   98.7  111.3  111.8  135.7  172.5   147.9   200.9
tqm8xx    Linux 2.6.33-   92.0  117.9  109.9  141.6  170.4   154.9   196.4
tqm8xx    Linux 2.6.33-   96.9  112.4   95.4  138.3  165.1   152.2   196.4
tqm8xx    Linux 2.6.33-  100.6  115.8  109.3  138.5  173.3   150.9   199.2
tqm8xx    Linux 2.6.33-  102.2  114.3  109.4  140.9  175.5   153.2   202.0
tqm8xx    Linux 2.6.33-   99.1  114.5  106.5  138.2  174.7   151.7   199.9
tqm8xx    Linux 2.6.33-   69.5   80.5   88.9  119.6  147.3   130.4   178.7
tqm8xx    Linux 2.6.33-   85.8   97.6   79.1  122.3  154.1   132.6   180.1
tqm8xx    Linux 2.6.33-   89.4   93.8  125.7  120.8  178.4   129.5   206.1
tqm8xx    Linux 2.6.33-   88.1  101.8   91.2  121.4  162.8   131.6   191.4

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx    Linux 2.6.33-  92.6 338.4 581. 720.1       1047.       2749
tqm8xx    Linux 2.6.33-  95.8 334.0 595. 725.0       1051.       2754
tqm8xx    Linux 2.6.33-  95.8 330.9 574. 720.1       1047.       2772
tqm8xx    Linux 2.6.33-  92.9 338.8 574. 714.3       1046.       2742
tqm8xx    Linux 2.6.33-  90.8 322.1 576. 734.9       1012.       2706
tqm8xx    Linux 2.6.33- 100.1 326.0 565. 719.5       1027.       2702
tqm8xx    Linux 2.6.33-  98.7 322.8 571. 713.8       1028.       2711
tqm8xx    Linux 2.6.33-  92.0 328.1 549. 714.1       1022.       2696
tqm8xx    Linux 2.6.33-  96.9 327.0 573. 722.3       1036.       2721
tqm8xx    Linux 2.6.33- 100.6 330.4 561. 723.8       1024.       2726
tqm8xx    Linux 2.6.33- 102.2 331.4 590. 728.6       1040.       2753
tqm8xx    Linux 2.6.33-  99.1 330.1 585. 723.5       1023.       2750
tqm8xx    Linux 2.6.33-  69.5 265.9 447. 632.6       909.0       2431
tqm8xx    Linux 2.6.33-  85.8 267.0 492. 650.6       909.4       2455
tqm8xx    Linux 2.6.33-  89.4 295.6 493. 643.0       908.8       2453
tqm8xx    Linux 2.6.33-  88.1 301.0 494. 645.1       907.9       2451

*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS   UDP  RPC/  TCP   RPC/ TCP
                               UDP         TCP  conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx    Linux 2.6.33- 5917.2 3968.3  31.2K 4329.0  4147.0  18.8    34.1 135.2
tqm8xx    Linux 2.6.33- 5714.3 3937.0  32.3K 6060.6  4210.0  14.2    34.5 131.4
tqm8xx    Linux 2.6.33- 5747.1 4000.0  31.2K 4329.0  4114.0 7.692    34.0 133.1
tqm8xx    Linux 2.6.33- 5747.1 4081.6  30.3K 4273.5  4100.0  18.2    34.2 135.0
tqm8xx    Linux 2.6.33- 5714.3 3952.6  31.2K 4273.5  4130.0  33.5    35.1 136.1
tqm8xx    Linux 2.6.33- 5714.3 3906.2  31.2K 6060.6  4105.0  25.7    35.5 135.9
tqm8xx    Linux 2.6.33- 5681.8 3921.6  32.3K 4255.3  4144.0  23.5    35.0 134.9
tqm8xx    Linux 2.6.33- 5649.7 3937.0  30.3K 4237.3  4116.0  21.6    35.3 135.3
tqm8xx    Linux 2.6.33- 5747.1 3921.6  32.3K 4329.0  4107.0  17.7    35.6 131.2
tqm8xx    Linux 2.6.33- 5952.4 3937.0  31.2K 4273.5  4119.0  25.4    35.8 136.4
tqm8xx    Linux 2.6.33- 5848.0 3937.0  32.3K 4484.3  4223.0  14.3    35.4 135.1
tqm8xx    Linux 2.6.33- 6172.8 3984.1  35.7K 4291.8  4210.0  14.4    36.0 135.0
tqm8xx    Linux 2.6.33- 5291.0 3610.1  31.2K 4065.0  3836.0 1.389    30.0 135.7
tqm8xx    Linux 2.6.33- 5524.9 3649.6  29.4K 3906.2  3867.0  14.9    29.8 137.7
tqm8xx    Linux 2.6.33- 5319.1 3649.6  29.4K 4048.6  3873.0  13.3    30.3 135.9
tqm8xx    Linux 2.6.33- 5347.6 3623.2  32.3K 3921.6  3894.0  13.3    30.4 135.8

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx    Linux 2.6.33- 14.8 15.6 10.1   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.6 10.7   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.7 12.7   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.6 13.9   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.8 12.9   21.0   55.7   32.5   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.7 14.0   21.0   55.7   32.4   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.8 12.9   21.0   55.7   32.5   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.8 13.0   21.0   55.7   32.5   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.7 14.0   21.0   55.6   32.4   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.7 15.7 12.8   21.0   55.6   32.4   34.6 55.7  53.1
tqm8xx    Linux 2.6.33- 14.6 15.7 12.8   21.0   55.6   32.4   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.7 12.8   21.0   55.6   32.4   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 15.0 16.0 13.2   21.3   55.8   32.5   34.7 55.9  53.2
tqm8xx    Linux 2.6.33- 15.0 16.0 13.4   21.3   55.8   32.5   34.7 55.8  53.2
tqm8xx    Linux 2.6.33- 15.0 16.0 13.9   21.3   55.8   32.5   34.7 55.9  53.2
tqm8xx    Linux 2.6.33- 15.0 16.0 13.2   21.2   55.8   32.5   34.6 55.9  53.2

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
--------- -------------   ---   ----   ----    --------    --------    -------
tqm8xx    Linux 2.6.33-    66   31.8  141.0       184.0      1165.7
tqm8xx    Linux 2.6.33-    66   31.8  141.2       184.2      1165.3
tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.3      1165.6
tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.2      1166.2
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1100.5    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1102.5    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.7    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.6    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.1    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.1       173.4      1149.0    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.7    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  141.1       173.4      1148.2    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.7      1099.8    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  171.1       171.6      1100.5    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.7  171.0       171.7      1101.0    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  171.0       171.6      1101.3    No L2 cache?

make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
bye
Heiko
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-03 10:10   ` Heiko Schocher
@ 2010-03-03 10:38     ` Joakim Tjernlund
  2010-03-04 10:30       ` Heiko Schocher
  0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-03 10:38 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Heiko Schocher <hs@denx.de> wrote on 2010/03/03 11:10:10:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
> [...]
> >> Here the results:
> >> (The first 4 rows are the results for the kernel without your patches,
> >>  the next 4 rows are the results for the kernel with your patches)
> >>
> >> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
> >
> > I see both ups and downs in this test, don't quite understand why.
> > What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?
>
> Sorry, forgot to say, where to find the sources. You can find them
> here:
>
> http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx

OK, so you got SWAP=no, MODULES=yes, CPU6=no, CPU15=no
PIN_TLB isn't listed in you def config so I assume
it is no?

MODULES=yes nullifies one optimization.

I don't understand the bad numbers for Prot Fault:
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx    Linux 2.6.33- 5917.2 3968.3  31.2K 4329.0  4147.0  18.8    34.1 135.2
tqm8xx    Linux 2.6.33- 5714.3 3937.0  32.3K 6060.6  4210.0  14.2    34.5 131.4
tqm8xx    Linux 2.6.33- 5747.1 4000.0  31.2K 4329.0  4114.0 7.692    34.0 133.1
tqm8xx    Linux 2.6.33- 5747.1 4081.6  30.3K 4273.5  4100.0  18.2    34.2 135.0
tqm8xx    Linux 2.6.33- 5714.3 3952.6  31.2K 4273.5  4130.0  33.5    35.1 136.1
tqm8xx    Linux 2.6.33- 5714.3 3906.2  31.2K 6060.6  4105.0  25.7    35.5 135.9
tqm8xx    Linux 2.6.33- 5681.8 3921.6  32.3K 4255.3  4144.0  23.5    35.0 134.9
tqm8xx    Linux 2.6.33- 5649.7 3937.0  30.3K 4237.3  4116.0  21.6    35.3 135.3

Could you try reverting patch:
  8xx: Don't touch ACCESSED when no SWAP.
and see if that makes a difference?

Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement,
regardless of my patches.

    Jocke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-03  8:48 ` Joakim Tjernlund
  2010-03-03  8:59   ` Joakim Tjernlund
@ 2010-03-03 10:10   ` Heiko Schocher
  2010-03-03 10:38     ` Joakim Tjernlund
  1 sibling, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-03 10:10 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Hello Joakim,

Joakim Tjernlund wrote:
> Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
[...]
>> Here the results:
>> (The first 4 rows are the results for the kernel without your patches,
>>  the next 4 rows are the results for the kernel with your patches)
>>
>> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
> 
> I see both ups and downs in this test, don't quite understand why.
> What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?

Sorry, forgot to say, where to find the sources. You can find them
here:

http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx

bye
Heiko
-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-03  8:48 ` Joakim Tjernlund
@ 2010-03-03  8:59   ` Joakim Tjernlund
  2010-03-03 10:10   ` Heiko Schocher
  1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-03  8:59 UTC (permalink / raw)
  Cc: Scott Wood, linuxppc-dev, hs, Wolfgang Denk

>
> Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
> >
> > Hello Joakim,
> >
> > I tried your 4 patches on a MPC855M based system:
>
> Thanks a lot for testing this for me!
>
> >
> > -bash-3.2# cat /proc/cpuinfo
> > processor       : 0
> > cpu             : 8xx
> > clock           : 66.000000MHz
> > revision        : 0.0 (pvr 0050 0000)
> > bogomips        : 8.25
> > timebase        : 4125000
> > platform        : TQM8xx
> > model           : TQM8xx
> > Memory          : 32 MB
> > -bash-3.2# cat /proc/version
> > Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version
> > 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
> > -bash-3.2#
> >
> > First I looked for the Boottime:
> >
> > Booting Linux:
> >
> >                            2.6.33 2.6.33tunned
> > ... until "Freeing unused kernel memory" message (= enter user space)    ~4s    ~4s
> > ... until "login:" message (= full multi-user mode)          56s    56s
> >
> > and I did a Performance test with lmbench, see:
> > http://sourceforge.net/projects/lmbench
> >
> > Here the results:
> > (The first 4 rows are the results for the kernel without your patches,
> >  the next 4 rows are the results for the kernel with your patches)
> >
> > make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
>
> I see both ups and downs in this test, don't quite understand why.
> What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?

Forgot to ask for PIN_TLB too

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
  2010-03-03  8:02 Heiko Schocher
@ 2010-03-03  8:48 ` Joakim Tjernlund
  2010-03-03  8:59   ` Joakim Tjernlund
  2010-03-03 10:10   ` Heiko Schocher
  0 siblings, 2 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-03  8:48 UTC (permalink / raw)
  To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
>
> Hello Joakim,
>
> I tried your 4 patches on a MPC855M based system:

Thanks a lot for testing this for me!

>
> -bash-3.2# cat /proc/cpuinfo
> processor       : 0
> cpu             : 8xx
> clock           : 66.000000MHz
> revision        : 0.0 (pvr 0050 0000)
> bogomips        : 8.25
> timebase        : 4125000
> platform        : TQM8xx
> model           : TQM8xx
> Memory          : 32 MB
> -bash-3.2# cat /proc/version
> Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version
> 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
> -bash-3.2#
>
> First I looked for the Boottime:
>
> Booting Linux:
>
>                            2.6.33 2.6.33tunned
> ... until "Freeing unused kernel memory" message (= enter user space)    ~4s    ~4s
> ... until "login:" message (= full multi-user mode)          56s    56s
>
> and I did a Performance test with lmbench, see:
> http://sourceforge.net/projects/lmbench
>
> Here the results:
> (The first 4 rows are the results for the kernel without your patches,
>  the next 4 rows are the results for the kernel with your patches)
>
> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'

I see both ups and downs in this test, don't quite understand why.
What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?


>
>                  L M B E N C H  3 . 0   S U M M A R Y
>                  ------------------------------------
>        (Alpha software, do not distribute)
>
> Basic system parameters
> ------------------------------------------------------------------------------
> Host                 OS Description              Mhz  tlb  cache  mem   scal
>                                                      pages line   par   load
>                                                            bytes
> --------- ------------- ----------------------- ---- ----- ----- ------ ----
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
> tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
>
> Processor, Processes - times in microseconds - smaller is better
> ------------------------------------------------------------------------------
> Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
>                              call  I/O stat clos TCP  inst hndl proc proc proc
> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
> tqm8xx    Linux 2.6.33-   66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
> tqm8xx    Linux 2.6.33-   66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
> tqm8xx    Linux 2.6.33-   66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
> tqm8xx    Linux 2.6.33-   66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
> tqm8xx    Linux 2.6.33-   66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
> tqm8xx    Linux 2.6.33-   66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
> tqm8xx    Linux 2.6.33-   66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
> tqm8xx    Linux 2.6.33-   66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
>

[SNIP integer/float test, these are not relevant]

>
> Context switching - times in microseconds - smaller is better
> -------------------------------------------------------------------------
> Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
>                          ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
> --------- ------------- ------ ------ ------ ------ ------ ------- -------
> tqm8xx    Linux 2.6.33-   92.6  109.6  110.9  137.5  173.8   151.8   199.3
> tqm8xx    Linux 2.6.33-   95.8  108.5  104.7  137.1  172.7   150.9   194.7
> tqm8xx    Linux 2.6.33-   95.8  118.8   97.5  146.4  162.0   160.8   190.1
> tqm8xx    Linux 2.6.33-   92.9  111.9  101.0  138.1  166.6   152.3   192.0
> tqm8xx    Linux 2.6.33-   90.8  108.5  116.2  134.3  171.8   147.1   210.0
> tqm8xx    Linux 2.6.33-  100.1  111.4  105.0  136.4  173.1   148.3   200.8
> tqm8xx    Linux 2.6.33-   98.7  111.3  111.8  135.7  172.5   147.9   200.9
> tqm8xx    Linux 2.6.33-   92.0  117.9  109.9  141.6  170.4   154.9   196.4
>
> *Local* Communication latencies in microseconds - smaller is better
> ---------------------------------------------------------------------
> Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
>                         ctxsw       UNIX         UDP         TCP conn
> --------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
> tqm8xx    Linux 2.6.33-  92.6 338.4 581. 720.1       1047.       2749
> tqm8xx    Linux 2.6.33-  95.8 334.0 595. 725.0       1051.       2754
> tqm8xx    Linux 2.6.33-  95.8 330.9 574. 720.1       1047.       2772
> tqm8xx    Linux 2.6.33-  92.9 338.8 574. 714.3       1046.       2742
> tqm8xx    Linux 2.6.33-  90.8 322.1 576. 734.9       1012.       2706
> tqm8xx    Linux 2.6.33- 100.1 326.0 565. 719.5       1027.       2702
> tqm8xx    Linux 2.6.33-  98.7 322.8 571. 713.8       1028.       2711
> tqm8xx    Linux 2.6.33-  92.0 328.1 549. 714.1       1022.       2696
>
> *Remote* Communication latencies in microseconds - smaller is better
> ---------------------------------------------------------------------
> Host                 OS   UDP  RPC/  TCP   RPC/ TCP
>                                UDP         TCP  conn
> --------- ------------- ----- ----- ----- ----- ----
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
> tqm8xx    Linux 2.6.33-
>
> File & VM system latencies in microseconds - smaller is better
> -------------------------------------------------------------------------------
> Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
>                         Create Delete Create Delete Latency Fault  Fault  selct
> --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
> tqm8xx    Linux 2.6.33- 5917.2 3968.3  31.2K 4329.0  4147.0  18.8    34.1 135.2
> tqm8xx    Linux 2.6.33- 5714.3 3937.0  32.3K 6060.6  4210.0  14.2    34.5 131.4
> tqm8xx    Linux 2.6.33- 5747.1 4000.0  31.2K 4329.0  4114.0 7.692    34.0 133.1
> tqm8xx    Linux 2.6.33- 5747.1 4081.6  30.3K 4273.5  4100.0  18.2    34.2 135.0
> tqm8xx    Linux 2.6.33- 5714.3 3952.6  31.2K 4273.5  4130.0  33.5    35.1 136.1
> tqm8xx    Linux 2.6.33- 5714.3 3906.2  31.2K 6060.6  4105.0  25.7    35.5 135.9
> tqm8xx    Linux 2.6.33- 5681.8 3921.6  32.3K 4255.3  4144.0  23.5    35.0 134.9
> tqm8xx    Linux 2.6.33- 5649.7 3937.0  30.3K 4237.3  4116.0  21.6    35.3 135.3
>
> *Local* Communication bandwidths in MB/s - bigger is better
> -----------------------------------------------------------------------------
> Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
>                              UNIX      reread reread (libc) (hand) read write
> --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
> tqm8xx    Linux 2.6.33- 14.8 15.6 10.1   21.0   55.5   32.3   34.5 55.6  53.0
> tqm8xx    Linux 2.6.33- 14.8 15.6 10.7   21.0   55.5   32.3   34.5 55.6  53.0
> tqm8xx    Linux 2.6.33- 14.8 15.7 12.7   21.0   55.5   32.3   34.5 55.6  53.0
> tqm8xx    Linux 2.6.33- 14.8 15.6 13.9   21.0   55.5   32.3   34.5 55.6  53.0
> tqm8xx    Linux 2.6.33- 14.8 15.8 12.9   21.0   55.7   32.5   34.6 55.8  53.1
> tqm8xx    Linux 2.6.33- 14.8 15.7 14.0   21.0   55.7   32.4   34.6 55.8  53.1
> tqm8xx    Linux 2.6.33- 14.8 15.8 12.9   21.0   55.7   32.5   34.6 55.8  53.1
> tqm8xx    Linux 2.6.33- 14.8 15.8 13.0   21.0   55.7   32.5   34.6 55.8  53.1
>
> Memory latencies in nanoseconds - smaller is better
>     (WARNING - may not be correct, check graphs)
> ------------------------------------------------------------------------------
> Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
> --------- -------------   ---   ----   ----    --------    --------    -------
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       184.0      1165.7
> tqm8xx    Linux 2.6.33-    66   31.8  141.2       184.2      1165.3
> tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.3      1165.6
> tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.2      1166.2
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1100.5    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1102.5    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.7    No L2 cache?
> tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.6    No L2 cache?
> make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
>
> bye
> Heiko
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-03-03  8:02 Heiko Schocher
  2010-03-03  8:48 ` Joakim Tjernlund
  0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-03  8:02 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk

Hello Joakim,

I tried your 4 patches on a MPC855M based system:

-bash-3.2# cat /proc/cpuinfo
processor       : 0
cpu             : 8xx
clock           : 66.000000MHz
revision        : 0.0 (pvr 0050 0000)
bogomips        : 8.25
timebase        : 4125000
platform        : TQM8xx
model           : TQM8xx
Memory          : 32 MB
-bash-3.2# cat /proc/version
Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
-bash-3.2#

First I looked for the Boottime:

Booting Linux:

									2.6.33 2.6.33tunned
... until "Freeing unused kernel memory" message (= enter user space) 	~4s    ~4s
... until "login:" message (= full multi-user mode) 			56s    56s

and I did a Performance test with lmbench, see:
http://sourceforge.net/projects/lmbench

Here the results:
(The first 4 rows are the results for the kernel without your patches,
 the next 4 rows are the results for the kernel with your patches)

make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'

                 L M B E N C H  3 . 0   S U M M A R Y
                 ------------------------------------
		 (Alpha software, do not distribute)

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem   scal
                                                     pages line   par   load
                                                           bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66     7    16 1.0400    1
tqm8xx    Linux 2.6.33-       powerpc-linux-gnu   66    32    16 1.0400    1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx    Linux 2.6.33-   66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
tqm8xx    Linux 2.6.33-   66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
tqm8xx    Linux 2.6.33-   66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
tqm8xx    Linux 2.6.33-   66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
tqm8xx    Linux 2.6.33-   66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
tqm8xx    Linux 2.6.33-   66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
tqm8xx    Linux 2.6.33-   66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
tqm8xx    Linux 2.6.33-   66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K

Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host                 OS  intgr intgr  intgr  intgr  intgr
                          bit   add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx    Linux 2.6.33-   15.7   18.0 1.5600  124.2  203.1
tqm8xx    Linux 2.6.33-   15.7   17.4 1.5800  121.1  202.8
tqm8xx    Linux 2.6.33-   15.2   17.9 1.6200  124.2  202.7
tqm8xx    Linux 2.6.33-   15.2   17.9 1.6000  125.0  204.0
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5600  124.7  204.4
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5800  124.2  202.8
tqm8xx    Linux 2.6.33-   15.7   17.9 1.5500  124.2  203.2
tqm8xx    Linux 2.6.33-   15.7   18.1 1.5500  124.5  202.0

Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS int64  int64  int64  int64  int64
                         bit    add    mul    div    mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx    Linux 2.6.33-    15.          13.3 1952.2 1838.2
tqm8xx    Linux 2.6.33-    15.          13.2 1951.5 1837.8
tqm8xx    Linux 2.6.33-    15.          13.2 1886.7 1907.8
tqm8xx    Linux 2.6.33-    15.          13.2 1951.5 1838.2
tqm8xx    Linux 2.6.33-    15.          13.3 1887.0 1902.2
tqm8xx    Linux 2.6.33-    15.          13.3 1887.4 1901.5
tqm8xx    Linux 2.6.33-    15.          13.3 1886.7 1893.0
tqm8xx    Linux 2.6.33-    15.          13.3 1950.0 1900.4

Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host                 OS  float  float  float  float
                         add    mul    div    bogo
--------- ------------- ------ ------ ------ ------
tqm8xx    Linux 2.6.33- 1008.9 1629.2 5527.0 9895.0
tqm8xx    Linux 2.6.33- 1008.9 1628.9 5495.0 9892.0
tqm8xx    Linux 2.6.33- 1007.8 1622.0 5499.0 9886.0
tqm8xx    Linux 2.6.33- 1016.5 1628.6 5319.0 9940.0
tqm8xx    Linux 2.6.33- 1008.0 1628.3 5497.0 9879.0
tqm8xx    Linux 2.6.33- 1007.6 1577.4 5495.0 9881.0
tqm8xx    Linux 2.6.33- 1014.8 1627.1 5493.0 9889.0
tqm8xx    Linux 2.6.33- 1004.6 1627.7 5487.0 9881.0

Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host                 OS  double double double double
                         add    mul    div    bogo
--------- ------------- ------  ------ ------ ------
tqm8xx    Linux 2.6.33- 1562.4 2782.8 3730.7  12.6K
tqm8xx    Linux 2.6.33- 1556.1 2781.5 3724.3  12.6K
tqm8xx    Linux 2.6.33- 1513.9 2801.0 3726.4  12.8K
tqm8xx    Linux 2.6.33- 1556.1 2780.9 3611.4  12.6K
tqm8xx    Linux 2.6.33- 1570.5 2772.6 3742.1  12.6K
tqm8xx    Linux 2.6.33- 1560.1 2703.0 3611.4  12.7K
tqm8xx    Linux 2.6.33- 1560.4 2779.5 3760.7  12.7K
tqm8xx    Linux 2.6.33- 1559.8 2773.0 3742.1  12.6K

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx    Linux 2.6.33-   92.6  109.6  110.9  137.5  173.8   151.8   199.3
tqm8xx    Linux 2.6.33-   95.8  108.5  104.7  137.1  172.7   150.9   194.7
tqm8xx    Linux 2.6.33-   95.8  118.8   97.5  146.4  162.0   160.8   190.1
tqm8xx    Linux 2.6.33-   92.9  111.9  101.0  138.1  166.6   152.3   192.0
tqm8xx    Linux 2.6.33-   90.8  108.5  116.2  134.3  171.8   147.1   210.0
tqm8xx    Linux 2.6.33-  100.1  111.4  105.0  136.4  173.1   148.3   200.8
tqm8xx    Linux 2.6.33-   98.7  111.3  111.8  135.7  172.5   147.9   200.9
tqm8xx    Linux 2.6.33-   92.0  117.9  109.9  141.6  170.4   154.9   196.4

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx    Linux 2.6.33-  92.6 338.4 581. 720.1       1047.       2749
tqm8xx    Linux 2.6.33-  95.8 334.0 595. 725.0       1051.       2754
tqm8xx    Linux 2.6.33-  95.8 330.9 574. 720.1       1047.       2772
tqm8xx    Linux 2.6.33-  92.9 338.8 574. 714.3       1046.       2742
tqm8xx    Linux 2.6.33-  90.8 322.1 576. 734.9       1012.       2706
tqm8xx    Linux 2.6.33- 100.1 326.0 565. 719.5       1027.       2702
tqm8xx    Linux 2.6.33-  98.7 322.8 571. 713.8       1028.       2711
tqm8xx    Linux 2.6.33-  92.0 328.1 549. 714.1       1022.       2696

*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS   UDP  RPC/  TCP   RPC/ TCP
                               UDP         TCP  conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-
tqm8xx    Linux 2.6.33-

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx    Linux 2.6.33- 5917.2 3968.3  31.2K 4329.0  4147.0  18.8    34.1 135.2
tqm8xx    Linux 2.6.33- 5714.3 3937.0  32.3K 6060.6  4210.0  14.2    34.5 131.4
tqm8xx    Linux 2.6.33- 5747.1 4000.0  31.2K 4329.0  4114.0 7.692    34.0 133.1
tqm8xx    Linux 2.6.33- 5747.1 4081.6  30.3K 4273.5  4100.0  18.2    34.2 135.0
tqm8xx    Linux 2.6.33- 5714.3 3952.6  31.2K 4273.5  4130.0  33.5    35.1 136.1
tqm8xx    Linux 2.6.33- 5714.3 3906.2  31.2K 6060.6  4105.0  25.7    35.5 135.9
tqm8xx    Linux 2.6.33- 5681.8 3921.6  32.3K 4255.3  4144.0  23.5    35.0 134.9
tqm8xx    Linux 2.6.33- 5649.7 3937.0  30.3K 4237.3  4116.0  21.6    35.3 135.3

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx    Linux 2.6.33- 14.8 15.6 10.1   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.6 10.7   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.7 12.7   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.6 13.9   21.0   55.5   32.3   34.5 55.6  53.0
tqm8xx    Linux 2.6.33- 14.8 15.8 12.9   21.0   55.7   32.5   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.7 14.0   21.0   55.7   32.4   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.8 12.9   21.0   55.7   32.5   34.6 55.8  53.1
tqm8xx    Linux 2.6.33- 14.8 15.8 13.0   21.0   55.7   32.5   34.6 55.8  53.1

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host                 OS   Mhz   L1 $   L2 $    Main mem    Rand mem    Guesses
--------- -------------   ---   ----   ----    --------    --------    -------
tqm8xx    Linux 2.6.33-    66   31.8  141.0       184.0      1165.7
tqm8xx    Linux 2.6.33-    66   31.8  141.2       184.2      1165.3
tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.3      1165.6
tqm8xx    Linux 2.6.33-    66   31.8  141.3       184.2      1166.2
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1100.5    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1102.5    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.7    No L2 cache?
tqm8xx    Linux 2.6.33-    66   31.8  141.0       171.8      1101.6    No L2 cache?
make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'

bye
Heiko

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-03-02 15:37 Joakim Tjernlund
  0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw)
  To: linuxppc-dev, Scott Wood

This set of tries to optimize the TLB code on 8xx even
more. If they work, it should be a noticable performance
boost.

I would be very happy if you could test them for me.

 - v2:
   Since Scott has done some testing of these patches I resend
   them with my SOB.
   Scott, can you "bless" these patches too?

Joakim Tjernlund (4):
  8xx: Optimze TLB Miss handlers
  8xx: Avoid testing for kernel space in ITLB Miss.
  8xx: Don't touch ACCESSED when no SWAP.
  8xx: Use SPRG2 and DAR registers to stash r11 and cr.

 arch/powerpc/kernel/head_8xx.S |   70 +++++++++++++++++++++++++++-------------
 1 files changed, 47 insertions(+), 23 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2010-03-17  7:40 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-26  8:29 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund
2010-02-26  8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
2010-02-26  8:29   ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
2010-02-26  8:29     ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
2010-02-26  8:29       ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund
2010-03-16 21:20       ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
2010-03-17  7:40         ` Joakim Tjernlund
2010-03-16 21:19     ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
2010-03-17  7:35       ` Joakim Tjernlund
2010-02-26 19:50   ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
2010-02-27 15:23     ` Joakim Tjernlund
2010-02-26 20:10   ` Kumar Gala
2010-02-27 15:25     ` Joakim Tjernlund
2010-03-02 15:37 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund
2010-03-03  8:02 Heiko Schocher
2010-03-03  8:48 ` Joakim Tjernlund
2010-03-03  8:59   ` Joakim Tjernlund
2010-03-03 10:10   ` Heiko Schocher
2010-03-03 10:38     ` Joakim Tjernlund
2010-03-04 10:30       ` Heiko Schocher
2010-03-04 12:16         ` Wolfgang Denk
2010-03-04 13:06           ` Joakim Tjernlund
2010-03-04 16:30             ` Heiko Schocher
2010-03-05 10:40               ` Joakim Tjernlund
2010-03-08  7:46                 ` Heiko Schocher
2010-03-08  8:44                   ` Joakim Tjernlund
2010-03-08  9:06                     ` Heiko Schocher
2010-03-08 10:42                       ` Joakim Tjernlund
2010-03-09  6:30                         ` Wolfgang Denk
2010-03-07 16:03               ` Joakim Tjernlund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).