* [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-02-26 8:29 Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26 8:29 UTC (permalink / raw)
To: linuxppc-dev
This set of tries to optimize the TLB code on 8xx even
more. If they work, it should be a noticable performance
boost.
I would be very happy if you could test them for me.
Joakim Tjernlund (4):
8xx: Optimze TLB Miss handlers
8xx: Avoid testing for kernel space in ITLB Miss.
8xx: Don't touch ACCESSED when no SWAP.
8xx: Use SPRG2 and DAR registers to stash r11 and cr.
arch/powerpc/kernel/head_8xx.S | 70 +++++++++++++++++++++++++++-------------
1 files changed, 47 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/4] 8xx: Optimze TLB Miss handlers
2010-02-26 8:29 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund
@ 2010-02-26 8:29 ` Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26 8:29 UTC (permalink / raw)
To: linuxppc-dev
This removes a couple of insn's from the TLB Miss
handlers whithout changing functionality.
---
arch/powerpc/kernel/head_8xx.S | 11 +++--------
1 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3ef743f..ecc4a02 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -343,17 +343,14 @@ InstructionTLBMiss:
cmpwi cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT
bne- cr0, 2f
- /* Clear PP lsb, 0x400 */
- rlwinm r10, r10, 0, 22, 20
-
/* The Linux PTE won't go exactly into the MMU TLB.
- * Software indicator bits 22 and 28 must be clear.
+ * Software indicator bits 21 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
li r11, 0x00f0
- rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
+ rlwimi r10, r11, 0, 0x07f8 /* Set 24-27, clear 21-23,28 */
DO_8xx_CPU6(0x2d80, r3)
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
@@ -444,9 +441,7 @@ DataStoreTLBMiss:
/* Honour kernel RO, User NA */
/* 0x200 == Extended encoding, bit 22 */
- /* r11 = (r10 & _PAGE_USER) >> 2 */
- rlwinm r11, r10, 32-2, 0x200
- or r10, r11, r10
+ rlwimi r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */
/* r11 = (r10 & _PAGE_RW) >> 1 */
rlwinm r11, r10, 32-1, 0x200
or r10, r11, r10
--
1.6.4.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss.
2010-02-26 8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
@ 2010-02-26 8:29 ` Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
2010-03-16 21:19 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
2010-02-26 19:50 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
2010-02-26 20:10 ` Kumar Gala
2 siblings, 2 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26 8:29 UTC (permalink / raw)
To: linuxppc-dev
Only modules will cause ITLB Misses as we always pin
the first 8MB of kernel memory.
---
arch/powerpc/kernel/head_8xx.S | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index ecc4a02..84ca1d9 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -318,12 +318,16 @@ InstructionTLBMiss:
/* If we are faulting a kernel address, we have to use the
* kernel page tables.
*/
+#ifdef CONFIG_MODULES
+ /* Only modules will cause ITLB Misses as we always
+ * pin the first 8MB of kernel memory */
andi. r11, r10, 0x0800 /* Address >= 0x80000000 */
beq 3f
lis r11, swapper_pg_dir@h
ori r11, r11, swapper_pg_dir@l
rlwimi r10, r11, 0, 2, 19
3:
+#endif
lwz r11, 0(r10) /* Get the level 1 entry */
rlwinm. r10, r11,0,0,19 /* Extract page descriptor page address */
beq 2f /* If zero, don't try to find a pte */
--
1.6.4.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP.
2010-02-26 8:29 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
@ 2010-02-26 8:29 ` Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund
2010-03-16 21:20 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
2010-03-16 21:19 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
1 sibling, 2 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26 8:29 UTC (permalink / raw)
To: linuxppc-dev
Only the swap function cares about the ACCESSED bit in
the pte. Do not waste cycles updateting ACCESSED when swap
is not compiled into the kernel.
---
arch/powerpc/kernel/head_8xx.S | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 84ca1d9..6478a96 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -343,10 +343,11 @@ InstructionTLBMiss:
mfspr r11, SPRN_MD_TWC /* ....and get the pte address */
lwz r10, 0(r11) /* Get the pte */
+#ifdef CONFIG_SWAP
andi. r11, r10, _PAGE_ACCESSED | _PAGE_PRESENT
cmpwi cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT
bne- cr0, 2f
-
+#endif
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 21 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
@@ -439,10 +440,11 @@ DataStoreTLBMiss:
* r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
* r10 = (r10 & ~PRESENT) | r11;
*/
+#ifdef CONFIG_SWAP
rlwinm r11, r10, 32-5, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-
+#endif
/* Honour kernel RO, User NA */
/* 0x200 == Extended encoding, bit 22 */
rlwimi r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */
--
1.6.4.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr.
2010-02-26 8:29 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
@ 2010-02-26 8:29 ` Joakim Tjernlund
2010-03-16 21:20 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-26 8:29 UTC (permalink / raw)
To: linuxppc-dev
This avoids storing these registers in memory.
CPU6 errata will still use the old way.
Remove some G2 leftover accesses from 2.4
---
arch/powerpc/kernel/head_8xx.S | 49 +++++++++++++++++++++++++++++----------
1 files changed, 36 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 6478a96..1f1a04b 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -71,9 +71,6 @@ _ENTRY(_start);
* in the first level table, but that would require many changes to the
* Linux page directory/table functions that I don't want to do right now.
*
- * I used to use SPRG2 for a temporary register in the TLB handler, but it
- * has since been put to other uses. I now use a hack to save a register
- * and the CCR at memory location 0.....Someday I'll fix this.....
* -- Dan
*/
.globl __start
@@ -302,8 +299,13 @@ InstructionTLBMiss:
DO_8xx_CPU6(0x3f80, r3)
mtspr SPRN_M_TW, r10 /* Save a couple of working registers */
mfcr r10
+#ifdef CONFIG_8xx_CPU6
stw r10, 0(r0)
stw r11, 4(r0)
+#else
+ mtspr SPRN_DAR, r10
+ mtspr SPRN_SPRG2, r11
+#endif
mfspr r10, SPRN_SRR0 /* Get effective address of fault */
#ifdef CONFIG_8xx_CPU15
addi r11, r10, 0x1000
@@ -359,13 +361,19 @@ InstructionTLBMiss:
DO_8xx_CPU6(0x2d80, r3)
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
- mfspr r10, SPRN_M_TW /* Restore registers */
+ /* Restore registers */
+#ifndef CONFIG_8xx_CPU6
+ mfspr r10, SPRN_DAR
+ mtcr r10
+ mtspr SPRN_DAR, r11 /* Tag DAR */
+ mfspr r11, SPRN_SPRG2
+#else
lwz r11, 0(r0)
mtcr r11
lwz r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
+ mfspr r10, SPRN_M_TW
rfi
2:
mfspr r11, SPRN_SRR1
@@ -375,13 +383,20 @@ InstructionTLBMiss:
rlwinm r11, r11, 0, 0xffff
mtspr SPRN_SRR1, r11
- mfspr r10, SPRN_M_TW /* Restore registers */
+ /* Restore registers */
+#ifndef CONFIG_8xx_CPU6
+ mfspr r10, SPRN_DAR
+ mtcr r10
+ li r11, 0x00f0
+ mtspr SPRN_DAR, r11 /* Tag DAR */
+ mfspr r11, SPRN_SPRG2
+#else
lwz r11, 0(r0)
mtcr r11
lwz r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
+ mfspr r10, SPRN_M_TW
b InstructionAccess
. = 0x1200
@@ -392,8 +407,13 @@ DataStoreTLBMiss:
DO_8xx_CPU6(0x3f80, r3)
mtspr SPRN_M_TW, r10 /* Save a couple of working registers */
mfcr r10
+#ifdef CONFIG_8xx_CPU6
stw r10, 0(r0)
stw r11, 4(r0)
+#else
+ mtspr SPRN_DAR, r10
+ mtspr SPRN_SPRG2, r11
+#endif
mfspr r10, SPRN_M_TWB /* Get level 1 table entry address */
/* If we are faulting a kernel address, we have to use the
@@ -461,18 +481,24 @@ DataStoreTLBMiss:
* of the MMU.
*/
2: li r11, 0x00f0
- mtspr SPRN_DAR,r11 /* Tag DAR */
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
- mfspr r10, SPRN_M_TW /* Restore registers */
+ /* Restore registers */
+#ifndef CONFIG_8xx_CPU6
+ mfspr r10, SPRN_DAR
+ mtcr r10
+ mtspr SPRN_DAR, r11 /* Tag DAR */
+ mfspr r11, SPRN_SPRG2
+#else
+ mtspr SPRN_DAR, r11 /* Tag DAR */
lwz r11, 0(r0)
mtcr r11
lwz r11, 4(r0)
-#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
+ mfspr r10, SPRN_M_TW
rfi
/* This is an instruction TLB error on the MPC8xx. This could be due
@@ -684,9 +710,6 @@ start_here:
tophys(r4,r2)
addi r4,r4,THREAD /* init task's THREAD */
mtspr SPRN_SPRG_THREAD,r4
- li r3,0
- /* XXX What is that for ? SPRG2 appears otherwise unused on 8xx */
- mtspr SPRN_SPRG2,r3 /* 0 => r1 has kernel sp */
/* stack */
lis r1,init_thread_union@ha
--
1.6.4.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
2010-02-26 8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
@ 2010-02-26 19:50 ` Scott Wood
2010-02-27 15:23 ` Joakim Tjernlund
2010-02-26 20:10 ` Kumar Gala
2 siblings, 1 reply; 30+ messages in thread
From: Scott Wood @ 2010-02-26 19:50 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev
On Fri, Feb 26, 2010 at 09:29:40AM +0100, Joakim Tjernlund wrote:
> This removes a couple of insn's from the TLB Miss
> handlers whithout changing functionality.
> ---
Did a quick test of the patchset, seems to work OK (without CONFIG_SWAP or
CONFIG_MODULES). Didn't try with CONFIG_8xx_CPU6.
-Scott
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
2010-02-26 8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
2010-02-26 19:50 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
@ 2010-02-26 20:10 ` Kumar Gala
2010-02-27 15:25 ` Joakim Tjernlund
2 siblings, 1 reply; 30+ messages in thread
From: Kumar Gala @ 2010-02-26 20:10 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev
On Feb 26, 2010, at 2:29 AM, Joakim Tjernlund wrote:
> li r11, 0x00f0
> - rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
> + rlwimi r10, r11, 0, 0x07f8 /* Set 24-27, clear 21-23,28 */
> DO_8xx_CPU6(0x2d80, r3)
> mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
Cool, didn't know 'as' supported this notation.
- k
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
2010-02-26 19:50 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
@ 2010-02-27 15:23 ` Joakim Tjernlund
0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-27 15:23 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
Scott Wood <scottwood@freescale.com> wrote on 2010/02/26 20:50:18:
>
> On Fri, Feb 26, 2010 at 09:29:40AM +0100, Joakim Tjernlund wrote:
> > This removes a couple of insn's from the TLB Miss
> > handlers whithout changing functionality.
> > ---
>
> Did a quick test of the patchset, seems to work OK (without CONFIG_SWAP or
> CONFIG_MODULES). Didn't try with CONFIG_8xx_CPU6.
Cool, thanks a lot!
Not sure anyone is using 2.6 with CPU6 errata. Seems it was fixed years ago.
Should I resend the whole series with SOB line or just include it here?
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] 8xx: Optimze TLB Miss handlers
2010-02-26 20:10 ` Kumar Gala
@ 2010-02-27 15:25 ` Joakim Tjernlund
0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-02-27 15:25 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev
Kumar Gala <galak@kernel.crashing.org> wrote on 2010/02/26 21:10:31:
>
>
> On Feb 26, 2010, at 2:29 AM, Joakim Tjernlund wrote:
>
> > li r11, 0x00f0
> > - rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
> > + rlwimi r10, r11, 0, 0x07f8 /* Set 24-27, clear 21-23,28 */
> > DO_8xx_CPU6(0x2d80, r3)
> > mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
>
> Cool, didn't know 'as' supported this notation.
Yeah, it was Scott who gave me the clue and from what I can tell it
is an official syntax form. I find much easier to understand.
Jocke
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss.
2010-02-26 8:29 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
@ 2010-03-16 21:19 ` Benjamin Herrenschmidt
2010-03-17 7:35 ` Joakim Tjernlund
1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-16 21:19 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev
On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> +#ifdef CONFIG_MODULES
> + /* Only modules will cause ITLB Misses as we always
> + * pin the first 8MB of kernel memory */
> andi. r11, r10, 0x0800 /* Address >= 0x80000000 */
> beq 3f
> lis r11, swapper_pg_dir@h
> ori r11, r11, swapper_pg_dir@l
> rlwimi r10, r11, 0, 2, 19
> 3:
> +#endif
You can optimize that further I think...
You can probably just remove the code above, and add something to
do_page_fault() that lazily copies the kernel PGD entries from
swapper_pg_dir to the app pgdir. (You can even pre-fill that when
creating a new mm).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP.
2010-02-26 8:29 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund
@ 2010-03-16 21:20 ` Benjamin Herrenschmidt
2010-03-17 7:40 ` Joakim Tjernlund
1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-16 21:20 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev
On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> Only the swap function cares about the ACCESSED bit in
> the pte. Do not waste cycles updateting ACCESSED when swap
> is not compiled into the kernel.
> ---
Your changeset comment is a bit misleading since the code isn't actually
updating ACCESSED... it's testing if ACCESSED is set and goes to the
higher level fault if not (which might then update ACCESSED).
Cheers,
Ben.
> arch/powerpc/kernel/head_8xx.S | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 84ca1d9..6478a96 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -343,10 +343,11 @@ InstructionTLBMiss:
> mfspr r11, SPRN_MD_TWC /* ....and get the pte address */
> lwz r10, 0(r11) /* Get the pte */
>
> +#ifdef CONFIG_SWAP
> andi. r11, r10, _PAGE_ACCESSED | _PAGE_PRESENT
> cmpwi cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT
> bne- cr0, 2f
> -
> +#endif
> /* The Linux PTE won't go exactly into the MMU TLB.
> * Software indicator bits 21 and 28 must be clear.
> * Software indicator bits 24, 25, 26, and 27 must be
> @@ -439,10 +440,11 @@ DataStoreTLBMiss:
> * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
> * r10 = (r10 & ~PRESENT) | r11;
> */
> +#ifdef CONFIG_SWAP
> rlwinm r11, r10, 32-5, _PAGE_PRESENT
> and r11, r11, r10
> rlwimi r10, r11, 0, _PAGE_PRESENT
> -
> +#endif
> /* Honour kernel RO, User NA */
> /* 0x200 == Extended encoding, bit 22 */
> rlwimi r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss.
2010-03-16 21:19 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
@ 2010-03-17 7:35 ` Joakim Tjernlund
0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-17 7:35 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 2010/03/16 22:19:36:
>
> On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> > +#ifdef CONFIG_MODULES
> > + /* Only modules will cause ITLB Misses as we always
> > + * pin the first 8MB of kernel memory */
> > andi. r11, r10, 0x0800 /* Address >= 0x80000000 */
> > beq 3f
> > lis r11, swapper_pg_dir@h
> > ori r11, r11, swapper_pg_dir@l
> > rlwimi r10, r11, 0, 2, 19
> > 3:
> > +#endif
>
> You can optimize that further I think...
>
> You can probably just remove the code above, and add something to
> do_page_fault() that lazily copies the kernel PGD entries from
> swapper_pg_dir to the app pgdir. (You can even pre-fill that when
> creating a new mm).
I did look at this at some point and could not figure out how
to do this.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP.
2010-03-16 21:20 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
@ 2010-03-17 7:40 ` Joakim Tjernlund
0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-17 7:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 2010/03/16 22:20:52:
>
> On Fri, 2010-02-26 at 09:29 +0100, Joakim Tjernlund wrote:
> > Only the swap function cares about the ACCESSED bit in
> > the pte. Do not waste cycles updateting ACCESSED when swap
> > is not compiled into the kernel.
> > ---
>
> Your changeset comment is a bit misleading since the code isn't actually
> updating ACCESSED... it's testing if ACCESSED is set and goes to the
> higher level fault if not (which might then update ACCESSED).
Right, I did have one or two variants that did update ACCESSED that
I experimented with, I guess that I got a bit confused by that.
The jury is still out on whether this patch is an improvement or not.
Jocke
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-08 10:42 ` Joakim Tjernlund
@ 2010-03-09 6:30 ` Wolfgang Denk
0 siblings, 0 replies; 30+ messages in thread
From: Wolfgang Denk @ 2010-03-09 6:30 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, hs, linuxppc-dev
Dear Joakim Tjernlund,
In message <OF1413A940.58E7B20E-ONC12576E0.003A9000-C12576E0.003ACFB7@transmode.se> you wrote:
>
> > I use NFS.
>
> Then I think it is possible NFS gets in the way for stable measurements. Anyone
> have experience with running lmbench on NFS?
NFS may have some influence here, but I doubt it is the primary cause
for these variations. The network where Heiko is running these tests
is mostly idle, so it should provide fairly constant conditions. Of
coursem the use of the network on the MPC8xx itself will add to the
variation, but again I would not expect so big differences.
Heiko - there is a 10 GB disk attached to the "tqm8xx" system; I
think there should be a usable root file system on it, but I cannot
remember the actual state. Maybe we can use that. Please contact me
on jabber this afternoon!
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Living on Earth may be expensive, but it includes an annual free trip
around the Sun.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-08 9:06 ` Heiko Schocher
@ 2010-03-08 10:42 ` Joakim Tjernlund
2010-03-09 6:30 ` Wolfgang Denk
0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-08 10:42 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Heiko Schocher <hs@denx.de> wrote on 2010/03/08 10:06:39:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29:
> >> Hello Joakim,
> >>
> >> Joakim Tjernlund wrote:
> >> [...]
> >>> What would be interesting is to skip patch 3 and turn off
> >>> MODULES add PIN_TLB and compare that against your unpatched .33 but
> >>> with MODULES off and PIN_TLB on
> >> run version
> >>
> >> 1-4 Linux2.6.33-rc without module support and PIN_TLB=on
> >> 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
> >>
> >> L M B E N C H 3 . 0 S U M M A R Y
> >> ------------------------------------
> >> (Alpha software, do not distribute)
> >
> > hmm, these results varies a lot. The only stable result I can see is:
> >
> >> Memory latencies in nanoseconds - smaller is better
> >> (WARNING - may not be correct, check graphs)
> >> ------------------------------------------------------------------------------
> >> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
> >> --------- ------------- --- ---- ---- -------- -------- -------
> >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache?
> >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache?
> >
> > I don't see why the other results vary so much. Are you using NFS or having
> much network
> > traffic?
>
> I use NFS.
Then I think it is possible NFS gets in the way for stable measurements. Anyone
have experience with running lmbench on NFS?
Jocke
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-08 8:44 ` Joakim Tjernlund
@ 2010-03-08 9:06 ` Heiko Schocher
2010-03-08 10:42 ` Joakim Tjernlund
0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-08 9:06 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
Joakim Tjernlund wrote:
> Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29:
>> Hello Joakim,
>>
>> Joakim Tjernlund wrote:
>> [...]
>>> What would be interesting is to skip patch 3 and turn off
>>> MODULES add PIN_TLB and compare that against your unpatched .33 but
>>> with MODULES off and PIN_TLB on
>> run version
>>
>> 1-4 Linux2.6.33-rc without module support and PIN_TLB=on
>> 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
>>
>> L M B E N C H 3 . 0 S U M M A R Y
>> ------------------------------------
>> (Alpha software, do not distribute)
>
> hmm, these results varies a lot. The only stable result I can see is:
>
>> Memory latencies in nanoseconds - smaller is better
>> (WARNING - may not be correct, check graphs)
>> ------------------------------------------------------------------------------
>> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
>> --------- ------------- --- ---- ---- -------- -------- -------
>> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache?
>> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache?
>
> I don't see why the other results vary so much. Are you using NFS or having much network
> traffic?
I use NFS.
bye
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-08 7:46 ` Heiko Schocher
@ 2010-03-08 8:44 ` Joakim Tjernlund
2010-03-08 9:06 ` Heiko Schocher
0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-08 8:44 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> [...]
> > What would be interesting is to skip patch 3 and turn off
> > MODULES add PIN_TLB and compare that against your unpatched .33 but
> > with MODULES off and PIN_TLB on
>
> run version
>
> 1-4 Linux2.6.33-rc without module support and PIN_TLB=on
> 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
>
> L M B E N C H 3 . 0 S U M M A R Y
> ------------------------------------
> (Alpha software, do not distribute)
hmm, these results varies a lot. The only stable result I can see is:
> Memory latencies in nanoseconds - smaller is better
> (WARNING - may not be correct, check graphs)
> ------------------------------------------------------------------------------
> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
> --------- ------------- --- ---- ---- -------- -------- -------
> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache?
I don't see why the other results vary so much. Are you using NFS or having much network
traffic?
Jocke
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-05 10:40 ` Joakim Tjernlund
@ 2010-03-08 7:46 ` Heiko Schocher
2010-03-08 8:44 ` Joakim Tjernlund
0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-08 7:46 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
Joakim Tjernlund wrote:
[...]
> What would be interesting is to skip patch 3 and turn off
> MODULES add PIN_TLB and compare that against your unpatched .33 but
> with MODULES off and PIN_TLB on
run version
1-4 Linux2.6.33-rc without module support and PIN_TLB=on
5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4
L M B E N C H 3 . 0 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0100 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0300 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0100 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0100 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx Linux 2.6.33- 66 2.97 8.91 127. 1238 270. 22.3 92.1 6386 27.K 83.K
tqm8xx Linux 2.6.33- 66 3.05 8.99 129. 1208 261. 22.3 85.3 6418 27.K 83.K
tqm8xx Linux 2.6.33- 66 3.05 8.81 128. 1205 270. 22.3 87.3 6342 27.K 82.K
tqm8xx Linux 2.6.33- 66 3.05 8.82 132. 1215 270. 23.1 86.7 6357 27.K 82.K
tqm8xx Linux 2.6.33- 66 3.28 9.29 128. 1257 260. 23.9 83.7 6511 28.K 84.K
tqm8xx Linux 2.6.33- 66 3.34 9.35 126. 1264 271. 23.1 86.6 6437 27.K 84.K
tqm8xx Linux 2.6.33- 66 3.19 8.97 130. 1212 271. 23.1 95.3 6480 27.K 84.K
tqm8xx Linux 2.6.33- 66 3.28 8.76 127. 1229 269. 22.9 90.9 6293 27.K 82.K
Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host OS intgr intgr intgr intgr intgr
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15.2 17.9 1.2500 124.1 202.4
tqm8xx Linux 2.6.33- 15.6 18.0 1.1900 124.1 196.4
tqm8xx Linux 2.6.33- 15.2 17.9 1.2400 124.9 202.5
tqm8xx Linux 2.6.33- 15.2 17.9 1.2400 124.2 196.8
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.6
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 202.1
tqm8xx Linux 2.6.33- 15.7 17.9 1.5700 125.0 202.2
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 121.1 196.4
Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS int64 int64 int64 int64 int64
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15. 12.9 1944.1 1895.2
tqm8xx Linux 2.6.33- 15. 12.9 1886.3 1894.4
tqm8xx Linux 2.6.33- 15. 12.9 1944.1 1895.2
tqm8xx Linux 2.6.33- 15. 12.9 1886.3 1894.8
tqm8xx Linux 2.6.33- 15. 13.2 1944.1 1894.4
tqm8xx Linux 2.6.33- 15. 13.2 1944.8 1896.3
tqm8xx Linux 2.6.33- 15. 13.2 1945.2 1837.4
tqm8xx Linux 2.6.33- 15. 13.2 1957.8 1907.4
Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host OS float float float float
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1011.0 1620.2 5467.0 9868.0
tqm8xx Linux 2.6.33- 1004.5 1630.1 5468.0 9852.0
tqm8xx Linux 2.6.33- 1012.2 1620.5 5472.0 9855.0
tqm8xx Linux 2.6.33- 1011.0 1620.2 5469.0 9866.0
tqm8xx Linux 2.6.33- 1004.8 1617.3 5503.0 9856.0
tqm8xx Linux 2.6.33- 1004.9 1577.1 5469.0 9859.0
tqm8xx Linux 2.6.33- 1011.4 1618.5 5470.0 9859.0
tqm8xx Linux 2.6.33- 1004.9 1620.5 5471.0 9904.0
Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS double double double double
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1555.5 2789.5 3725.7 12.8K
tqm8xx Linux 2.6.33- 1513.2 2772.0 3720.0 12.7K
tqm8xx Linux 2.6.33- 1555.8 2772.1 3730.0 12.7K
tqm8xx Linux 2.6.33- 1555.5 2699.0 3725.0 12.7K
tqm8xx Linux 2.6.33- 1513.8 2699.5 3610.7 12.7K
tqm8xx Linux 2.6.33- 1566.7 2771.6 3750.0 12.7K
tqm8xx Linux 2.6.33- 1556.7 2789.2 3612.1 12.6K
tqm8xx Linux 2.6.33- 1556.7 2698.5 3749.3 12.6K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx Linux 2.6.33- 64.4 74.9 130.2 111.1 180.4 123.2 211.1
tqm8xx Linux 2.6.33- 67.4 81.0 125.0 117.0 183.7 127.7 208.4
tqm8xx Linux 2.6.33- 67.5 80.5 92.7 115.3 156.9 128.0 183.8
tqm8xx Linux 2.6.33- 67.0 80.2 90.5 114.6 159.4 126.8 185.8
tqm8xx Linux 2.6.33- 82.0 87.8 88.0 116.1 149.3 125.5 182.2
tqm8xx Linux 2.6.33- 81.7 98.5 97.6 123.8 158.1 135.3 188.0
tqm8xx Linux 2.6.33- 67.9 87.7 90.7 114.9 151.1 127.3 177.9
tqm8xx Linux 2.6.33- 67.5 80.3 84.6 113.6 145.7 124.8 170.9
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33- 64.4 254.3 455. 648.0 941.8 2505
tqm8xx Linux 2.6.33- 67.4 261.2 456. 645.8 909.1 2439
tqm8xx Linux 2.6.33- 67.5 264.8 459. 638.5 932.0 2447
tqm8xx Linux 2.6.33- 67.0 262.4 454. 643.9 909.9 2442
tqm8xx Linux 2.6.33- 82.0 302.1 500. 651.4 937.2 2504
tqm8xx Linux 2.6.33- 81.7 300.2 510. 643.2 909.7 2490
tqm8xx Linux 2.6.33- 67.9 266.7 498. 645.5 923.4 2442
tqm8xx Linux 2.6.33- 67.5 260.8 444. 640.3 917.7 2440
*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS UDP RPC/ TCP RPC/ TCP
UDP TCP conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx Linux 2.6.33- 6097.6 3731.3 30.3K 4000.0 4026.0 20.5 31.9 131.9
tqm8xx Linux 2.6.33- 5747.1 3623.2 32.3K 3952.6 4030.0 16.6 31.0 132.7
tqm8xx Linux 2.6.33- 5405.4 3610.1 32.3K 3921.6 4004.0 15.5 30.0 131.9
tqm8xx Linux 2.6.33- 5681.8 3891.1 35.7K 4219.4 3966.0 6.038 30.4 128.7
tqm8xx Linux 2.6.33- 12.7K 3649.6 34.5K 7092.2 4066.0 3.604 31.4 133.6
tqm8xx Linux 2.6.33- 5405.4 4032.3 38.5K 5494.5 4036.0 18.1 31.0 128.6
tqm8xx Linux 2.6.33- 5405.4 3610.1 37.0K 7142.9 4078.0 15.4 31.0 133.2
tqm8xx Linux 2.6.33- 5714.3 3623.2 30.3K 7194.2 4054.0 12.7 29.9 133.0
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx Linux 2.6.33- 14.9 16.1 13.0 21.4 55.6 32.4 34.5 55.7 53.0
tqm8xx Linux 2.6.33- 14.9 16.2 12.9 21.3 55.5 32.4 34.5 55.7 53.0
tqm8xx Linux 2.6.33- 14.8 16.0 13.0 21.4 55.6 32.4 34.5 55.7 53.0
tqm8xx Linux 2.6.33- 15.0 16.2 13.8 21.3 55.6 32.4 34.5 55.7 53.0
tqm8xx Linux 2.6.33- 14.9 16.0 13.4 21.3 55.7 32.5 34.6 55.8 53.2
tqm8xx Linux 2.6.33- 15.1 16.2 13.6 21.3 55.7 32.5 34.6 55.8 53.2
tqm8xx Linux 2.6.33- 15.0 16.2 12.9 21.3 55.7 32.5 34.6 55.8 53.2
tqm8xx Linux 2.6.33- 15.1 16.2 13.1 21.5 55.7 32.5 34.7 55.8 53.2
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
--------- ------------- --- ---- ---- -------- -------- -------
tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache?
make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-04 16:30 ` Heiko Schocher
2010-03-05 10:40 ` Joakim Tjernlund
@ 2010-03-07 16:03 ` Joakim Tjernlund
1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-07 16:03 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Heiko Schocher <hs@denx.de> wrote on 2010/03/04 17:30:07:
> From: Heiko Schocher <hs@denx.de>
> To: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Cc: Wolfgang Denk <wd@denx.de>, Klaus-J=FCrgen <heydeck@kieback-peter=
.de>,
> linuxppc-dev@ozlabs.org, Scott Wood <scottwood@freescale.com>
> Date: 2010/03/04 17:30
> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
> >> From: Wolfgang Denk <wd@denx.de>
> >> To: hs@denx.de
> >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrg=
en
> >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
> >> <scottwood@freescale.com>
> >> Date: 2010/03/04 13:17
> >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
> >>
> >> Dear Heiko,
> >>
> >> thanks for running the tests.
> >>
> >> In message <4B8F8BB4.6070201@denx.de> you wrote:
> >>> here the results:
> >>>
> >>> run version
> >>>
> >>> 1-4 2.6.33-rc6 without your patches
> >>> 5-8 2.6.33-rc6 with all your patches
> >>> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touc=
h ACCESSED
> >> when no SWAP)
> >>> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy
> >> So CONFIG_PIN_TLB imroves the performance as expected, while the o=
ther
> >> patches don;t show any measurable improvememt - or am I reading th=
e
> >> results incorrectly?
BTW, I have impl. all of the newer 2.6 TLB/MMU fixes(including the dcbX=
fixup) for 2.4 as well.
If there is any interest I can polish them and submit for 2.4? I do nee=
d an external tester
for that though.
Jocke=
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-04 16:30 ` Heiko Schocher
@ 2010-03-05 10:40 ` Joakim Tjernlund
2010-03-08 7:46 ` Heiko Schocher
2010-03-07 16:03 ` Joakim Tjernlund
1 sibling, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-05 10:40 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Heiko Schocher <hs@denx.de> wrote on 2010/03/04 17:30:07:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
> >> From: Wolfgang Denk <wd@denx.de>
> >> To: hs@denx.de
> >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrg=
en
> >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
> >> <scottwood@freescale.com>
> >> Date: 2010/03/04 13:17
> >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
> >>
> >> Dear Heiko,
> >>
> >> thanks for running the tests.
> >>
> >> In message <4B8F8BB4.6070201@denx.de> you wrote:
> >>> here the results:
> >>>
> >>> run version
> >>>
> >>> 1-4 2.6.33-rc6 without your patches
> >>> 5-8 2.6.33-rc6 with all your patches
> >>> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touc=
h ACCESSED
> >> when no SWAP)
> >>> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy
> >> So CONFIG_PIN_TLB imroves the performance as expected, while the o=
ther
> >> patches don;t show any measurable improvememt - or am I reading th=
e
> >> results incorrectly?
> >
> > Close but not quite. What stands out most is:
> >
> > Memory latencies in nanoseconds - smaller is better
> > (WARNING - may not be correct, check graphs)
> > -------------------------------------------------------------------=
-----------
> > Host OS Mhz L1 $ L2 $ Main mem Rand mem=
Guesses
> > --------- ------------- --- ---- ---- -------- --------=
-------
> > tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7=
> > tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3=
> > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6=
> > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2=
> >
> > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6=
No L2 cache?
> >
> > tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2=
No L2 cache?
> >
> > tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0=
No L2 cache?
> > tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3=
No L2 cache?
> >
> >
> > Besides the numbers, note how the first group doesn't have a Guesse=
s entry.
> > Is there something odd with the results for the first group?
>
> Hmm.. just to be safe, I made this test again, but it shows also no e=
ntry in
> "Guesses" ... Hardware, Linux Source, rootFS, lmbench sources, all th=
e
> same ...
OK
>
> > Also, since you are using MODULES, patch 2 is nullified.
> > Patch 1 is very minor and should not show I think.
> > This leaves patches 3 & 4.
> > There appears to be something funny with patch 3,Don't touch ACCESS=
ED when no SWAP, as
> > it yields bad numbers for Prot Fault so perhaps I am missing someth=
ing that
> needs ACCESSED
> > even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
> > Is there any messages in the kernel log(dmesg)?
>
> I couldn;t find something in the output with dmesg ... but if you
> want this output, I can send it to you.
No, if you can't find anything in there, I won't either.
What would be interesting is to skip patch 3 and turn off
MODULES add PIN_TLB and compare that against your unpatched .33 but
with MODULES off and PIN_TLB on
Jocke=
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-04 13:06 ` Joakim Tjernlund
@ 2010-03-04 16:30 ` Heiko Schocher
2010-03-05 10:40 ` Joakim Tjernlund
2010-03-07 16:03 ` Joakim Tjernlund
0 siblings, 2 replies; 30+ messages in thread
From: Heiko Schocher @ 2010-03-04 16:30 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
Joakim Tjernlund wrote:
> Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
>> From: Wolfgang Denk <wd@denx.de>
>> To: hs@denx.de
>> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-Jürgen
>> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
>> <scottwood@freescale.com>
>> Date: 2010/03/04 13:17
>> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
>>
>> Dear Heiko,
>>
>> thanks for running the tests.
>>
>> In message <4B8F8BB4.6070201@denx.de> you wrote:
>>> here the results:
>>>
>>> run version
>>>
>>> 1-4 2.6.33-rc6 without your patches
>>> 5-8 2.6.33-rc6 with all your patches
>>> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED
>> when no SWAP)
>>> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y
>> So CONFIG_PIN_TLB imroves the performance as expected, while the other
>> patches don;t show any measurable improvememt - or am I reading the
>> results incorrectly?
>
> Close but not quite. What stands out most is:
>
> Memory latencies in nanoseconds - smaller is better
> (WARNING - may not be correct, check graphs)
> ------------------------------------------------------------------------------
> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
> --------- ------------- --- ---- ---- -------- -------- -------
> tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7
> tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3
> tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6
> tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2
>
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache?
>
> tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2 No L2 cache?
>
> tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3 No L2 cache?
>
>
> Besides the numbers, note how the first group doesn't have a Guesses entry.
> Is there something odd with the results for the first group?
Hmm.. just to be safe, I made this test again, but it shows also no entry in
"Guesses" ... Hardware, Linux Source, rootFS, lmbench sources, all the
same ...
> Also, since you are using MODULES, patch 2 is nullified.
> Patch 1 is very minor and should not show I think.
> This leaves patches 3 & 4.
> There appears to be something funny with patch 3,Don't touch ACCESSED when no SWAP, as
> it yields bad numbers for Prot Fault so perhaps I am missing something that needs ACCESSED
> even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
> Is there any messages in the kernel log(dmesg)?
I couldn;t find something in the output with dmesg ... but if you
want this output, I can send it to you.
bye
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-04 12:16 ` Wolfgang Denk
@ 2010-03-04 13:06 ` Joakim Tjernlund
2010-03-04 16:30 ` Heiko Schocher
0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-04 13:06 UTC (permalink / raw)
To: Wolfgang Denk; +Cc: Scott Wood, linuxppc-dev, hs
Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56:
> From: Wolfgang Denk <wd@denx.de>
> To: hs@denx.de
> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrgen
> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood
> <scottwood@freescale.com>
> Date: 2010/03/04 13:17
> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
>
> Dear Heiko,
>
> thanks for running the tests.
>
> In message <4B8F8BB4.6070201@denx.de> you wrote:
> >
> > here the results:
> >
> > run version
> >
> > 1-4 2.6.33-rc6 without your patches
> > 5-8 2.6.33-rc6 with all your patches
> > 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch =
ACCESSED
> when no SWAP)
> > 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy
>
> So CONFIG_PIN_TLB imroves the performance as expected, while the othe=
r
> patches don;t show any measurable improvememt - or am I reading the
> results incorrectly?
Close but not quite. What stands out most is:
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
-----------------------------------------------------------------------=
-------
Host OS Mhz L1 $ L2 $ Main mem Rand mem =
Guesses
--------- ------------- --- ---- ---- -------- -------- =
-------
tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7
tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0 =
No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3 =
No L2 cache?
Besides the numbers, note how the first group doesn't have a Guesses en=
try.
Is there something odd with the results for the first group?
Also, since you are using MODULES, patch 2 is nullified.
Patch 1 is very minor and should not show I think.
This leaves patches 3 & 4.
There appears to be something funny with patch 3,Don't touch ACCESSED w=
hen no SWAP, as
it yields bad numbers for Prot Fault so perhaps I am missing something =
that needs ACCESSED
even if NO_SWAP. Perhaps a someone that knows MM in Linux knows?
Is there any messages in the kernel log(dmesg)?
Jocke=
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-04 10:30 ` Heiko Schocher
@ 2010-03-04 12:16 ` Wolfgang Denk
2010-03-04 13:06 ` Joakim Tjernlund
0 siblings, 1 reply; 30+ messages in thread
From: Wolfgang Denk @ 2010-03-04 12:16 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev
Dear Heiko,
thanks for running the tests.
In message <4B8F8BB4.6070201@denx.de> you wrote:
>
> here the results:
>
> run version
>
> 1-4 2.6.33-rc6 without your patches
> 5-8 2.6.33-rc6 with all your patches
> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP)
> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y
So CONFIG_PIN_TLB imroves the performance as expected, while the other
patches don;t show any measurable improvememt - or am I reading the
results incorrectly?
Best regards,
Wolfgang Denk
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
And now remains That we find out the cause of this effect, Or rather
say, the cause of this defect... -- Hamlet, Act II, Scene 2
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-03 10:38 ` Joakim Tjernlund
@ 2010-03-04 10:30 ` Heiko Schocher
2010-03-04 12:16 ` Wolfgang Denk
0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-04 10:30 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
Joakim Tjernlund wrote:
> Could you try reverting patch:
> 8xx: Don't touch ACCESSED when no SWAP.
> and see if that makes a difference?
[...]
> Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement,
> regardless of my patches.
here the results:
run version
1-4 2.6.33-rc6 without your patches
5-8 2.6.33-rc6 with all your patches
9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP)
13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y
> Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement,
> regardless of my patches.
make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
L M B E N C H 3 . 0 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0100 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0100 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.1700 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0100 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx Linux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
tqm8xx Linux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
tqm8xx Linux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
tqm8xx Linux 2.6.33- 66 3.06 8.83 128. 1355 269. 20.7 89.2 6927 29.K 87.K
tqm8xx Linux 2.6.33- 66 3.05 8.84 127. 1344 271. 21.6 90.5 6868 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.06 8.84 131. 1376 260. 21.4 88.1 7119 29.K 87.K
tqm8xx Linux 2.6.33- 66 3.05 8.90 122. 1342 272. 21.4 88.6 6847 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.19 9.10 122. 1205 265. 20.9 90.3 6358 27.K 83.K
tqm8xx Linux 2.6.33- 66 3.28 9.10 124. 1208 270. 20.9 95.2 6217 27.K 82.K
tqm8xx Linux 2.6.33- 66 3.19 8.98 125. 1210 270. 21.1 87.9 6364 27.K 83.K
tqm8xx Linux 2.6.33- 66 3.19 8.86 124. 1237 262. 21.3 90.7 6311 27.K 84.K
Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host OS intgr intgr intgr intgr intgr
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15.7 18.0 1.5600 124.2 203.1
tqm8xx Linux 2.6.33- 15.7 17.4 1.5800 121.1 202.8
tqm8xx Linux 2.6.33- 15.2 17.9 1.6200 124.2 202.7
tqm8xx Linux 2.6.33- 15.2 17.9 1.6000 125.0 204.0
tqm8xx Linux 2.6.33- 15.7 18.1 1.5600 124.7 204.4
tqm8xx Linux 2.6.33- 15.7 18.1 1.5800 124.2 202.8
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.2
tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 124.5 202.0
tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 124.5 202.6
tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 121.0 196.5
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 121.0 202.5
tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 125.1 196.4
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 202.1
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.4
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 196.4
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 196.5
Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS int64 int64 int64 int64 int64
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15. 13.3 1952.2 1838.2
tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1837.8
tqm8xx Linux 2.6.33- 15. 13.2 1886.7 1907.8
tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1838.2
tqm8xx Linux 2.6.33- 15. 13.3 1887.0 1902.2
tqm8xx Linux 2.6.33- 15. 13.3 1887.4 1901.5
tqm8xx Linux 2.6.33- 15. 13.3 1886.7 1893.0
tqm8xx Linux 2.6.33- 15. 13.3 1950.0 1900.4
tqm8xx Linux 2.6.33- 15. 13.3 1955.2 1906.7
tqm8xx Linux 2.6.33- 15. 13.2 1943.7 1900.7
tqm8xx Linux 2.6.33- 15. 13.3 1958.2 1910.4
tqm8xx Linux 2.6.33- 15. 13.3 1886.7 1900.7
tqm8xx Linux 2.6.33- 15. 13.3 1943.7 1837.4
tqm8xx Linux 2.6.33- 15. 13.2 1944.1 1837.4
tqm8xx Linux 2.6.33- 15. 13.2 1944.4 1906.1
tqm8xx Linux 2.6.33- 15. 13.2 1957.8 1894.8
Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host OS float float float float
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1008.9 1629.2 5527.0 9895.0
tqm8xx Linux 2.6.33- 1008.9 1628.9 5495.0 9892.0
tqm8xx Linux 2.6.33- 1007.8 1622.0 5499.0 9886.0
tqm8xx Linux 2.6.33- 1016.5 1628.6 5319.0 9940.0
tqm8xx Linux 2.6.33- 1008.0 1628.3 5497.0 9879.0
tqm8xx Linux 2.6.33- 1007.6 1577.4 5495.0 9881.0
tqm8xx Linux 2.6.33- 1014.8 1627.1 5493.0 9889.0
tqm8xx Linux 2.6.33- 1004.6 1627.7 5487.0 9881.0
tqm8xx Linux 2.6.33- 1003.8 1627.1 5490.0 9875.0
tqm8xx Linux 2.6.33- 977.2 1628.0 5318.0 9924.0
tqm8xx Linux 2.6.33- 1007.4 1627.7 5490.0 9882.0
tqm8xx Linux 2.6.33- 1004.7 1628.0 5495.0 9891.0
tqm8xx Linux 2.6.33- 1011.6 1630.1 5484.0 9855.0
tqm8xx Linux 2.6.33- 977.0 1621.4 5469.0 9856.0
tqm8xx Linux 2.6.33- 1011.4 1621.4 5471.0 9856.0
tqm8xx Linux 2.6.33- 1004.9 1577.1 5470.0 9866.0
Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS double double double double
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1562.4 2782.8 3730.7 12.6K
tqm8xx Linux 2.6.33- 1556.1 2781.5 3724.3 12.6K
tqm8xx Linux 2.6.33- 1513.9 2801.0 3726.4 12.8K
tqm8xx Linux 2.6.33- 1556.1 2780.9 3611.4 12.6K
tqm8xx Linux 2.6.33- 1570.5 2772.6 3742.1 12.6K
tqm8xx Linux 2.6.33- 1560.1 2703.0 3611.4 12.7K
tqm8xx Linux 2.6.33- 1560.4 2779.5 3760.7 12.7K
tqm8xx Linux 2.6.33- 1559.8 2773.0 3742.1 12.6K
tqm8xx Linux 2.6.33- 1564.7 2699.0 3722.1 12.6K
tqm8xx Linux 2.6.33- 1560.7 2790.0 3725.7 12.7K
tqm8xx Linux 2.6.33- 1565.0 2780.0 3749.3 12.7K
tqm8xx Linux 2.6.33- 1560.4 2700.0 3767.1 12.8K
tqm8xx Linux 2.6.33- 1555.5 2772.1 3747.9 12.6K
tqm8xx Linux 2.6.33- 1513.5 2772.5 3725.7 12.6K
tqm8xx Linux 2.6.33- 1557.0 2772.5 3725.7 12.7K
tqm8xx Linux 2.6.33- 1514.1 2773.5 3719.3 12.7K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx Linux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3
tqm8xx Linux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7
tqm8xx Linux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1
tqm8xx Linux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0
tqm8xx Linux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0
tqm8xx Linux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8
tqm8xx Linux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9
tqm8xx Linux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4
tqm8xx Linux 2.6.33- 96.9 112.4 95.4 138.3 165.1 152.2 196.4
tqm8xx Linux 2.6.33- 100.6 115.8 109.3 138.5 173.3 150.9 199.2
tqm8xx Linux 2.6.33- 102.2 114.3 109.4 140.9 175.5 153.2 202.0
tqm8xx Linux 2.6.33- 99.1 114.5 106.5 138.2 174.7 151.7 199.9
tqm8xx Linux 2.6.33- 69.5 80.5 88.9 119.6 147.3 130.4 178.7
tqm8xx Linux 2.6.33- 85.8 97.6 79.1 122.3 154.1 132.6 180.1
tqm8xx Linux 2.6.33- 89.4 93.8 125.7 120.8 178.4 129.5 206.1
tqm8xx Linux 2.6.33- 88.1 101.8 91.2 121.4 162.8 131.6 191.4
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749
tqm8xx Linux 2.6.33- 95.8 334.0 595. 725.0 1051. 2754
tqm8xx Linux 2.6.33- 95.8 330.9 574. 720.1 1047. 2772
tqm8xx Linux 2.6.33- 92.9 338.8 574. 714.3 1046. 2742
tqm8xx Linux 2.6.33- 90.8 322.1 576. 734.9 1012. 2706
tqm8xx Linux 2.6.33- 100.1 326.0 565. 719.5 1027. 2702
tqm8xx Linux 2.6.33- 98.7 322.8 571. 713.8 1028. 2711
tqm8xx Linux 2.6.33- 92.0 328.1 549. 714.1 1022. 2696
tqm8xx Linux 2.6.33- 96.9 327.0 573. 722.3 1036. 2721
tqm8xx Linux 2.6.33- 100.6 330.4 561. 723.8 1024. 2726
tqm8xx Linux 2.6.33- 102.2 331.4 590. 728.6 1040. 2753
tqm8xx Linux 2.6.33- 99.1 330.1 585. 723.5 1023. 2750
tqm8xx Linux 2.6.33- 69.5 265.9 447. 632.6 909.0 2431
tqm8xx Linux 2.6.33- 85.8 267.0 492. 650.6 909.4 2455
tqm8xx Linux 2.6.33- 89.4 295.6 493. 643.0 908.8 2453
tqm8xx Linux 2.6.33- 88.1 301.0 494. 645.1 907.9 2451
*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS UDP RPC/ TCP RPC/ TCP
UDP TCP conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2
tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4
tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1
tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0
tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1
tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9
tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9
tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3
tqm8xx Linux 2.6.33- 5747.1 3921.6 32.3K 4329.0 4107.0 17.7 35.6 131.2
tqm8xx Linux 2.6.33- 5952.4 3937.0 31.2K 4273.5 4119.0 25.4 35.8 136.4
tqm8xx Linux 2.6.33- 5848.0 3937.0 32.3K 4484.3 4223.0 14.3 35.4 135.1
tqm8xx Linux 2.6.33- 6172.8 3984.1 35.7K 4291.8 4210.0 14.4 36.0 135.0
tqm8xx Linux 2.6.33- 5291.0 3610.1 31.2K 4065.0 3836.0 1.389 30.0 135.7
tqm8xx Linux 2.6.33- 5524.9 3649.6 29.4K 3906.2 3867.0 14.9 29.8 137.7
tqm8xx Linux 2.6.33- 5319.1 3649.6 29.4K 4048.6 3873.0 13.3 30.3 135.9
tqm8xx Linux 2.6.33- 5347.6 3623.2 32.3K 3921.6 3894.0 13.3 30.4 135.8
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx Linux 2.6.33- 14.8 15.6 10.1 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.6 10.7 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.7 12.7 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.6 13.9 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.7 32.4 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.8 13.0 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.6 32.4 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.7 15.7 12.8 21.0 55.6 32.4 34.6 55.7 53.1
tqm8xx Linux 2.6.33- 14.6 15.7 12.8 21.0 55.6 32.4 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.7 12.8 21.0 55.6 32.4 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 15.0 16.0 13.2 21.3 55.8 32.5 34.7 55.9 53.2
tqm8xx Linux 2.6.33- 15.0 16.0 13.4 21.3 55.8 32.5 34.7 55.8 53.2
tqm8xx Linux 2.6.33- 15.0 16.0 13.9 21.3 55.8 32.5 34.7 55.9 53.2
tqm8xx Linux 2.6.33- 15.0 16.0 13.2 21.2 55.8 32.5 34.6 55.9 53.2
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
--------- ------------- --- ---- ---- -------- -------- -------
tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7
tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3 No L2 cache?
make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
bye
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-03 10:10 ` Heiko Schocher
@ 2010-03-03 10:38 ` Joakim Tjernlund
2010-03-04 10:30 ` Heiko Schocher
0 siblings, 1 reply; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-03 10:38 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Heiko Schocher <hs@denx.de> wrote on 2010/03/03 11:10:10:
>
> Hello Joakim,
>
> Joakim Tjernlund wrote:
> > Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
> [...]
> >> Here the results:
> >> (The first 4 rows are the results for the kernel without your patches,
> >> the next 4 rows are the results for the kernel with your patches)
> >>
> >> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
> >
> > I see both ups and downs in this test, don't quite understand why.
> > What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?
>
> Sorry, forgot to say, where to find the sources. You can find them
> here:
>
> http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx
OK, so you got SWAP=no, MODULES=yes, CPU6=no, CPU15=no
PIN_TLB isn't listed in you def config so I assume
it is no?
MODULES=yes nullifies one optimization.
I don't understand the bad numbers for Prot Fault:
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2
tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4
tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1
tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0
tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1
tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9
tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9
tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3
Could you try reverting patch:
8xx: Don't touch ACCESSED when no SWAP.
and see if that makes a difference?
Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement,
regardless of my patches.
Jocke
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-03 8:48 ` Joakim Tjernlund
2010-03-03 8:59 ` Joakim Tjernlund
@ 2010-03-03 10:10 ` Heiko Schocher
2010-03-03 10:38 ` Joakim Tjernlund
1 sibling, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-03 10:10 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
Joakim Tjernlund wrote:
> Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
[...]
>> Here the results:
>> (The first 4 rows are the results for the kernel without your patches,
>> the next 4 rows are the results for the kernel with your patches)
>>
>> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
>
> I see both ups and downs in this test, don't quite understand why.
> What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?
Sorry, forgot to say, where to find the sources. You can find them
here:
http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx
bye
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-03 8:48 ` Joakim Tjernlund
@ 2010-03-03 8:59 ` Joakim Tjernlund
2010-03-03 10:10 ` Heiko Schocher
1 sibling, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-03 8:59 UTC (permalink / raw)
Cc: Scott Wood, linuxppc-dev, hs, Wolfgang Denk
>
> Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
> >
> > Hello Joakim,
> >
> > I tried your 4 patches on a MPC855M based system:
>
> Thanks a lot for testing this for me!
>
> >
> > -bash-3.2# cat /proc/cpuinfo
> > processor : 0
> > cpu : 8xx
> > clock : 66.000000MHz
> > revision : 0.0 (pvr 0050 0000)
> > bogomips : 8.25
> > timebase : 4125000
> > platform : TQM8xx
> > model : TQM8xx
> > Memory : 32 MB
> > -bash-3.2# cat /proc/version
> > Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version
> > 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
> > -bash-3.2#
> >
> > First I looked for the Boottime:
> >
> > Booting Linux:
> >
> > 2.6.33 2.6.33tunned
> > ... until "Freeing unused kernel memory" message (= enter user space) ~4s ~4s
> > ... until "login:" message (= full multi-user mode) 56s 56s
> >
> > and I did a Performance test with lmbench, see:
> > http://sourceforge.net/projects/lmbench
> >
> > Here the results:
> > (The first 4 rows are the results for the kernel without your patches,
> > the next 4 rows are the results for the kernel with your patches)
> >
> > make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
>
> I see both ups and downs in this test, don't quite understand why.
> What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?
Forgot to ask for PIN_TLB too
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
2010-03-03 8:02 Heiko Schocher
@ 2010-03-03 8:48 ` Joakim Tjernlund
2010-03-03 8:59 ` Joakim Tjernlund
2010-03-03 10:10 ` Heiko Schocher
0 siblings, 2 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-03 8:48 UTC (permalink / raw)
To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47:
>
> Hello Joakim,
>
> I tried your 4 patches on a MPC855M based system:
Thanks a lot for testing this for me!
>
> -bash-3.2# cat /proc/cpuinfo
> processor : 0
> cpu : 8xx
> clock : 66.000000MHz
> revision : 0.0 (pvr 0050 0000)
> bogomips : 8.25
> timebase : 4125000
> platform : TQM8xx
> model : TQM8xx
> Memory : 32 MB
> -bash-3.2# cat /proc/version
> Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version
> 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
> -bash-3.2#
>
> First I looked for the Boottime:
>
> Booting Linux:
>
> 2.6.33 2.6.33tunned
> ... until "Freeing unused kernel memory" message (= enter user space) ~4s ~4s
> ... until "login:" message (= full multi-user mode) 56s 56s
>
> and I did a Performance test with lmbench, see:
> http://sourceforge.net/projects/lmbench
>
> Here the results:
> (The first 4 rows are the results for the kernel without your patches,
> the next 4 rows are the results for the kernel with your patches)
>
> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
I see both ups and downs in this test, don't quite understand why.
What is your config w.r.t SWAP, MODULES, CPU6 and CPU15?
>
> L M B E N C H 3 . 0 S U M M A R Y
> ------------------------------------
> (Alpha software, do not distribute)
>
> Basic system parameters
> ------------------------------------------------------------------------------
> Host OS Description Mhz tlb cache mem scal
> pages line par load
> bytes
> --------- ------------- ----------------------- ---- ----- ----- ------ ----
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
> tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
>
> Processor, Processes - times in microseconds - smaller is better
> ------------------------------------------------------------------------------
> Host OS Mhz null null open slct sig sig fork exec sh
> call I/O stat clos TCP inst hndl proc proc proc
> --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
> tqm8xx Linux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
> tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
> tqm8xx Linux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
> tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
> tqm8xx Linux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
> tqm8xx Linux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
> tqm8xx Linux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
> tqm8xx Linux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
>
[SNIP integer/float test, these are not relevant]
>
> Context switching - times in microseconds - smaller is better
> -------------------------------------------------------------------------
> Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
> ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
> --------- ------------- ------ ------ ------ ------ ------ ------- -------
> tqm8xx Linux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3
> tqm8xx Linux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7
> tqm8xx Linux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1
> tqm8xx Linux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0
> tqm8xx Linux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0
> tqm8xx Linux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8
> tqm8xx Linux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9
> tqm8xx Linux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4
>
> *Local* Communication latencies in microseconds - smaller is better
> ---------------------------------------------------------------------
> Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
> ctxsw UNIX UDP TCP conn
> --------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
> tqm8xx Linux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749
> tqm8xx Linux 2.6.33- 95.8 334.0 595. 725.0 1051. 2754
> tqm8xx Linux 2.6.33- 95.8 330.9 574. 720.1 1047. 2772
> tqm8xx Linux 2.6.33- 92.9 338.8 574. 714.3 1046. 2742
> tqm8xx Linux 2.6.33- 90.8 322.1 576. 734.9 1012. 2706
> tqm8xx Linux 2.6.33- 100.1 326.0 565. 719.5 1027. 2702
> tqm8xx Linux 2.6.33- 98.7 322.8 571. 713.8 1028. 2711
> tqm8xx Linux 2.6.33- 92.0 328.1 549. 714.1 1022. 2696
>
> *Remote* Communication latencies in microseconds - smaller is better
> ---------------------------------------------------------------------
> Host OS UDP RPC/ TCP RPC/ TCP
> UDP TCP conn
> --------- ------------- ----- ----- ----- ----- ----
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
> tqm8xx Linux 2.6.33-
>
> File & VM system latencies in microseconds - smaller is better
> -------------------------------------------------------------------------------
> Host OS 0K File 10K File Mmap Prot Page 100fd
> Create Delete Create Delete Latency Fault Fault selct
> --------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
> tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2
> tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4
> tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1
> tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0
> tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1
> tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9
> tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9
> tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3
>
> *Local* Communication bandwidths in MB/s - bigger is better
> -----------------------------------------------------------------------------
> Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
> UNIX reread reread (libc) (hand) read write
> --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
> tqm8xx Linux 2.6.33- 14.8 15.6 10.1 21.0 55.5 32.3 34.5 55.6 53.0
> tqm8xx Linux 2.6.33- 14.8 15.6 10.7 21.0 55.5 32.3 34.5 55.6 53.0
> tqm8xx Linux 2.6.33- 14.8 15.7 12.7 21.0 55.5 32.3 34.5 55.6 53.0
> tqm8xx Linux 2.6.33- 14.8 15.6 13.9 21.0 55.5 32.3 34.5 55.6 53.0
> tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
> tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.7 32.4 34.6 55.8 53.1
> tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
> tqm8xx Linux 2.6.33- 14.8 15.8 13.0 21.0 55.7 32.5 34.6 55.8 53.1
>
> Memory latencies in nanoseconds - smaller is better
> (WARNING - may not be correct, check graphs)
> ------------------------------------------------------------------------------
> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
> --------- ------------- --- ---- ---- -------- -------- -------
> tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7
> tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3
> tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6
> tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache?
> tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache?
> make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
>
> bye
> Heiko
>
> --
> DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-03-03 8:02 Heiko Schocher
2010-03-03 8:48 ` Joakim Tjernlund
0 siblings, 1 reply; 30+ messages in thread
From: Heiko Schocher @ 2010-03-03 8:02 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
I tried your 4 patches on a MPC855M based system:
-bash-3.2# cat /proc/cpuinfo
processor : 0
cpu : 8xx
clock : 66.000000MHz
revision : 0.0 (pvr 0050 0000)
bogomips : 8.25
timebase : 4125000
platform : TQM8xx
model : TQM8xx
Memory : 32 MB
-bash-3.2# cat /proc/version
Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
-bash-3.2#
First I looked for the Boottime:
Booting Linux:
2.6.33 2.6.33tunned
... until "Freeing unused kernel memory" message (= enter user space) ~4s ~4s
... until "login:" message (= full multi-user mode) 56s 56s
and I did a Performance test with lmbench, see:
http://sourceforge.net/projects/lmbench
Here the results:
(The first 4 rows are the results for the kernel without your patches,
the next 4 rows are the results for the kernel with your patches)
make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
L M B E N C H 3 . 0 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx Linux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
tqm8xx Linux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
tqm8xx Linux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host OS intgr intgr intgr intgr intgr
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15.7 18.0 1.5600 124.2 203.1
tqm8xx Linux 2.6.33- 15.7 17.4 1.5800 121.1 202.8
tqm8xx Linux 2.6.33- 15.2 17.9 1.6200 124.2 202.7
tqm8xx Linux 2.6.33- 15.2 17.9 1.6000 125.0 204.0
tqm8xx Linux 2.6.33- 15.7 18.1 1.5600 124.7 204.4
tqm8xx Linux 2.6.33- 15.7 18.1 1.5800 124.2 202.8
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.2
tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 124.5 202.0
Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS int64 int64 int64 int64 int64
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15. 13.3 1952.2 1838.2
tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1837.8
tqm8xx Linux 2.6.33- 15. 13.2 1886.7 1907.8
tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1838.2
tqm8xx Linux 2.6.33- 15. 13.3 1887.0 1902.2
tqm8xx Linux 2.6.33- 15. 13.3 1887.4 1901.5
tqm8xx Linux 2.6.33- 15. 13.3 1886.7 1893.0
tqm8xx Linux 2.6.33- 15. 13.3 1950.0 1900.4
Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host OS float float float float
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1008.9 1629.2 5527.0 9895.0
tqm8xx Linux 2.6.33- 1008.9 1628.9 5495.0 9892.0
tqm8xx Linux 2.6.33- 1007.8 1622.0 5499.0 9886.0
tqm8xx Linux 2.6.33- 1016.5 1628.6 5319.0 9940.0
tqm8xx Linux 2.6.33- 1008.0 1628.3 5497.0 9879.0
tqm8xx Linux 2.6.33- 1007.6 1577.4 5495.0 9881.0
tqm8xx Linux 2.6.33- 1014.8 1627.1 5493.0 9889.0
tqm8xx Linux 2.6.33- 1004.6 1627.7 5487.0 9881.0
Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS double double double double
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1562.4 2782.8 3730.7 12.6K
tqm8xx Linux 2.6.33- 1556.1 2781.5 3724.3 12.6K
tqm8xx Linux 2.6.33- 1513.9 2801.0 3726.4 12.8K
tqm8xx Linux 2.6.33- 1556.1 2780.9 3611.4 12.6K
tqm8xx Linux 2.6.33- 1570.5 2772.6 3742.1 12.6K
tqm8xx Linux 2.6.33- 1560.1 2703.0 3611.4 12.7K
tqm8xx Linux 2.6.33- 1560.4 2779.5 3760.7 12.7K
tqm8xx Linux 2.6.33- 1559.8 2773.0 3742.1 12.6K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx Linux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3
tqm8xx Linux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7
tqm8xx Linux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1
tqm8xx Linux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0
tqm8xx Linux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0
tqm8xx Linux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8
tqm8xx Linux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9
tqm8xx Linux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749
tqm8xx Linux 2.6.33- 95.8 334.0 595. 725.0 1051. 2754
tqm8xx Linux 2.6.33- 95.8 330.9 574. 720.1 1047. 2772
tqm8xx Linux 2.6.33- 92.9 338.8 574. 714.3 1046. 2742
tqm8xx Linux 2.6.33- 90.8 322.1 576. 734.9 1012. 2706
tqm8xx Linux 2.6.33- 100.1 326.0 565. 719.5 1027. 2702
tqm8xx Linux 2.6.33- 98.7 322.8 571. 713.8 1028. 2711
tqm8xx Linux 2.6.33- 92.0 328.1 549. 714.1 1022. 2696
*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS UDP RPC/ TCP RPC/ TCP
UDP TCP conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2
tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4
tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1
tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0
tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1
tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9
tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9
tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx Linux 2.6.33- 14.8 15.6 10.1 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.6 10.7 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.7 12.7 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.6 13.9 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.7 32.4 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.8 13.0 21.0 55.7 32.5 34.6 55.8 53.1
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
--------- ------------- --- ---- ---- -------- -------- -------
tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7
tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache?
make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
bye
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-03-02 15:37 Joakim Tjernlund
0 siblings, 0 replies; 30+ messages in thread
From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw)
To: linuxppc-dev, Scott Wood
This set of tries to optimize the TLB code on 8xx even
more. If they work, it should be a noticable performance
boost.
I would be very happy if you could test them for me.
- v2:
Since Scott has done some testing of these patches I resend
them with my SOB.
Scott, can you "bless" these patches too?
Joakim Tjernlund (4):
8xx: Optimze TLB Miss handlers
8xx: Avoid testing for kernel space in ITLB Miss.
8xx: Don't touch ACCESSED when no SWAP.
8xx: Use SPRG2 and DAR registers to stash r11 and cr.
arch/powerpc/kernel/head_8xx.S | 70 +++++++++++++++++++++++++++-------------
1 files changed, 47 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2010-03-17 7:40 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-26 8:29 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund
2010-02-26 8:29 ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund
2010-03-16 21:20 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Benjamin Herrenschmidt
2010-03-17 7:40 ` Joakim Tjernlund
2010-03-16 21:19 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Benjamin Herrenschmidt
2010-03-17 7:35 ` Joakim Tjernlund
2010-02-26 19:50 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Scott Wood
2010-02-27 15:23 ` Joakim Tjernlund
2010-02-26 20:10 ` Kumar Gala
2010-02-27 15:25 ` Joakim Tjernlund
2010-03-02 15:37 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund
2010-03-03 8:02 Heiko Schocher
2010-03-03 8:48 ` Joakim Tjernlund
2010-03-03 8:59 ` Joakim Tjernlund
2010-03-03 10:10 ` Heiko Schocher
2010-03-03 10:38 ` Joakim Tjernlund
2010-03-04 10:30 ` Heiko Schocher
2010-03-04 12:16 ` Wolfgang Denk
2010-03-04 13:06 ` Joakim Tjernlund
2010-03-04 16:30 ` Heiko Schocher
2010-03-05 10:40 ` Joakim Tjernlund
2010-03-08 7:46 ` Heiko Schocher
2010-03-08 8:44 ` Joakim Tjernlund
2010-03-08 9:06 ` Heiko Schocher
2010-03-08 10:42 ` Joakim Tjernlund
2010-03-09 6:30 ` Wolfgang Denk
2010-03-07 16:03 ` Joakim Tjernlund
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).