All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
@ 2007-02-14 19:08 Marcelo Tosatti
  2007-02-14 19:55 ` Dave Jones
  2007-02-14 21:16 ` Alan
  0 siblings, 2 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2007-02-14 19:08 UTC (permalink / raw)
  To: Jordan Crouse, Andrew Morton; +Cc: linux-kernel


movntq instruction is supported by Geode CPU's, so use
fast_clear_page/fast_copy_page versions that have it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/i386/lib/mmx.c b/arch/i386/lib/mmx.c
index 28084d2..ddc1421 100644
--- a/arch/i386/lib/mmx.c
+++ b/arch/i386/lib/mmx.c
@@ -121,7 +121,7 @@ void *_mmx_memcpy(void *to, const void *
 	return p;
 }
 
-#ifdef CONFIG_MK7
+#if defined (CONFIG_MK7) || defined(CONFIG_MGEODE_LX)
 
 /*
  *	The K7 has streaming cache bypass load/store. The Cyrix III, K6 and

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
  2007-02-14 19:08 [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode Marcelo Tosatti
@ 2007-02-14 19:55 ` Dave Jones
  2007-02-14 20:17   ` Marcelo Tosatti
  2007-02-14 21:16 ` Alan
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Jones @ 2007-02-14 19:55 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jordan Crouse, Andrew Morton, linux-kernel

On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
 > 
 > movntq instruction is supported by Geode CPU's, so use
 > fast_clear_page/fast_copy_page versions that have it.

it's supported, but is it a win ?
The same was also true of the VIA C3/C7's, but due to
poor memory bandwidth, it turned out to be slower in most cases.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
  2007-02-14 19:55 ` Dave Jones
@ 2007-02-14 20:17   ` Marcelo Tosatti
  2007-02-14 20:47     ` Dave Jones
  2007-02-14 21:23     ` Arjan van de Ven
  0 siblings, 2 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2007-02-14 20:17 UTC (permalink / raw)
  To: Dave Jones, Marcelo Tosatti, Jordan Crouse, Andrew Morton, linux-kernel

On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
>  > 
>  > movntq instruction is supported by Geode CPU's, so use
>  > fast_clear_page/fast_copy_page versions that have it.
> 
> it's supported, but is it a win ?
> The same was also true of the VIA C3/C7's, but due to
> poor memory bandwidth, it turned out to be slower in most cases.

Do you have the numbers for VIA C3/C7 around?

The Geode benefits from movntq instead of movq:

[marcelo@localhost ~]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : Geode by NSC
cpu family      : 5
model           : 5
model name      : Geode(TM) Integrated Processor by National Semi
stepping        : 2
cpu MHz         : 364.898
cache size      : 32 KB
...

[marcelo@localhost ~]$ wget http://www.fenrus.demon.nl/athlon.c
...

[marcelo@localhost ~]$ ./athlon
Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
clear_page() tests
clear_page function 'warm up run'        took 9565 cycles per page
clear_page function '2.4 non MMX'        took 3347 cycles per page
clear_page function '2.4 MMX fallback'   took 3389 cycles per page
clear_page function '2.4 MMX version'    took 2920 cycles per page
clear_page function 'faster_clear_page'  took 2912 cycles per page
clear_page function 'even_faster_clear'  took 2863 cycles per page

copy_page() tests
copy_page function 'warm up run'         took 9409 cycles per page
copy_page function '2.4 non MMX'         took 13161 cycles per page
copy_page function '2.4 MMX fallback'    took 13033 cycles per page
copy_page function '2.4 MMX version'     took 9288 cycles per page
copy_page function 'faster_copy'         took 9806 cycles per page
copy_page function 'even_faster'         took 8990 cycles per page


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
  2007-02-14 20:17   ` Marcelo Tosatti
@ 2007-02-14 20:47     ` Dave Jones
  2007-02-14 21:23     ` Arjan van de Ven
  1 sibling, 0 replies; 8+ messages in thread
From: Dave Jones @ 2007-02-14 20:47 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jordan Crouse, Andrew Morton, linux-kernel

On Wed, Feb 14, 2007 at 06:17:36PM -0200, Marcelo Tosatti wrote:
 > On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
 > > On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
 > >  > 
 > >  > movntq instruction is supported by Geode CPU's, so use
 > >  > fast_clear_page/fast_copy_page versions that have it.
 > > 
 > > it's supported, but is it a win ?
 > > The same was also true of the VIA C3/C7's, but due to
 > > poor memory bandwidth, it turned out to be slower in most cases.
 > 
 > Do you have the numbers for VIA C3/C7 around?

I don't, and my 3dnow capable C3s are unplugged right now.
The newer generation (including the C7) have SSE/SSE2 instead,
which seems to be faster.  (Using a different benchmark app that uses SSE)

clear_page function 'normal clear_page()'        took 9425 cycles per page (620.3 MB/s)
clear_page function 'new clear_page()   '        took 3840 cycles per page (1522.7 MB/s)

copy_page function 'normal copy_page()'  took 11453 cycles per page (510.5 MB/s)
copy_page function 'new copy_page()   '  took 5024 cycles per page (1163.7 MB/s)


		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
  2007-02-14 19:08 [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode Marcelo Tosatti
  2007-02-14 19:55 ` Dave Jones
@ 2007-02-14 21:16 ` Alan
  2007-02-15 15:01   ` Marcelo Tosatti
  1 sibling, 1 reply; 8+ messages in thread
From: Alan @ 2007-02-14 21:16 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jordan Crouse, Andrew Morton, linux-kernel

On Wed, 14 Feb 2007 17:08:39 -0200
Marcelo Tosatti <marcelo@kvack.org> wrote:

> 
> movntq instruction is supported by Geode CPU's, so use
> fast_clear_page/fast_copy_page versions that have it.

Is it actually faster for macro performance not just microbenchmarking ?

Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
  2007-02-14 20:17   ` Marcelo Tosatti
  2007-02-14 20:47     ` Dave Jones
@ 2007-02-14 21:23     ` Arjan van de Ven
  1 sibling, 0 replies; 8+ messages in thread
From: Arjan van de Ven @ 2007-02-14 21:23 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Dave Jones, Jordan Crouse, Andrew Morton, linux-kernel

On Wed, 2007-02-14 at 18:17 -0200, Marcelo Tosatti wrote:
> On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> > On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> >  > 
> >  > movntq instruction is supported by Geode CPU's, so use
> >  > fast_clear_page/fast_copy_page versions that have it.
> > 
> > it's supported, but is it a win ?
> > The same was also true of the VIA C3/C7's, but due to
> > poor memory bandwidth, it turned out to be slower in most cases.
> 
> Do you have the numbers for VIA C3/C7 around?
> 
> The Geode benefits from movntq instead of movq:
> 
> [marcelo@localhost ~]$ cat /proc/cpuinfo
> processor       : 0
> vendor_id       : Geode by NSC
> cpu family      : 5
> model           : 5
> model name      : Geode(TM) Integrated Processor by National Semi
> stepping        : 2
> cpu MHz         : 364.898
> cache size      : 32 KB
> ...
> 
> [marcelo@localhost ~]$ wget http://www.fenrus.demon.nl/athlon.c
> ...


btw there is a caveat with this program: you don't see that this evicts
the data RIGHT AFTER THE COPY, so if you use it again you pay AGAIN the
memory bandwidth price...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
  2007-02-14 21:16 ` Alan
@ 2007-02-15 15:01   ` Marcelo Tosatti
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2007-02-15 15:01 UTC (permalink / raw)
  To: Alan; +Cc: Marcelo Tosatti, Jordan Crouse, Andrew Morton, linux-kernel

On Wed, Feb 14, 2007 at 09:16:46PM +0000, Alan wrote:
> On Wed, 14 Feb 2007 17:08:39 -0200
> Marcelo Tosatti <marcelo@kvack.org> wrote:
> 
> > 
> > movntq instruction is supported by Geode CPU's, so use
> > fast_clear_page/fast_copy_page versions that have it.
> 
> Is it actually faster for macro performance not just microbenchmarking ?

A COW intensive private mmap() benchmark shows the kernel spending
_more_ time inside mmx_copy_page() with movntq than with movq.

So its not clear whether the patch is actually a win, please drop it.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode
@ 2007-02-14 19:17 Marcelo Tosatti
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2007-02-14 19:17 UTC (permalink / raw)
  To: Jordan Crouse, Andrew Morton; +Cc: linux-kernel


movntq instruction is supported by Geode CPU's, so use
fast_clear_page/fast_copy_page versions that have it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/i386/lib/mmx.c b/arch/i386/lib/mmx.c
index 28084d2..ddc1421 100644
--- a/arch/i386/lib/mmx.c
+++ b/arch/i386/lib/mmx.c
@@ -121,7 +121,7 @@ void *_mmx_memcpy(void *to, const void *
 	return p;
 }
 
-#ifdef CONFIG_MK7
+#if defined (CONFIG_MK7) || defined(CONFIG_MGEODE_LX)
 
 /*
  *	The K7 has streaming cache bypass load/store. The Cyrix III, K6 and


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-02-15 15:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-14 19:08 [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode Marcelo Tosatti
2007-02-14 19:55 ` Dave Jones
2007-02-14 20:17   ` Marcelo Tosatti
2007-02-14 20:47     ` Dave Jones
2007-02-14 21:23     ` Arjan van de Ven
2007-02-14 21:16 ` Alan
2007-02-15 15:01   ` Marcelo Tosatti
2007-02-14 19:17 Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.