All of lore.kernel.org
 help / color / mirror / Atom feed
* Some benchmarks on ARM
@ 2010-07-02 18:02 Robert Schwebel
  2010-07-02 20:34 ` Magnus Lilja
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Robert Schwebel @ 2010-07-02 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

We have recently made some benchmarks, in order to get a little bit
better fealing about where ARM cpus are today, especially when it comes
to the "recent" ones, and in comparism to the Atom. So we collected a
few benchmarks (most from lmbench) and did some actual measurements.

Here is a little article:
http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html

I'm pretty sure that there are quite a few things where people on ALKML
have good ideas where the effects come from or how to improve the
methodology - so I'd be glad to get some feedback from the community!

All measurements have been done on 2.6.34.

Thanks,

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
@ 2010-07-02 20:34 ` Magnus Lilja
  2010-07-05 12:24   ` Marc Kleine-Budde
  2010-07-03  5:44 ` Nicolas Pitre
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Magnus Lilja @ 2010-07-02 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robert,

On 2010-07-02 20:02, Robert Schwebel wrote:
> Hi,
> 
> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it comes
> to the "recent" ones, and in comparism to the Atom. So we collected a
> few benchmarks (most from lmbench) and did some actual measurements.
> 
> Here is a little article:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
> 
> I'm pretty sure that there are quite a few things where people on ALKML
> have good ideas where the effects come from or how to improve the
> methodology - so I'd be glad to get some feedback from the community!


It would be nice if you could add the exact command lines you used for the different tests so it's easy to run the same tests on other boards as well. I suppose you added some flag to gcc to make use of the floating point hardware in i.MX35?

Could also add information on the RAM characteristics of each board? DDR1/DDR2, 16 bit/32 bit wide, and speed (MHz9.

 

Regards, Magnus Lilja

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
  2010-07-02 20:34 ` Magnus Lilja
@ 2010-07-03  5:44 ` Nicolas Pitre
  2010-07-05 13:04   ` Maurus Cuelenaere
  2010-07-03 19:48 ` Baruch Siach
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2010-07-03  5:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2 Jul 2010, Robert Schwebel wrote:

> Hi,
> 
> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it comes
> to the "recent" ones, and in comparism to the Atom. So we collected a
> few benchmarks (most from lmbench) and did some actual measurements.
> 
> Here is a little article:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
> 
> I'm pretty sure that there are quite a few things where people on ALKML
> have good ideas where the effects come from or how to improve the
> methodology - so I'd be glad to get some feedback from the community!

It would be nice if you could add measurements for recent Marvell 
products there, such as the Kirkwood (think SheevaPlug or the like 
running at 1.2 GHz), or Dove.  I wold expect memory throughput on those 
to be quite good.

Also, your article is completely missing on the other metric which is 
power consumption.  After all what would be most interesting is not the 
various performance numbers per MHz but those performance numbers per 
Watt.


Nicolas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
  2010-07-02 20:34 ` Magnus Lilja
  2010-07-03  5:44 ` Nicolas Pitre
@ 2010-07-03 19:48 ` Baruch Siach
  2010-07-03 20:08 ` Gilles Chanteperdrix
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Baruch Siach @ 2010-07-03 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robert,

On Fri, Jul 02, 2010 at 08:02:57PM +0200, Robert Schwebel wrote:
> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it comes
> to the "recent" ones, and in comparism to the Atom. So we collected a
> few benchmarks (most from lmbench) and did some actual measurements.
> 
> Here is a little article:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html

The ARM architecture version in the list of tested hardware is wrong for the 
ARM1136 and Cortex-A8 cores. Should be ARMv6 and ARMv7, respectively.

baruch

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch at tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
                   ` (2 preceding siblings ...)
  2010-07-03 19:48 ` Baruch Siach
@ 2010-07-03 20:08 ` Gilles Chanteperdrix
  2010-07-03 20:28   ` Russell King - ARM Linux
  2010-07-05  8:51 ` Colin Tuckley
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-03 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

Robert Schwebel wrote:
> Hi,
> 
> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it comes
> to the "recent" ones, and in comparism to the Atom. So we collected a
> few benchmarks (most from lmbench) and did some actual measurements.
> 
> Here is a little article:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
> 
> I'm pretty sure that there are quite a few things where people on ALKML
> have good ideas where the effects come from or how to improve the
> methodology - so I'd be glad to get some feedback from the community!
> 
> All measurements have been done on 2.6.34.

The context switch time for PXA270 looks really suspicious. The worst
case context switch time of an AT91RM9200, an armv4 running at 180MHz,
is less than 300us, so, I doubt that the context switch time of a PXA
can be that worse.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-03 20:08 ` Gilles Chanteperdrix
@ 2010-07-03 20:28   ` Russell King - ARM Linux
  2010-07-04  9:47     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2010-07-03 20:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Jul 03, 2010 at 10:08:29PM +0200, Gilles Chanteperdrix wrote:
> Robert Schwebel wrote:
> > Hi,
> > 
> > We have recently made some benchmarks, in order to get a little bit
> > better fealing about where ARM cpus are today, especially when it comes
> > to the "recent" ones, and in comparism to the Atom. So we collected a
> > few benchmarks (most from lmbench) and did some actual measurements.
> > 
> > Here is a little article:
> > http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
> > 
> > I'm pretty sure that there are quite a few things where people on ALKML
> > have good ideas where the effects come from or how to improve the
> > methodology - so I'd be glad to get some feedback from the community!
> > 
> > All measurements have been done on 2.6.34.
> 
> The context switch time for PXA270 looks really suspicious. The worst
> case context switch time of an AT91RM9200, an armv4 running at 180MHz,
> is less than 300us, so, I doubt that the context switch time of a PXA
> can be that worse.

The measurement is of the thread and MM switch time, which'll involve
cache flushes on <= ARMv5.

On PXA, we have to 'read' (via means of D cache line allocations) 32K
of data into the cache in order to cause the existing data to be written
out.  These are done from a range of addresses which don't exist in the
page tables, and so should not cause any bus activity other than the
write-outs.  However, the cache still has to interact with the MMU to
try to fetch the requested data - which probably consumes some cycles.

ARM920 on the other hand can walk through every cache line and clean+
invalidate it.  It has 64 lines in each segment, and 8 segments, each
line 32 bytes long, which gives a cache size of 16K.

I'd therefore expect ARM920's cache flushing to be quicker (in terms
of cycles consumed) than PXA.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-03 20:28   ` Russell King - ARM Linux
@ 2010-07-04  9:47     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-04  9:47 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Sat, Jul 03, 2010 at 10:08:29PM +0200, Gilles Chanteperdrix wrote:
>> Robert Schwebel wrote:
>>> Hi,
>>>
>>> We have recently made some benchmarks, in order to get a little bit
>>> better fealing about where ARM cpus are today, especially when it comes
>>> to the "recent" ones, and in comparism to the Atom. So we collected a
>>> few benchmarks (most from lmbench) and did some actual measurements.
>>>
>>> Here is a little article:
>>> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
>>>
>>> I'm pretty sure that there are quite a few things where people on ALKML
>>> have good ideas where the effects come from or how to improve the
>>> methodology - so I'd be glad to get some feedback from the community!
>>>
>>> All measurements have been done on 2.6.34.
>> The context switch time for PXA270 looks really suspicious. The worst
>> case context switch time of an AT91RM9200, an armv4 running at 180MHz,
>> is less than 300us, so, I doubt that the context switch time of a PXA
>> can be that worse.
> 
> The measurement is of the thread and MM switch time, which'll involve
> cache flushes on <= ARMv5.
> 
> On PXA, we have to 'read' (via means of D cache line allocations) 32K
> of data into the cache in order to cause the existing data to be written
> out.  These are done from a range of addresses which don't exist in the
> page tables, and so should not cause any bus activity other than the
> write-outs.  However, the cache still has to interact with the MMU to
> try to fetch the requested data - which probably consumes some cycles.
> 
> ARM920 on the other hand can walk through every cache line and clean+
> invalidate it.  It has 64 lines in each segment, and 8 segments, each
> line 32 bytes long, which gives a cache size of 16K.
> 
> I'd therefore expect ARM920's cache flushing to be quicker (in terms
> of cycles consumed) than PXA.

Ok. But, the cache flush only accounts for only around 70us on the 290us
of context switch time of the AT91RM9200. Lokks like the cache flush on
PXA would have to be a lot longer.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
                   ` (3 preceding siblings ...)
  2010-07-03 20:08 ` Gilles Chanteperdrix
@ 2010-07-05  8:51 ` Colin Tuckley
  2010-07-05 12:29   ` Marc Kleine-Budde
  2010-07-05 12:41   ` Marc Kleine-Budde
  2010-07-29 16:54 ` Robert Schwebel
  2010-08-19  5:36 ` shiraz hashim
  6 siblings, 2 replies; 29+ messages in thread
From: Colin Tuckley @ 2010-07-05  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

> -----Original Message-----
> kernel-bounces at lists.infradead.org] On Behalf Of Robert Schwebel
> Sent: 02 July 2010 19:03

> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it comes
> to the "recent" ones, and in comparism to the Atom. So we collected a
> few benchmarks (most from lmbench) and did some actual measurements.

You have the family wrong for (at least) the Cortex-A8, it's a v7. This probably accounts for some of it's slow relative performance since you probably used incorrect compiler switches.

> All measurements have been done on 2.6.34.

How were the kernels compiled? Was thumb mode used for the ones that support it?

You didn't mention the cache state anywhere - it can make a big difference.

Regards,

Colin

--
Colin Tuckley - ARM Ltd.
110 Fulbourn Rd
Cambridge, CB1 9NJ
Tel: +44 1223 400536

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 20:34 ` Magnus Lilja
@ 2010-07-05 12:24   ` Marc Kleine-Budde
  2010-07-05 14:00     ` Russell King - ARM Linux
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2010-07-05 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

Magnus Lilja wrote:
> Hi Robert,
> 
> On 2010-07-02 20:02, Robert Schwebel wrote:
>> Hi,
>>
>> We have recently made some benchmarks, in order to get a little bit
>> better fealing about where ARM cpus are today, especially when it comes
>> to the "recent" ones, and in comparism to the Atom. So we collected a
>> few benchmarks (most from lmbench) and did some actual measurements.
>>
>> Here is a little article:
>> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
>>
>> I'm pretty sure that there are quite a few things where people on ALKML
>> have good ideas where the effects come from or how to improve the
>> methodology - so I'd be glad to get some feedback from the community!
> 
> 

> It would be nice if you could add the exact command lines you used
> for the different tests so it's easy to run the same tests on other
> boards as well. I suppose you added some flag to gcc to make use of
> the floating point hardware in i.MX35?

We used a compiler generating hardware floating point instruction by
default:

gcc was configured with:
"--with-float=softfp --with-fpu=vfp --with-cpu=arm1136jf-s"

here the complete output of gcc -v:
Using built-in specs.
Target: arm-1136jfs-linux-gnueabi
Configured with:
/home/frogger/pengutronix/toolchain/releases/OSELAS.Toolchain-1.99.3/platform-arm-1136jfs-linux-gnueabi-gcc-4.3.2-glibc-2.8-binutils-2.19-kernel-2.6.27-sanitized/build-cross/gcc-4.3.2/configure
--target=arm-1136jfs-linux-gnueabi
--with-sysroot=/home/frogger/pengutronix/toolchain/releases/OSELAS.Toolchain-1.99.3/inst/opt/OSELAS.Toolchain-1.99.3/arm-1136jfs-linux-gnueabi/gcc-4.3.2-glibc-2.8-binutils-2.19-kernel-2.6.27-sanitized/sysroot-arm-1136jfs-linux-gnueabi
--disable-multilib --with-float=softfp --with-fpu=vfp
--with-cpu=arm1136jf-s --enable-__cxa_atexit --disable-sjlj-exceptions
--disable-nls --disable-decimal-float --disable-fixed-point
--disable-win32-registry --enable-symvers=gnu
--with-pkgversion=OSELAS.Toolchain-1.99.3
--with-gmp=/home/frogger/pengutronix/toolchain/releases/OSELAS.Toolchain-1.99.3/platform-arm-1136jfs-linux-gnueabi-gcc-4.3.2-glibc-2.8-binutils-2.19-kernel-2.6.27-sanitized/sysroot-host
--with-mpfr=/home/frogger/pengutronix/toolchain/releases/OSELAS.Toolchain-1.99.3/platform-arm-1136jfs-linux-gnueabi-gcc-4.3.2-glibc-2.8-binutils-2.19-kernel-2.6.27-sanitized/sysroot-host
--prefix=/home/frogger/pengutronix/toolchain/releases/OSELAS.Toolchain-1.99.3/inst/opt/OSELAS.Toolchain-1.99.3/arm-1136jfs-linux-gnueabi/gcc-4.3.2-glibc-2.8-binutils-2.19-kernel-2.6.27-sanitized
--enable-languages=c,c++ --enable-threads=posix --enable-c99
--enable-long-long --enable-libstdcxx-debug --enable-profile
--enable-shared --enable-libssp --enable-checking=release
Thread model: posix
gcc version 4.3.2 (OSELAS.Toolchain-1.99.3)

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100705/3df9031f/attachment.sig>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05  8:51 ` Colin Tuckley
@ 2010-07-05 12:29   ` Marc Kleine-Budde
  2010-07-05 12:41   ` Marc Kleine-Budde
  1 sibling, 0 replies; 29+ messages in thread
From: Marc Kleine-Budde @ 2010-07-05 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

Colin Tuckley wrote:
>> -----Original Message-----
>> kernel-bounces at lists.infradead.org] On Behalf Of Robert Schwebel
>> Sent: 02 July 2010 19:03
> 
>> We have recently made some benchmarks, in order to get a little bit
>> better fealing about where ARM cpus are today, especially when it comes
>> to the "recent" ones, and in comparism to the Atom. So we collected a
>> few benchmarks (most from lmbench) and did some actual measurements.
> 
> You have the family wrong for (at least) the Cortex-A8, it's a v7. This probably accounts for some of it's slow relative performance since you probably used incorrect compiler switches.
> 
>> All measurements have been done on 2.6.34.
> 
> How were the kernels compiled? Was thumb mode used for the ones that support it?

gcc version 4.3.2 was for all kernels. The kernel .config will be added
to the website.

> You didn't mention the cache state anywhere - it can make a big difference.

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100705/811861cb/attachment.sig>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05  8:51 ` Colin Tuckley
  2010-07-05 12:29   ` Marc Kleine-Budde
@ 2010-07-05 12:41   ` Marc Kleine-Budde
  2010-07-05 12:45     ` Marc Kleine-Budde
  1 sibling, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2010-07-05 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

Colin Tuckley wrote:
>> -----Original Message-----
>> kernel-bounces at lists.infradead.org] On Behalf Of Robert Schwebel
>> Sent: 02 July 2010 19:03
> 
>> We have recently made some benchmarks, in order to get a little bit
>> better fealing about where ARM cpus are today, especially when it comes
>> to the "recent" ones, and in comparism to the Atom. So we collected a
>> few benchmarks (most from lmbench) and did some actual measurements.
> 
> You have the family wrong for (at least) the Cortex-A8, it's a v7. This probably accounts for some of it's slow relative performance since you probably used incorrect compiler switches.
> 
>> All measurements have been done on 2.6.34.
> 
> How were the kernels compiled? Was thumb mode used for the ones that support it?
> 
> You didn't mention the cache state anywhere - it can make a big difference.

After booting all tests are done via ssh, the system was "idle" otherwise.

We're just repeating the tests with "echo 1 > /proc/sys/vm/drop_caches"
prior to each test.

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100705/c98aa45f/attachment.sig>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 12:41   ` Marc Kleine-Budde
@ 2010-07-05 12:45     ` Marc Kleine-Budde
  0 siblings, 0 replies; 29+ messages in thread
From: Marc Kleine-Budde @ 2010-07-05 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

Marc Kleine-Budde wrote:
> Colin Tuckley wrote:
>>> -----Original Message-----
>>> kernel-bounces at lists.infradead.org] On Behalf Of Robert Schwebel
>>> Sent: 02 July 2010 19:03
>>> We have recently made some benchmarks, in order to get a little bit
>>> better fealing about where ARM cpus are today, especially when it comes
>>> to the "recent" ones, and in comparism to the Atom. So we collected a
>>> few benchmarks (most from lmbench) and did some actual measurements.
>> You have the family wrong for (at least) the Cortex-A8, it's a v7. This probably accounts for some of it's slow relative performance since you probably used incorrect compiler switches.
>>
>>> All measurements have been done on 2.6.34.
>> How were the kernels compiled? Was thumb mode used for the ones that support it?
>>
>> You didn't mention the cache state anywhere - it can make a big difference.
> 
> After booting all tests are done via ssh, the system was "idle" otherwise.
> 
> We're just repeating the tests with "echo 1 > /proc/sys/vm/drop_caches"
> prior to each test.

We'll use "sync; echo 3 > /proc/sys/vm/drop_caches"

Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100705/af25163f/attachment.sig>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-03  5:44 ` Nicolas Pitre
@ 2010-07-05 13:04   ` Maurus Cuelenaere
  2010-07-05 13:23     ` Robert Schwebel
  0 siblings, 1 reply; 29+ messages in thread
From: Maurus Cuelenaere @ 2010-07-05 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

Op 03-07-10 07:44, Nicolas Pitre schreef:
> On Fri, 2 Jul 2010, Robert Schwebel wrote:
> 
>> Hi,
>>
>> We have recently made some benchmarks, in order to get a little bit
>> better fealing about where ARM cpus are today, especially when it comes
>> to the "recent" ones, and in comparism to the Atom. So we collected a
>> few benchmarks (most from lmbench) and did some actual measurements.
>>
>> Here is a little article:
>> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
>>
>> I'm pretty sure that there are quite a few things where people on ALKML
>> have good ideas where the effects come from or how to improve the
>> methodology - so I'd be glad to get some feedback from the community!
> 
> It would be nice if you could add measurements for recent Marvell 
> products there, such as the Kirkwood (think SheevaPlug or the like 
> running at 1.2 GHz), or Dove.  I wold expect memory throughput on those 
> to be quite good.

Some quick tests of lmbench on a Sheevaplug:

mcuelenaere at kot:/dev/shm/lmbench3/bin/armv5tel-linux-gnu$ ./lat_ops 
integer bit: 0.85 nanoseconds
integer add: 0.02 nanoseconds
integer mul: 0.42 nanoseconds
integer div: 147.77 nanoseconds
integer mod: 36.94 nanoseconds
int64 bit: 1.71 nanoseconds
int64 add: 0.04 nanoseconds
int64 mul: 0.92 nanoseconds
int64 div: 425.89 nanoseconds
int64 mod: 273.85 nanoseconds
float add: 36.25 nanoseconds
float mul: 30.32 nanoseconds
float div: 161.29 nanoseconds
double add: 51.21 nanoseconds
double mul: 46.31 nanoseconds
double div: 542.06 nanoseconds
float bogomflops: 325.59 nanoseconds
double bogomflops: 799.14 nanoseconds

mcuelenaere at kot:/dev/shm/lmbench3/bin/armv5tel-linux-gnu$ mbw 128
Long uses 4 bytes. Allocating 2*33554432 elements = 268435456 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0	Method: MEMCPY	Elapsed: 0.48203	MiB: 128.00000	Copy: 265.546 MiB/s
1	Method: MEMCPY	Elapsed: 0.48165	MiB: 128.00000	Copy: 265.751 MiB/s
2	Method: MEMCPY	Elapsed: 0.48163	MiB: 128.00000	Copy: 265.764 MiB/s
3	Method: MEMCPY	Elapsed: 0.49714	MiB: 128.00000	Copy: 257.473 MiB/s
4	Method: MEMCPY	Elapsed: 0.48168	MiB: 128.00000	Copy: 265.737 MiB/s
5	Method: MEMCPY	Elapsed: 0.48163	MiB: 128.00000	Copy: 265.764 MiB/s
6	Method: MEMCPY	Elapsed: 0.49695	MiB: 128.00000	Copy: 257.570 MiB/s
7	Method: MEMCPY	Elapsed: 0.48196	MiB: 128.00000	Copy: 265.579 MiB/s
8	Method: MEMCPY	Elapsed: 0.48164	MiB: 128.00000	Copy: 265.761 MiB/s
9	Method: MEMCPY	Elapsed: 0.49695	MiB: 128.00000	Copy: 257.570 MiB/s
AVG	Method: MEMCPY	Elapsed: 0.48633	MiB: 128.00000	Copy: 263.198 MiB/s
0	Method: DUMB	Elapsed: 0.29804	MiB: 128.00000	Copy: 429.475 MiB/s
1	Method: DUMB	Elapsed: 0.29807	MiB: 128.00000	Copy: 429.429 MiB/s
2	Method: DUMB	Elapsed: 0.29815	MiB: 128.00000	Copy: 429.310 MiB/s
3	Method: DUMB	Elapsed: 0.29800	MiB: 128.00000	Copy: 429.530 MiB/s
4	Method: DUMB	Elapsed: 0.31337	MiB: 128.00000	Copy: 408.458 MiB/s
5	Method: DUMB	Elapsed: 0.29805	MiB: 128.00000	Copy: 429.462 MiB/s
6	Method: DUMB	Elapsed: 0.29808	MiB: 128.00000	Copy: 429.411 MiB/s
7	Method: DUMB	Elapsed: 0.29801	MiB: 128.00000	Copy: 429.510 MiB/s
8	Method: DUMB	Elapsed: 0.29809	MiB: 128.00000	Copy: 429.403 MiB/s
9	Method: DUMB	Elapsed: 0.31339	MiB: 128.00000	Copy: 408.437 MiB/s
AVG	Method: DUMB	Elapsed: 0.30113	MiB: 128.00000	Copy: 425.072 MiB/s
0	Method: MCBLOCK	Elapsed: 0.21906	MiB: 128.00000	Copy: 584.317 MiB/s
1	Method: MCBLOCK	Elapsed: 0.21554	MiB: 128.00000	Copy: 593.852 MiB/s
2	Method: MCBLOCK	Elapsed: 0.21577	MiB: 128.00000	Copy: 593.238 MiB/s
3	Method: MCBLOCK	Elapsed: 0.21671	MiB: 128.00000	Copy: 590.646 MiB/s
4	Method: MCBLOCK	Elapsed: 0.21479	MiB: 128.00000	Copy: 595.942 MiB/s
5	Method: MCBLOCK	Elapsed: 0.23519	MiB: 128.00000	Copy: 544.232 MiB/s
6	Method: MCBLOCK	Elapsed: 0.21705	MiB: 128.00000	Copy: 589.734 MiB/s
7	Method: MCBLOCK	Elapsed: 0.59684	MiB: 128.00000	Copy: 214.464 MiB/s
8	Method: MCBLOCK	Elapsed: 0.21699	MiB: 128.00000	Copy: 589.889 MiB/s
9	Method: MCBLOCK	Elapsed: 0.21418	MiB: 128.00000	Copy: 597.642 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.25621	MiB: 128.00000	Copy: 499.589 MiB/s

Couldn't get lat_ctx to work.

mcuelenaere at kot:/dev/shm/lmbench3/bin/armv5tel-linux-gnu$ ./lat_syscall open
Simple open/close: 7.2754 microseconds
mcuelenaere at kot:/dev/shm/lmbench3/bin/armv5tel-linux-gnu$ ./lat_syscall open /dev/shm/lmbench3.tar 
Simple open/close: 6.9399 microseconds

mcuelenaere at kot:/dev/shm/lmbench3/bin/armv5tel-linux-gnu$ ./lat_proc fork
Process fork+exit: 763.5714 microseconds

mcuelenaere at kot:/dev/shm/lmbench3/bin/armv5tel-linux-gnu$ cat /proc/cpuinfo 
Processor	: Feroceon 88FR131 rev 1 (v5l)
BogoMIPS	: 1192.75
Features	: swp half thumb fastmult edsp 
CPU implementer	: 0x56
CPU architecture: 5TE
CPU variant	: 0x2
CPU part	: 0x131
CPU revision	: 1

Hardware	: Marvell SheevaPlug Reference Board
Revision	: 0000
Serial		: 0000000000000000


I'm not sure if I'm doing this right, but it looks like the Sheevaplug beats all ARM chips (except
on FP) on the tests done at [1]. Looks like these tests heavily depend on the clock frequency.

[1]: http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html

-- 
Maurus Cuelenaere

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 13:04   ` Maurus Cuelenaere
@ 2010-07-05 13:23     ` Robert Schwebel
  2010-07-05 13:31       ` Mike Rapoport
  2010-07-06 14:02       ` Pavel Machek
  0 siblings, 2 replies; 29+ messages in thread
From: Robert Schwebel @ 2010-07-05 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

Maurus,

On Mon, Jul 05, 2010 at 03:04:33PM +0200, Maurus Cuelenaere wrote:
> Some quick tests of lmbench on a Sheevaplug:

Thanks a lot - we'll update the document soon, wiht the exact command
lines included.

> I'm not sure if I'm doing this right, but it looks like the Sheevaplug
> beats all ARM chips (except on FP) on the tests done at [1]. Looks
> like these tests heavily depend on the clock frequency.

If anyone has an idea for better benchmarks, I'd be very interested. We
have been searching for benchmarks which:

- can be easily cross compiled to all involved platforms
- show the different aspects of the systems

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 13:23     ` Robert Schwebel
@ 2010-07-05 13:31       ` Mike Rapoport
  2010-07-05 13:42         ` Robert Schwebel
  2010-07-05 13:53         ` Marek Vasut
  2010-07-06 14:02       ` Pavel Machek
  1 sibling, 2 replies; 29+ messages in thread
From: Mike Rapoport @ 2010-07-05 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Robert Schwebel wrote:
> Maurus,
> 
> On Mon, Jul 05, 2010 at 03:04:33PM +0200, Maurus Cuelenaere wrote:
>> Some quick tests of lmbench on a Sheevaplug:
> 
> Thanks a lot - we'll update the document soon, wiht the exact command
> lines included.
> 
>> I'm not sure if I'm doing this right, but it looks like the Sheevaplug
>> beats all ARM chips (except on FP) on the tests done at [1]. Looks
>> like these tests heavily depend on the clock frequency.
> 
> If anyone has an idea for better benchmarks, I'd be very interested. We
> have been searching for benchmarks which:
> 
> - can be easily cross compiled to all involved platforms
> - show the different aspects of the systems

Native kernel build? ;-)

> rsc


-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 13:31       ` Mike Rapoport
@ 2010-07-05 13:42         ` Robert Schwebel
  2010-07-05 14:15           ` Nicolas Pitre
  2010-07-05 13:53         ` Marek Vasut
  1 sibling, 1 reply; 29+ messages in thread
From: Robert Schwebel @ 2010-07-05 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 05, 2010 at 04:31:27PM +0300, Mike Rapoport wrote:
> Native kernel build? ;-)

Hmm, I don't have a native compiler for all platforms yet.

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 13:31       ` Mike Rapoport
  2010-07-05 13:42         ` Robert Schwebel
@ 2010-07-05 13:53         ` Marek Vasut
  1 sibling, 0 replies; 29+ messages in thread
From: Marek Vasut @ 2010-07-05 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

Dne Po 5. ?ervence 2010 15:31:27 Mike Rapoport napsal(a):
> Robert Schwebel wrote:
> > Maurus,
> > 
> > On Mon, Jul 05, 2010 at 03:04:33PM +0200, Maurus Cuelenaere wrote:
> >> Some quick tests of lmbench on a Sheevaplug:
> > Thanks a lot - we'll update the document soon, wiht the exact command
> > lines included.
> > 
> >> I'm not sure if I'm doing this right, but it looks like the Sheevaplug
> >> beats all ARM chips (except on FP) on the tests done at [1]. Looks
> >> like these tests heavily depend on the clock frequency.
> > 
> > If anyone has an idea for better benchmarks, I'd be very interested. We
> > have been searching for benchmarks which:
> > 
> > - can be easily cross compiled to all involved platforms
> > - show the different aspects of the systems
> 
> Native kernel build? ;-)

about 7 hours on xscale-pxa270 with 256MB of SDRAM ... compiling all xscale 
platforms in :)

> 
> > rsc

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 12:24   ` Marc Kleine-Budde
@ 2010-07-05 14:00     ` Russell King - ARM Linux
  2010-07-05 15:14       ` Måns Rullgård
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King - ARM Linux @ 2010-07-05 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 05, 2010 at 02:24:03PM +0200, Marc Kleine-Budde wrote:
> We used a compiler generating hardware floating point instruction by
> default:
> 
> gcc was configured with:
> "--with-float=softfp --with-fpu=vfp --with-cpu=arm1136jf-s"

This is unclear that it's generating hardware floating point instructions.
The --with-float=softfp suggests its using soft-fp.

Please check whether the resulting binaries are using hard-fp or soft-fp
by looking for VFP instructions.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 13:42         ` Robert Schwebel
@ 2010-07-05 14:15           ` Nicolas Pitre
  2010-07-06  5:36             ` Mike Rapoport
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2010-07-05 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 5 Jul 2010, Robert Schwebel wrote:

> On Mon, Jul 05, 2010 at 04:31:27PM +0300, Mike Rapoport wrote:
> > Native kernel build? ;-)
> 
> Hmm, I don't have a native compiler for all platforms yet.

Kernel compile is a really bad test in this context anyway, unless you 
compile the exact same kernel target using the same config with the SAME 
gcc version on all test machines, and using the same filesystem type and 
medium.

It is best to keep kernel compilation test for validation of 
improvements to a single system.


Nicolas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 14:00     ` Russell King - ARM Linux
@ 2010-07-05 15:14       ` Måns Rullgård
  0 siblings, 0 replies; 29+ messages in thread
From: Måns Rullgård @ 2010-07-05 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

> On Mon, Jul 05, 2010 at 02:24:03PM +0200, Marc Kleine-Budde wrote:
>> We used a compiler generating hardware floating point instruction by
>> default:
>> 
>> gcc was configured with:
>> "--with-float=softfp --with-fpu=vfp --with-cpu=arm1136jf-s"
>
> This is unclear that it's generating hardware floating point instructions.
> The --with-float=softfp suggests its using soft-fp.

That configuration will generate hardware vfp instructions while using
softfloat calling conventions (arguments passed in integer registers).
A pure softfloat configuration would be --with-float=soft.

-- 
M?ns Rullg?rd
mans at mansr.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 14:15           ` Nicolas Pitre
@ 2010-07-06  5:36             ` Mike Rapoport
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Rapoport @ 2010-07-06  5:36 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas Pitre wrote:
> On Mon, 5 Jul 2010, Robert Schwebel wrote:
> 
>> On Mon, Jul 05, 2010 at 04:31:27PM +0300, Mike Rapoport wrote:
>>> Native kernel build? ;-)
>> Hmm, I don't have a native compiler for all platforms yet.
> 
> Kernel compile is a really bad test in this context anyway, unless you 
> compile the exact same kernel target using the same config with the SAME 
> gcc version on all test machines, and using the same filesystem type and 
> medium.

It's quite possible with, say, Debian on USB disk...

> It is best to keep kernel compilation test for validation of 
> improvements to a single system.
> 
> 
> Nicolas


-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-05 13:23     ` Robert Schwebel
  2010-07-05 13:31       ` Mike Rapoport
@ 2010-07-06 14:02       ` Pavel Machek
  1 sibling, 0 replies; 29+ messages in thread
From: Pavel Machek @ 2010-07-06 14:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi!

> > I'm not sure if I'm doing this right, but it looks like the Sheevaplug
> > beats all ARM chips (except on FP) on the tests done at [1]. Looks
> > like these tests heavily depend on the clock frequency.
> 
> If anyone has an idea for better benchmarks, I'd be very interested. We
> have been searching for benchmarks which:
> 
> - can be easily cross compiled to all involved platforms
> - show the different aspects of the systems

You tested stuff like syscalls/fork latencies... but usually most of
the time is spent in kernel. What about pure userlevel computing
stuff, like mp3 encoding (lame), maybe video transcoding, maybe grep
could be benchmarked, and maybe factor is worth benchmarking
(http://pavelmachek.livejournal.com/77425.html)...
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
                   ` (4 preceding siblings ...)
  2010-07-05  8:51 ` Colin Tuckley
@ 2010-07-29 16:54 ` Robert Schwebel
  2010-07-30 10:19   ` Richard Cochran
  2010-08-19  5:36 ` shiraz hashim
  6 siblings, 1 reply; 29+ messages in thread
From: Robert Schwebel @ 2010-07-29 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 02, 2010 at 08:02:57PM +0200, Robert Schwebel wrote:
> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it
> comes to the "recent" ones, and in comparism to the Atom.

Thanks to everyone who posted feedback!

An updated version of the article is now here:
http://www.pengutronix.de/development/kernel/arm-benchmarks-20100729_en.html

Changes:

- all kernels ported to 2.6.34
- Atom without -rt
- all kernel configs available
- all benchmark commandlines added
- more info about compilers and generated code
- info about memory types and bus widths

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-29 16:54 ` Robert Schwebel
@ 2010-07-30 10:19   ` Richard Cochran
  2010-07-30 11:40     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Cochran @ 2010-07-30 10:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 29, 2010 at 06:54:13PM +0200, Robert Schwebel wrote:
> Thanks to everyone who posted feedback!
> 
> An updated version of the article is now here:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100729_en.html

Nice report.

I would be interesting for me if you could give the FCSE patch a try
on the v5 machines. Any chance of that happening?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-30 10:19   ` Richard Cochran
@ 2010-07-30 11:40     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 29+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-30 11:40 UTC (permalink / raw)
  To: linux-arm-kernel

Richard Cochran wrote:
> On Thu, Jul 29, 2010 at 06:54:13PM +0200, Robert Schwebel wrote:
>> Thanks to everyone who posted feedback!
>>
>> An updated version of the article is now here:
>> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100729_en.html
> 
> Nice report.
> 
> I would be interesting for me if you could give the FCSE patch a try
> on the v5 machines. Any chance of that happening?

As I already said, I am very suspicious about the results of this
benchmark on PXA. We get user-space scheduling latencies under 300us on
PXA with Xenomai, and as you know, the worst case user-space scheduling
latency includes a context switch, so, this means that the context
switch is less than 300us. However, these benchmarks show some context
switches around 600us, so I suspect the measurement measures more than
just a context switch, maybe the execution of a long standing interrupt
or more probably a soft irq.
The gain induced by the FCSE patch is between 50 and 100us on the
machines where we measured it, so, it will not make a big difference on
a context switch time of 600us.

Anyway, I have not worked on the FCSE patch for 2.6.34, I was waiting
for 2.6.35 to be released to work on the two at a time, but if anyone is
interested, I can get it working before that.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
                   ` (5 preceding siblings ...)
  2010-07-29 16:54 ` Robert Schwebel
@ 2010-08-19  5:36 ` shiraz hashim
  2010-08-19  6:28   ` Robert Schwebel
  6 siblings, 1 reply; 29+ messages in thread
From: shiraz hashim @ 2010-08-19  5:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robert,

On Fri, Jul 2, 2010 at 11:32 PM, Robert Schwebel
<r.schwebel@pengutronix.de> wrote:
> Hi,
>
> We have recently made some benchmarks, in order to get a little bit
> better fealing about where ARM cpus are today, especially when it comes
> to the "recent" ones, and in comparism to the Atom. So we collected a
> few benchmarks (most from lmbench) and did some actual measurements.
>
> Here is a little article:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html

Thanks for this link, it was quite informative. Can you also please
share the detailed report of lmbench tests which you run on these
platforms. For example, average cpu bandwidth figures (as in bw_mem
test) are not giving information on significant impact of caches and
DDR.

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-08-19  5:36 ` shiraz hashim
@ 2010-08-19  6:28   ` Robert Schwebel
  2010-08-19  7:10     ` shiraz hashim
  0 siblings, 1 reply; 29+ messages in thread
From: Robert Schwebel @ 2010-08-19  6:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Aug 19, 2010 at 11:06:04AM +0530, shiraz hashim wrote:
> > Here is a little article:
> > http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
>
> Thanks for this link, it was quite informative. Can you also please
> share the detailed report of lmbench tests which you run on these
> platforms. For example, average cpu bandwidth figures (as in bw_mem
> test) are not giving information on significant impact of caches and
> DDR.

Please check the updated version of the article - the page you've linked
above contains a link at the top. The new version contains all the
commands which have been used.

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
  2010-08-19  6:28   ` Robert Schwebel
@ 2010-08-19  7:10     ` shiraz hashim
  0 siblings, 0 replies; 29+ messages in thread
From: shiraz hashim @ 2010-08-19  7:10 UTC (permalink / raw)
  To: linux-arm-kernel

Thanks Robert,

On Thu, Aug 19, 2010 at 11:58 AM, Robert Schwebel
<r.schwebel@pengutronix.de> wrote:
> Hi,
>
> On Thu, Aug 19, 2010 at 11:06:04AM +0530, shiraz hashim wrote:
>> > Here is a little article:
>> > http://www.pengutronix.de/development/kernel/arm-benchmarks-20100702_en.html
>>
>> Thanks for this link, it was quite informative. Can you also please
>> share the detailed report of lmbench tests which you run on these
>> platforms. For example, average cpu bandwidth figures (as in bw_mem
>> test) are not giving information on significant impact of caches and
>> DDR.
>
> Please check the updated version of the article - the page you've linked
> above contains a link at the top. The new version contains all the
> commands which have been used.

Now I see this, it is more clear.

-- 
regards
Shiraz Hashim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Some benchmarks on ARM
@ 2010-07-30 14:47 Tomasz Stanislawski
  0 siblings, 0 replies; 29+ messages in thread
From: Tomasz Stanislawski @ 2010-07-30 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 29, 2010 at 06:54:13PM +0200, Robert Schwebel wrote:
> Thanks to everyone who posted feedback!
> 
> An updated version of the article is now here:
> http://www.pengutronix.de/development/kernel/arm-benchmarks-20100729_en.html

Thank you for the interesting analysis. I have conducted your tests on S5PC110
(HummingBird). Cannot tell what the memory type is. Tests fall in line with
BeagleBoard results including clock scaling into account except for test 3
(context switch) which is three times as fast for 60% clock advantage.
There are also minor differences in memory related tests (bandwidth and fork).
All these appears to be conducted by memory subsystem.

LMBench was compiled with instruction:
make CC="arm-linux-gnueabi-gcc -O2 -march=armv7-a -mtune=cortex-a8
-mfloat-abi=softfp -mfpu=neon"
GCC was vanilla-4.4.1 compiler.

LMBench version: 3.0-a9

Environment:
Debian:~/lmbench/bin/i686-pc-linux-gnu# uname -a
Linux Debian 2.6.34-rc6 #2 PREEMPT Fri Jul 30 15:31:57 CEST 2010 armv7l GNU/Linux
Debian:~/lmbench/bin/i686-pc-linux-gnu# cat /proc/cpuinfo

Processor       : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 797.90
Features        : swp half thumb fastmult vfp edsp neon vfpv3
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc08
CPU revision    : 2

----------------------------------------------
Test 1
Debian:~/lmbench/bin/# lat_ops -W 100 -N 100
Result: 12.52
Comments: The latency of a single instruction is closely related to a depth of
CPU's pipeline and as such is not a reliable indicator of usable performance.
The mean performance of IPC is in closer relation with processor's speed.

----------------------------------------------
Test 2
Result: 340.92

----------------------------------------------
Test 3
Result: 11.24

----------------------------------------------
Test 4
Result: 6.8816

----------------------------------------------
Test 5
Result: 780.1429

Best regards,
Tomasz Stanislawski
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-08-19  7:10 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-02 18:02 Some benchmarks on ARM Robert Schwebel
2010-07-02 20:34 ` Magnus Lilja
2010-07-05 12:24   ` Marc Kleine-Budde
2010-07-05 14:00     ` Russell King - ARM Linux
2010-07-05 15:14       ` Måns Rullgård
2010-07-03  5:44 ` Nicolas Pitre
2010-07-05 13:04   ` Maurus Cuelenaere
2010-07-05 13:23     ` Robert Schwebel
2010-07-05 13:31       ` Mike Rapoport
2010-07-05 13:42         ` Robert Schwebel
2010-07-05 14:15           ` Nicolas Pitre
2010-07-06  5:36             ` Mike Rapoport
2010-07-05 13:53         ` Marek Vasut
2010-07-06 14:02       ` Pavel Machek
2010-07-03 19:48 ` Baruch Siach
2010-07-03 20:08 ` Gilles Chanteperdrix
2010-07-03 20:28   ` Russell King - ARM Linux
2010-07-04  9:47     ` Gilles Chanteperdrix
2010-07-05  8:51 ` Colin Tuckley
2010-07-05 12:29   ` Marc Kleine-Budde
2010-07-05 12:41   ` Marc Kleine-Budde
2010-07-05 12:45     ` Marc Kleine-Budde
2010-07-29 16:54 ` Robert Schwebel
2010-07-30 10:19   ` Richard Cochran
2010-07-30 11:40     ` Gilles Chanteperdrix
2010-08-19  5:36 ` shiraz hashim
2010-08-19  6:28   ` Robert Schwebel
2010-08-19  7:10     ` shiraz hashim
2010-07-30 14:47 Tomasz Stanislawski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.