linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* lib/GCD.c regression on arm
@ 2016-07-15 13:51 Cheah Kok Cheong
  2016-07-18 12:15 ` Jisheng Zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Cheah Kok Cheong @ 2016-07-15 13:51 UTC (permalink / raw)
  To: linux-arm-kernel

Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
replaced the Euclidean algorithm totally with the Binary algorithm.
Two variants were provided and selected via Kconfig depending on whether
a fast __ffs (find least significant set bit) instruction is available.

For arm v5 and above the fast __ffs version is used as evident in
arch/arm/mm/Kconfig.

I benchmarked the gcd performance using the code provided in the commit
with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
were used.

The performance with fast __ffs Binary algo is slower than the Euclidean
algo. Using the non ffs version [even/odd variant] gives a comparable
performance as the Euclidean algo.

Will be interesting to see whether this is also true for other platforms
with arm v5 and above? Hopefully others will do some testing.
If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
Kconfig.

Thanks.
Best Regards,
Cheah

cross compiled with '-O2'

Euclidean                 Binary with ffs           Binary no ffs


gcd -r 50000 -n 10        

gcd0: elapsed 25766       gcd0: elapsed 25766       gcd0: elapsed 25765
gcd1: elapsed 19994       gcd1: elapsed 20224       gcd1: elapsed 19843
gcd2: elapsed 20071       gcd2: elapsed 20533       gcd2: elapsed 20151
gcd3: elapsed 20070       gcd3: elapsed 20380       gcd3: elapsed 19919
gcd4: elapsed 20148       gcd4: elapsed 20610       gcd4: elapsed 20151
PASS                      PASS                      PASS
           
gcd0: elapsed 26690       gcd0: elapsed 26612       gcd0: elapsed 24381
gcd1: elapsed 20224       gcd1: elapsed 20379       gcd1: elapsed 19765
gcd2: elapsed 20224       gcd2: elapsed 20304       gcd2: elapsed 19842
gcd3: elapsed 20148       gcd3: elapsed 20302       gcd3: elapsed 19919
gcd4: elapsed 20301       gcd4: elapsed 20302       gcd4: elapsed 19919
PASS                      PASS                      PASS
                                         
gcd0: elapsed 25842       gcd0: elapsed 26459       gcd0: elapsed 25457
gcd1: elapsed 20454       gcd1: elapsed 20532       gcd1: elapsed 20225
gcd2: elapsed 20378       gcd2: elapsed 20762       gcd2: elapsed 20226
gcd3: elapsed 20378       gcd3: elapsed 20378       gcd3: elapsed 20148
gcd4: elapsed 20532       gcd4: elapsed 20918       gcd4: elapsed 20301
PASS                      PASS                      PASS


gcd -r 1000 -n 100
                                            
gcd0: elapsed 245873      gcd0: elapsed 252957      gcd0: elapsed 245571
gcd1: elapsed 191290      gcd1: elapsed 198345      gcd1: elapsed 192513
gcd2: elapsed 192672      gcd2: elapsed 199579      gcd2: elapsed 192978
gcd3: elapsed 191366      gcd3: elapsed 198728      gcd3: elapsed 192283
gcd4: elapsed 193134      gcd4: elapsed 200884      gcd4: elapsed 193669
PASS                      PASS                      PASS

gcd0: elapsed 245180      gcd0: elapsed 251113      gcd0: elapsed 250573
gcd1: elapsed 191755      gcd1: elapsed 196800      gcd1: elapsed 194729
gcd2: elapsed 192286      gcd2: elapsed 198654      gcd2: elapsed 195574
gcd3: elapsed 191601      gcd3: elapsed 197344      gcd3: elapsed 194965
gcd4: elapsed 193135      gcd4: elapsed 200268      gcd4: elapsed 197037
PASS                      PASS                      PASS

gcd0: elapsed 243412      gcd0: elapsed 252189      gcd0: elapsed 247876
gcd1: elapsed 190447      gcd1: elapsed 197192      gcd1: elapsed 193355
gcd2: elapsed 192288      gcd2: elapsed 199042      gcd2: elapsed 193437
gcd3: elapsed 190755      gcd3: elapsed 198957      gcd3: elapsed 193660
gcd4: elapsed 192672      gcd4: elapsed 200346      gcd4: elapsed 194586
PASS                      PASS                      PASS


gcd -n 1000

gcd0: elapsed 2636655     gcd0: elapsed 2701340     gcd0: elapsed 2622109
gcd1: elapsed 2055411     gcd1: elapsed 2153446     gcd1: elapsed 2053342
gcd2: elapsed 2064420     gcd2: elapsed 2162496     gcd2: elapsed 2066503
gcd3: elapsed 2055151     gcd3: elapsed 2163201     gcd3: elapsed 2055161
gcd4: elapsed 2071591     gcd4: elapsed 2171636     gcd4: elapsed 2074488
PASS                      PASS                      PASS

gcd0: elapsed 2636512     gcd0: elapsed 2719436     gcd0: elapsed 2613575
gcd1: elapsed 2060157     gcd1: elapsed 2159284     gcd1: elapsed 2046187
gcd2: elapsed 2069242     gcd2: elapsed 2163944     gcd2: elapsed 2056430
gcd3: elapsed 2060436     gcd3: elapsed 2166796     gcd3: elapsed 2046933
gcd4: elapsed 2074188     gcd4: elapsed 2176243     gcd4: elapsed 2065170
PASS                      PASS                      PASS

gcd0: elapsed 2614949     gcd0: elapsed 2708342     gcd0: elapsed 2632962
gcd1: elapsed 2044957     gcd1: elapsed 2157985     gcd1: elapsed 2055475
gcd2: elapsed 2054496     gcd2: elapsed 2170720     gcd2: elapsed 2068926
gcd3: elapsed 2044838     gcd3: elapsed 2167954     gcd3: elapsed 2055305
gcd4: elapsed 2059033     gcd4: elapsed 2176002     gcd4: elapsed 2079856
PASS                      PASS                      PASS

^ permalink raw reply	[flat|nested] 3+ messages in thread

* lib/GCD.c regression on arm
  2016-07-15 13:51 lib/GCD.c regression on arm Cheah Kok Cheong
@ 2016-07-18 12:15 ` Jisheng Zhang
  2016-07-19  6:52   ` Cheah Kok Cheong
  0 siblings, 1 reply; 3+ messages in thread
From: Jisheng Zhang @ 2016-07-18 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Cheah,

On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote:

> Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
> replaced the Euclidean algorithm totally with the Binary algorithm.
> Two variants were provided and selected via Kconfig depending on whether
> a fast __ffs (find least significant set bit) instruction is available.
> 
> For arm v5 and above the fast __ffs version is used as evident in
> arch/arm/mm/Kconfig.
> 
> I benchmarked the gcd performance using the code provided in the commit
> with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
> were used.
> 
> The performance with fast __ffs Binary algo is slower than the Euclidean
> algo. Using the non ffs version [even/odd variant] gives a comparable
> performance as the Euclidean algo.

Interesting, using the code in the commit, I get the following result
on A CA53 platform

build with aarch64 toolchain, -O2 -mcpu=cortex-a53

~ # /a53 -r 500000 -n 10
gcd0: elapsed 10170
gcd1: elapsed 11340
gcd2: elapsed 13590
gcd3: elapsed 11700
gcd4: elapsed 14230
PASS

build with armhf toolchain, -O2 -mcpu=cortex-a53

~ # /a53_32 -r 500000 -n 10
gcd0: elapsed 9490
gcd1: elapsed 10220
gcd2: elapsed 10790
gcd3: elapsed 10270
gcd4: elapsed 10850
PASS


> 
> Will be interesting to see whether this is also true for other platforms
> with arm v5 and above? Hopefully others will do some testing.
> If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
> Kconfig.
> 
> Thanks.
> Best Regards,
> Cheah
> 
> cross compiled with '-O2'
> 
> Euclidean                 Binary with ffs           Binary no ffs
> 
> 
> gcd -r 50000 -n 10        
> 
> gcd0: elapsed 25766       gcd0: elapsed 25766       gcd0: elapsed 25765
> gcd1: elapsed 19994       gcd1: elapsed 20224       gcd1: elapsed 19843
> gcd2: elapsed 20071       gcd2: elapsed 20533       gcd2: elapsed 20151
> gcd3: elapsed 20070       gcd3: elapsed 20380       gcd3: elapsed 19919
> gcd4: elapsed 20148       gcd4: elapsed 20610       gcd4: elapsed 20151
> PASS                      PASS                      PASS
>            
> gcd0: elapsed 26690       gcd0: elapsed 26612       gcd0: elapsed 24381
> gcd1: elapsed 20224       gcd1: elapsed 20379       gcd1: elapsed 19765
> gcd2: elapsed 20224       gcd2: elapsed 20304       gcd2: elapsed 19842
> gcd3: elapsed 20148       gcd3: elapsed 20302       gcd3: elapsed 19919
> gcd4: elapsed 20301       gcd4: elapsed 20302       gcd4: elapsed 19919
> PASS                      PASS                      PASS
>                                          
> gcd0: elapsed 25842       gcd0: elapsed 26459       gcd0: elapsed 25457
> gcd1: elapsed 20454       gcd1: elapsed 20532       gcd1: elapsed 20225
> gcd2: elapsed 20378       gcd2: elapsed 20762       gcd2: elapsed 20226
> gcd3: elapsed 20378       gcd3: elapsed 20378       gcd3: elapsed 20148
> gcd4: elapsed 20532       gcd4: elapsed 20918       gcd4: elapsed 20301
> PASS                      PASS                      PASS
> 
> 
> gcd -r 1000 -n 100
>                                             
> gcd0: elapsed 245873      gcd0: elapsed 252957      gcd0: elapsed 245571
> gcd1: elapsed 191290      gcd1: elapsed 198345      gcd1: elapsed 192513
> gcd2: elapsed 192672      gcd2: elapsed 199579      gcd2: elapsed 192978
> gcd3: elapsed 191366      gcd3: elapsed 198728      gcd3: elapsed 192283
> gcd4: elapsed 193134      gcd4: elapsed 200884      gcd4: elapsed 193669
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 245180      gcd0: elapsed 251113      gcd0: elapsed 250573
> gcd1: elapsed 191755      gcd1: elapsed 196800      gcd1: elapsed 194729
> gcd2: elapsed 192286      gcd2: elapsed 198654      gcd2: elapsed 195574
> gcd3: elapsed 191601      gcd3: elapsed 197344      gcd3: elapsed 194965
> gcd4: elapsed 193135      gcd4: elapsed 200268      gcd4: elapsed 197037
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 243412      gcd0: elapsed 252189      gcd0: elapsed 247876
> gcd1: elapsed 190447      gcd1: elapsed 197192      gcd1: elapsed 193355
> gcd2: elapsed 192288      gcd2: elapsed 199042      gcd2: elapsed 193437
> gcd3: elapsed 190755      gcd3: elapsed 198957      gcd3: elapsed 193660
> gcd4: elapsed 192672      gcd4: elapsed 200346      gcd4: elapsed 194586
> PASS                      PASS                      PASS
> 
> 
> gcd -n 1000
> 
> gcd0: elapsed 2636655     gcd0: elapsed 2701340     gcd0: elapsed 2622109
> gcd1: elapsed 2055411     gcd1: elapsed 2153446     gcd1: elapsed 2053342
> gcd2: elapsed 2064420     gcd2: elapsed 2162496     gcd2: elapsed 2066503
> gcd3: elapsed 2055151     gcd3: elapsed 2163201     gcd3: elapsed 2055161
> gcd4: elapsed 2071591     gcd4: elapsed 2171636     gcd4: elapsed 2074488
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 2636512     gcd0: elapsed 2719436     gcd0: elapsed 2613575
> gcd1: elapsed 2060157     gcd1: elapsed 2159284     gcd1: elapsed 2046187
> gcd2: elapsed 2069242     gcd2: elapsed 2163944     gcd2: elapsed 2056430
> gcd3: elapsed 2060436     gcd3: elapsed 2166796     gcd3: elapsed 2046933
> gcd4: elapsed 2074188     gcd4: elapsed 2176243     gcd4: elapsed 2065170
> PASS                      PASS                      PASS
> 
> gcd0: elapsed 2614949     gcd0: elapsed 2708342     gcd0: elapsed 2632962
> gcd1: elapsed 2044957     gcd1: elapsed 2157985     gcd1: elapsed 2055475
> gcd2: elapsed 2054496     gcd2: elapsed 2170720     gcd2: elapsed 2068926
> gcd3: elapsed 2044838     gcd3: elapsed 2167954     gcd3: elapsed 2055305
> gcd4: elapsed 2059033     gcd4: elapsed 2176002     gcd4: elapsed 2079856
> PASS                      PASS                      PASS
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* lib/GCD.c regression on arm
  2016-07-18 12:15 ` Jisheng Zhang
@ 2016-07-19  6:52   ` Cheah Kok Cheong
  0 siblings, 0 replies; 3+ messages in thread
From: Cheah Kok Cheong @ 2016-07-19  6:52 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Jisheng,
 Looks like you have found another kind of problem with arm64.
That's a big hit in 64bit.

On Mon, Jul 18, 2016 at 08:15:49PM +0800, Jisheng Zhang wrote:
> Dear Cheah,
> 
> Interesting, using the code in the commit, I get the following result
> on A CA53 platform
> 
> build with aarch64 toolchain, -O2 -mcpu=cortex-a53
> 
> ~ # /a53 -r 500000 -n 10
> gcd0: elapsed 10170
> gcd1: elapsed 11340
> gcd2: elapsed 13590
> gcd3: elapsed 11700
> gcd4: elapsed 14230
> PASS
> 
> build with armhf toolchain, -O2 -mcpu=cortex-a53
> 
> ~ # /a53_32 -r 500000 -n 10
> gcd0: elapsed 9490
> gcd1: elapsed 10220
> gcd2: elapsed 10790
> gcd3: elapsed 10270
> gcd4: elapsed 10850
> PASS
> 

> On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote:
> 
> > Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
> > replaced the Euclidean algorithm totally with the Binary algorithm.
> > Two variants were provided and selected via Kconfig depending on whether
> > a fast __ffs (find least significant set bit) instruction is available.
> > 
> > For arm v5 and above the fast __ffs version is used as evident in
> > arch/arm/mm/Kconfig.
> > 
> > I benchmarked the gcd performance using the code provided in the commit
> > with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
> > were used.
> > 
> > The performance with fast __ffs Binary algo is slower than the Euclidean
> > algo. Using the non ffs version [even/odd variant] gives a comparable
> > performance as the Euclidean algo.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-19  6:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-15 13:51 lib/GCD.c regression on arm Cheah Kok Cheong
2016-07-18 12:15 ` Jisheng Zhang
2016-07-19  6:52   ` Cheah Kok Cheong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).