* lib/GCD.c regression on arm
@ 2016-07-15 13:51 Cheah Kok Cheong
2016-07-18 12:15 ` Jisheng Zhang
0 siblings, 1 reply; 3+ messages in thread
From: Cheah Kok Cheong @ 2016-07-15 13:51 UTC (permalink / raw)
To: linux-arm-kernel
Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
replaced the Euclidean algorithm totally with the Binary algorithm.
Two variants were provided and selected via Kconfig depending on whether
a fast __ffs (find least significant set bit) instruction is available.
For arm v5 and above the fast __ffs version is used as evident in
arch/arm/mm/Kconfig.
I benchmarked the gcd performance using the code provided in the commit
with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
were used.
The performance with fast __ffs Binary algo is slower than the Euclidean
algo. Using the non ffs version [even/odd variant] gives a comparable
performance as the Euclidean algo.
Will be interesting to see whether this is also true for other platforms
with arm v5 and above? Hopefully others will do some testing.
If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
Kconfig.
Thanks.
Best Regards,
Cheah
cross compiled with '-O2'
Euclidean Binary with ffs Binary no ffs
gcd -r 50000 -n 10
gcd0: elapsed 25766 gcd0: elapsed 25766 gcd0: elapsed 25765
gcd1: elapsed 19994 gcd1: elapsed 20224 gcd1: elapsed 19843
gcd2: elapsed 20071 gcd2: elapsed 20533 gcd2: elapsed 20151
gcd3: elapsed 20070 gcd3: elapsed 20380 gcd3: elapsed 19919
gcd4: elapsed 20148 gcd4: elapsed 20610 gcd4: elapsed 20151
PASS PASS PASS
gcd0: elapsed 26690 gcd0: elapsed 26612 gcd0: elapsed 24381
gcd1: elapsed 20224 gcd1: elapsed 20379 gcd1: elapsed 19765
gcd2: elapsed 20224 gcd2: elapsed 20304 gcd2: elapsed 19842
gcd3: elapsed 20148 gcd3: elapsed 20302 gcd3: elapsed 19919
gcd4: elapsed 20301 gcd4: elapsed 20302 gcd4: elapsed 19919
PASS PASS PASS
gcd0: elapsed 25842 gcd0: elapsed 26459 gcd0: elapsed 25457
gcd1: elapsed 20454 gcd1: elapsed 20532 gcd1: elapsed 20225
gcd2: elapsed 20378 gcd2: elapsed 20762 gcd2: elapsed 20226
gcd3: elapsed 20378 gcd3: elapsed 20378 gcd3: elapsed 20148
gcd4: elapsed 20532 gcd4: elapsed 20918 gcd4: elapsed 20301
PASS PASS PASS
gcd -r 1000 -n 100
gcd0: elapsed 245873 gcd0: elapsed 252957 gcd0: elapsed 245571
gcd1: elapsed 191290 gcd1: elapsed 198345 gcd1: elapsed 192513
gcd2: elapsed 192672 gcd2: elapsed 199579 gcd2: elapsed 192978
gcd3: elapsed 191366 gcd3: elapsed 198728 gcd3: elapsed 192283
gcd4: elapsed 193134 gcd4: elapsed 200884 gcd4: elapsed 193669
PASS PASS PASS
gcd0: elapsed 245180 gcd0: elapsed 251113 gcd0: elapsed 250573
gcd1: elapsed 191755 gcd1: elapsed 196800 gcd1: elapsed 194729
gcd2: elapsed 192286 gcd2: elapsed 198654 gcd2: elapsed 195574
gcd3: elapsed 191601 gcd3: elapsed 197344 gcd3: elapsed 194965
gcd4: elapsed 193135 gcd4: elapsed 200268 gcd4: elapsed 197037
PASS PASS PASS
gcd0: elapsed 243412 gcd0: elapsed 252189 gcd0: elapsed 247876
gcd1: elapsed 190447 gcd1: elapsed 197192 gcd1: elapsed 193355
gcd2: elapsed 192288 gcd2: elapsed 199042 gcd2: elapsed 193437
gcd3: elapsed 190755 gcd3: elapsed 198957 gcd3: elapsed 193660
gcd4: elapsed 192672 gcd4: elapsed 200346 gcd4: elapsed 194586
PASS PASS PASS
gcd -n 1000
gcd0: elapsed 2636655 gcd0: elapsed 2701340 gcd0: elapsed 2622109
gcd1: elapsed 2055411 gcd1: elapsed 2153446 gcd1: elapsed 2053342
gcd2: elapsed 2064420 gcd2: elapsed 2162496 gcd2: elapsed 2066503
gcd3: elapsed 2055151 gcd3: elapsed 2163201 gcd3: elapsed 2055161
gcd4: elapsed 2071591 gcd4: elapsed 2171636 gcd4: elapsed 2074488
PASS PASS PASS
gcd0: elapsed 2636512 gcd0: elapsed 2719436 gcd0: elapsed 2613575
gcd1: elapsed 2060157 gcd1: elapsed 2159284 gcd1: elapsed 2046187
gcd2: elapsed 2069242 gcd2: elapsed 2163944 gcd2: elapsed 2056430
gcd3: elapsed 2060436 gcd3: elapsed 2166796 gcd3: elapsed 2046933
gcd4: elapsed 2074188 gcd4: elapsed 2176243 gcd4: elapsed 2065170
PASS PASS PASS
gcd0: elapsed 2614949 gcd0: elapsed 2708342 gcd0: elapsed 2632962
gcd1: elapsed 2044957 gcd1: elapsed 2157985 gcd1: elapsed 2055475
gcd2: elapsed 2054496 gcd2: elapsed 2170720 gcd2: elapsed 2068926
gcd3: elapsed 2044838 gcd3: elapsed 2167954 gcd3: elapsed 2055305
gcd4: elapsed 2059033 gcd4: elapsed 2176002 gcd4: elapsed 2079856
PASS PASS PASS
^ permalink raw reply [flat|nested] 3+ messages in thread
* lib/GCD.c regression on arm
2016-07-15 13:51 lib/GCD.c regression on arm Cheah Kok Cheong
@ 2016-07-18 12:15 ` Jisheng Zhang
2016-07-19 6:52 ` Cheah Kok Cheong
0 siblings, 1 reply; 3+ messages in thread
From: Jisheng Zhang @ 2016-07-18 12:15 UTC (permalink / raw)
To: linux-arm-kernel
Dear Cheah,
On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote:
> Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
> replaced the Euclidean algorithm totally with the Binary algorithm.
> Two variants were provided and selected via Kconfig depending on whether
> a fast __ffs (find least significant set bit) instruction is available.
>
> For arm v5 and above the fast __ffs version is used as evident in
> arch/arm/mm/Kconfig.
>
> I benchmarked the gcd performance using the code provided in the commit
> with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
> were used.
>
> The performance with fast __ffs Binary algo is slower than the Euclidean
> algo. Using the non ffs version [even/odd variant] gives a comparable
> performance as the Euclidean algo.
Interesting, using the code in the commit, I get the following result
on A CA53 platform
build with aarch64 toolchain, -O2 -mcpu=cortex-a53
~ # /a53 -r 500000 -n 10
gcd0: elapsed 10170
gcd1: elapsed 11340
gcd2: elapsed 13590
gcd3: elapsed 11700
gcd4: elapsed 14230
PASS
build with armhf toolchain, -O2 -mcpu=cortex-a53
~ # /a53_32 -r 500000 -n 10
gcd0: elapsed 9490
gcd1: elapsed 10220
gcd2: elapsed 10790
gcd3: elapsed 10270
gcd4: elapsed 10850
PASS
>
> Will be interesting to see whether this is also true for other platforms
> with arm v5 and above? Hopefully others will do some testing.
> If this is the case then we should "select CPU_NO_EFFICIENT_FFS" in our
> Kconfig.
>
> Thanks.
> Best Regards,
> Cheah
>
> cross compiled with '-O2'
>
> Euclidean Binary with ffs Binary no ffs
>
>
> gcd -r 50000 -n 10
>
> gcd0: elapsed 25766 gcd0: elapsed 25766 gcd0: elapsed 25765
> gcd1: elapsed 19994 gcd1: elapsed 20224 gcd1: elapsed 19843
> gcd2: elapsed 20071 gcd2: elapsed 20533 gcd2: elapsed 20151
> gcd3: elapsed 20070 gcd3: elapsed 20380 gcd3: elapsed 19919
> gcd4: elapsed 20148 gcd4: elapsed 20610 gcd4: elapsed 20151
> PASS PASS PASS
>
> gcd0: elapsed 26690 gcd0: elapsed 26612 gcd0: elapsed 24381
> gcd1: elapsed 20224 gcd1: elapsed 20379 gcd1: elapsed 19765
> gcd2: elapsed 20224 gcd2: elapsed 20304 gcd2: elapsed 19842
> gcd3: elapsed 20148 gcd3: elapsed 20302 gcd3: elapsed 19919
> gcd4: elapsed 20301 gcd4: elapsed 20302 gcd4: elapsed 19919
> PASS PASS PASS
>
> gcd0: elapsed 25842 gcd0: elapsed 26459 gcd0: elapsed 25457
> gcd1: elapsed 20454 gcd1: elapsed 20532 gcd1: elapsed 20225
> gcd2: elapsed 20378 gcd2: elapsed 20762 gcd2: elapsed 20226
> gcd3: elapsed 20378 gcd3: elapsed 20378 gcd3: elapsed 20148
> gcd4: elapsed 20532 gcd4: elapsed 20918 gcd4: elapsed 20301
> PASS PASS PASS
>
>
> gcd -r 1000 -n 100
>
> gcd0: elapsed 245873 gcd0: elapsed 252957 gcd0: elapsed 245571
> gcd1: elapsed 191290 gcd1: elapsed 198345 gcd1: elapsed 192513
> gcd2: elapsed 192672 gcd2: elapsed 199579 gcd2: elapsed 192978
> gcd3: elapsed 191366 gcd3: elapsed 198728 gcd3: elapsed 192283
> gcd4: elapsed 193134 gcd4: elapsed 200884 gcd4: elapsed 193669
> PASS PASS PASS
>
> gcd0: elapsed 245180 gcd0: elapsed 251113 gcd0: elapsed 250573
> gcd1: elapsed 191755 gcd1: elapsed 196800 gcd1: elapsed 194729
> gcd2: elapsed 192286 gcd2: elapsed 198654 gcd2: elapsed 195574
> gcd3: elapsed 191601 gcd3: elapsed 197344 gcd3: elapsed 194965
> gcd4: elapsed 193135 gcd4: elapsed 200268 gcd4: elapsed 197037
> PASS PASS PASS
>
> gcd0: elapsed 243412 gcd0: elapsed 252189 gcd0: elapsed 247876
> gcd1: elapsed 190447 gcd1: elapsed 197192 gcd1: elapsed 193355
> gcd2: elapsed 192288 gcd2: elapsed 199042 gcd2: elapsed 193437
> gcd3: elapsed 190755 gcd3: elapsed 198957 gcd3: elapsed 193660
> gcd4: elapsed 192672 gcd4: elapsed 200346 gcd4: elapsed 194586
> PASS PASS PASS
>
>
> gcd -n 1000
>
> gcd0: elapsed 2636655 gcd0: elapsed 2701340 gcd0: elapsed 2622109
> gcd1: elapsed 2055411 gcd1: elapsed 2153446 gcd1: elapsed 2053342
> gcd2: elapsed 2064420 gcd2: elapsed 2162496 gcd2: elapsed 2066503
> gcd3: elapsed 2055151 gcd3: elapsed 2163201 gcd3: elapsed 2055161
> gcd4: elapsed 2071591 gcd4: elapsed 2171636 gcd4: elapsed 2074488
> PASS PASS PASS
>
> gcd0: elapsed 2636512 gcd0: elapsed 2719436 gcd0: elapsed 2613575
> gcd1: elapsed 2060157 gcd1: elapsed 2159284 gcd1: elapsed 2046187
> gcd2: elapsed 2069242 gcd2: elapsed 2163944 gcd2: elapsed 2056430
> gcd3: elapsed 2060436 gcd3: elapsed 2166796 gcd3: elapsed 2046933
> gcd4: elapsed 2074188 gcd4: elapsed 2176243 gcd4: elapsed 2065170
> PASS PASS PASS
>
> gcd0: elapsed 2614949 gcd0: elapsed 2708342 gcd0: elapsed 2632962
> gcd1: elapsed 2044957 gcd1: elapsed 2157985 gcd1: elapsed 2055475
> gcd2: elapsed 2054496 gcd2: elapsed 2170720 gcd2: elapsed 2068926
> gcd3: elapsed 2044838 gcd3: elapsed 2167954 gcd3: elapsed 2055305
> gcd4: elapsed 2059033 gcd4: elapsed 2176002 gcd4: elapsed 2079856
> PASS PASS PASS
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 3+ messages in thread
* lib/GCD.c regression on arm
2016-07-18 12:15 ` Jisheng Zhang
@ 2016-07-19 6:52 ` Cheah Kok Cheong
0 siblings, 0 replies; 3+ messages in thread
From: Cheah Kok Cheong @ 2016-07-19 6:52 UTC (permalink / raw)
To: linux-arm-kernel
Dear Jisheng,
Looks like you have found another kind of problem with arm64.
That's a big hit in 64bit.
On Mon, Jul 18, 2016 at 08:15:49PM +0800, Jisheng Zhang wrote:
> Dear Cheah,
>
> Interesting, using the code in the commit, I get the following result
> on A CA53 platform
>
> build with aarch64 toolchain, -O2 -mcpu=cortex-a53
>
> ~ # /a53 -r 500000 -n 10
> gcd0: elapsed 10170
> gcd1: elapsed 11340
> gcd2: elapsed 13590
> gcd3: elapsed 11700
> gcd4: elapsed 14230
> PASS
>
> build with armhf toolchain, -O2 -mcpu=cortex-a53
>
> ~ # /a53_32 -r 500000 -n 10
> gcd0: elapsed 9490
> gcd1: elapsed 10220
> gcd2: elapsed 10790
> gcd3: elapsed 10270
> gcd4: elapsed 10850
> PASS
>
> On Fri, 15 Jul 2016 21:51:10 +0800 Cheah Kok Cheong wrote:
>
> > Commit fff7fb0b2d90 ("lib/GCD.c: use binary GCD algorithm instead of Euclidean")
> > replaced the Euclidean algorithm totally with the Binary algorithm.
> > Two variants were provided and selected via Kconfig depending on whether
> > a fast __ffs (find least significant set bit) instruction is available.
> >
> > For arm v5 and above the fast __ffs version is used as evident in
> > arch/arm/mm/Kconfig.
> >
> > I benchmarked the gcd performance using the code provided in the commit
> > with a Cortex-A9 based Mediatek MT6577. Three runs at different settings
> > were used.
> >
> > The performance with fast __ffs Binary algo is slower than the Euclidean
> > algo. Using the non ffs version [even/odd variant] gives a comparable
> > performance as the Euclidean algo.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-07-19 6:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-15 13:51 lib/GCD.c regression on arm Cheah Kok Cheong
2016-07-18 12:15 ` Jisheng Zhang
2016-07-19 6:52 ` Cheah Kok Cheong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).