[Xenomai] Building with hard float: cannot open shared object file libpthread

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
@ 2015-03-17 16:19 Steve B
  2015-03-17 18:06 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 32+ messages in thread
From: Steve B @ 2015-03-17 16:19 UTC (permalink / raw)
  To: Xenomai

Hello all,

It has come to my attention that when compiling with arm-linux-gnueabi-gcc
my application is using a libm that seems to be doing soft floating point
routines when I really want to be using hard float.
I built a simple test application for now that does just one math operation
and measures the amount of time taken.

Switching my compiler to arm-linux-gnueabihf and adding the -mhard-float
flag, I get this error immediately at runtime:
error while loading shared libraries: libpthread_rt.so.1: cannot open
shared object file: No such file or directory

If I take away the -mhard-float flag, the program runs but the one math
operation (with a corner case argument that I found to take a long time to
compute) takes about 3000 microseconds to run.
If I put the -mhard-float flag back in and take away Xenomai and run as a
regular Linux application, the same operation takes around 60 microseconds.

I also tried building the Xenomai on my target system with the
arm-linux-gnueabihf compiler and the -mhard-float flag at configure time,
but this didn't fix the problem.
Has anybody run across this before?

Thanks,

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 16:19 [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1 Steve B
@ 2015-03-17 18:06 ` Gilles Chanteperdrix
  2015-03-17 18:33   ` Steve B
  0 siblings, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-17 18:06 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Tue, Mar 17, 2015 at 09:19:29AM -0700, Steve B wrote:
> Hello all,
> 
> It has come to my attention that when compiling with arm-linux-gnueabi-gcc
> my application is using a libm that seems to be doing soft floating point
> routines when I really want to be using hard float.
> I built a simple test application for now that does just one math operation
> and measures the amount of time taken.
> 
> Switching my compiler to arm-linux-gnueabihf and adding the -mhard-float
> flag, I get this error immediately at runtime:
> error while loading shared libraries: libpthread_rt.so.1: cannot open
> shared object file: No such file or directory
> 
> If I take away the -mhard-float flag, the program runs but the one math
> operation (with a corner case argument that I found to take a long time to
> compute) takes about 3000 microseconds to run.
> If I put the -mhard-float flag back in and take away Xenomai and run as a
> regular Linux application, the same operation takes around 60 microseconds.
> 
> I also tried building the Xenomai on my target system with the
> arm-linux-gnueabihf compiler and the -mhard-float flag at configure time,
> but this didn't fix the problem.
> Has anybody run across this before?

The problem is probably that the libc/libm you are using is not
using hard floats, the loader will refuse to load a program using
hard floats with a soft float library. An intermediate solution may
be to use hardware floating point with soft float ABI, by passing
-mfloat-abi=softfp to gcc.

You do not really get to choose the floating points you can use with
a given toolchain, the possibilities are fixed when the toolchain is
compiled.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 18:06 ` Gilles Chanteperdrix
@ 2015-03-17 18:33   ` Steve B
  2015-03-17 18:38     ` Gilles Chanteperdrix
  2015-03-17 19:24     ` Lennart Sorensen
  0 siblings, 2 replies; 32+ messages in thread
From: Steve B @ 2015-03-17 18:33 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Tue, Mar 17, 2015 at 11:06 AM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> On Tue, Mar 17, 2015 at 09:19:29AM -0700, Steve B wrote:
> > Hello all,
> >
> > It has come to my attention that when compiling with
> arm-linux-gnueabi-gcc
> > my application is using a libm that seems to be doing soft floating point
> > routines when I really want to be using hard float.
> > I built a simple test application for now that does just one math
> operation
> > and measures the amount of time taken.
> >
> > Switching my compiler to arm-linux-gnueabihf and adding the -mhard-float
> > flag, I get this error immediately at runtime:
> > error while loading shared libraries: libpthread_rt.so.1: cannot open
> > shared object file: No such file or directory
> >
> > If I take away the -mhard-float flag, the program runs but the one math
> > operation (with a corner case argument that I found to take a long time
> to
> > compute) takes about 3000 microseconds to run.
> > If I put the -mhard-float flag back in and take away Xenomai and run as a
> > regular Linux application, the same operation takes around 60
> microseconds.
> >
> > I also tried building the Xenomai on my target system with the
> > arm-linux-gnueabihf compiler and the -mhard-float flag at configure time,
> > but this didn't fix the problem.
> > Has anybody run across this before?
>
> The problem is probably that the libc/libm you are using is not
> using hard floats, the loader will refuse to load a program using
> hard floats with a soft float library. An intermediate solution may
> be to use hardware floating point with soft float ABI, by passing
> -mfloat-abi=softfp to gcc.
>
> You do not really get to choose the floating points you can use with
> a given toolchain, the possibilities are fixed when the toolchain is
> compiled.
>
> --
>                                             Gilles.
>

Thanks! This actually doesn't work at compile time for some reason. The ld
step complains that the output binary uses VFP register arguments while my
main.o file does not... does anybody have any hints on this?

If I do mfloat-abi=hard I can compile and run but my math operation still
takes too long, which is counter-intuitive to me...

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 18:33   ` Steve B
@ 2015-03-17 18:38     ` Gilles Chanteperdrix
       [not found]       ` <CAEMXjGzZn3JWCsxAkC+dFL0tLWk_FZpsNzB=YkSHYzCS2QEKmA@mail.gmail.com>
  2015-03-17 19:24     ` Lennart Sorensen
  1 sibling, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-17 18:38 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Tue, Mar 17, 2015 at 11:33:58AM -0700, Steve B wrote:
> On Tue, Mar 17, 2015 at 11:06 AM, Gilles Chanteperdrix <
> gilles.chanteperdrix@xenomai.org> wrote:
> 
> > On Tue, Mar 17, 2015 at 09:19:29AM -0700, Steve B wrote:
> > > Hello all,
> > >
> > > It has come to my attention that when compiling with
> > arm-linux-gnueabi-gcc
> > > my application is using a libm that seems to be doing soft floating point
> > > routines when I really want to be using hard float.
> > > I built a simple test application for now that does just one math
> > operation
> > > and measures the amount of time taken.
> > >
> > > Switching my compiler to arm-linux-gnueabihf and adding the -mhard-float
> > > flag, I get this error immediately at runtime:
> > > error while loading shared libraries: libpthread_rt.so.1: cannot open
> > > shared object file: No such file or directory
> > >
> > > If I take away the -mhard-float flag, the program runs but the one math
> > > operation (with a corner case argument that I found to take a long time
> > to
> > > compute) takes about 3000 microseconds to run.
> > > If I put the -mhard-float flag back in and take away Xenomai and run as a
> > > regular Linux application, the same operation takes around 60
> > microseconds.
> > >
> > > I also tried building the Xenomai on my target system with the
> > > arm-linux-gnueabihf compiler and the -mhard-float flag at configure time,
> > > but this didn't fix the problem.
> > > Has anybody run across this before?
> >
> > The problem is probably that the libc/libm you are using is not
> > using hard floats, the loader will refuse to load a program using
> > hard floats with a soft float library. An intermediate solution may
> > be to use hardware floating point with soft float ABI, by passing
> > -mfloat-abi=softfp to gcc.
> >
> > You do not really get to choose the floating points you can use with
> > a given toolchain, the possibilities are fixed when the toolchain is
> > compiled.
> >
> > --
> >                                             Gilles.
> >
> 
> Thanks! This actually doesn't work at compile time for some reason. The ld
> step complains that the output binary uses VFP register arguments while my
> main.o file does not... does anybody have any hints on this?
> 
> If I do mfloat-abi=hard I can compile and run but my math operation still
> takes too long, which is counter-intuitive to me...

As I said, you do not really get to choose any option with a given
toolchain. A toolchain, and the libraries it contains support only one
floating point type (or two, if it is using soft floating point
ABI), not the three possibilities.

What is the operation that takes long ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
       [not found]       ` <CAEMXjGzZn3JWCsxAkC+dFL0tLWk_FZpsNzB=YkSHYzCS2QEKmA@mail.gmail.com>
@ 2015-03-17 19:18         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-17 19:18 UTC (permalink / raw)
  To: Steve B

On Tue, Mar 17, 2015 at 12:13:07PM -0700, Steve B wrote:
> On Tue, Mar 17, 2015 at 11:38 AM, Gilles Chanteperdrix <
> gilles.chanteperdrix@xenomai.org> wrote:
> 
> > On Tue, Mar 17, 2015 at 11:33:58AM -0700, Steve B wrote:
> > > On Tue, Mar 17, 2015 at 11:06 AM, Gilles Chanteperdrix <
> > > gilles.chanteperdrix@xenomai.org> wrote:
> > >
> > > > On Tue, Mar 17, 2015 at 09:19:29AM -0700, Steve B wrote:
> > > > > Hello all,
> > > > >
> > > > > It has come to my attention that when compiling with
> > > > arm-linux-gnueabi-gcc
> > > > > my application is using a libm that seems to be doing soft floating
> > point
> > > > > routines when I really want to be using hard float.
> > > > > I built a simple test application for now that does just one math
> > > > operation
> > > > > and measures the amount of time taken.
> > > > >
> > > > > Switching my compiler to arm-linux-gnueabihf and adding the
> > -mhard-float
> > > > > flag, I get this error immediately at runtime:
> > > > > error while loading shared libraries: libpthread_rt.so.1: cannot open
> > > > > shared object file: No such file or directory
> > > > >
> > > > > If I take away the -mhard-float flag, the program runs but the one
> > math
> > > > > operation (with a corner case argument that I found to take a long
> > time
> > > > to
> > > > > compute) takes about 3000 microseconds to run.
> > > > > If I put the -mhard-float flag back in and take away Xenomai and run
> > as a
> > > > > regular Linux application, the same operation takes around 60
> > > > microseconds.
> > > > >
> > > > > I also tried building the Xenomai on my target system with the
> > > > > arm-linux-gnueabihf compiler and the -mhard-float flag at configure
> > time,
> > > > > but this didn't fix the problem.
> > > > > Has anybody run across this before?
> > > >
> > > > The problem is probably that the libc/libm you are using is not
> > > > using hard floats, the loader will refuse to load a program using
> > > > hard floats with a soft float library. An intermediate solution may
> > > > be to use hardware floating point with soft float ABI, by passing
> > > > -mfloat-abi=softfp to gcc.
> > > >
> > > > You do not really get to choose the floating points you can use with
> > > > a given toolchain, the possibilities are fixed when the toolchain is
> > > > compiled.
> > > >
> > > > --
> > > >                                             Gilles.
> > > >
> > >
> > > Thanks! This actually doesn't work at compile time for some reason. The
> > ld
> > > step complains that the output binary uses VFP register arguments while
> > my
> > > main.o file does not... does anybody have any hints on this?
> > >
> > > If I do mfloat-abi=hard I can compile and run but my math operation still
> > > takes too long, which is counter-intuitive to me...
> >
> > As I said, you do not really get to choose any option with a given
> > toolchain. A toolchain, and the libraries it contains support only one
> > floating point type (or two, if it is using soft floating point
> > ABI), not the three possibilities.
> >
> > What is the operation that takes long ?
> >
> > --
> >                                             Gilles.
> >
> 
> It's a pow() function. I found in my actual application there are some
> times when it takes several hundred microseconds, so I captured some of the
> arguments causing it to take so long and built a quick test program to just
> call it once with those arguments.
> I will keep at it, I'm sure there must be a solution somewhere...

Sorry to insist, but, to change the type of floating points, you
need to change toolchain.

There are three ways for floating points:
1- soft floating point
2- hard floating point with soft ABI (float function arguments passed
in integer registers)
3- hard floating point with hard ABI.

The compiler can generally generate the three types, but the
libraries (libc, libm) shipped with a compiler are of one type.
Generally, binaries of one type can not be linked with those of
another type, except for types 1 and 2.

Now, what I did not get yet is: what type are your libraries ? Is
this the soft float pow or the hard float that is slow ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 18:33   ` Steve B
  2015-03-17 18:38     ` Gilles Chanteperdrix
@ 2015-03-17 19:24     ` Lennart Sorensen
  2015-03-17 19:57       ` Steve B
  1 sibling, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-17 19:24 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Tue, Mar 17, 2015 at 11:33:58AM -0700, Steve B wrote:
> Thanks! This actually doesn't work at compile time for some reason. The ld
> step complains that the output binary uses VFP register arguments while my
> main.o file does not... does anybody have any hints on this?
> 
> If I do mfloat-abi=hard I can compile and run but my math operation still
> takes too long, which is counter-intuitive to me...

Your entire system (all libs you use, etc) and your program all have to
be compiled with the same ABI.  So either you use gnueabi or gnueabihf.
You don't get to mix and match.

So if you use debian or ubuntu armhf, then your system is gnueabihf,
and if you use armel, then your system is gnueabi (which uses softfloat
for argument passing).

Of course if you use a multiarch system you could have libs built and
installed for both methods at the same time and each program would load
the libraries that matched its ABI.  That's probably too complicated to
deal with in general though.

But if your libc and such are built with gnueabi, then you can not build
a single program with gnueabihf unless it requires no libraries from
the system at all (so a staticly linked binary could still run as long
as the kernel is built to run both ABIs).

ARM has unfortunately gone through a number of ABIs over the years that
are not entirely compatible (especially when it comes to floating point
where they are pretty much entirely incompatible).

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 19:24     ` Lennart Sorensen
@ 2015-03-17 19:57       ` Steve B
  2015-03-17 20:02         ` Gilles Chanteperdrix
  2015-03-17 21:34         ` Lennart Sorensen
  0 siblings, 2 replies; 32+ messages in thread
From: Steve B @ 2015-03-17 19:57 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Tue, Mar 17, 2015 at 12:24 PM, Lennart Sorensen <
lsorense@csclub.uwaterloo.ca> wrote:

> On Tue, Mar 17, 2015 at 11:33:58AM -0700, Steve B wrote:
> > Thanks! This actually doesn't work at compile time for some reason. The
> ld
> > step complains that the output binary uses VFP register arguments while
> my
> > main.o file does not... does anybody have any hints on this?
> >
> > If I do mfloat-abi=hard I can compile and run but my math operation still
> > takes too long, which is counter-intuitive to me...
>
> Your entire system (all libs you use, etc) and your program all have to
> be compiled with the same ABI.  So either you use gnueabi or gnueabihf.
> You don't get to mix and match.
>
> So if you use debian or ubuntu armhf, then your system is gnueabihf,
> and if you use armel, then your system is gnueabi (which uses softfloat
> for argument passing).
>
> Of course if you use a multiarch system you could have libs built and
> installed for both methods at the same time and each program would load
> the libraries that matched its ABI.  That's probably too complicated to
> deal with in general though.
>
> But if your libc and such are built with gnueabi, then you can not build
> a single program with gnueabihf unless it requires no libraries from
> the system at all (so a staticly linked binary could still run as long
> as the kernel is built to run both ABIs).
>
> ARM has unfortunately gone through a number of ABIs over the years that
> are not entirely compatible (especially when it comes to floating point
> where they are pretty much entirely incompatible).
>
> --
> Len Sorensen
>

Thanks guys,
Originally some other engineers on my project specified that we use
Angstrom Linux which seems to only work with gnueabi.
I noticed that the pow() function was causing problems so I looked at the
disassembly of the libm.so with that toolchain and it looks suspiciously
like soft floating point routines.

So I'm trying to run my simple test program on a Debian distribution which
is built with gnueabihf. Seems like the issue is that if I use the
-mhard-float option, I can't run with Xenomai, even if I built the Xenomai
libraries with -mhard-float enabled.
If I don't add in -mhard-float when I compile the program, the pow()
function still takes much too long... if I enable -mhard-float and take
away the Xenomai libs, then the pow() function executes in a reasonable
amount of time.
I guess there still may be something wrong in making sure everything is
built the same.

Hopefully that makes more sense?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 19:57       ` Steve B
@ 2015-03-17 20:02         ` Gilles Chanteperdrix
  2015-03-17 21:34         ` Lennart Sorensen
  1 sibling, 0 replies; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-17 20:02 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Tue, Mar 17, 2015 at 12:57:43PM -0700, Steve B wrote:
> On Tue, Mar 17, 2015 at 12:24 PM, Lennart Sorensen <
> lsorense@csclub.uwaterloo.ca> wrote:
> 
> > On Tue, Mar 17, 2015 at 11:33:58AM -0700, Steve B wrote:
> > > Thanks! This actually doesn't work at compile time for some reason. The
> > ld
> > > step complains that the output binary uses VFP register arguments while
> > my
> > > main.o file does not... does anybody have any hints on this?
> > >
> > > If I do mfloat-abi=hard I can compile and run but my math operation still
> > > takes too long, which is counter-intuitive to me...
> >
> > Your entire system (all libs you use, etc) and your program all have to
> > be compiled with the same ABI.  So either you use gnueabi or gnueabihf.
> > You don't get to mix and match.
> >
> > So if you use debian or ubuntu armhf, then your system is gnueabihf,
> > and if you use armel, then your system is gnueabi (which uses softfloat
> > for argument passing).
> >
> > Of course if you use a multiarch system you could have libs built and
> > installed for both methods at the same time and each program would load
> > the libraries that matched its ABI.  That's probably too complicated to
> > deal with in general though.
> >
> > But if your libc and such are built with gnueabi, then you can not build
> > a single program with gnueabihf unless it requires no libraries from
> > the system at all (so a staticly linked binary could still run as long
> > as the kernel is built to run both ABIs).
> >
> > ARM has unfortunately gone through a number of ABIs over the years that
> > are not entirely compatible (especially when it comes to floating point
> > where they are pretty much entirely incompatible).
> >
> > --
> > Len Sorensen
> >
> 
> Thanks guys,
> Originally some other engineers on my project specified that we use
> Angstrom Linux which seems to only work with gnueabi.
> I noticed that the pow() function was causing problems so I looked at the
> disassembly of the libm.so with that toolchain and it looks suspiciously
> like soft floating point routines.
> 
> So I'm trying to run my simple test program on a Debian distribution which
> is built with gnueabihf. Seems like the issue is that if I use the
> -mhard-float option, I can't run with Xenomai, even if I built the Xenomai
> libraries with -mhard-float enabled.
> If I don't add in -mhard-float when I compile the program, the pow()
> function still takes much too long... if I enable -mhard-float and take
> away the Xenomai libs, then the pow() function executes in a reasonable
> amount of time.
> I guess there still may be something wrong in making sure everything is
> built the same.
> 
> Hopefully that makes more sense?

With a hard float toolchain, you normally do not need to pass
-mhard-float, it should always generate hard float code. Are you
sure you do not have some other flags on the compiler command line?

FWIW, xenomai has been tested for years on omap3, omap4, and now
at91sama5d3 with hard float toolchains.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 19:57       ` Steve B
  2015-03-17 20:02         ` Gilles Chanteperdrix
@ 2015-03-17 21:34         ` Lennart Sorensen
  2015-03-19  0:42           ` Steve B
  1 sibling, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-17 21:34 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Tue, Mar 17, 2015 at 12:57:43PM -0700, Steve B wrote:
> Originally some other engineers on my project specified that we use
> Angstrom Linux which seems to only work with gnueabi.
> I noticed that the pow() function was causing problems so I looked at the
> disassembly of the libm.so with that toolchain and it looks suspiciously
> like soft floating point routines.
> 
> So I'm trying to run my simple test program on a Debian distribution which
> is built with gnueabihf. Seems like the issue is that if I use the
> -mhard-float option, I can't run with Xenomai, even if I built the Xenomai
> libraries with -mhard-float enabled.
> If I don't add in -mhard-float when I compile the program, the pow()
> function still takes much too long... if I enable -mhard-float and take
> away the Xenomai libs, then the pow() function executes in a reasonable
> amount of time.
> I guess there still may be something wrong in making sure everything is
> built the same.
> 
> Hopefully that makes more sense?

Xenomai runs fine on Debian with hard float, as long as your whole system
is built that way.  No specifying of -mhard-float, just using what the
tool chain of the system does by default, and of course on debian armhf,
that is hard float.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-17 21:34         ` Lennart Sorensen
@ 2015-03-19  0:42           ` Steve B
  2015-03-19 14:07             ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Steve B @ 2015-03-19  0:42 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Tue, Mar 17, 2015 at 2:34 PM, Lennart Sorensen <
lsorense@csclub.uwaterloo.ca> wrote:

> On Tue, Mar 17, 2015 at 12:57:43PM -0700, Steve B wrote:
> > Originally some other engineers on my project specified that we use
> > Angstrom Linux which seems to only work with gnueabi.
> > I noticed that the pow() function was causing problems so I looked at the
> > disassembly of the libm.so with that toolchain and it looks suspiciously
> > like soft floating point routines.
> >
> > So I'm trying to run my simple test program on a Debian distribution
> which
> > is built with gnueabihf. Seems like the issue is that if I use the
> > -mhard-float option, I can't run with Xenomai, even if I built the
> Xenomai
> > libraries with -mhard-float enabled.
> > If I don't add in -mhard-float when I compile the program, the pow()
> > function still takes much too long... if I enable -mhard-float and take
> > away the Xenomai libs, then the pow() function executes in a reasonable
> > amount of time.
> > I guess there still may be something wrong in making sure everything is
> > built the same.
> >
> > Hopefully that makes more sense?
>
> Xenomai runs fine on Debian with hard float, as long as your whole system
> is built that way.  No specifying of -mhard-float, just using what the
> tool chain of the system does by default, and of course on debian armhf,
> that is hard float.
>
> --
> Len Sorensen
>

Hello again, all.
It turns out that when I ran without Xenomai and with the -mhard-float
option enabled, the pow() function was only returning one of the original
arguments as a result, and thus wasn't working properly!

Without the -mhard-float option in my compile, it seems that the gnueabihf
case actually has worse timing than the regular gnueabi, which is kind of
puzzling. This is definitely not a Xenomai issue though so I will check
with some other software folks on site and see if they have any thoughts.

Thanks,

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19  0:42           ` Steve B
@ 2015-03-19 14:07             ` Lennart Sorensen
  2015-03-19 14:40               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 14:07 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Wed, Mar 18, 2015 at 05:42:02PM -0700, Steve B wrote:
> Hello again, all.
> It turns out that when I ran without Xenomai and with the -mhard-float
> option enabled, the pow() function was only returning one of the original
> arguments as a result, and thus wasn't working properly!
> 
> Without the -mhard-float option in my compile, it seems that the gnueabihf
> case actually has worse timing than the regular gnueabi, which is kind of
> puzzling. This is definitely not a Xenomai issue though so I will check
> with some other software folks on site and see if they have any thoughts.

Well it is possible the FPU is so bad at that function, that the software
implementation using intergers only can be faster.

Which CPU are you running this on?  I remember the Cortex-A8 is known for
having a horribly slow FPU (neon code generally runs quite a bit faster,
which isn't true of later Cortex-A designs, where the FPU is usually
faster than neon or at least equal).

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 14:07             ` Lennart Sorensen
@ 2015-03-19 14:40               ` Gilles Chanteperdrix
  2015-03-19 15:59                 ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 14:40 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Thu, Mar 19, 2015 at 10:07:03AM -0400, Lennart Sorensen wrote:
> On Wed, Mar 18, 2015 at 05:42:02PM -0700, Steve B wrote:
> > Hello again, all.
> > It turns out that when I ran without Xenomai and with the -mhard-float
> > option enabled, the pow() function was only returning one of the original
> > arguments as a result, and thus wasn't working properly!
> > 
> > Without the -mhard-float option in my compile, it seems that the gnueabihf
> > case actually has worse timing than the regular gnueabi, which is kind of
> > puzzling. This is definitely not a Xenomai issue though so I will check
> > with some other software folks on site and see if they have any thoughts.
> 
> Well it is possible the FPU is so bad at that function, that the software
> implementation using intergers only can be faster.
> 
> Which CPU are you running this on?  I remember the Cortex-A8 is known for
> having a horribly slow FPU (neon code generally runs quite a bit faster,
> which isn't true of later Cortex-A designs, where the FPU is usually
> faster than neon or at least equal).

I have used cortex a8 to do some moderate calculations, but only
operating on 2Mpixel images got me near the millisecond. Doing
things like polynomial fit was taking microseconds.

How is pow implemented anyway, exp(a * ln(b)) or is there a smarter
solution ? Is b an integer in your case ? What are the values of a
and b ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 14:40               ` Gilles Chanteperdrix
@ 2015-03-19 15:59                 ` Lennart Sorensen
  2015-03-19 16:04                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 15:59 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Thu, Mar 19, 2015 at 03:40:41PM +0100, Gilles Chanteperdrix wrote:
> I have used cortex a8 to do some moderate calculations, but only
> operating on 2Mpixel images got me near the millisecond. Doing
> things like polynomial fit was taking microseconds.
> 
> How is pow implemented anyway, exp(a * ln(b)) or is there a smarter
> solution ? Is b an integer in your case ? What are the values of a
> and b ?

Well from what I have read, the VFP FPU on the Cortex-A8 is 1/10 the
speed of the VFP FPU in the Cortex-A9 and newer.  So very very slow.

Looking at glibc-2.19 in ./sysdeps/ieee754/dbl-64/e_pow.c the
implementation involves a lot of floating point calculations.  It's 130
lines long, and I am not having much luck understanding it.  It certainly
has lots of checking that values aren't too small or too big and would
cause under or over flow, and various other checks.  It calls log1, and
exp and various other functions which may themselves be pretty big too.
So on a slow FPU like a Cortex-A8, it looks like pow() could be very slow.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 15:59                 ` Lennart Sorensen
@ 2015-03-19 16:04                   ` Gilles Chanteperdrix
  2015-03-19 16:43                     ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 16:04 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Thu, Mar 19, 2015 at 11:59:39AM -0400, Lennart Sorensen wrote:
> On Thu, Mar 19, 2015 at 03:40:41PM +0100, Gilles Chanteperdrix wrote:
> > I have used cortex a8 to do some moderate calculations, but only
> > operating on 2Mpixel images got me near the millisecond. Doing
> > things like polynomial fit was taking microseconds.
> > 
> > How is pow implemented anyway, exp(a * ln(b)) or is there a smarter
> > solution ? Is b an integer in your case ? What are the values of a
> > and b ?
> 
> Well from what I have read, the VFP FPU on the Cortex-A8 is 1/10 the
> speed of the VFP FPU in the Cortex-A9 and newer.  So very very slow.
> 
> Looking at glibc-2.19 in ./sysdeps/ieee754/dbl-64/e_pow.c the
> implementation involves a lot of floating point calculations.  It's 130
> lines long, and I am not having much luck understanding it.  It certainly
> has lots of checking that values aren't too small or too big and would
> cause under or over flow, and various other checks.  It calls log1, and
> exp and various other functions which may themselves be pretty big too.
> So on a slow FPU like a Cortex-A8, it looks like pow() could be very slow.

My point was, there may be some pathological values that may involve
using expm1 or logp1 instead of log or exp, and so to avoid pow
entirely. Also, I am not sure pow optimizes the case where b is an
integer. It would be interesting to know the actual values of a and
b which cause pow to explode.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:04                   ` Gilles Chanteperdrix
@ 2015-03-19 16:43                     ` Lennart Sorensen
  2015-03-19 16:48                       ` Gilles Chanteperdrix
  2015-03-19 16:49                       ` Steve B
  0 siblings, 2 replies; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 16:43 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> My point was, there may be some pathological values that may involve
> using expm1 or logp1 instead of log or exp, and so to avoid pow
> entirely. Also, I am not sure pow optimizes the case where b is an
> integer. It would be interesting to know the actual values of a and
> b which cause pow to explode.

Could be.

For fun I checked how many instructions it took to execute 
pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
it took 1151 instructions.  Now I did not enable optimization for that
build, which might matter.  Those were of course not all floating pointer
instructions, but quite a few of them are.

With -O3, it dropped to 303 instructions.

I should try the same test on armel to see what difference it shows just
because I am curious.  I would have to setup an armel chroot to run
in first.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:43                     ` Lennart Sorensen
@ 2015-03-19 16:48                       ` Gilles Chanteperdrix
  2015-03-19 17:26                         ` Lennart Sorensen
  2015-03-19 16:49                       ` Steve B
  1 sibling, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 16:48 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Thu, Mar 19, 2015 at 12:43:26PM -0400, Lennart Sorensen wrote:
> On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> > My point was, there may be some pathological values that may involve
> > using expm1 or logp1 instead of log or exp, and so to avoid pow
> > entirely. Also, I am not sure pow optimizes the case where b is an
> > integer. It would be interesting to know the actual values of a and
> > b which cause pow to explode.
> 
> Could be.
> 
> For fun I checked how many instructions it took to execute 
> pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
> it took 1151 instructions.  Now I did not enable optimization for that
> build, which might matter.  Those were of course not all floating pointer
> instructions, but quite a few of them are.
> 
> With -O3, it dropped to 303 instructions.
> 
> I should try the same test on armel to see what difference it shows just
> because I am curious.  I would have to setup an armel chroot to run
> in first.

300 or 1000 instructions are executed in a very short time, not in a
3 milliseconds, otherwise it would mean that each instruction takes
10us or so to execute. Typically, in an ideal situation on a 1GHz
processor, an instruction takes 1ns to execute, that is an order of
magnitude smaller. So, even if the situation is not ideal, that is
still far from the numbers reported by Steve. 

So, again, it would be interesting to know the actual values Steve
has that cause pow to go crazy.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:43                     ` Lennart Sorensen
  2015-03-19 16:48                       ` Gilles Chanteperdrix
@ 2015-03-19 16:49                       ` Steve B
  2015-03-19 16:54                         ` Gilles Chanteperdrix
  2015-03-19 17:48                         ` Lennart Sorensen
  1 sibling, 2 replies; 32+ messages in thread
From: Steve B @ 2015-03-19 16:49 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Thu, Mar 19, 2015 at 9:43 AM, Lennart Sorensen <
lsorense@csclub.uwaterloo.ca> wrote:

> On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> > My point was, there may be some pathological values that may involve
> > using expm1 or logp1 instead of log or exp, and so to avoid pow
> > entirely. Also, I am not sure pow optimizes the case where b is an
> > integer. It would be interesting to know the actual values of a and
> > b which cause pow to explode.
>
> Could be.
>
> For fun I checked how many instructions it took to execute
> pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
> it took 1151 instructions.  Now I did not enable optimization for that
> build, which might matter.  Those were of course not all floating pointer
> instructions, but quite a few of them are.
>
> With -O3, it dropped to 303 instructions.
>
> I should try the same test on armel to see what difference it shows just
> because I am curious.  I would have to setup an armel chroot to run
> in first.
>
> --
> Len Sorensen
>

Hi guys,
Here are some input values that have caused problems for me:

b=0.975800; c= 7.000000;
b = -1789009.391544; c = 6.000000;
b= 42442350436303.453125; c = 4.500000;

(where I am doing a = pow(b,c);)
Interestingly the first two took a long time in my actual application but
not in my test program where I was just plugging them straight into the
pow() function. I guess there may be some difference in compile options
that I need to take a look at to see what is going on.
And the third one (yeah, that's a huge number!) takes a very long time no
matter what, and seems to take much longer with the hf compiler.

Let me know if this looks interesting..
My original plan was to try to get my algorithm designer to get rid of the
pow() calls wherever he can, and hopefully that will get us straightened
out...

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:49                       ` Steve B
@ 2015-03-19 16:54                         ` Gilles Chanteperdrix
  2015-03-19 18:00                           ` Steve B
  2015-03-19 17:48                         ` Lennart Sorensen
  1 sibling, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 16:54 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 09:49:45AM -0700, Steve B wrote:
> On Thu, Mar 19, 2015 at 9:43 AM, Lennart Sorensen <
> lsorense@csclub.uwaterloo.ca> wrote:
> 
> > On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> > > My point was, there may be some pathological values that may involve
> > > using expm1 or logp1 instead of log or exp, and so to avoid pow
> > > entirely. Also, I am not sure pow optimizes the case where b is an
> > > integer. It would be interesting to know the actual values of a and
> > > b which cause pow to explode.
> >
> > Could be.
> >
> > For fun I checked how many instructions it took to execute
> > pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
> > it took 1151 instructions.  Now I did not enable optimization for that
> > build, which might matter.  Those were of course not all floating pointer
> > instructions, but quite a few of them are.
> >
> > With -O3, it dropped to 303 instructions.
> >
> > I should try the same test on armel to see what difference it shows just
> > because I am curious.  I would have to setup an armel chroot to run
> > in first.
> >
> > --
> > Len Sorensen
> >
> 
> Hi guys,
> Here are some input values that have caused problems for me:
> 
> b=0.975800; c= 7.000000;
> b = -1789009.391544; c = 6.000000;
> b= 42442350436303.453125; c = 4.500000;
> 
> (where I am doing a = pow(b,c);)
> Interestingly the first two took a long time in my actual application but
> not in my test program where I was just plugging them straight into the
> pow() function. I guess there may be some difference in compile options
> that I need to take a look at to see what is going on.
> And the third one (yeah, that's a huge number!) takes a very long time no
> matter what, and seems to take much longer with the hf compiler.
> 
> Let me know if this looks interesting..
> My original plan was to try to get my algorithm designer to get rid of the
> pow() calls wherever he can, and hopefully that will get us straightened
> out...

For the first example, b being close to 1, and pow(b,c) being 
exp(c * log(b))

you may want to try exp(c * log1p(b - 1))

Also, since the exponents are integers
you may want to try the russian peon algorithm
4.5 is 9 / 2 so pow(b, 4.5) is sqrt(pow(b, 9))


> 
> Steve

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:48                       ` Gilles Chanteperdrix
@ 2015-03-19 17:26                         ` Lennart Sorensen
  2015-03-19 20:06                           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 17:26 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Thu, Mar 19, 2015 at 05:48:31PM +0100, Gilles Chanteperdrix wrote:
> 300 or 1000 instructions are executed in a very short time, not in a
> 3 milliseconds, otherwise it would mean that each instruction takes
> 10us or so to execute. Typically, in an ideal situation on a 1GHz
> processor, an instruction takes 1ns to execute, that is an order of
> magnitude smaller. So, even if the situation is not ideal, that is
> still far from the numbers reported by Steve. 

Well apparently on the Cortex-A8 FPU instructions take 10 or more clock
cycles to execute, versus 1 in most cases on the Cortex-A9, so 300
instructions starts to add up.

> So, again, it would be interesting to know the actual values Steve
> has that cause pow to go crazy.

Yes it would.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:49                       ` Steve B
  2015-03-19 16:54                         ` Gilles Chanteperdrix
@ 2015-03-19 17:48                         ` Lennart Sorensen
  1 sibling, 0 replies; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 17:48 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 09:49:45AM -0700, Steve B wrote:
> Hi guys,
> Here are some input values that have caused problems for me:
> 
> b=0.975800; c= 7.000000;
> b = -1789009.391544; c = 6.000000;
> b= 42442350436303.453125; c = 4.500000;

I am trying the last one right now.  It is still counting instructions.
It has been at it for many minutes.  According to
http://developerblog.redhat.com/2015/01/02/improving-math-performance-in-glibc/
the way it is implemented, it uses a lookup table in many cases, but if
it determines the loopup table method won't be accurate, then it goes of
an calculates the solution in multiple steps, which includes calculating
a 768bit precision result, and then rounding that to a double.  It is
of course very slow when this happens.  The page mentions improving the
slowest path of pow by 8 times.  Of course your current glibc won't have
those improvements yet.  I will post the number of instructions when I
get it.  I may have to go get lunch first.  It is at 94000 instructions
so far and I have no idea how much might be left.

Well actually it seems to say glibc 2.18 has those improvements.  I should
try a run compiled against that (I was using 2.13 from Debian Wheezy)

> (where I am doing a = pow(b,c);)
> Interestingly the first two took a long time in my actual application but
> not in my test program where I was just plugging them straight into the
> pow() function. I guess there may be some difference in compile options
> that I need to take a look at to see what is going on.
> And the third one (yeah, that's a huge number!) takes a very long time no
> matter what, and seems to take much longer with the hf compiler.
> 
> Let me know if this looks interesting..
> My original plan was to try to get my algorithm designer to get rid of the
> pow() calls wherever he can, and hopefully that will get us straightened
> out...

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 16:54                         ` Gilles Chanteperdrix
@ 2015-03-19 18:00                           ` Steve B
  2015-03-19 18:05                             ` Lennart Sorensen
  2015-03-19 20:03                             ` Gilles Chanteperdrix
  0 siblings, 2 replies; 32+ messages in thread
From: Steve B @ 2015-03-19 18:00 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Thu, Mar 19, 2015 at 9:54 AM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> On Thu, Mar 19, 2015 at 09:49:45AM -0700, Steve B wrote:
> > On Thu, Mar 19, 2015 at 9:43 AM, Lennart Sorensen <
> > lsorense@csclub.uwaterloo.ca> wrote:
> >
> > > On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> > > > My point was, there may be some pathological values that may involve
> > > > using expm1 or logp1 instead of log or exp, and so to avoid pow
> > > > entirely. Also, I am not sure pow optimizes the case where b is an
> > > > integer. It would be interesting to know the actual values of a and
> > > > b which cause pow to explode.
> > >
> > > Could be.
> > >
> > > For fun I checked how many instructions it took to execute
> > > pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
> > > it took 1151 instructions.  Now I did not enable optimization for that
> > > build, which might matter.  Those were of course not all floating
> pointer
> > > instructions, but quite a few of them are.
> > >
> > > With -O3, it dropped to 303 instructions.
> > >
> > > I should try the same test on armel to see what difference it shows
> just
> > > because I am curious.  I would have to setup an armel chroot to run
> > > in first.
> > >
> > > --
> > > Len Sorensen
> > >
> >
> > Hi guys,
> > Here are some input values that have caused problems for me:
> >
> > b=0.975800; c= 7.000000;
> > b = -1789009.391544; c = 6.000000;
> > b= 42442350436303.453125; c = 4.500000;
> >
> > (where I am doing a = pow(b,c);)
> > Interestingly the first two took a long time in my actual application but
> > not in my test program where I was just plugging them straight into the
> > pow() function. I guess there may be some difference in compile options
> > that I need to take a look at to see what is going on.
> > And the third one (yeah, that's a huge number!) takes a very long time no
> > matter what, and seems to take much longer with the hf compiler.
> >
> > Let me know if this looks interesting..
> > My original plan was to try to get my algorithm designer to get rid of
> the
> > pow() calls wherever he can, and hopefully that will get us straightened
> > out...
>
> For the first example, b being close to 1, and pow(b,c) being
> exp(c * log(b))
>
> you may want to try exp(c * log1p(b - 1))
>
> Also, since the exponents are integers
> you may want to try the russian peon algorithm
> 4.5 is 9 / 2 so pow(b, 4.5) is sqrt(pow(b, 9))
>
>
> >
> > Steve
>
> --
>                                             Gilles.
>

Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work better,
since it's just a few multiplies and a square root.

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 18:00                           ` Steve B
@ 2015-03-19 18:05                             ` Lennart Sorensen
  2015-03-19 19:00                               ` Lennart Sorensen
  2015-03-19 20:03                             ` Gilles Chanteperdrix
  1 sibling, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 18:05 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 11:00:12AM -0700, Steve B wrote:
> Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work better,
> since it's just a few multiplies and a square root.

The run just passed 165000 instructions for that one call to pow().

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 18:05                             ` Lennart Sorensen
@ 2015-03-19 19:00                               ` Lennart Sorensen
  2015-03-19 19:12                                 ` Steve B
  0 siblings, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 19:00 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 02:05:59PM -0400, Lennart Sorensen wrote:
> On Thu, Mar 19, 2015 at 11:00:12AM -0700, Steve B wrote:
> > Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work better,
> > since it's just a few multiplies and a square root.
> 
> The run just passed 165000 instructions for that one call to pow().

Total came to: 249316 instructions.

Ouch!

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 19:00                               ` Lennart Sorensen
@ 2015-03-19 19:12                                 ` Steve B
  2015-03-19 19:24                                   ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Steve B @ 2015-03-19 19:12 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Thu, Mar 19, 2015 at 12:00 PM, Lennart Sorensen <
lsorense@csclub.uwaterloo.ca> wrote:

> On Thu, Mar 19, 2015 at 02:05:59PM -0400, Lennart Sorensen wrote:
> > On Thu, Mar 19, 2015 at 11:00:12AM -0700, Steve B wrote:
> > > Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work
> better,
> > > since it's just a few multiplies and a square root.
> >
> > The run just passed 165000 instructions for that one call to pow().
>
> Total came to: 249316 instructions.
>
> Ouch!
>
> --
> Len Sorensen
>

Thanks for looking into it.. that's pretty close to my 3ms figure at 10
clocks per instruction.
I'm a curious why it's actually a bit better with the non-hf compiler, but
I think it may be a moot point. The more important thing is to replace
these with alternative ways to compute the same result. Doing everything
fixed point could have been a good design choice from the beginning, as
well...

Steve

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 19:12                                 ` Steve B
@ 2015-03-19 19:24                                   ` Lennart Sorensen
  2015-03-19 20:33                                     ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 19:24 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 12:12:06PM -0700, Steve B wrote:
> On Thu, Mar 19, 2015 at 12:00 PM, Lennart Sorensen <
> lsorense@csclub.uwaterloo.ca> wrote:
> 
> > On Thu, Mar 19, 2015 at 02:05:59PM -0400, Lennart Sorensen wrote:
> > > On Thu, Mar 19, 2015 at 11:00:12AM -0700, Steve B wrote:
> > > > Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work
> > better,
> > > > since it's just a few multiplies and a square root.
> > >
> > > The run just passed 165000 instructions for that one call to pow().
> >
> > Total came to: 249316 instructions.
> >
> > Ouch!
> >
> > --
> > Len Sorensen
> >
> 
> Thanks for looking into it.. that's pretty close to my 3ms figure at 10
> clocks per instruction.
> I'm a curious why it's actually a bit better with the non-hf compiler, but
> I think it may be a moot point. The more important thing is to replace
> these with alternative ways to compute the same result. Doing everything
> fixed point could have been a good design choice from the beginning, as
> well...

Well the softfloat would not be 10 cycles per instruction since it isn't
doing floating point.  It is possible that emulating the FPU in softfloat
is actually faster for this case on the Cortex-A8.

I am running a count on the number of instructions used with softfloat
right now.  It is at 130000 and counting.

After that finishes I will do a run with a newer glibc and see if they
actually made a noticeable improvement.

Good thing I have a decently fast arm to run this on (Using a 1GHz
Cortex-A15).

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 18:00                           ` Steve B
  2015-03-19 18:05                             ` Lennart Sorensen
@ 2015-03-19 20:03                             ` Gilles Chanteperdrix
  2015-03-19 21:22                               ` Gilles Chanteperdrix
  1 sibling, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 20:03 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 11:00:12AM -0700, Steve B wrote:
> On Thu, Mar 19, 2015 at 9:54 AM, Gilles Chanteperdrix <
> gilles.chanteperdrix@xenomai.org> wrote:
> 
> > On Thu, Mar 19, 2015 at 09:49:45AM -0700, Steve B wrote:
> > > On Thu, Mar 19, 2015 at 9:43 AM, Lennart Sorensen <
> > > lsorense@csclub.uwaterloo.ca> wrote:
> > >
> > > > On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> > > > > My point was, there may be some pathological values that may involve
> > > > > using expm1 or logp1 instead of log or exp, and so to avoid pow
> > > > > entirely. Also, I am not sure pow optimizes the case where b is an
> > > > > integer. It would be interesting to know the actual values of a and
> > > > > b which cause pow to explode.
> > > >
> > > > Could be.
> > > >
> > > > For fun I checked how many instructions it took to execute
> > > > pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
> > > > it took 1151 instructions.  Now I did not enable optimization for that
> > > > build, which might matter.  Those were of course not all floating
> > pointer
> > > > instructions, but quite a few of them are.
> > > >
> > > > With -O3, it dropped to 303 instructions.
> > > >
> > > > I should try the same test on armel to see what difference it shows
> > just
> > > > because I am curious.  I would have to setup an armel chroot to run
> > > > in first.
> > > >
> > > > --
> > > > Len Sorensen
> > > >
> > >
> > > Hi guys,
> > > Here are some input values that have caused problems for me:
> > >
> > > b=0.975800; c= 7.000000;
> > > b = -1789009.391544; c = 6.000000;
> > > b= 42442350436303.453125; c = 4.500000;
> > >
> > > (where I am doing a = pow(b,c);)
> > > Interestingly the first two took a long time in my actual application but
> > > not in my test program where I was just plugging them straight into the
> > > pow() function. I guess there may be some difference in compile options
> > > that I need to take a look at to see what is going on.
> > > And the third one (yeah, that's a huge number!) takes a very long time no
> > > matter what, and seems to take much longer with the hf compiler.
> > >
> > > Let me know if this looks interesting..
> > > My original plan was to try to get my algorithm designer to get rid of
> > the
> > > pow() calls wherever he can, and hopefully that will get us straightened
> > > out...
> >
> > For the first example, b being close to 1, and pow(b,c) being
> > exp(c * log(b))
> >
> > you may want to try exp(c * log1p(b - 1))
> >
> > Also, since the exponents are integers
> > you may want to try the russian peon algorithm
> > 4.5 is 9 / 2 so pow(b, 4.5) is sqrt(pow(b, 9))
> >
> >
> > >
> > > Steve
> >
> > --
> >                                             Gilles.
> >
> 
> Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work better,
> since it's just a few multiplies and a square root.

For b^4, you want to do
x = b * b
y = x * x

That is only two multiplications (instead of 4). I believe the
russian peasant would do it that way. It gives a reduced number of
multiplications (it is not optimal though), and that works for any
integer power.

Anyway, I do not believe the pow function is supposed to be
efficient for integer powers, I do not know if there is another
glibc function, but if there is not, the russian peasant is simple
and efficient.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 17:26                         ` Lennart Sorensen
@ 2015-03-19 20:06                           ` Gilles Chanteperdrix
  2015-03-19 20:32                             ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 20:06 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Xenomai

On Thu, Mar 19, 2015 at 01:26:15PM -0400, Lennart Sorensen wrote:
> On Thu, Mar 19, 2015 at 05:48:31PM +0100, Gilles Chanteperdrix wrote:
> > 300 or 1000 instructions are executed in a very short time, not in a
> > 3 milliseconds, otherwise it would mean that each instruction takes
> > 10us or so to execute. Typically, in an ideal situation on a 1GHz
> > processor, an instruction takes 1ns to execute, that is an order of
> > magnitude smaller. So, even if the situation is not ideal, that is
> > still far from the numbers reported by Steve. 
> 
> Well apparently on the Cortex-A8 FPU instructions take 10 or more clock
> cycles to execute, versus 1 in most cases on the Cortex-A9, so 300
> instructions starts to add up.

10 cycles at 1 GHz, is still 10ns, not 10us. That is still 1/1000.

And I do not really believe the cortex a8 is 10 times slower than
the cortex a9, I have benchmarked hand optimized 12x12 matrix
multiplications using doubles (so which can not be implemented with
neon), on both omap3 at 720MHz and omap4 at 1GHz, and I do not
remember such difference between the benchmarks. Maybe A9 is 2 or 3
times faster, but not 10 times.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 20:06                           ` Gilles Chanteperdrix
@ 2015-03-19 20:32                             ` Lennart Sorensen
  0 siblings, 0 replies; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 20:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Thu, Mar 19, 2015 at 09:06:57PM +0100, Gilles Chanteperdrix wrote:
> 10 cycles at 1 GHz, is still 10ns, not 10us. That is still 1/1000.
> 
> And I do not really believe the cortex a8 is 10 times slower than
> the cortex a9, I have benchmarked hand optimized 12x12 matrix
> multiplications using doubles (so which can not be implemented with
> neon), on both omap3 at 720MHz and omap4 at 1GHz, and I do not
> remember such difference between the benchmarks. Maybe A9 is 2 or 3
> times faster, but not 10 times.

The A8 is not 1/10 the speed of an A9.  The FPU of the A8 is about 1/10
the speed of the A9, but given there is often integer math going on and
other logic, you won't be hurt that bad in general.  My experience on
generic code is that the A9 is about twice the speed of the A8, so your
2 or 3 times agrees with that.  Really FPU heavy code would be worse
than that, but I doubt it would be easy to make code that actually showed
a 10 times difference between them.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 19:24                                   ` Lennart Sorensen
@ 2015-03-19 20:33                                     ` Lennart Sorensen
  2015-03-20 13:58                                       ` Lennart Sorensen
  0 siblings, 1 reply; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-19 20:33 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 03:24:15PM -0400, Lennart Sorensen wrote:
> I am running a count on the number of instructions used with softfloat
> right now.  It is at 130000 and counting.

Softfloat is now at 642000 and counting.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 20:03                             ` Gilles Chanteperdrix
@ 2015-03-19 21:22                               ` Gilles Chanteperdrix
  2015-03-19 21:29                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 21:22 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 09:03:02PM +0100, Gilles Chanteperdrix wrote:
> On Thu, Mar 19, 2015 at 11:00:12AM -0700, Steve B wrote:
> > On Thu, Mar 19, 2015 at 9:54 AM, Gilles Chanteperdrix <
> > gilles.chanteperdrix@xenomai.org> wrote:
> > 
> > > On Thu, Mar 19, 2015 at 09:49:45AM -0700, Steve B wrote:
> > > > On Thu, Mar 19, 2015 at 9:43 AM, Lennart Sorensen <
> > > > lsorense@csclub.uwaterloo.ca> wrote:
> > > >
> > > > > On Thu, Mar 19, 2015 at 05:04:03PM +0100, Gilles Chanteperdrix wrote:
> > > > > > My point was, there may be some pathological values that may involve
> > > > > > using expm1 or logp1 instead of log or exp, and so to avoid pow
> > > > > > entirely. Also, I am not sure pow optimizes the case where b is an
> > > > > > integer. It would be interesting to know the actual values of a and
> > > > > > b which cause pow to explode.
> > > > >
> > > > > Could be.
> > > > >
> > > > > For fun I checked how many instructions it took to execute
> > > > > pow(1.234000,12.200000) to get 13.003041 and according to my gdb run,
> > > > > it took 1151 instructions.  Now I did not enable optimization for that
> > > > > build, which might matter.  Those were of course not all floating
> > > pointer
> > > > > instructions, but quite a few of them are.
> > > > >
> > > > > With -O3, it dropped to 303 instructions.
> > > > >
> > > > > I should try the same test on armel to see what difference it shows
> > > just
> > > > > because I am curious.  I would have to setup an armel chroot to run
> > > > > in first.
> > > > >
> > > > > --
> > > > > Len Sorensen
> > > > >
> > > >
> > > > Hi guys,
> > > > Here are some input values that have caused problems for me:
> > > >
> > > > b=0.975800; c= 7.000000;
> > > > b = -1789009.391544; c = 6.000000;
> > > > b= 42442350436303.453125; c = 4.500000;
> > > >
> > > > (where I am doing a = pow(b,c);)
> > > > Interestingly the first two took a long time in my actual application but
> > > > not in my test program where I was just plugging them straight into the
> > > > pow() function. I guess there may be some difference in compile options
> > > > that I need to take a look at to see what is going on.
> > > > And the third one (yeah, that's a huge number!) takes a very long time no
> > > > matter what, and seems to take much longer with the hf compiler.
> > > >
> > > > Let me know if this looks interesting..
> > > > My original plan was to try to get my algorithm designer to get rid of
> > > the
> > > > pow() calls wherever he can, and hopefully that will get us straightened
> > > > out...
> > >
> > > For the first example, b being close to 1, and pow(b,c) being
> > > exp(c * log(b))
> > >
> > > you may want to try exp(c * log1p(b - 1))
> > >
> > > Also, since the exponents are integers
> > > you may want to try the russian peon algorithm
> > > 4.5 is 9 / 2 so pow(b, 4.5) is sqrt(pow(b, 9))
> > >
> > >
> > > >
> > > > Steve
> > >
> > > --
> > >                                             Gilles.
> > >
> > 
> > Yes, that's a good idea, or maybe sqrt(b)*b*b*b*b would even work better,
> > since it's just a few multiplies and a square root.
> 
> For b^4, you want to do
> x = b * b
> y = x * x
> 
> That is only two multiplications (instead of 4). I believe the
> russian peasant would do it that way. It gives a reduced number of
> multiplications (it is not optimal though), and that works for any
> integer power.
> 
> Anyway, I do not believe the pow function is supposed to be
> efficient for integer powers, I do not know if there is another
> glibc function, but if there is not, the russian peasant is simple
> and efficient.

The first google hit on russian peasant exponentiation, that is:
lafstern.org/matt/col3.pdf
Looks wrong to me.

I had more luck changing language, and found
http://84.55.172.83/le_nagae/info/exoX/exo2/exo2.html

Whose implementation simplifies to:

double pow_int(double x, unsigned long n)
{
	double res = 1.0, p = x;
	
	while (n) {
		if (n & 1)
			res *= p;
		p *= p;
		n >>= 1;
	}

	return res;
}

Which can be used as a pow drop-in replacement for integer exponents.

I have checked with the following main:

int main(int argc, const char *argv[])
{
	int seed;

	if (argc >= 2)
		seed = atoi(argv[1]); /* Allow replaying seed */
	else 
		seed = time(NULL) ^ getpid();

	srandom(seed);
	printf("seed: 0x%08x\n", seed);


	for (;;) {
		double f = random();
		unsigned long n = random();

		if ((pow_int(f, n) != pow(f, n)) / pow_int(f, n) > 1e-6) {
			fprintf(stderr, "Test failed for f: %g, n: %d\n"
				"pow_int: %g, pow: %g\n",
				f, n, pow_int(f, n), pow(f, n));
			exit(EXIT_FAILURE);
		}
	}

	return EXIT_SUCCESS;
}

That it returns the same value as pow, modulo round off errors, for
a large number of values. So, is probably right.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 21:22                               ` Gilles Chanteperdrix
@ 2015-03-19 21:29                                 ` Gilles Chanteperdrix
  0 siblings, 0 replies; 32+ messages in thread
From: Gilles Chanteperdrix @ 2015-03-19 21:29 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 10:22:04PM +0100, Gilles Chanteperdrix wrote:
> 		if ((pow_int(f, n) != pow(f, n)) / pow_int(f, n) > 1e-6) {
It's                 pow_int(f,n) - pow(f, n)

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1
  2015-03-19 20:33                                     ` Lennart Sorensen
@ 2015-03-20 13:58                                       ` Lennart Sorensen
  0 siblings, 0 replies; 32+ messages in thread
From: Lennart Sorensen @ 2015-03-20 13:58 UTC (permalink / raw)
  To: Steve B; +Cc: Xenomai

On Thu, Mar 19, 2015 at 04:33:13PM -0400, Lennart Sorensen wrote:
> On Thu, Mar 19, 2015 at 03:24:15PM -0400, Lennart Sorensen wrote:
> > I am running a count on the number of instructions used with softfloat
> > right now.  It is at 130000 and counting.
> 
> Softfloat is now at 642000 and counting.

At 1600000 instructions my softfloat test ran out of ram.

Hard float using glibc 2.19 needed 149339, so that is quite an improvement
over glibc 2.13.  Still quite a lot though.

-- 
Len Sorensen


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-03-20 13:58 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-17 16:19 [Xenomai] Building with hard float: cannot open shared object file libpthread_rt.so.1 Steve B
2015-03-17 18:06 ` Gilles Chanteperdrix
2015-03-17 18:33   ` Steve B
2015-03-17 18:38     ` Gilles Chanteperdrix
     [not found]       ` <CAEMXjGzZn3JWCsxAkC+dFL0tLWk_FZpsNzB=YkSHYzCS2QEKmA@mail.gmail.com>
2015-03-17 19:18         ` Gilles Chanteperdrix
2015-03-17 19:24     ` Lennart Sorensen
2015-03-17 19:57       ` Steve B
2015-03-17 20:02         ` Gilles Chanteperdrix
2015-03-17 21:34         ` Lennart Sorensen
2015-03-19  0:42           ` Steve B
2015-03-19 14:07             ` Lennart Sorensen
2015-03-19 14:40               ` Gilles Chanteperdrix
2015-03-19 15:59                 ` Lennart Sorensen
2015-03-19 16:04                   ` Gilles Chanteperdrix
2015-03-19 16:43                     ` Lennart Sorensen
2015-03-19 16:48                       ` Gilles Chanteperdrix
2015-03-19 17:26                         ` Lennart Sorensen
2015-03-19 20:06                           ` Gilles Chanteperdrix
2015-03-19 20:32                             ` Lennart Sorensen
2015-03-19 16:49                       ` Steve B
2015-03-19 16:54                         ` Gilles Chanteperdrix
2015-03-19 18:00                           ` Steve B
2015-03-19 18:05                             ` Lennart Sorensen
2015-03-19 19:00                               ` Lennart Sorensen
2015-03-19 19:12                                 ` Steve B
2015-03-19 19:24                                   ` Lennart Sorensen
2015-03-19 20:33                                     ` Lennart Sorensen
2015-03-20 13:58                                       ` Lennart Sorensen
2015-03-19 20:03                             ` Gilles Chanteperdrix
2015-03-19 21:22                               ` Gilles Chanteperdrix
2015-03-19 21:29                                 ` Gilles Chanteperdrix
2015-03-19 17:48                         ` Lennart Sorensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.