All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Paubert <paubert@iram.es>
To: Stephen N Chivers <schivers@csc.com.au>
Cc: Chris Proctor <cproctor@csc.com.au>, linuxppc-dev@lists.ozlabs.org
Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
Date: Thu, 6 Feb 2014 09:26:37 +0100	[thread overview]
Message-ID: <20140206082635.GA7048@visitor2.iram.es> (raw)
In-Reply-To: <OF784BA598.ACFE9DA4-ONCA257C76.00827472-CA257C77.000BCFAD@csc.com>

On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote:
> I have a MPC8548e based board and an application that makes
> extensive use of floating point including numerous calls to cos.
> In the same program there is the use of an sqlite database.
> 
> The kernel is derived from 2.6.31 and is compiled with math emulation.
> 
> At some point after the reading of the SQLITE database, the
> return result from cos goes from an in range value to an out
> of range value.
> 
> This is as a result of the FP rounding mode mutating from "round to 
> nearest"
> to "round toward zero".
> 
> The cos in the glibc version being used is known to be sensitive to 
> rounding
> direction and Joseph Myers has previously fixed glibc.
> 
> The failure does not occur on a machine that has a hardware floating
> point unit (a MPC7410 processor).
> 
> I have traced the mutation to the following series of instructions:
> 
>         mffs            f0
>         mtfsb1          4*cr7+so
>         mtfsb0          4*cr7+eq
>         fadd            f13,f1,f2
>         mtfsf           1, f0
> 
> The instructions are part of the stream emitted by gcc for the conversion
> of a 128 bit floating point value into an integer in the sqlite database 
> read.
> 
> Immediately before the execution of the mffs instruction the "rounding
> mode" is "round to nearest".
> 
> On the MPC8548 board, the execution of the mtfsf instruction does not
> restore the rounding mode to "round to nearest".
> 
> I believe that the mask computation in mtfsf.c is incorrect and is 
> reversed.
> 
> In the latest version of the file (linux-3.14-rc1), the mask is computed 
> by:
> 
>                  mask = 0;
>                  if (FM & (1 << 0))
>                         mask |= 0x90000000;
>                  if (FM & (1 << 1))
>                         mask |= 0x0f000000;
>                  if (FM & (1 << 2))
>                         mask |= 0x00f00000;
>                  if (FM & (1 << 3))
>                         mask |= 0x000f0000;
>                  if (FM & (1 << 4))
>                         mask |= 0x0000f000;
>                  if (FM & (1 << 5))
>                         mask |= 0x00000f00;
>                  if (FM & (1 << 6))
>                         mask |= 0x000000f0;
>                  if (FM & (1 << 7))
>                         mask |= 0x0000000f;
> 
> I think it should be:
> 
>                 mask = 0;
>                 if (FM & (1 << 0))
>                         mask |= 0x0000000f;
>                 if (FM & (1 << 1))
>                         mask |= 0x000000f0;
>                 if (FM & (1 << 2))
>                         mask |= 0x00000f00;
>                 if (FM & (1 << 3))
>                         mask |= 0x0000f000;
>                 if (FM & (1 << 4))
>                         mask |= 0x000f0000;
>                 if (FM & (1 << 5))
>                         mask |= 0x00f00000;
>                 if (FM & (1 << 6))
>                         mask |= 0x0f000000;
>                 if (FM & (1 << 7))
>                         mask |= 0x90000000;
> 
> With the above mask computation I get consistent results for both the 
> MPC8548
> and MPC7410 boards.
> 
> Am I missing something subtle?

No I think you are correct. This said, this code may probably be optimized 
to eliminate a lot of the conditional branches. I think that:

mask = (FM & 1);
mask |= (FM << 3) & 0x10;
mask |= (FM << 6) & 0x100;
mask |= (FM << 9) & 0x1000;
mask |= (FM << 12) & 0x10000;
mask |= (FM << 15) & 0x100000;
mask |= (FM << 18) & 0x1000000;
mask |= (FM << 21) & 0x10000000;
mask *= 15;

should do the job, in less code space and without a single branch.

Each one of the "mask |=" lines should be translated into an
rlwinm instruction followed by an "or". Actually it should be possible
to transform each of these lines into a single "rlwimi" instruction
but I don't know how to coerce gcc to reach this level of optimization.

Another way of optomizing this could be:

mask = (FM & 0x0f) | ((FM << 12) & 0x000f0000);
mask = (mask & 0x00030003) | ((mask << 6) & 0x03030303);
mask = (mask & 0x01010101) | ((mask << 3) & 0x10101010);
mask *= 15;

It's not easy to see which of the solutions is faster, the second one
needs to create quite a few constants, but its dependency length is 
lower. It is very likely that the first solution is faster in cache-cold
case and the second in cache-hot. 

Regardless, the original code is rather naïve, larger and slower in all cases,
with timing variation depending on branch mispredictions.

	Regards,
	Gabriel

  reply	other threads:[~2014-02-06  9:09 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-06  2:09 arch/powerpc/math-emu/mtfsf.c - incorrect mask? Stephen N Chivers
2014-02-06  8:26 ` Gabriel Paubert [this message]
2014-02-07  1:27   ` Stephen N Chivers
2014-02-07 10:10     ` Gabriel Paubert
2014-02-07 20:49       ` James Yang
2014-02-09 19:42         ` Stephen N Chivers
2014-02-10 16:50           ` James Yang
2014-02-10 11:03         ` Gabriel Paubert
2014-02-10 11:17           ` David Laight
2014-02-10 12:21             ` Gabriel Paubert
2014-02-10 12:32               ` David Laight
2014-02-10 13:00                 ` Gabriel Paubert
2014-02-10 17:03           ` James Yang
2014-02-11  7:26             ` Gabriel Paubert
2014-02-11 20:57               ` Linux-3.14-rc2: Order of serial node compatibles in DTS files Stephen N Chivers
     [not found]                 ` <OF6F6A0029.3B20EF5B-ONCA257C7C.0071BFDA-CA257C7C.00732AD3-SmukeSwxQOQ@public.gmane.org>
2014-02-11 22:33                   ` Kumar Gala
2014-02-11 22:33                     ` Kumar Gala
     [not found]                     ` <63AEBD99-AA87-4FD7-BBDA-0CE419959F14-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2014-02-11 22:51                       ` Sebastian Hesselbarth
2014-02-11 22:51                         ` Sebastian Hesselbarth
2014-02-11 23:38                         ` Stephen N Chivers
     [not found]                           ` <OFB203CA90.B048F8AA-ONCA257C7C.0081A816-CA257C7C.0081DCE3-SmukeSwxQOQ@public.gmane.org>
2014-02-11 23:43                             ` Sebastian Hesselbarth
2014-02-11 23:43                               ` Sebastian Hesselbarth
     [not found]                               ` <52FAB5A7.7080208-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-02-12 11:00                                 ` Arnd Bergmann
2014-02-12 11:00                                   ` Arnd Bergmann
     [not found]                         ` <52FAA97F.4060600-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-02-11 23:41                           ` Scott Wood
2014-02-11 23:41                             ` Scott Wood
     [not found]                             ` <1392162080.6733.404.camel-88ow+0ZRuxG2UiBs7uKeOtHuzzzSOjJt@public.gmane.org>
2014-02-11 23:46                               ` Sebastian Hesselbarth
2014-02-11 23:46                                 ` Sebastian Hesselbarth
     [not found]                                 ` <52FAB65C.4090201-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-02-12  0:21                                   ` Stephen N Chivers
2014-02-12  0:21                                     ` Stephen N Chivers
     [not found]                                     ` <OF9BA019D5.46DA1E29-ONCA257C7D.000089EA-CA257C7D.00020333-SmukeSwxQOQ@public.gmane.org>
2014-02-12  5:28                                       ` Kevin Hao
2014-02-12  5:28                                         ` Kevin Hao
     [not found]                                         ` <20140212052816.GA15434-2Y/eJ/Q4tBKYoDLmhA0N63T7OC0tnCxdQQ4Iyu8u01E@public.gmane.org>
2014-02-12  8:30                                           ` Sebastian Hesselbarth
2014-02-12  8:30                                             ` Sebastian Hesselbarth
     [not found]                                             ` <52FB3108.3000202-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-02-12 10:31                                               ` Kevin Hao
2014-02-12 10:31                                                 ` Kevin Hao
     [not found]                                                 ` <20140212103153.GC15434-2Y/eJ/Q4tBKYoDLmhA0N63T7OC0tnCxdQQ4Iyu8u01E@public.gmane.org>
2014-02-12 11:26                                                   ` Sebastian Hesselbarth
2014-02-12 11:26                                                     ` Sebastian Hesselbarth
     [not found]                                                     ` <52FB5A56.2050402-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-02-12 11:32                                                       ` Kevin Hao
2014-02-12 11:32                                                         ` Kevin Hao
2014-02-12  8:25                                       ` Sebastian Hesselbarth
2014-02-12  8:25                                         ` Sebastian Hesselbarth
     [not found]                                         ` <52FB2FF4.6060905-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-02-12 10:35                                           ` Kevin Hao
2014-02-12 10:35                                             ` Kevin Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140206082635.GA7048@visitor2.iram.es \
    --to=paubert@iram.es \
    --cc=cproctor@csc.com.au \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=schivers@csc.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.