On Mon, 2 Mar 2020, Richard Henderson wrote:
> On 3/2/20 3:42 AM, BALATON Zoltan wrote:
>>> The "hardfloat" option works (with other targets) only with ieee745
>>> accumulative exceptions, when the most common of those exceptions, inexact, has
>>> already been raised.  And thus need not be raised a second time.
>>
>> Why exactly it's done that way? What are the differences between IEEE FP
>> implementations that prevents using hardfloat most of the time instead of only
>> using it in some (although supposedly common) special cases?
>
> While it is possible to read the host's ieee exception word after the hardfloat
> operation, there are two reasons that is undesirable:
>
> (1) It is *slow*.  So slow that it's faster to run the softfloat code instead.
> I thought it would be easier to find the benchmark numbers that Emilio
> generated when this was tested, but I can't find it.

I remember those benchmarks too and this is also what the paper Alex 
referred to also confirmed. Also I've found that enabling hardfloat for 
PPC without doing anything else is slightly slower (on a recent CPU, on 
older CPUs could be even slower). Interetingly however it does give a 
speedup for vector instructions (maybe because they don't clear flags 
between each sub operation). Does that mean these vector instruction 
helpers are also buggy regarding exceptions?

> (2) IEEE has a number of implementation choices for corner cases, and we need
> to implement the target's choices, not the host's choices.

But how is that related to inexact flag and float_round_nearest_even 
rounding mode which are the only two things can_use_fpu() function checks 
for?

>> I think CPUs can also raise exceptions when they detect the condition in
>> hardware so maybe we should install our FPU exception handler and set guest
>> flags from that then we don't need to check and won't have problem with these
>> bits either. Why is that not possible or isn't done?
>
> If we have to enable and disable host fpu exceptions going in and out of
> softfloat routines, we're back to modifying the host fpu control word, which as
> described above, is *slow*.
>
>> That handler could only
>> set a global flag on each exception that targets can be checked by targets and
>> handle differences. This global flag then can include non-sticky versions if
>> needed because clearing a global should be less expensive than clearing FPU
>> status reg. But I don't really know, just guessing, somone who knows more about
>> FPUs probably knows a better way.
>
> I don't know if anyone has tried that variant, where we simply leave the
> exceptions enabled, leave the signal handler enabled, and use a global.
>
> Feel free to try it and benchmark it.

I probably won't try any time soon. I have several other half finished 
stuff to hack on to not take up another one I likely can't finish, but 
hope this discussion inspires someone to try it. I'm also interested in 
the results. If nobody tries in the next two years maybe I get there 
eventually.

Regards,
BALATON Zoltan