All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
@ 2014-06-24  4:34 Al Viro
  2014-06-24 16:52 ` Al Viro
  2014-06-24 18:23 ` Richard Henderson
  0 siblings, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-06-24  4:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

	First of all, kudos - with current qemu tree qemu-alpha-system is
working pretty well - debian install and a *lot* of builds work just fine.
As in, getting from lenny to pretty complete squeeze toolchain, including gcj,
openjdk6 and a lot of crap needed to satisfy build-deps of those, plus all
priority:required and most of priority:important ones.  It's a _lot_ of
beating and the damn thing survives - the problems had been with debian
packages themselves (fstatat() bug in lenny libc, epically buggered build-deps
in gcc-defaults, etc.).  As it is, one core of 6-way 3.3GHz phenom II is quite
capable of running a home-grown autobuilder.  Feels like ~250-300MHz alpha
with a very fast disk...

	Remaining problems, AFAICS, are around floating point traps.
I've found one in glibc testsuite (math/tests-misc.c; overflow in
ADDS/SU ends up with wrong results from fetestexcept() - only FE_OVERFLOW is
set, while the sucker expects FE_INEXACT as well and actual hardware sets both)
and another in gcc one (with -funsafe-math-optimizations CVTST/S on denorms
triggers SIGFPE/FPE_FLTINV).

	The libc one is a bug in gen_fp_exc_raise_ignore() - the difference
between ADDS/SU and ADDS/SUI is only in trapping, not storing results in
FPCR.INE and friends.  Both will have the same effect on those and
    if (ignore) {
        tcg_gen_andi_i32(exc, exc, ~ignore);
    }
in gen_fp_exc_raise_ignore() leads to exc & ignore not reaching the
update of env->fpcr_exc_status in helper_fp_exc_raise_s().  See 4.7.8:
[quote]
	In addition, the FPCR gives a summary of each exception type for the 
	exception conditions detected by all IEEE floating-point operates thus
	far, as well as an overall summary bit that indicates whether any of
	these exception conditions has been detected. The indiividual exception
	bits match exactly in purpose and order the exception bits found in the
	exception summary quadword that is pushed for arithmetic traps. However,
	for each instruction, these exception bits arse set independent of the
	trapping mode specified for the instruction. Therefore, even though
	trapping may be disabled for a certain exceptional condition, the fact
	that the exceptional condition was encountered by an instruction is
	still recorded in the FPCR.
[end quote]
And yes, on actual hardware both ADDS/SU and ADDS/SUI set FPCR.INE the same
way - verified by direct experiment.

While we are at it, I'm not quite sure what plain ADDS will leave in FPCR.INE
if it traps on overflow.  It's probably entirely academical, but it might be
worth checking if ELF ABI for Alpha has anything to say about the state seen
by fetestexcept() in SIGFPE handler...

Not sure what's the decent way to fix that - we could, of course, follow that
    tcg_gen_ld8u_i32(exc, cpu_env,
                     offsetof(CPUAlphaState, fp_status.float_exception_flags));
with generating code that would do |= into ->fpcr_exc_status, but I don't
know if we'd blow the footprint to hell by doing so.  Alternatively, we could
do that in helpers called before we start raising exceptions, and I really
wonder what happens with plain CVTQS - do we get FPCR.INE set there anyway?
If so, we really have to do it at least in that helper...  Comments?

	The gcc one comes from the fact that we never set EXC_M_SWC,
whether we come from helper_fp_exc_raise() or from helper_fp_exc_raise_s(),
so kernel-side do_entArith() skips
        if (summary & 1) {
                /* Software-completion summary bit is set, so try to
                   emulate the instruction.  If the processor supports
                   precise exceptions, we don't have to search.  */
                if (!amask(AMASK_PRECISE_TRAP))
                        si_code = alpha_fp_emul(regs->pc - 4);
                else
                        si_code = alpha_fp_emul_imprecise(regs, write_mask);
                if (si_code == 0)
                        return;
        }  
and buggers off to raise SIGFPE.  That's easy to fix, but... we also
get trap PC pointing to offending instruction, not the next one after it,
as 4.7.7.3 would require and as the kernel expects.

	I'm not sure what to do with trap PC - bumping env->pc by 4 after
cpu_restore_state() in dynamic_excp() seems to work, but I'm not at all sure
it's correct; I don't know qemu well enough to tell.

	Anyway, delta that seems to fix the gcc one (gcc.dg/pr28796-2.c from
gcc-4.3 and later) follows.  Again, I'm not at all sure if handling of
env->pc in there is safe from qemu POV and I'd like like to get comments on
that from somebody more familiar with qemu guts.

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index d2d776c..a24535b 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -45,10 +45,10 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 }
 
 static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
-                                       uint32_t exc, uint32_t regno)
+                                       uint32_t exc, uint32_t regno, uint32_t sw)
 {
     if (exc) {
-        uint32_t hw_exc = 0;
+        uint32_t hw_exc = sw;
 
         if (exc & float_flag_invalid) {
             hw_exc |= EXC_M_INV;
@@ -75,7 +75,7 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
    doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno)
 {
-    inline_fp_exc_raise(env, GETPC(), exc, regno);
+    inline_fp_exc_raise(env, GETPC(), exc, regno, 0);
 }
 
 /* Raise exceptions for ieee fp insns with software completion.  */
@@ -84,7 +84,7 @@ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t exc, uint32_t regno)
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~env->fpcr_exc_mask;
-        inline_fp_exc_raise(env, GETPC(), exc, regno);
+        inline_fp_exc_raise(env, GETPC(), exc, regno, EXC_M_SWC);
     }
 }
 
diff --git a/target-alpha/helper.c b/target-alpha/helper.c
index 7c053a3..538c6b2 100644
--- a/target-alpha/helper.c
+++ b/target-alpha/helper.c
@@ -527,6 +527,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, uintptr_t retaddr,
     env->error_code = error;
     if (retaddr) {
         cpu_restore_state(cs, retaddr);
+	env->pc += 4;
     }
     cpu_loop_exit(cs);
 }

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24  4:34 [Qemu-devel] [RFC] alpha qemu arithmetic exceptions Al Viro
@ 2014-06-24 16:52 ` Al Viro
  2014-06-24 18:33   ` Richard Henderson
  2014-06-24 18:23 ` Richard Henderson
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-06-24 16:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jun 24, 2014 at 05:34:23AM +0100, Al Viro wrote:
> 	First of all, kudos - with current qemu tree qemu-alpha-system is
> working pretty well - debian install and a *lot* of builds work just fine.
> As in, getting from lenny to pretty complete squeeze toolchain, including gcj,
> openjdk6 and a lot of crap needed to satisfy build-deps of those, plus all
> priority:required and most of priority:important ones.  It's a _lot_ of
> beating and the damn thing survives - the problems had been with debian
> packages themselves (fstatat() bug in lenny libc, epically buggered build-deps
> in gcc-defaults, etc.).  As it is, one core of 6-way 3.3GHz phenom II is quite
> capable of running a home-grown autobuilder.  Feels like ~250-300MHz alpha
> with a very fast disk...
> 
> 	Remaining problems, AFAICS, are around floating point traps.
> I've found one in glibc testsuite (math/tests-misc.c; overflow in
> ADDS/SU ends up with wrong results from fetestexcept() - only FE_OVERFLOW is
> set, while the sucker expects FE_INEXACT as well and actual hardware sets both)
> and another in gcc one (with -funsafe-math-optimizations CVTST/S on denorms
> triggers SIGFPE/FPE_FLTINV).
> 
> 	The libc one is a bug in gen_fp_exc_raise_ignore() - the difference
> between ADDS/SU and ADDS/SUI is only in trapping, not storing results in
> FPCR.INE and friends.  Both will have the same effect on those and
>     if (ignore) {
>         tcg_gen_andi_i32(exc, exc, ~ignore);
>     }
> in gen_fp_exc_raise_ignore() leads to exc & ignore not reaching the
> update of env->fpcr_exc_status in helper_fp_exc_raise_s().  See 4.7.8:
> [quote]
> 	In addition, the FPCR gives a summary of each exception type for the 
> 	exception conditions detected by all IEEE floating-point operates thus
> 	far, as well as an overall summary bit that indicates whether any of
> 	these exception conditions has been detected. The indiividual exception
> 	bits match exactly in purpose and order the exception bits found in the
> 	exception summary quadword that is pushed for arithmetic traps. However,
> 	for each instruction, these exception bits arse set independent of the
> 	trapping mode specified for the instruction. Therefore, even though
> 	trapping may be disabled for a certain exceptional condition, the fact
> 	that the exceptional condition was encountered by an instruction is
> 	still recorded in the FPCR.
> [end quote]
> And yes, on actual hardware both ADDS/SU and ADDS/SUI set FPCR.INE the same
> way - verified by direct experiment.

BTW, here's another testcase:
nclude <stdio.h>

unsigned long __attribute__((noinline)) f(double x)
{
        return (unsigned long)x;        // SVCTQ/SVC
}

main()
{
        unsigned long x;
        extern unsigned long __ieee_get_fp_control(void);
        printf("before:%lx\n", __ieee_get_fp_control());
        x = f(1ULL<<63);
        printf("after:%lx\n", __ieee_get_fp_control());
        printf("result:%lx\n", x);
}

On actual hardware:
before:0
after:20000
result:8000000000000000

On qemu:
before:0
after:0
result:8000000000000000


IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise().

Can't we simply have separate helpers for various trap suffices, with
all this work on getting exc, etc. taken inside them?  It's not as if
we distinguished many variants, after all...  Right now we have:
	plain, /U, /V
	/S, /SU
	/SUI
	/SV
	/SVI
and /SU should probably be separated from /S - we do want to suppress underflow
traps on those (again, FPCR.UND should be set regardless).  That's what, 5 or 6
helpers?  Might want to separate /V and /U from plain - AFAICS, we get it
wrong with things like ADDS/U vs. ADDS (it's just that normally underflow
traps are disabled by FPCR.DUND).  I hadn't experimented with those yet, but
even if it turns out that they *are* different - 8 helpers instead of the 2 we
currently have, sharing most of the actual source...

Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
rather than EXC_M_UNF?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24  4:34 [Qemu-devel] [RFC] alpha qemu arithmetic exceptions Al Viro
  2014-06-24 16:52 ` Al Viro
@ 2014-06-24 18:23 ` Richard Henderson
  2014-06-24 20:47   ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-06-24 18:23 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/23/2014 09:34 PM, Al Viro wrote:
> 	Anyway, delta that seems to fix the gcc one (gcc.dg/pr28796-2.c from
> gcc-4.3 and later) follows.  Again, I'm not at all sure if handling of
> env->pc in there is safe from qemu POV and I'd like like to get comments on
> that from somebody more familiar with qemu guts.

Thanks for the diagnosis on the gcc test case.  I've been meaning to
investigate some of these edge cases for quite a while and never quite
got there.

>  static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
> -                                       uint32_t exc, uint32_t regno)
> +                                       uint32_t exc, uint32_t regno, uint32_t sw)
>  {
>      if (exc) {
> -        uint32_t hw_exc = 0;
> +        uint32_t hw_exc = sw;
>  
>          if (exc & float_flag_invalid) {
>              hw_exc |= EXC_M_INV;
> @@ -75,7 +75,7 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
>     doesn't apply.  */
>  void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno)
>  {
> -    inline_fp_exc_raise(env, GETPC(), exc, regno);
> +    inline_fp_exc_raise(env, GETPC(), exc, regno, 0);
>  }
>  
>  /* Raise exceptions for ieee fp insns with software completion.  */
> @@ -84,7 +84,7 @@ void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t exc, uint32_t regno)
>      if (exc) {
>          env->fpcr_exc_status |= exc;
>          exc &= ~env->fpcr_exc_mask;
> -        inline_fp_exc_raise(env, GETPC(), exc, regno);
> +        inline_fp_exc_raise(env, GETPC(), exc, regno, EXC_M_SWC);
>      }
>  }

This part looks good.

> diff --git a/target-alpha/helper.c b/target-alpha/helper.c
> index 7c053a3..538c6b2 100644
> --- a/target-alpha/helper.c
> +++ b/target-alpha/helper.c
> @@ -527,6 +527,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, uintptr_t retaddr,
>      env->error_code = error;
>      if (retaddr) {
>          cpu_restore_state(cs, retaddr);
> +	env->pc += 4;

This one needs a different fix, since dynamic_excp is also used from
alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the
address of the memory insn.  But that should be easy to fix up.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 16:52 ` Al Viro
@ 2014-06-24 18:33   ` Richard Henderson
  2014-06-24 20:32     ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-06-24 18:33 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/24/2014 09:52 AM, Al Viro wrote:
> unsigned long __attribute__((noinline)) f(double x)
> {
>         return (unsigned long)x;        // SVCTQ/SVC
> }
> 
> main()
> {
>         unsigned long x;
>         extern unsigned long __ieee_get_fp_control(void);
>         printf("before:%lx\n", __ieee_get_fp_control());
>         x = f(1ULL<<63);
>         printf("after:%lx\n", __ieee_get_fp_control());
>         printf("result:%lx\n", x);
> }
> 
> On actual hardware:
> before:0
> after:20000
> result:8000000000000000
> 
> On qemu:
> before:0
> after:0
> result:8000000000000000
> 
> 
> IOW, gen_fcvttq() is also affected, not only gen_fp_exc_raise().

Clearly a gross misunderstanding of what bits are actually computed, never mind
what gets signaled.

Thanks for the test.  I've not had working hardware for a couple of years to
validate what's supposed to get set and what isn't.

> Can't we simply have separate helpers for various trap suffices, with
> all this work on getting exc, etc. taken inside them?  It's not as if
> we distinguished many variants, after all...  Right now we have:
> 	plain, /U, /V
> 	/S, /SU
> 	/SUI
> 	/SV
> 	/SVI

We used to have separate helpers... at least for the modes that had been
implemented at the time.  The combinatorial explosion ugly though -- 4
different versions of add, sub, etc.  I thought the partial inlining was a
decent solution, as far as maintainability, but it's not unreasonable to disagree.

> Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
> rather than EXC_M_UNF?

No idea.  Should they?


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 18:33   ` Richard Henderson
@ 2014-06-24 20:32     ` Al Viro
  2014-06-24 20:57       ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-06-24 20:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jun 24, 2014 at 11:33:32AM -0700, Richard Henderson wrote:
> >         return (unsigned long)x;        // SVCTQ/SVC
					       CVTTQ/SVC, of course...
> Clearly a gross misunderstanding of what bits are actually computed, never mind
> what gets signaled.
> 
> Thanks for the test.  I've not had working hardware for a couple of years to
> validate what's supposed to get set and what isn't.

If you have any ideas for testing, I do have working hw (the box that is
currently alive is ev45, though; I _can_ try to boot a UP1000 one, but
I make no promises regarding its fans, both in PSU and in CPU module ;-/)

> > Can't we simply have separate helpers for various trap suffices, with
> > all this work on getting exc, etc. taken inside them?  It's not as if
> > we distinguished many variants, after all...  Right now we have:
> > 	plain, /U, /V
> > 	/S, /SU
> > 	/SUI
> > 	/SV
> > 	/SVI
> 
> We used to have separate helpers... at least for the modes that had been
> implemented at the time.  The combinatorial explosion ugly though -- 4
> different versions of add, sub, etc.  I thought the partial inlining was a
> decent solution, as far as maintainability, but it's not unreasonable to disagree.

Um?  No, I mean having gen_fp_exc_raise() generate a call of one of the 8
helpers; gen_ieee_arith3() and friends would remain as-is.  It's just that
instead of generating load to exc, andi, call of helper_fp_exc_raise_s or
call of helper_fp_exc_raise we would generate a call of one of the
helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal
with loading exc, updating ->fpcr_exc_status and generating traps.

> > Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
> > rather than EXC_M_UNF?
> 
> No idea.  Should they?

They seem to - both from the arch.manual and from direct experiment...  And
they do set FPCR.INV at the same time, not just trigger the trap.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 18:23 ` Richard Henderson
@ 2014-06-24 20:47   ` Al Viro
  0 siblings, 0 replies; 68+ messages in thread
From: Al Viro @ 2014-06-24 20:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jun 24, 2014 at 11:23:01AM -0700, Richard Henderson wrote:
> >      env->error_code = error;
> >      if (retaddr) {
> >          cpu_restore_state(cs, retaddr);
> > +	env->pc += 4;
> 
> This one needs a different fix, since dynamic_excp is also used from
> alpha_cpu_unassigned_access, and I'm pretty sure the mchk should have the
> address of the memory insn.  But that should be easy to fix up.

That's not a problem, actually - there we have
    dynamic_excp(env, 0, EXCP_MCHK, 0);
so retaddr is going to be 0 and that env->pc += 4 won't be reached at all...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 20:32     ` Al Viro
@ 2014-06-24 20:57       ` Richard Henderson
  2014-06-24 21:24         ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-06-24 20:57 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/24/2014 01:32 PM, Al Viro wrote:
> If you have any ideas for testing, I do have working hw (the box that is
> currently alive is ev45, though; I _can_ try to boot a UP1000 one, but
> I make no promises regarding its fans, both in PSU and in CPU module ;-/)

Ah.  Gotta be careful with ev4/45... half of the fpu is unimplemented, and so
if you're not careful all you're testing is the kernel emulation behaviour.

> Um?  No, I mean having gen_fp_exc_raise() generate a call of one of the 8
> helpers; gen_ieee_arith3() and friends would remain as-is.  It's just that
> instead of generating load to exc, andi, call of helper_fp_exc_raise_s or
> call of helper_fp_exc_raise we would generate a call of one of the
> helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal
> with loading exc, updating ->fpcr_exc_status and generating traps.

Ah, I getcha.  Yes, that makes sense.

>>> Another thing: shouldn't arithmetics on denorms without /S raise EXC_M_INV,
>>> rather than EXC_M_UNF?
>>
>> No idea.  Should they?
> 
> They seem to - both from the arch.manual and from direct experiment...  And
> they do set FPCR.INV at the same time, not just trigger the trap.

Ok.  I'll try to make time to fix up some of this stuff this weekend.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 20:57       ` Richard Henderson
@ 2014-06-24 21:24         ` Al Viro
  2014-06-24 21:32           ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-06-24 21:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jun 24, 2014 at 01:57:52PM -0700, Richard Henderson wrote:
> On 06/24/2014 01:32 PM, Al Viro wrote:
> > If you have any ideas for testing, I do have working hw (the box that is
> > currently alive is ev45, though; I _can_ try to boot a UP1000 one, but
> > I make no promises regarding its fans, both in PSU and in CPU module ;-/)
> 
> Ah.  Gotta be careful with ev4/45... half of the fpu is unimplemented, and so
> if you're not careful all you're testing is the kernel emulation behaviour.

*nod*

> > Um?  No, I mean having gen_fp_exc_raise() generate a call of one of the 8
> > helpers; gen_ieee_arith3() and friends would remain as-is.  It's just that
> > instead of generating load to exc, andi, call of helper_fp_exc_raise_s or
> > call of helper_fp_exc_raise we would generate a call of one of the
> > helper_fp_exc_raise{,_u,_v,_s,_su,_sui,_sv,_svi} and let that sucker deal
> > with loading exc, updating ->fpcr_exc_status and generating traps.
> 
> Ah, I getcha.  Yes, that makes sense.

FWIW, the crudest version of that is simply
+    env->fpcr_exc_status |= (uint8_t)env->fp_status.float_exception_flags;
in the very beginning of helper_fp_exc_raise_s().  And yes, it recovers
math/tests-misc.c, even though it's obviously not a good final fix.

Al, off to figure out the black magic TCG is using to generate calls...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 21:24         ` Al Viro
@ 2014-06-24 21:32           ` Richard Henderson
  2014-06-25  7:01             ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-06-24 21:32 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/24/2014 02:24 PM, Al Viro wrote:
> Al, off to figure out the black magic TCG is using to generate calls...

If you've a helper

DEF_HELPER_1(halt, void, i64)

then

  gen_helper_halt(...)

will generate the tcg ops that result in the call.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-24 21:32           ` Richard Henderson
@ 2014-06-25  7:01             ` Al Viro
  2014-06-25  9:27               ` Peter Maydell
  2014-06-26  5:55               ` Al Viro
  0 siblings, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-06-25  7:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote:
> On 06/24/2014 02:24 PM, Al Viro wrote:
> > Al, off to figure out the black magic TCG is using to generate calls...
> 
> If you've a helper
> 
> DEF_HELPER_1(halt, void, i64)
> 
> then
> 
>   gen_helper_halt(...)
> 
> will generate the tcg ops that result in the call.

Another fun issue:

CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV
instead of IOV.

	* it misses reporting overflows for case when it *knows* that
	  overflow will happen - the need to shift up by more than 63 bits.
	  Trivially fixed, of course.  There overflow cases leave wrong
	  result as well - should be 0.
	* it also misses reporting overflows for case when value is in
	  ranges 2^63..2^64-1 and -2^64+1..-2^63-1.  And yes, it's
	  asymmetric - 2^63 is an overflow, -2^63 isn't.
	* overflow is reported by float_raise(float_flag_overflow, &FP_STATUS).
	  Wrong flag - it should be IOV, not FOV.  And it should be set
	  in FPCR regardless of the trap modifier (IOV, this VI thing is
	  wrong - we should deal with that only when we generate a trap).
	* All overflow cases should raise INE as well.

Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
unused -
enum {
    float_flag_invalid   =  1,
    float_flag_divbyzero =  4,
    float_flag_overflow  =  8,
    float_flag_underflow = 16,
    float_flag_inexact   = 32,
    float_flag_input_denormal = 64,
    float_flag_output_denormal = 128
};

That would allow to deal with that crap nicely - we could have it raise
the new flag, then have helper_fp_exc_raise_... for default trap mode
mask it out (and yes, we need to set FPCR flags in default mode, as well
as /U and /V - confirmed by direct experiment *and* by TFM).

If we can't... well, we could put that flag separately, but it would be
more unpleasant.  Folks?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-25  7:01             ` Al Viro
@ 2014-06-25  9:27               ` Peter Maydell
  2014-06-25 14:26                 ` Al Viro
  2014-06-26  5:55               ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2014-06-25  9:27 UTC (permalink / raw)
  To: Al Viro; +Cc: QEMU Developers, Richard Henderson

On 25 June 2014 08:01, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
> unused -
> enum {
>     float_flag_invalid   =  1,
>     float_flag_divbyzero =  4,
>     float_flag_overflow  =  8,
>     float_flag_underflow = 16,
>     float_flag_inexact   = 32,
>     float_flag_input_denormal = 64,
>     float_flag_output_denormal = 128
> };
>
> That would allow to deal with that crap nicely - we could have it raise
> the new flag, then have helper_fp_exc_raise_... for default trap mode
> mask it out (and yes, we need to set FPCR flags in default mode, as well
> as /U and /V - confirmed by direct experiment *and* by TFM).
>
> If we can't... well, we could put that flag separately, but it would be
> more unpleasant.  Folks?

I think it's OK to put extra float_flags in, provided you can define
their semantics in terms that make sense for more than one
architecture (even if only one arch actually happens to need them).
The input_denormal/output_denormal flags only get used for ARM,
for instance. However if you wanted to split overflow from integer
overflow you'd need to fix up all the other targets which expect
them to generate just one exception flag...

(Note that any patch touching softfloat files needs to come with
a statement that you're happy to license it under either the
softfloat-2a or softfloat-2b licenses, because we're currently
midway through the tedious process of trying to relicense it.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-25  9:27               ` Peter Maydell
@ 2014-06-25 14:26                 ` Al Viro
  2014-06-25 17:41                   ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-06-25 14:26 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

On Wed, Jun 25, 2014 at 10:27:11AM +0100, Peter Maydell wrote:

> I think it's OK to put extra float_flags in, provided you can define
> their semantics in terms that make sense for more than one
> architecture (even if only one arch actually happens to need them).
> The input_denormal/output_denormal flags only get used for ARM,
> for instance. However if you wanted to split overflow from integer
> overflow you'd need to fix up all the other targets which expect
> them to generate just one exception flag...

Hmm...  On alpha it's generated only by the following: CVTTQ, CVTGQ,
CVTQL.  I.e. conversions to integer formats that can be held in FPU
registers (double -> s64, VAX double -> s64 and s64 -> s32).  Does
softfloat even have anything similar?  As it is, it's all in alpha-specific
code; double -> s64 might have a chance to be generic (semantics:
	* denorms -> 0, raise "inexact", provided that they survived to
that point and hadn't buggered off with "invalid"
	* exact integers in range -2^63 .. 2^63-1 -> equivalent 64bit
integer
	* values outside of that range (all with zero fractional part,
since the weight of LSB of significand will be considerably greater than 1
by that point) -> lower 64 bits of value, raise "integer overflow" and
"inexact".
	* values with non-zero fractional part -> rounded according to
rounding mode, raise "inexact".
), but existing float64_to_int64() isn't it - very different behaviour
on overflows.  Incidentally, VAX double to s64 is buggered in that area -
it *does* try to use float64_to_int64() and, on top of getting INV instead
of IOV, gets the wrong result in case of overflow (MAX_LONG/MIN_LONG instead
of value in -2^63..2^63-1 comparable modulo 2^64 with exact value taken
as element of $\Bbb Z$).

And s64->s32 is just plain weird - not in the part that has IOV raised on
values outside of -2^31..2^31-1, but in the bit shuffling it's doing if
the test passes; alpha FPU stores s32 value in bits 63-62/58-29, with the
rest filled with zeroes.

In any case, it's not splitting float_overflow_flag; similar cases in
softfloat.c raise float_invalid_flag.  I don't know if it would make
sense to try and teach float64_to_int64() about this kind of return
value on overflow...

> (Note that any patch touching softfloat files needs to come with
> a statement that you're happy to license it under either the
> softfloat-2a or softfloat-2b licenses, because we're currently
> midway through the tedious process of trying to relicense it.)

Wouldn't be a problem, but I doubt that it would be particulary useful to touch
softfloat.c due to the reasons above, anyway.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-25 14:26                 ` Al Viro
@ 2014-06-25 17:41                   ` Peter Maydell
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2014-06-25 17:41 UTC (permalink / raw)
  To: Al Viro; +Cc: QEMU Developers, Richard Henderson

On 25 June 2014 15:26, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Hmm...  On alpha it's generated only by the following: CVTTQ, CVTGQ,
> CVTQL.  I.e. conversions to integer formats that can be held in FPU
> registers (double -> s64, VAX double -> s64 and s64 -> s32).  Does
> softfloat even have anything similar?

Well, VAX doubles are a bit out of scope for an IEEE emulation
library :-)

>  As it is, it's all in alpha-specific code;

It does sound like that's the best place for it. In that case, you
don't want to add a flag to the softfloat float_flags -- they are
specifically for indicating softfloat's status/exceptions. Flags
handled purely in CPU-specific code should be stored in the
CPU specific state struct somewhere.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-25  7:01             ` Al Viro
  2014-06-25  9:27               ` Peter Maydell
@ 2014-06-26  5:55               ` Al Viro
  2014-06-30 18:39                 ` Richard Henderson
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-06-26  5:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Wed, Jun 25, 2014 at 08:01:17AM +0100, Al Viro wrote:
> On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote:
> > On 06/24/2014 02:24 PM, Al Viro wrote:
> > > Al, off to figure out the black magic TCG is using to generate calls...
> > 
> > If you've a helper
> > 
> > DEF_HELPER_1(halt, void, i64)
> > 
> > then
> > 
> >   gen_helper_halt(...)
> > 
> > will generate the tcg ops that result in the call.
> 
> Another fun issue:
> 
> CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV
> instead of IOV.
> 
> 	* it misses reporting overflows for case when it *knows* that
> 	  overflow will happen - the need to shift up by more than 63 bits.
> 	  Trivially fixed, of course.  There overflow cases leave wrong
> 	  result as well - should be 0.
> 	* it also misses reporting overflows for case when value is in
> 	  ranges 2^63..2^64-1 and -2^64+1..-2^63-1.  And yes, it's
> 	  asymmetric - 2^63 is an overflow, -2^63 isn't.
> 	* overflow is reported by float_raise(float_flag_overflow, &FP_STATUS).
> 	  Wrong flag - it should be IOV, not FOV.  And it should be set
> 	  in FPCR regardless of the trap modifier (IOV, this VI thing is
> 	  wrong - we should deal with that only when we generate a trap).
> 	* All overflow cases should raise INE as well.
> 
> Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
> unused -
> enum {
>     float_flag_invalid   =  1,
>     float_flag_divbyzero =  4,
>     float_flag_overflow  =  8,
>     float_flag_underflow = 16,
>     float_flag_inexact   = 32,
>     float_flag_input_denormal = 64,
>     float_flag_output_denormal = 128
> };
> 
> That would allow to deal with that crap nicely - we could have it raise
> the new flag, then have helper_fp_exc_raise_... for default trap mode
> mask it out (and yes, we need to set FPCR flags in default mode, as well
> as /U and /V - confirmed by direct experiment *and* by TFM).

OK, I've managed to resurrect UP1000 box (FSVO resurrect - the southbridge
DMA controller has always been buggered, with intermittent noise on one of
the data lines; fans in CPU module are FUBAR as well - 17 and 20 RPM resp.,
so I don't risk keeping it running for long, etc.)

Still, that allows to test EV67 and I hope to resurrect a DS10 box as well,
which will allow for saner testing environment.

Current delta follows, fixing gcc and libc testcases *and* AFAICS getting
CVTTQ handling in line with what actual EV67 is doing.  It's a dirty hack
wrt float_raise() - relies on bit 1 never being raised by softfpu.c.  I'll
look into separating that bit, but it'll probably have non-zero costs ;-/
We need two flags - "has IOV been raised during this insn" (in this patch
it's bit 1 of fp_status.float_exception_flags, cleaned along with those)
and something to keep FPCR.IOV in (in this patch - bit 1 of fpcr_exc_status).
Sure, we can add another uint8_t or two in struct CPUAlphaState, but that'll
mean extra PITA in code and extra memory accesses...

Review would be welcome.

diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
index d9b861f..047b9a2 100644
--- a/target-alpha/cpu.h
+++ b/target-alpha/cpu.h
@@ -152,6 +152,10 @@ enum {
     FP_ROUND_DYNAMIC = 0x3,
 };
 
+enum {
+    float_flag_IOV = 2,
+};
+
 /* FPCR bits */
 #define FPCR_SUM		(1ULL << 63)
 #define FPCR_INED		(1ULL << 62)
diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index d2d776c..2b39ea4 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -45,10 +45,11 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 }
 
 static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
-                                       uint32_t exc, uint32_t regno)
+                                       uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
     if (exc) {
-        uint32_t hw_exc = 0;
+	if (hw_exc)
+            exc &= ~env->fpcr_exc_mask;
 
         if (exc & float_flag_invalid) {
             hw_exc |= EXC_M_INV;
@@ -65,6 +66,9 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
         if (exc & float_flag_inexact) {
             hw_exc |= EXC_M_INE;
         }
+        if (exc & float_flag_IOV) {
+            hw_exc |= EXC_M_IOV;
+        }
 
         arith_excp(env, retaddr, hw_exc, 1ull << regno);
     }
@@ -73,18 +77,21 @@ static inline void inline_fp_exc_raise(CPUAlphaState *env, uintptr_t retaddr,
 /* Raise exceptions for ieee fp insns without software completion.
    In that case there are no exceptions that don't trap; the mask
    doesn't apply.  */
-void helper_fp_exc_raise(CPUAlphaState *env, uint32_t exc, uint32_t regno)
+void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    inline_fp_exc_raise(env, GETPC(), exc, regno);
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    if (exc) {
+	env->fpcr_exc_status |= exc;
+	inline_fp_exc_raise(env, GETPC(), exc & ~ignore, regno, 0);
+    }
 }
 
-/* Raise exceptions for ieee fp insns with software completion.  */
-void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t exc, uint32_t regno)
+void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
     if (exc) {
-        env->fpcr_exc_status |= exc;
-        exc &= ~env->fpcr_exc_mask;
-        inline_fp_exc_raise(env, GETPC(), exc, regno);
+	env->fpcr_exc_status |= exc;
+        inline_fp_exc_raise(env, GETPC(), exc & ~ignore, regno, EXC_M_SWC);
     }
 }
 
@@ -682,14 +689,12 @@ uint64_t helper_cvtqs(CPUAlphaState *env, uint64_t a)
     return float32_to_s(fr);
 }
 
-/* Implement float64 to uint64 conversion without saturation -- we must
+/* Implement float64 to int64 conversion without saturation -- we must
    supply the truncated result.  This behaviour is used by the compiler
-   to get unsigned conversion for free with the same instruction.
-
-   The VI flag is set when overflow or inexact exceptions should be raised.  */
+   to get unsigned conversion for free with the same instruction. */
 
 static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
-                                    int roundmode, int VI)
+                                    int roundmode)
 {
     uint64_t frac, ret = 0;
     uint32_t exp, sign, exc = 0;
@@ -704,7 +709,7 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
             goto do_underflow;
         }
     } else if (exp == 0x7ff) {
-        exc = (frac ? float_flag_invalid : VI ? float_flag_overflow : 0);
+        exc = frac ? float_flag_invalid : float_flag_IOV | float_flag_inexact;
     } else {
         /* Restore implicit bit.  */
         frac |= 0x10000000000000ull;
@@ -715,10 +720,12 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
                the fraction left.  There is no rounding to do.  */
             if (shift < 63) {
                 ret = frac << shift;
-                if (VI && (ret >> shift) != frac) {
-                    exc = float_flag_overflow;
+                if ((ret >> shift) != frac || (int64_t)(ret-sign) < 0) {
+                    exc = float_flag_IOV | float_flag_inexact;
                 }
-            }
+            } else {
+		exc = float_flag_IOV | float_flag_inexact;
+	    }
         } else {
             uint64_t round;
 
@@ -739,7 +746,7 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
             }
 
             if (round) {
-                exc = (VI ? float_flag_inexact : 0);
+                exc = float_flag_inexact;
                 switch (roundmode) {
                 case float_round_nearest_even:
                     if (round == (1ull << 63)) {
@@ -773,17 +780,12 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
 
 uint64_t helper_cvttq(CPUAlphaState *env, uint64_t a)
 {
-    return inline_cvttq(env, a, FP_STATUS.float_rounding_mode, 1);
+    return inline_cvttq(env, a, FP_STATUS.float_rounding_mode);
 }
 
 uint64_t helper_cvttq_c(CPUAlphaState *env, uint64_t a)
 {
-    return inline_cvttq(env, a, float_round_to_zero, 0);
-}
-
-uint64_t helper_cvttq_svic(CPUAlphaState *env, uint64_t a)
-{
-    return inline_cvttq(env, a, float_round_to_zero, 1);
+    return inline_cvttq(env, a, float_round_to_zero);
 }
 
 uint64_t helper_cvtqt(CPUAlphaState *env, uint64_t a)
diff --git a/target-alpha/helper.c b/target-alpha/helper.c
index 7c053a3..82b1040 100644
--- a/target-alpha/helper.c
+++ b/target-alpha/helper.c
@@ -36,6 +36,9 @@ uint64_t cpu_alpha_load_fpcr (CPUAlphaState *env)
         if (t & float_flag_invalid) {
             r |= FPCR_INV;
         }
+        if (t & float_flag_IOV) {
+            r |= FPCR_IOV;
+	}
         if (t & float_flag_divbyzero) {
             r |= FPCR_DZE;
         }
@@ -103,6 +106,9 @@ void cpu_alpha_store_fpcr (CPUAlphaState *env, uint64_t val)
     if (val & FPCR_INV) {
         t |= float_flag_invalid;
     }
+    if (val & FPCR_IOV) {
+        t |= float_flag_IOV;
+    }
     if (val & FPCR_DZE) {
         t |= float_flag_divbyzero;
     }
@@ -527,6 +533,7 @@ void QEMU_NORETURN dynamic_excp(CPUAlphaState *env, uintptr_t retaddr,
     env->error_code = error;
     if (retaddr) {
         cpu_restore_state(cs, retaddr);
+	env->pc += 4;
     }
     cpu_loop_exit(cs);
 }
diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index a451cfe..173db01 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -83,7 +83,6 @@ DEF_HELPER_FLAGS_2(cvtqg, TCG_CALL_NO_RWG, i64, env, i64)
 
 DEF_HELPER_FLAGS_2(cvttq, TCG_CALL_NO_RWG, i64, env, i64)
 DEF_HELPER_FLAGS_2(cvttq_c, TCG_CALL_NO_RWG, i64, env, i64)
-DEF_HELPER_FLAGS_2(cvttq_svic, TCG_CALL_NO_RWG, i64, env, i64)
 
 DEF_HELPER_FLAGS_2(setroundmode, TCG_CALL_NO_RWG, void, env, i32)
 DEF_HELPER_FLAGS_2(setflushzero, TCG_CALL_NO_RWG, void, env, i32)
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index cc81e77..e7b0f0d 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -672,26 +672,22 @@ static void gen_fp_exc_clear(void)
 #endif
 }
 
-static void gen_fp_exc_raise_ignore(int rc, int fn11, int ignore)
+static void gen_fp_exc_raise(int rc, int fn11)
 {
     /* ??? We ought to be able to do something with imprecise exceptions.
        E.g. notice we're still in the trap shadow of something within the
        TB and do not generate the code to signal the exception; end the TB
        when an exception is forced to arrive, either by consumption of a
        register value or TRAPB or EXCB.  */
-    TCGv_i32 exc = tcg_temp_new_i32();
     TCGv_i32 reg;
+    TCGv_i32 ign;
+    uint32_t ignore = float_flag_inexact;
+    if (!(fn11 & QUAL_U))
+	ignore |= float_flag_underflow | float_flag_IOV;
+    if (fn11 & QUAL_I)
+	ignore &= ~float_flag_inexact;
 
-#if defined(CONFIG_SOFTFLOAT_INLINE)
-    tcg_gen_ld8u_i32(exc, cpu_env,
-                     offsetof(CPUAlphaState, fp_status.float_exception_flags));
-#else
-    gen_helper_fp_exc_get(exc, cpu_env);
-#endif
-
-    if (ignore) {
-        tcg_gen_andi_i32(exc, exc, ~ignore);
-    }
+    ign = tcg_const_i32(ignore);
 
     /* ??? Pass in the regno of the destination so that the helper can
        set EXC_MASK, which contains a bitmask of destination registers
@@ -701,18 +697,13 @@ static void gen_fp_exc_raise_ignore(int rc, int fn11, int ignore)
     reg = tcg_const_i32(rc + 32);
 
     if (fn11 & QUAL_S) {
-        gen_helper_fp_exc_raise_s(cpu_env, exc, reg);
+        gen_helper_fp_exc_raise_s(cpu_env, ign, reg);
     } else {
-        gen_helper_fp_exc_raise(cpu_env, exc, reg);
+        gen_helper_fp_exc_raise(cpu_env, ign, reg);
     }
 
+    tcg_temp_free_i32(ign);
     tcg_temp_free_i32(reg);
-    tcg_temp_free_i32(exc);
-}
-
-static inline void gen_fp_exc_raise(int rc, int fn11)
-{
-    gen_fp_exc_raise_ignore(rc, fn11, fn11 & QUAL_I ? 0 : float_flag_inexact);
 }
 
 static void gen_fcvtlq(TCGv vc, TCGv vb)
@@ -773,7 +764,6 @@ IEEE_ARITH2(cvtts)
 static void gen_fcvttq(DisasContext *ctx, int rb, int rc, int fn11)
 {
     TCGv vb, vc;
-    int ignore = 0;
 
     /* No need to set flushzero, since we have an integer output.  */
     gen_fp_exc_clear();
@@ -782,26 +772,13 @@ static void gen_fcvttq(DisasContext *ctx, int rb, int rc, int fn11)
 
     /* Almost all integer conversions use cropped rounding, and most
        also do not have integer overflow enabled.  Special case that.  */
-    switch (fn11) {
-    case QUAL_RM_C:
+    if ((fn11 & QUAL_RM_MASK) == QUAL_RM_C) {
         gen_helper_cvttq_c(vc, cpu_env, vb);
-        break;
-    case QUAL_V | QUAL_RM_C:
-    case QUAL_S | QUAL_V | QUAL_RM_C:
-        ignore = float_flag_inexact;
-        /* FALLTHRU */
-    case QUAL_S | QUAL_V | QUAL_I | QUAL_RM_C:
-        gen_helper_cvttq_svic(vc, cpu_env, vb);
-        break;
-    default:
+    } else {
         gen_qual_roundmode(ctx, fn11);
         gen_helper_cvttq(vc, cpu_env, vb);
-        ignore |= (fn11 & QUAL_V ? 0 : float_flag_overflow);
-        ignore |= (fn11 & QUAL_I ? 0 : float_flag_inexact);
-        break;
     }
-
-    gen_fp_exc_raise_ignore(rc, fn11, ignore);
+    gen_fp_exc_raise(rc, fn11);
 }
 
 static void gen_ieee_intcvt(DisasContext *ctx,

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-26  5:55               ` Al Viro
@ 2014-06-30 18:39                 ` Richard Henderson
  2014-06-30 20:56                   ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-06-30 18:39 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/25/2014 10:55 PM, Al Viro wrote:
> On Wed, Jun 25, 2014 at 08:01:17AM +0100, Al Viro wrote:
>> On Tue, Jun 24, 2014 at 02:32:46PM -0700, Richard Henderson wrote:
>>> On 06/24/2014 02:24 PM, Al Viro wrote:
>>>> Al, off to figure out the black magic TCG is using to generate calls...
>>>
>>> If you've a helper
>>>
>>> DEF_HELPER_1(halt, void, i64)
>>>
>>> then
>>>
>>>   gen_helper_halt(...)
>>>
>>> will generate the tcg ops that result in the call.
>>
>> Another fun issue:
>>
>> CVTTQ is both underreporting the overflow *AND* reports the wrong kind - FOV
>> instead of IOV.
>>
>> 	* it misses reporting overflows for case when it *knows* that
>> 	  overflow will happen - the need to shift up by more than 63 bits.
>> 	  Trivially fixed, of course.  There overflow cases leave wrong
>> 	  result as well - should be 0.
>> 	* it also misses reporting overflows for case when value is in
>> 	  ranges 2^63..2^64-1 and -2^64+1..-2^63-1.  And yes, it's
>> 	  asymmetric - 2^63 is an overflow, -2^63 isn't.
>> 	* overflow is reported by float_raise(float_flag_overflow, &FP_STATUS).
>> 	  Wrong flag - it should be IOV, not FOV.  And it should be set
>> 	  in FPCR regardless of the trap modifier (IOV, this VI thing is
>> 	  wrong - we should deal with that only when we generate a trap).
>> 	* All overflow cases should raise INE as well.
>>
>> Could we steal bit 1 in float_exception_flags for IOV?  It is (currently?)
>> unused -
>> enum {
>>     float_flag_invalid   =  1,
>>     float_flag_divbyzero =  4,
>>     float_flag_overflow  =  8,
>>     float_flag_underflow = 16,
>>     float_flag_inexact   = 32,
>>     float_flag_input_denormal = 64,
>>     float_flag_output_denormal = 128
>> };
>>
>> That would allow to deal with that crap nicely - we could have it raise
>> the new flag, then have helper_fp_exc_raise_... for default trap mode
>> mask it out (and yes, we need to set FPCR flags in default mode, as well
>> as /U and /V - confirmed by direct experiment *and* by TFM).
> 
> OK, I've managed to resurrect UP1000 box (FSVO resurrect - the southbridge
> DMA controller has always been buggered, with intermittent noise on one of
> the data lines; fans in CPU module are FUBAR as well - 17 and 20 RPM resp.,
> so I don't risk keeping it running for long, etc.)
> 
> Still, that allows to test EV67 and I hope to resurrect a DS10 box as well,
> which will allow for saner testing environment.
> 
> Current delta follows, fixing gcc and libc testcases *and* AFAICS getting
> CVTTQ handling in line with what actual EV67 is doing.  It's a dirty hack
> wrt float_raise() - relies on bit 1 never being raised by softfpu.c.  I'll
> look into separating that bit, but it'll probably have non-zero costs ;-/
> We need two flags - "has IOV been raised during this insn" (in this patch
> it's bit 1 of fp_status.float_exception_flags, cleaned along with those)
> and something to keep FPCR.IOV in (in this patch - bit 1 of fpcr_exc_status).
> Sure, we can add another uint8_t or two in struct CPUAlphaState, but that'll
> mean extra PITA in code and extra memory accesses...
> 
> Review would be welcome.

Looks good.

I've split it up into a couple of smaller patches, made some sylistic tweaks
and pushed it to

  git://github.com/rth7680/qemu.git axp-next

I'm starting to do some testing now, but a glance though would be helpful.
Especially to see if I didn't make some silly mistake in the process.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-30 18:39                 ` Richard Henderson
@ 2014-06-30 20:56                   ` Al Viro
  2014-07-01  4:34                     ` Al Viro
  2014-07-01 17:03                     ` Richard Henderson
  0 siblings, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-06-30 20:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote:

> Looks good.
> 
> I've split it up into a couple of smaller patches, made some sylistic tweaks
> and pushed it to
> 
>   git://github.com/rth7680/qemu.git axp-next
> 
> I'm starting to do some testing now, but a glance though would be helpful.
> Especially to see if I didn't make some silly mistake in the process.

The only problem I see at a glance is that CVTTQ should raise IOV|INE in
ranges 2^63..2^64-1 and -2^64+1..-2^63-1 as well.  That's what this
	|| ((int64_t)(ret-sign) < 0)
thing there was about and yes, it does match the behaviour of actual hardware
(verified both on EV45 and EV67).

FWIW, it might be better to do what float64_to_int64_round_to_zero() is doing -
i.e.
	if (shift >= 0) {
		if (shift < 64)
			ret = frac << shift;
		if (shift < 11 || a == LIT64(0xC3E0000000000000))
			exc = 0;
	}
since frac is between 1ULL<<52 and (1ULL<<53)-1, i.e. shift greater than 11
is guaranteed to overflow, shift less than 11 is guaranteed not to and shift
exactly 11 won't overflow only in one case - frac == 1ULL<<52, sign = 1 (i.e.
when we have -2^63 there).  BTW, shift == 63 is interesting - we certainly
overflow, but we want the result to be 0 or 2^63 depending on the least
significant bit of mantissa, not "always 0".  IOW, 0x4720000000000000 should
yield IOV|INE, with result being 0 and 0x4720000000000001 - IOV|INE and
result 0x8000000000000000.  Again, verified on actual hardware; the last
patch I posted had been incorrect in the last case (both cases yield 0 with it,
same as in mainline qemu).

Incremental on top of your branch would be

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -722,12 +722,10 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
             /* In this case the number is so large that we must shift
                the fraction left.  There is no rounding to do.  */
             exc = float_flag_int_overflow | float_flag_inexact;
-            if (shift < 63) {
-                ret = frac << shift;
-                if ((ret >> shift) == frac) {
-                    exc = 0;
-                }
-            }
+	    if (shift < 64)
+		ret = frac << shift;
+            if (shift < 11 || a == LIT64( 0xC3E0000000000000))
+                exc = 0;
         } else {
             uint64_t round;
 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-30 20:56                   ` Al Viro
@ 2014-07-01  4:34                     ` Al Viro
  2014-07-01  5:00                       ` Al Viro
  2014-07-01 14:31                       ` Richard Henderson
  2014-07-01 17:03                     ` Richard Henderson
  1 sibling, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-07-01  4:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Jun 30, 2014 at 09:56:35PM +0100, Al Viro wrote:
> FWIW, it might be better to do what float64_to_int64_round_to_zero() is doing -
> i.e.
> 	if (shift >= 0) {
> 		if (shift < 64)
> 			ret = frac << shift;
> 		if (shift < 11 || a == LIT64(0xC3E0000000000000))
> 			exc = 0;
> 	}
> since frac is between 1ULL<<52 and (1ULL<<53)-1, i.e. shift greater than 11
> is guaranteed to overflow, shift less than 11 is guaranteed not to and shift
> exactly 11 won't overflow only in one case - frac == 1ULL<<52, sign = 1 (i.e.
> when we have -2^63 there).  BTW, shift == 63 is interesting - we certainly
> overflow, but we want the result to be 0 or 2^63 depending on the least
> significant bit of mantissa, not "always 0".  IOW, 0x4720000000000000 should
> yield IOV|INE, with result being 0 and 0x4720000000000001 - IOV|INE and
> result 0x8000000000000000.  Again, verified on actual hardware; the last
> patch I posted had been incorrect in the last case (both cases yield 0 with it,
> same as in mainline qemu).

While we are at it, CVTTQ yields INV on +-infinity, just as it does for NaNs.
IOW, in inline_cvttq()
        exc = (frac ? float_flag_invalid
               : float_flag_int_overflow | float_flag_inexact);
should be simply
	exc = float_flag_invalid;

VAX operations are serious mess, but I'm not sure if we have them actually
used anywhere in Linux kernel or userland.  Always possible, of course, but...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01  4:34                     ` Al Viro
@ 2014-07-01  5:00                       ` Al Viro
  2014-07-01 14:31                       ` Richard Henderson
  1 sibling, 0 replies; 68+ messages in thread
From: Al Viro @ 2014-07-01  5:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jul 01, 2014 at 05:34:45AM +0100, Al Viro wrote:

> VAX operations are serious mess, but I'm not sure if we have them actually
> used anywhere in Linux kernel or userland.  Always possible, of course, but...

Grr...  Truncated mail, sorry.  Missing part:

_If_ we decide that we want CVTGQ working correctly, we'll have the following
pile of fun:
	* it needs non-saturating overflow handling, same as cvttq
	* it needs different rounding for CVTGQ and CVTGQ/C
	* CVTGQ/S needs EXC_M_SWC in the word fed to trap in INV case (i.e.
when we see dirty zero or reserved).  I think the right way to do it is to
have it use float_raise() and finish with something similar to
gen_fp_exc_raise(), except that...
	* VAX insns need a slightly different trap handling - fpcr_exc_mask
is IEEE-only.
	* g_to_float64() isn't quite right here - we want e.g. 2^-1023 to
result in 0 *and* we want inexact raised.  As it is, we'll end up with
exact 0.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01  4:34                     ` Al Viro
  2014-07-01  5:00                       ` Al Viro
@ 2014-07-01 14:31                       ` Richard Henderson
  1 sibling, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2014-07-01 14:31 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/30/2014 09:34 PM, Al Viro wrote:
> VAX operations are serious mess, but I'm not sure if we have them actually
> used anywhere in Linux kernel or userland.  Always possible, of course, but...

As far as I know, vax insns aren't used anywhere.  If I were doing this port
from scratch I'd leave them totally stubbed out.

Let's not spend any kind of effort on this at all until other more important
things are improved.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-06-30 20:56                   ` Al Viro
  2014-07-01  4:34                     ` Al Viro
@ 2014-07-01 17:03                     ` Richard Henderson
  2014-07-01 17:50                       ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-01 17:03 UTC (permalink / raw)
  To: Al Viro; +Cc: qemu-devel

On 06/30/2014 01:56 PM, Al Viro wrote:
> On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote:
> 
>> Looks good.
>>
>> I've split it up into a couple of smaller patches, made some sylistic tweaks
>> and pushed it to
>>
>>   git://github.com/rth7680/qemu.git axp-next
>>
>> I'm starting to do some testing now, but a glance though would be helpful.
>> Especially to see if I didn't make some silly mistake in the process.

Hmm.  I've just gotten through glibc testing and there are quite a few failures
of the form

Failure: nearbyint (4.5): Exception "Inexact" set

in math/test-double.out.

Any chance you can run the glibc math tests against real hardware and see if
these pass?  I have a feeling that qemu is now signaling inexact when the
hardware doesn't for "/SU" (but not /SUI) instructions.



r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01 17:03                     ` Richard Henderson
@ 2014-07-01 17:50                       ` Al Viro
  2014-07-01 18:23                         ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-01 17:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Tue, Jul 01, 2014 at 10:03:06AM -0700, Richard Henderson wrote:
> On 06/30/2014 01:56 PM, Al Viro wrote:
> > On Mon, Jun 30, 2014 at 11:39:43AM -0700, Richard Henderson wrote:
> > 
> >> Looks good.
> >>
> >> I've split it up into a couple of smaller patches, made some sylistic tweaks
> >> and pushed it to
> >>
> >>   git://github.com/rth7680/qemu.git axp-next
> >>
> >> I'm starting to do some testing now, but a glance though would be helpful.
> >> Especially to see if I didn't make some silly mistake in the process.
> 
> Hmm.  I've just gotten through glibc testing and there are quite a few failures
> of the form
> 
> Failure: nearbyint (4.5): Exception "Inexact" set
> 
> in math/test-double.out.
> 
> Any chance you can run the glibc math tests against real hardware and see if
> these pass?  I have a feeling that qemu is now signaling inexact when the
> hardware doesn't for "/SU" (but not /SUI) instructions.

Which glibc version?  Better yet, could you throw preprocessed source
my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
full glibc builds on it ;-/

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01 17:50                       ` Al Viro
@ 2014-07-01 18:23                         ` Peter Maydell
  2014-07-01 18:30                           ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2014-07-01 18:23 UTC (permalink / raw)
  To: Al Viro; +Cc: QEMU Developers, Richard Henderson

On 1 July 2014 18:50, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Which glibc version?  Better yet, could you throw preprocessed source
> my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
> full glibc builds on it ;-/

Would a 164LX be a useful (ie non-duplicate) extra resource
for testing this stuff? That has a 21164 (EV5) in it. I haven't
tried to boot it for some years, but I can have a try at
resurrecting it if it would be helpful...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01 18:23                         ` Peter Maydell
@ 2014-07-01 18:30                           ` Richard Henderson
  2014-07-01 19:08                             ` Peter Maydell
  2014-07-02  4:05                             ` Al Viro
  0 siblings, 2 replies; 68+ messages in thread
From: Richard Henderson @ 2014-07-01 18:30 UTC (permalink / raw)
  To: Peter Maydell, Al Viro; +Cc: QEMU Developers

On 07/01/2014 11:23 AM, Peter Maydell wrote:
> On 1 July 2014 18:50, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> Which glibc version?  Better yet, could you throw preprocessed source
>> my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
>> full glibc builds on it ;-/
> 
> Would a 164LX be a useful (ie non-duplicate) extra resource
> for testing this stuff? That has a 21164 (EV5) in it. I haven't
> tried to boot it for some years, but I can have a try at
> resurrecting it if it would be helpful...

An ev5 would be a good fallback.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01 18:30                           ` Richard Henderson
@ 2014-07-01 19:08                             ` Peter Maydell
  2014-07-02  4:05                             ` Al Viro
  1 sibling, 0 replies; 68+ messages in thread
From: Peter Maydell @ 2014-07-01 19:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Al Viro, QEMU Developers

On 1 July 2014 19:30, Richard Henderson <rth@twiddle.net> wrote:
> On 07/01/2014 11:23 AM, Peter Maydell wrote:
>> Would a 164LX be a useful (ie non-duplicate) extra resource
>> for testing this stuff? That has a 21164 (EV5) in it. I haven't
>> tried to boot it for some years, but I can have a try at
>> resurrecting it if it would be helpful...
>
> An ev5 would be a good fallback.

Well, it boots. /proc/cpuinfo claims it's an EV56 variation 7
revision 0. Currently running Debian etch. Let me know if
you have anything you want me to run on it...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-01 18:30                           ` Richard Henderson
  2014-07-01 19:08                             ` Peter Maydell
@ 2014-07-02  4:05                             ` Al Viro
  2014-07-02  5:50                               ` Al Viro
  2014-07-02 14:59                               ` Richard Henderson
  1 sibling, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-07-02  4:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Tue, Jul 01, 2014 at 11:30:19AM -0700, Richard Henderson wrote:
> On 07/01/2014 11:23 AM, Peter Maydell wrote:
> > On 1 July 2014 18:50, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >> Which glibc version?  Better yet, could you throw preprocessed source
> >> my way?  UP1000 box is not in a good shape and I'd rather avoid trying to run
> >> full glibc builds on it ;-/
> > 
> > Would a 164LX be a useful (ie non-duplicate) extra resource
> > for testing this stuff? That has a 21164 (EV5) in it. I haven't
> > tried to boot it for some years, but I can have a try at
> > resurrecting it if it would be helpful...
> 
> An ev5 would be a good fallback.

OK, DS10 resurrected and so far seems to be stable (I'll know by tomorrow;
there's a possibility that chipset heatsink is dodgy, but so far it seems
to be doing OK).  That gives us a EV6 box.

Which glibc version it is?  I don't see such failures with your
axp/axp-next (head at 6b38f4e7f); could you post the details on your
reproducer?  I've tried to guess the likely version by glibc.git, but
I don't see nearbyint tests with such argument in any version there,
so I couldn't find it that way...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02  4:05                             ` Al Viro
@ 2014-07-02  5:50                               ` Al Viro
  2014-07-02  6:17                                 ` Al Viro
  2014-07-02 14:59                               ` Richard Henderson
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-02  5:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Wed, Jul 02, 2014 at 05:05:08AM +0100, Al Viro wrote:
> OK, DS10 resurrected and so far seems to be stable (I'll know by tomorrow;
> there's a possibility that chipset heatsink is dodgy, but so far it seems
> to be doing OK).  That gives us a EV6 box.
> 
> Which glibc version it is?  I don't see such failures with your
> axp/axp-next (head at 6b38f4e7f); could you post the details on your
> reproducer?  I've tried to guess the likely version by glibc.git, but
> I don't see nearbyint tests with such argument in any version there,
> so I couldn't find it that way...

FWIW,

; cat >a.c <<'EOF'
#include <stdio.h>
#include <fenv.h>

volatile long x;
void __attribute__((noinline)) f(double v)
{
        x = v;
}

main()
{
        unsigned long tmp, ret;
        static char *names[] = {"IOV", "INE", "UNF", "OVF", "DZE", "INV"};
        int i;

        feclearexcept(FE_ALL_EXCEPT);
        f(4.5);

        __asm__ __volatile__ (
                "stt $f0,%0\n\t"
                "trapb\n\t"
                "mf_fpcr $f0\n\t"
                "trapb\n\t"
                "stt $f0,%1\n\t"
                "ldt $f0,%0"
                : "=m"(tmp), "=m"(ret));
        for (i = 0; i < 6; i++)
                printf(" %s", (ret >> (57-i)) & 1 ? names[i] : "   ");
        printf(" %x ", fetestexcept(FE_ALL_EXCEPT));
        printf("FE_INEXACT = %x\n", FE_INEXACT);
}
EOF
; gcc -lm a.c
; ./a.out
     INE                 200000 FE_INEXACT = 200000
; uname -a
Linux wynton 2.6.22-rc7 #1 Thu Aug 30 02:03:17 EDT 2007 alpha GNU/Linux

That's on freshly resurrected DS10, just brought to the last debian/alpha
(i.e. lenny).  Kernel had been locally built back before the box has died.

On miles (3.3.6+, AS200) result is different:
     INE                 0 FE_INEXACT = 200000

On qemu (with debian kernel from lenny - 2.6.26) it's the same as on DS10:
     INE                 200000 FE_INEXACT = 200000

It _might_ be the difference between 3.3 and 2.6.20-somethine, but I doubt
that.  It's definitely not a matter of difference in libc versions - AS200
box has 2.13-38, but static binary built on DS10 (with its 2.7-18) copied on
AS200 behaves there as locally built one (i.e. 0 from fetestexcept(), as
opposed to FE_INEXACT the same static binary produces on DS10 and under
qemu).

AFAICS, it leaves two possibilities - EV45 (AS200) vs. EV6 (DS10) and EV67
(qemu) _or_ some change in the kernel.  I'll build 3.x kernel for DS10 and
post the results; shouldn't take long...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02  5:50                               ` Al Viro
@ 2014-07-02  6:17                                 ` Al Viro
  2014-07-02 15:26                                   ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-02  6:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Wed, Jul 02, 2014 at 06:50:27AM +0100, Al Viro wrote:

> AFAICS, it leaves two possibilities - EV45 (AS200) vs. EV6 (DS10) and EV67
> (qemu) _or_ some change in the kernel.  I'll build 3.x kernel for DS10 and
> post the results; shouldn't take long...

Actually, it's simpler - note that on *all* systems we end up with FPCR.INE
set.  So this
swcr_update_status(unsigned long swcr, unsigned long fpcr)
{
        /* EV6 implements most of the bits in hardware.  Collect
           the acrued exception bits from the real fpcr.  */
        if (implver() == IMPLVER_EV6) {
                swcr &= ~IEEE_STATUS_MASK;
                swcr |= (fpcr >> 35) & IEEE_STATUS_MASK;
        }
        return swcr;
}
ends up with FE_INEXACT set on everything that has implver() return 2.
Which is what EV6 and EV67 do and which is what qemu does by default.

So no, it's not a kernel version difference; it's all kernel versions ignoring
FPCR.INE when it calculates ieee_state on EV45 and using it on EV6 and friends.

If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd
better not use FPCR.INE - *all* variants of actual hardware (at least from
21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual
very clearly requires such behaviour for any subset that isn't completely
without floating point support.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02  4:05                             ` Al Viro
  2014-07-02  5:50                               ` Al Viro
@ 2014-07-02 14:59                               ` Richard Henderson
  2014-07-02 15:20                                 ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-02 14:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On 07/01/2014 09:05 PM, Al Viro wrote:
> Which glibc version it is?  I don't see such failures with your
> axp/axp-next (head at 6b38f4e7f); could you post the details on your
> reproducer?  I've tried to guess the likely version by glibc.git, but
> I don't see nearbyint tests with such argument in any version there,
> so I couldn't find it that way...

Glibc mainline, then look at math/test-double.out.

I'm interested in the results of the following test.


r~

[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 1981 bytes --]

#include <stdio.h>
#include <fenv.h>
#include <math.h>
#include <float.h>

static void div_su(void)
{
  asm("divt/su %0,%1,$f0; trapb" : : "f"(1.0), "f"(3.0) : "$f0");
}

static void div_sui(void)
{
  asm("divt/su %0,%1,$f0; trapb" : : "f"(1.0), "f"(3.0) : "$f0");
}

static void mul_su(void)
{
  asm("mult/su %0,%0,$f0; trapb" : : "f"(DBL_MIN) : "$f0");
}

static void mul_sui(void)
{
  asm("mult/sui %0,%0,$f0; trapb" : : "f"(DBL_MIN) : "$f0");
}

static void cvttq_45(void)
{
  asm("cvttq/c %0,$f0; trapb" : : "f"(4.5) : "$f0");
}

static void cvttq_sv_45(void)
{
  asm("cvttq/svc %0,$f0; trapb" : : "f"(4.5) : "$f0");
}

static void cvttq_svi_45(void)
{
  asm("cvttq/svic %0,$f0; trapb" : : "f"(4.5) : "$f0");
}

static void cvttq_max(void)
{
  asm("cvttq/c %0,$f0; trapb" : : "f"(DBL_MAX) : "$f0");
}

static void cvttq_sv_max(void)
{
  asm("cvttq/svc %0,$f0; trapb" : : "f"(DBL_MAX) : "$f0");
}

static void cvttq_svi_max(void)
{
  asm("cvttq/svic %0,$f0; trapb" : : "f"(DBL_MAX) : "$f0");
}

static struct test {
  void (*fn)(void);
  const char *name;
} const tests[] = {
  { div_su,		"/su  : 1/3" },
  { div_sui,		"/sui : 1/3" },
  { mul_su,		"/su  : min*min" },
  { mul_sui,		"/sui : min*min" },
  { cvttq_45,		"/    : (long)4.5" },
  { cvttq_sv_45,	"/sv  : (long)4.5" },
  { cvttq_svi_45,	"/svi : (long)4.5" },
  { cvttq_max,		"/    : (long)max" },
  { cvttq_sv_max,	"/sv  : (long)max" },
  { cvttq_svi_max,	"/svi : (long)max" },
};

int main()
{
  char result[8];
  int i, e;

  for (i = 0; i < sizeof(tests)/sizeof(struct test); ++i)
    {
      feclearexcept(FE_ALL_EXCEPT);
      tests[i].fn();

      e = fetestexcept(FE_ALL_EXCEPT);
      result[0] = e & FE_DIVBYZERO ? 'd' : '-';
      result[1] = e & FE_INEXACT ? 'i' : '-';
      result[2] = e & FE_INVALID ? 'I' : '-';
      result[3] = e & FE_OVERFLOW ? 'o' : '-';
      result[4] = e & FE_UNDERFLOW ? 'u' : '-';
      result[5] = '\0';
      printf("%-20s %s\n", tests[i].name, result);
    }

  return 0;
}

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02 14:59                               ` Richard Henderson
@ 2014-07-02 15:20                                 ` Al Viro
  2014-07-03  6:51                                   ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-02 15:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

> I'm interested in the results of the following test.


DS10:
/su  : 1/3           -i---
/sui : 1/3           -i---
/su  : min*min       -i--u
/sui : min*min       -i--u
/    : (long)4.5     -i---
/sv  : (long)4.5     -i---
/svi : (long)4.5     -i---
/    : (long)max     -i---
/sv  : (long)max     -iI--
/svi : (long)max     -iI--

AS200:
/su  : 1/3           -----
/sui : 1/3           -----
/su  : min*min       -i--u
/sui : min*min       -i--u
/    : (long)4.5     -----
/sv  : (long)4.5     -----
/svi : (long)4.5     -----
/    : (long)max     -----
/sv  : (long)max     --I--
/svi : (long)max     --I--

qemu:
/su  : 1/3           -i---
/sui : 1/3           -i---
/su  : min*min       -i--u
/sui : min*min       -i--u
/    : (long)4.5     -i---
/sv  : (long)4.5     -i---
/svi : (long)4.5     -i---
/    : (long)max     -i---
/sv  : (long)max     -iI--
/svi : (long)max     -iI--

IOW, same as EV6.  The difference is due to the kernel trusting FPCR.INE
as source for FE_INEXACT when it sees implver() returning 2.  See a bit
upthread for analysis...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02  6:17                                 ` Al Viro
@ 2014-07-02 15:26                                   ` Richard Henderson
  2014-07-02 15:49                                     ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-02 15:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/01/2014 11:17 PM, Al Viro wrote:
> If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd
> better not use FPCR.INE - *all* variants of actual hardware (at least from
> 21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual
> very clearly requires such behaviour for any subset that isn't completely
> without floating point support.

Um, where do you see that?  I see:

# 4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception
# This model is similar to the model in Section 4.7.6.3, except this
# model does not signal inexact results either by the inexact status
# flag or by trapping. [...] This model is implemented by using IEEE
# floating-point instructions with the /SU or /SV trap qualifiers.

The important words to me being "does not signal" and "inexact status flag".

Thus in sysdeps/alpha/fpu/s_nearbyint.c I explicitly use cvttq/svd and not
cvttq/svid.  By my reading that means no inexact shall be raised.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02 15:26                                   ` Richard Henderson
@ 2014-07-02 15:49                                     ` Al Viro
  0 siblings, 0 replies; 68+ messages in thread
From: Al Viro @ 2014-07-02 15:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Wed, Jul 02, 2014 at 08:26:53AM -0700, Richard Henderson wrote:
> On 07/01/2014 11:17 PM, Al Viro wrote:
> > If we don't want FE_INEXACT seen by fetestexcept() after rounding 4.5, we'd
> > better not use FPCR.INE - *all* variants of actual hardware (at least from
> > 21064A to 21264) set that sucker, and 4.7 in Architecture Reference Manual
> > very clearly requires such behaviour for any subset that isn't completely
> > without floating point support.
> 
> Um, where do you see that?  I see:
> 
> # 4.7.6.4 IEEE-Compliant Arithmetic Without Inexact Exception
> # This model is similar to the model in Section 4.7.6.3, except this
> # model does not signal inexact results either by the inexact status
> # flag or by trapping. [...] This model is implemented by using IEEE
> # floating-point instructions with the /SU or /SV trap qualifiers.
> 
> The important words to me being "does not signal" and "inexact status flag".
> 
> Thus in sysdeps/alpha/fpu/s_nearbyint.c I explicitly use cvttq/svd and not
> cvttq/svid.  By my reading that means no inexact shall be raised.

What does that have to do with exceptions?  cvttq/svd is not going to raise
one; it *does* set that bit in FPCR, though.  What happens afterwards is
that fetestexcept() calls osf_getsysinfo(2) with GSI_IEEE_FP_CONTROL for op.
Which does
                w = current_thread_info()->ieee_state & IEEE_SW_MASK;
                w = swcr_update_status(w, rdfpcr());
and hands the value of w to caller.  Now, look at swcr_update_status()
(in arch/alpha/include/uapi/asm/fpu.h these days) and note that on 21264
it will throw away the status bits of ->ieee_state and use 6 bits from
FPCR instead.

Note, BTW, that appendix B (IEEE conformance) claims (in B.1) conversions as
hardware-implemented, with "Software routines support remainder, round to
integer in floating-point format, and convert binary to/from decimal" right
next to it.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-02 15:20                                 ` Al Viro
@ 2014-07-03  6:51                                   ` Al Viro
  2014-07-03 18:25                                     ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-03  6:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

	More bugs: addl/v should sign-extend the result, as addl does.
As it is, we have
uint64_t helper_addlv(CPUAlphaState *env, uint64_t op1, uint64_t op2)
{
    uint64_t tmp = op1;
    op1 = (uint32_t)(op1 + op2);
    if (unlikely((tmp ^ op2 ^ (-1UL)) & (tmp ^ op1) & (1UL << 31))) {
        arith_excp(env, GETPC(), EXC_M_IOV, 0);
    }
    return op1;
}

IOW,
#include <stdio.h>

long r;
void __attribute__((noinline)) f(void)
{
        asm __volatile(
        "subl   $31, 1, $0\n\t"
        "addl   $0, $0, $1\n\t"
        "addl/v $0, $0, $0\n\t"
        "subq   $0, $1, $0\n\t"
        "stq    $0, %0\n\t"
                : "=m"(r): :"$0", "$1");
}

main()
{
        f();
        printf("%ld\n", r);
}

ends up printing 0 on actual hardware (all variants) and 4294967296 on
qemu.  Similar problem with subl/v - 

#include <stdio.h>

long r;
void __attribute__((noinline)) f(void)
{
        asm __volatile(
        "subl   $31, 1, $0\n\t"
        "subl/v $31, 1, $1\n\t"
        "subq   $0, $1, $0\n\t"
        "stq    $0, %0\n\t"
                : "=m"(r): :"$0", "$1");
}

main()
{
        f();
        printf("%ld\n", r);
}

prints 0 on actual hw and -4294967296 on qemu.  What constraints do we have
on qemu host, anyway?  Two's-complement, (int32_t)(uint32_t)x == x for any
int32_t x?  helper_mullv() seems to assume that...

Oh, crap - our mull/v is sensitive to upper 32 bits of multiplicands.
If you put 1UL<<32 into one register, 1 into another and say mull/v,
result will be 0 and no overflow.  qemu does
    int64_t res = (int64_t)op1 * (int64_t)op2;

    if (unlikely((int32_t)res != res)) {
        arith_excp(env, GETPC(), EXC_M_IOV, 0);
    }
    return (int64_t)((int32_t)res);
which leads to overflow trap triggered for no good reason...

Incidentally, all those guys ({add,sub,mul}[lq]/v) *do* assign the result
(same as the variant without /v would) before entering the trap.  So
arith_excp() is wrong here.

FWIW, why not just generate
	trunc_i64_i32 tmp, va
	trunc_i64_i32 tmp2, vb
	muls2_i32 tmp2, tmp, tmp, tmp2
	ext32s_i64 vc, tmp2
	maybe_overflow_32 tmp
where maybe_overflow throws IOV unless tmp is 0 or -1?  That would appear
to suffice for mull/v.  mulq/v would be
	muls2_i64 vc, tmp, va, vb
	maybe_overflow_64 tmp
addl/v:
	trunc_i64_i32 tmp, va
	trunc_i64_i32 tmp2, vb
	add2_i32 tmp2, tmp, tmp, zero, tmp2, zero
	ext32s_i64 vc, tmp2
	maybe_overflow_32 tmp
etc.

We'd need two helpers, differing only in argument type.  Simple
	if (unlikely(arg && ~arg))
           arith_excp(env, GETPC(), EXC_M_IOV, 0);
would do.  Not sure what flags would be needed in DEFINE_HELPER_... for
those, though.  Comments?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-03  6:51                                   ` Al Viro
@ 2014-07-03 18:25                                     ` Al Viro
  2014-07-03 20:19                                       ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-03 18:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Thu, Jul 03, 2014 at 07:51:04AM +0100, Al Viro wrote:

> FWIW, why not just generate
> 	trunc_i64_i32 tmp, va
> 	trunc_i64_i32 tmp2, vb
> 	muls2_i32 tmp2, tmp, tmp, tmp2
> 	ext32s_i64 vc, tmp2
> 	maybe_overflow_32 tmp
> where maybe_overflow throws IOV unless tmp is 0 or -1?
> to suffice for mull/v.  mulq/v would be
> 	muls2_i64 vc, tmp, va, vb
> 	maybe_overflow_64 tmp
> addl/v:
> 	trunc_i64_i32 tmp, va
> 	trunc_i64_i32 tmp2, vb
> 	add2_i32 tmp2, tmp, tmp, zero, tmp2, zero
> 	ext32s_i64 vc, tmp2
> 	maybe_overflow_32 tmp
> etc.

Grr...  Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0.
Something like
	setcond_32	tmp2, tmp2, zero, TCG_COND_LT
	add_i32		tmp, tmp2, tmp
	call		helper_IOV_if_not_zero tmp
for 32bit ones and
	setcond_64	tmp2, vc, zero, TCG_COND_LT
	add_i64		tmp, tmp2, tmp
	call		helper_IOV_if_not_zero tmp
for 64bit ones, or would it be better just to pass both arguments to helper
and let it deal with the check?  I'm not familiar enough with TCG, sorry...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-03 18:25                                     ` Al Viro
@ 2014-07-03 20:19                                       ` Richard Henderson
  2014-07-03 22:47                                         ` Al Viro
  2014-07-04  0:50                                         ` Al Viro
  0 siblings, 2 replies; 68+ messages in thread
From: Richard Henderson @ 2014-07-03 20:19 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/03/2014 11:25 AM, Al Viro wrote:
> On Thu, Jul 03, 2014 at 07:51:04AM +0100, Al Viro wrote:
> 
>> FWIW, why not just generate
>> 	trunc_i64_i32 tmp, va
>> 	trunc_i64_i32 tmp2, vb
>> 	muls2_i32 tmp2, tmp, tmp, tmp2
>> 	ext32s_i64 vc, tmp2
>> 	maybe_overflow_32 tmp
>> where maybe_overflow throws IOV unless tmp is 0 or -1?
>> to suffice for mull/v.  mulq/v would be
>> 	muls2_i64 vc, tmp, va, vb
>> 	maybe_overflow_64 tmp
>> addl/v:
>> 	trunc_i64_i32 tmp, va
>> 	trunc_i64_i32 tmp2, vb
>> 	add2_i32 tmp2, tmp, tmp, zero, tmp2, zero
>> 	ext32s_i64 vc, tmp2
>> 	maybe_overflow_32 tmp
>> etc.
> 
> Grr...  Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0.
> Something like
> 	setcond_32	tmp2, tmp2, zero, TCG_COND_LT
> 	add_i32		tmp, tmp2, tmp
> 	call		helper_IOV_if_not_zero tmp
> for 32bit ones and
> 	setcond_64	tmp2, vc, zero, TCG_COND_LT
> 	add_i64		tmp, tmp2, tmp
> 	call		helper_IOV_if_not_zero tmp
> for 64bit ones, or would it be better just to pass both arguments to helper
> and let it deal with the check?  I'm not familiar enough with TCG, sorry...
> 

I believe I have a tidy solution to these /v insns.  New patch set shortly.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-03 20:19                                       ` Richard Henderson
@ 2014-07-03 22:47                                         ` Al Viro
  2014-07-03 23:05                                           ` Peter Maydell
  2014-07-04  0:50                                         ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-03 22:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Thu, Jul 03, 2014 at 01:19:19PM -0700, Richard Henderson wrote:
> > Grr...  Wrong check, obviously - we want to check that tmp + MSB(tmp2) is 0.
> > Something like
> > 	setcond_32	tmp2, tmp2, zero, TCG_COND_LT
> > 	add_i32		tmp, tmp2, tmp
> > 	call		helper_IOV_if_not_zero tmp
> > for 32bit ones and
> > 	setcond_64	tmp2, vc, zero, TCG_COND_LT
> > 	add_i64		tmp, tmp2, tmp
> > 	call		helper_IOV_if_not_zero tmp
> > for 64bit ones, or would it be better just to pass both arguments to helper
> > and let it deal with the check?  I'm not familiar enough with TCG, sorry...
> > 
> 
> I believe I have a tidy solution to these /v insns.  New patch set shortly.

Hmm...
+            tcg_gen_eqv_i64(tmp, va, vb);
+            tcg_gen_mov_i64(tmp2, va);
+            tcg_gen_add_i64(vc, va, vb);
+            tcg_gen_xor_i64(tmp2, tmp2, vc);
+            tcg_gen_and_i64(tmp, tmp, tmp2);
+            tcg_gen_shri_i64(tmp, tmp, 63);
+            tcg_gen_movi_i64(tmp2, 0);
+            gen_helper_check_overflow(cpu_env, tmp, tmp2);

How can that be correct?  Suppose a = b = 0.  We get
tcg_gen_eqv_i64(tmp, va, vb);	->	tmp = -1
tcg_gen_mov_i64(tmp2, va);	->	tmp2 = 0
tcg_gen_add_i64(vc, va, vb);	->	c = 0
tcg_gen_xor_i64(tmp2, tmp2, vc);->	tmp2 = 0
tcg_gen_and_i64(tmp, tmp, tmp2);->	tmp = -1
tcg_gen_shri_i64(tmp, tmp, 63);	->	tmp = 1
tcg_gen_movi_i64(tmp2, 0);	->	tmp2 = 0
gen_helper_check_overflow(cpu_env, tmp, tmp2);	-> not equal, overflow.

What am I missing here?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-03 22:47                                         ` Al Viro
@ 2014-07-03 23:05                                           ` Peter Maydell
  2014-07-03 23:22                                             ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2014-07-03 23:05 UTC (permalink / raw)
  To: Al Viro; +Cc: QEMU Developers, Richard Henderson

On 3 July 2014 23:47, Al Viro <viro@zeniv.linux.org.uk> wrote:
> How can that be correct?  Suppose a = b = 0.  We get
> tcg_gen_eqv_i64(tmp, va, vb);   ->      tmp = -1
> tcg_gen_mov_i64(tmp2, va);      ->      tmp2 = 0
> tcg_gen_add_i64(vc, va, vb);    ->      c = 0
> tcg_gen_xor_i64(tmp2, tmp2, vc);->      tmp2 = 0
> tcg_gen_and_i64(tmp, tmp, tmp2);->      tmp = -1

tmp2 here is 0, so the result of this AND is 0, not -1...

> tcg_gen_shri_i64(tmp, tmp, 63); ->      tmp = 1

so tmp = 0

> tcg_gen_movi_i64(tmp2, 0);      ->      tmp2 = 0
> gen_helper_check_overflow(cpu_env, tmp, tmp2);  -> not equal, overflow.

and tmp == tmp2, no overflow.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-03 23:05                                           ` Peter Maydell
@ 2014-07-03 23:22                                             ` Al Viro
  0 siblings, 0 replies; 68+ messages in thread
From: Al Viro @ 2014-07-03 23:22 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

On Fri, Jul 04, 2014 at 12:05:37AM +0100, Peter Maydell wrote:
> On 3 July 2014 23:47, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > How can that be correct?  Suppose a = b = 0.  We get
> > tcg_gen_eqv_i64(tmp, va, vb);   ->      tmp = -1
> > tcg_gen_mov_i64(tmp2, va);      ->      tmp2 = 0
> > tcg_gen_add_i64(vc, va, vb);    ->      c = 0
> > tcg_gen_xor_i64(tmp2, tmp2, vc);->      tmp2 = 0
> > tcg_gen_and_i64(tmp, tmp, tmp2);->      tmp = -1
> 
> tmp2 here is 0, so the result of this AND is 0, not -1...

Doh.  Misread it as tcg_gen_add_i64, sorry.

Hmm...  So it's ((a ^ ~b) & (a ^ c) < 0 as overflow condition, IOW
MSB(a) == MSB(b) && MSB(c) != MSB(a).  OK, that works; might deserve
a comment, though...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-03 20:19                                       ` Richard Henderson
  2014-07-03 22:47                                         ` Al Viro
@ 2014-07-04  0:50                                         ` Al Viro
  2014-07-04  4:30                                           ` Richard Henderson
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-04  0:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Thu, Jul 03, 2014 at 01:19:19PM -0700, Richard Henderson wrote:

> I believe I have a tidy solution to these /v insns.  New patch set shortly.

OK, looks sane.  Next (trivial) bug: in translate_one()
        case 0xF800:
            /* WH64 */
            /* No-op */
            break;
should be followed by
        case 0xFC00:
            /* WH64EN */
            /* No-op */
            break;

As it is,
        asm __volatile( "lda    $0,%0\n\t"
			"wh64en ($0)\n\t" :: "m"(r));
ends sending SIGILL.

Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB
don't accept literal argument.  For one thing, as(1) won't let you generate
those, so it would have to be explicit
	.long 0x70001620
instead of
	perr $0,0,$0
On DS10 it gives SIGILL; under qemu it succeeds.  Trivial to fix, anyway,
if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037).

Another interesting bit I _really_ don't want to touch right now is LDx_L/STx_C;
what we get there is closer to compare-and-swap than to what the real
hardware is doing.  OTOH, considering the constraints on what can go between
LDx_L and STx_C, I'm not sure whether it can lead to any real problems with
the current qemu behaviour...

Hell knows; could a long linear piece of code with LDL_L near the point where
it runs out of space in block end up with QEMU switching to different cpu
before we reach the matching STL_C?  If so, there might be problems; on actual
hardware

CPU1: LDL_L reads 0
CPU2: store 1
...
CPU2: store 0
CPU1: STL_C
would have STL_C fail.  qemu implementation of those suckers will succeed.
I'm not sure if anything in the kernel is sensitive to that, but analysis
won't be fun...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-04  0:50                                         ` Al Viro
@ 2014-07-04  4:30                                           ` Richard Henderson
  2014-07-04  7:29                                             ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-04  4:30 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/03/2014 05:50 PM, Al Viro wrote:
> OK, looks sane.  Next (trivial) bug: in translate_one()
>         case 0xF800:
>             /* WH64 */
>             /* No-op */
>             break;
> should be followed by
>         case 0xFC00:
>             /* WH64EN */
>             /* No-op */
>             break;

Huh.  I don't have any documentation for EV7.  Added.

> Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB
> don't accept literal argument.  For one thing, as(1) won't let you generate
> those, so it would have to be explicit
> 	.long 0x70001620
> instead of
> 	perr $0,0,$0
> On DS10 it gives SIGILL; under qemu it succeeds.  Trivial to fix, anyway,
> if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037).

Is it just 030..037, or everything under opcode 1C?

Sadly, V4 of the handbook doesn't mention *anything* about not actually
allowing literals for any of these insns.

For now, I've updated insns in the range you describe, because it's easy.

> CPU1: LDL_L reads 0
> CPU2: store 1
> ...
> CPU2: store 0
> CPU1: STL_C
> would have STL_C fail.  qemu implementation of those suckers will succeed.
> I'm not sure if anything in the kernel is sensitive to that, but analysis
> won't be fun...

I'm aware that lock/cond can be used in ways that we don't support, including
STL_C to a different address on the same cacheline as the LDL_L.

I'm also aware that if we actually did implement SMP, we would be vulnerable to
the ABA error you describe above.

That said, it's all moot until the PALcode grows actual SMP support for booting
and signalling secondary cpus.  Given that qemu implements SMP by multiplexing
the guest cpus on a single host thread, and so we can't actually speed up the
guest by implementing SMP, it's not seemed like a priority.

The next thing I'd work on given oodles of time is to add block device support
to the PALcode, so that the console could boot from disk like a real machine.
In theory, most of this code can be stolen from SeaBIOS.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-04  4:30                                           ` Richard Henderson
@ 2014-07-04  7:29                                             ` Al Viro
  2014-07-05  1:40                                               ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-04  7:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Thu, Jul 03, 2014 at 09:30:05PM -0700, Richard Henderson wrote:

> > Another one is probably not worth bothering - PERR, CTPOP, CTLZ, UNPKBx and PKxB
> > don't accept literal argument.  For one thing, as(1) won't let you generate
> > those, so it would have to be explicit
> > 	.long 0x70001620
> > instead of
> > 	perr $0,0,$0
> > On DS10 it gives SIGILL; under qemu it succeeds.  Trivial to fix, anyway,
> > if we care about that (if (islit) goto invalid_opc; in 1C.030..1C.037).
> 
> Is it just 030..037, or everything under opcode 1C?

No, just those.

> Sadly, V4 of the handbook doesn't mention *anything* about not actually
> allowing literals for any of these insns.

It does - compare 4.13.1 (4-155, page 225) with 4.13.2 (two pages later).
The former has
	MINxxx		Ra.rq,Rb.rq,Rc.wq		! Operate format
			Ra.rq,#b.ib,Rc.wq
	MINxxx		Ra.rq,Rb.rq,Rc.wq		! Operate format
			Ra.rq,#b.ib,Rc.wq
The latter -
	PERR		Ra.rq,Rb.rq,Rc.wq		! Operate format

And yes, PERR with bit 12 set will give invalid instruction trap, while e.g.
MINSB8 won't.  The Real Intrudprize Kwality(tm) of technical writing, that,
but the information is, indeed, there.  Verified on UP1000, which has 0x307
for feature bits, so all this stuff is really in hardware, not emulated.

OPC1C is a mess - that's one place on alpha where decoder needs more than
upper 6 bits to determine the format of instruction.  Most of those guys
are Operate (with an extra twist being that some don't take literals),
but FTOIS and FTOIT are F-P, and only approximately so (its source refers
to integer register, destination to floating point one).  Note that
function field is in bits 5--11 for Operate and 5--15 for F-P ;-/  Bit
11 allows to discriminate between those, since FTOIS and FTOIT have function
0x70 and 0x78 resp, while everything else has it lower than 0x40.  Hell
knows how that mess had come to be...

Anyway, the situation with literals in OPC1C:
0, 1 (SEXT[BW]).  Work, rejected by as(1).
0x30--0x37.  Invalid instruction trap, as(1) (correctly) refuses to produce
those.
0x38--0x3f.  Work, accepted by as(1).
0x70, 0x78.  Those are F-P, no literals for them.

SEXTB/SEXTW are missing ARG_OPRL form in binutils opcodes/alpha-opc.c; probably
not worth bothering...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-04  7:29                                             ` Al Viro
@ 2014-07-05  1:40                                               ` Al Viro
  2014-07-05  5:26                                                 ` Al Viro
  2014-07-05 21:09                                                 ` Al Viro
  0 siblings, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-07-05  1:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

	Denorms fun:

a) softfloat.c raises flags we don't care about.  So checking that
FP_STATUS.float_exception_flags is non-zero is *not* good - we catch
false positives that way.

b) DNZ has effect *only* for /S insns.  Without /S denorm means INV and
that's it.  FPCR.INV isn't set, at that.  FPCR.INVD is ignored (it affects
only insns with /S).

c) without DNZ or DNOD denorms trip INV even with /S.  Again, FPCR.INV is
not set *and* FPCR.INVD is ignored.  It does stop INV from SQRTT/SU on
-1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set).  Looks like
this sucker is a separate kind of trap, the only similarity with INV being
that it sets the same bit in trap summary word.

d) at least on EV6 and EV67 DNOD *still* trips INV.  According to the
manual suppression of INV by DNOD is optional.  And while their text
might be interpreted as "INV is suppressed if operation with denorm
wouldn't result in something unpleasant" (which would apply to
sqrt(DBL_MIN/2)), the same behaviour happens on DBL_MIN/2 + DBL_MIN/2,
where the result is a good finite value, so it really looks like DNOD
doesn't suppress INV at all on these processors.

Does anybody have 21364 to run some tests on?

FWIW, hw testing had been done by direct printk from do_entArith(); it's
before anything alpha_fpu_emu() does.

Right now I have duplicate of 21264 SQRTT behaviour on everything except
infinities; hadn't looked into those yet.  I'm going to massage it a bit
and see if the result causes any regressions for corner cases of MULT
and friends.  Hopefully I'll have something usable by tomorrow...

Al, wondering if the original regression testsuite still exists somewhere in
the bowels of Intel - DEC/Compaq/HP had to have one for testing the hardware
back then...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-05  1:40                                               ` Al Viro
@ 2014-07-05  5:26                                                 ` Al Viro
  2014-07-05 21:09                                                 ` Al Viro
  1 sibling, 0 replies; 68+ messages in thread
From: Al Viro @ 2014-07-05  5:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote:
> d) at least on EV6 and EV67 DNOD *still* trips INV.  According to the
> manual suppression of INV by DNOD is optional.  And while their text
> might be interpreted as "INV is suppressed if operation with denorm
> wouldn't result in something unpleasant" (which would apply to
> sqrt(DBL_MIN/2)), the same behaviour happens on DBL_MIN/2 + DBL_MIN/2,
> where the result is a good finite value, so it really looks like DNOD
> doesn't suppress INV at all on these processors.
> 
> Does anybody have 21364 to run some tests on?

In fact, DNOD is simply not implemented on those guys - if you try to set it,
the bit still reads zero.  Worse, according to Compiler Writer's Guide for the
21264/21364, "Alpha architecture FPCR bit 47 (DNOD) is not implemented
by the 21264 or 21364".

In other words, it looks like FPCR.DNOD is something from (never-produced)
21464.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-05  1:40                                               ` Al Viro
  2014-07-05  5:26                                                 ` Al Viro
@ 2014-07-05 21:09                                                 ` Al Viro
  2014-07-05 22:55                                                   ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-05 21:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote:
> a) softfloat.c raises flags we don't care about.  So checking that
> FP_STATUS.float_exception_flags is non-zero is *not* good - we catch
> false positives that way.
> 
> b) DNZ has effect *only* for /S insns.  Without /S denorm means INV and
> that's it.  FPCR.INV isn't set, at that.  FPCR.INVD is ignored (it affects
> only insns with /S).
> 
> c) without DNZ or DNOD denorms trip INV even with /S.  Again, FPCR.INV is
> not set *and* FPCR.INVD is ignored.  It does stop INV from SQRTT/SU on
> -1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set).  Looks like
> this sucker is a separate kind of trap, the only similarity with INV being
> that it sets the same bit in trap summary word.

BTW, CVTTQ/SVI on denorms with DNZ shouldn't set Inexact.

> Right now I have duplicate of 21264 SQRTT behaviour on everything except
> infinities; hadn't looked into those yet.  I'm going to massage it a bit
> and see if the result causes any regressions for corner cases of MULT
> and friends.  Hopefully I'll have something usable by tomorrow...

Situation with infinities/NaNs: without /S we should trap on those guys
for arithmetics and conversions; in trap summary we get EXC_M_INV (regardless
of the argument) *and* (unlike the treatment of denorms there) we should
set FPCR.INV.  With /S they are passed to operation, which is responsible
for raising whatever it wants to raise (so far they all seem to be doing
the right thing in that area).

With comparisons, denorm handling is the same as for arithemtics; i.e.
with /S they trigger INV unless DNZ is set (and, presumably, working DNOD
would have the same effect on them).

Anyway, the current delta (on top of 26f86) follows; seems to get IEEE
insns behave on non-finite arguments as they do on 21264.  The main
exception is that register bitmask supplied to trap isn't calculated in
a bunch of cases; since its main purpose is to help locating the trapping
insn and we report precise traps (amask feature bit 9), it's probably not
an interesting problem.  Current Linux kernel definitely won't look at that
thing under qemu; an old one might, but it would have to be something
older than 2.3... <checks the history> than 2.2.8, actually.  And the impact
is that insns with /S getting a denorm argument won't be properly emulated
and you'll get SIGFPE.  Again, it has to be a really old kernel (older than
May 1999) to be affected at all.

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..637d95e 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
     return get_float_exception_flags(&FP_STATUS);
 }
 
+enum {
+	Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+		   float_flag_divbyzero | float_flag_overflow |
+		   float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
                                  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
    doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
     uint64_t frac = val & 0xfffffffffffffull;
 
     if (exp == 0) {
-        /* Denormals without DNZ set raise an exception.  */
-        if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-            arith_excp(env, GETPC(), EXC_M_UNF, 0);
+        /* Denormals without /S raise an exception.  */
+        if (frac != 0) {
+            arith_excp(env, GETPC(), EXC_M_INV, 0);
         }
     } else if (exp == 0x7ff) {
-        /* Infinity or NaN.  */
-        /* ??? I'm not sure these exception bit flags are correct.  I do
-           know that the Linux kernel, at least, doesn't rely on them and
-           just emulates the insn to figure out what exception to use.  */
-        arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+        /* Infinity or NaN */
+        env->fpcr_exc_status |= float_flag_invalid;
+        arith_excp(env, GETPC(), EXC_M_INV, 0);
     }
 }
 
@@ -125,16 +129,34 @@ void helper_ieee_input_cmp(CPUAlphaState *env, uint64_t val)
     uint64_t frac = val & 0xfffffffffffffull;
 
     if (exp == 0) {
-        /* Denormals without DNZ set raise an exception.  */
-        if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-            arith_excp(env, GETPC(), EXC_M_UNF, 0);
+        /* Denormals raise an exception.  */
+        if (frac != 0) {
+            arith_excp(env, GETPC(), EXC_M_INV, 0);
         }
     } else if (exp == 0x7ff && frac) {
         /* NaN.  */
+        env->fpcr_exc_status |= float_flag_invalid;
         arith_excp(env, GETPC(), EXC_M_INV, 0);
     }
 }
 
+/* Input handing with software completion.  Trap for denorms,
+   unless DNZ is set.  *IF* we try to support DNOD (which
+   none of the produced hardware did, AFAICS), we'll need
+   to suppress the trap when FPCR.DNOD is set; then the
+   code downstream of that will need to cope with denorms
+   sans flush_input_to_zero.  Most of it should work sanely,
+   but there's nothing to compare with...
+*/
+void helper_ieee_input_s(CPUAlphaState *env, uint64_t val)
+{
+    if (unlikely(2 * val - 1 < 0x1fffffffffffff)) {
+	if (!FP_STATUS.flush_inputs_to_zero) {
+	    arith_excp(env, GETPC(), EXC_M_INV | EXC_M_SWC, 0);
+	}
+    }
+}
+
 /* F floating (VAX) */
 static uint64_t float32_to_f(float32 fa)
 {
@@ -707,7 +729,8 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
     frac = a & 0xfffffffffffffull;
 
     if (exp == 0) {
-        if (unlikely(frac != 0)) {
+        if (unlikely(frac != 0) && !FP_STATUS.flush_inputs_to_zero) {
+	    /* not going to happen without working DNOD; ifdef out, perhaps? */
             goto do_underflow;
         }
     } else if (exp == 0x7ff) {
diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index 2cc100b..596f24d 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -88,6 +88,7 @@ DEF_HELPER_FLAGS_3(fp_exc_raise_s, TCG_CALL_NO_WG, void, env, i32, i32)
 
 DEF_HELPER_FLAGS_2(ieee_input, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(ieee_input_cmp, TCG_CALL_NO_WG, void, env, i64)
+DEF_HELPER_FLAGS_2(ieee_input_s, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(fcvtql_v_input, TCG_CALL_NO_WG, void, env, i64)
 
 #if !defined (CONFIG_USER_ONLY)
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 6ea33f3..4f71807 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -655,7 +655,8 @@ static TCGv gen_ieee_input(DisasContext *ctx, int reg, int fn11, int is_cmp)
             } else {
                 gen_helper_ieee_input(cpu_env, val);
             }
-        }
+        } else
+            gen_helper_ieee_input_s(cpu_env, val);
     }
     return val;
 }

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-05 21:09                                                 ` Al Viro
@ 2014-07-05 22:55                                                   ` Al Viro
  2014-07-07 14:11                                                     ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-05 22:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Sat, Jul 05, 2014 at 10:09:51PM +0100, Al Viro wrote:

> Anyway, the current delta (on top of 26f86) follows; seems to get IEEE
> insns behave on non-finite arguments as they do on 21264.  The main
> exception is that register bitmask supplied to trap isn't calculated in
> a bunch of cases; since its main purpose is to help locating the trapping
> insn and we report precise traps (amask feature bit 9), it's probably not
> an interesting problem.  Current Linux kernel definitely won't look at that
> thing under qemu; an old one might, but it would have to be something
> older than 2.3... <checks the history> than 2.2.8, actually.  And the impact
> is that insns with /S getting a denorm argument won't be properly emulated
> and you'll get SIGFPE.  Again, it has to be a really old kernel (older than
> May 1999) to be affected at all.

... and a followup (and the last part of exception handling for non-VAX
insn inputs, AFAICS) - CVTQL.

* whether it triggers a trap or not, it sets IOV and INE on overflow.
* in case of trap it does *not* bugger off immediately - result is
calculated, stored and only then we trap.
* trap summary word is different from cvtql/v and cvtql/sv.  IOW, it's yet
another case of "we think that IEEE semantics is stupid and if you need it,
you'd damn better ask for it explicitly".  Note that cvtql/v sets IOV|INE and
hits SIGFPE no matter what, while cvtql/sv set INV instead and triggers SIGFPE
only if FP_INVALID is enabled.  All difference is kernel-side and it's
triggered by EXC_M_SWC in summary word.

AFAICS, that should be it for IEEE and shared insns, as far as exceptions
on inputs are concerned.

Combined delta follows:

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..25c83b5 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
     return get_float_exception_flags(&FP_STATUS);
 }
 
+enum {
+	Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+		   float_flag_divbyzero | float_flag_overflow |
+		   float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
                                  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
    doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
     uint64_t frac = val & 0xfffffffffffffull;
 
     if (exp == 0) {
-        /* Denormals without DNZ set raise an exception.  */
-        if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-            arith_excp(env, GETPC(), EXC_M_UNF, 0);
+        /* Denormals without /S raise an exception.  */
+        if (frac != 0) {
+            arith_excp(env, GETPC(), EXC_M_INV, 0);
         }
     } else if (exp == 0x7ff) {
         /* Infinity or NaN.  */
-        /* ??? I'm not sure these exception bit flags are correct.  I do
-           know that the Linux kernel, at least, doesn't rely on them and
-           just emulates the insn to figure out what exception to use.  */
-        arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+        env->fpcr_exc_status |= float_flag_invalid;
+        arith_excp(env, GETPC(), EXC_M_INV, 0);
     }
 }
 
@@ -125,16 +129,34 @@ void helper_ieee_input_cmp(CPUAlphaState *env, uint64_t val)
     uint64_t frac = val & 0xfffffffffffffull;
 
     if (exp == 0) {
-        /* Denormals without DNZ set raise an exception.  */
-        if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-            arith_excp(env, GETPC(), EXC_M_UNF, 0);
+        /* Denormals raise an exception.  */
+        if (frac != 0) {
+            arith_excp(env, GETPC(), EXC_M_INV, 0);
         }
     } else if (exp == 0x7ff && frac) {
         /* NaN.  */
+        env->fpcr_exc_status |= float_flag_invalid;
         arith_excp(env, GETPC(), EXC_M_INV, 0);
     }
 }
 
+/* Input handing with software completion.  Trap for denorms,
+   unless DNZ is set.  *IF* we try to support DNOD (which
+   none of the produced hardware did, AFAICS), we'll need
+   to suppress the trap when FPCR.DNOD is set; then the
+   code downstream of that will need to cope with denorms
+   sans flush_input_to_zero.  Most of it should work sanely,
+   but there's nothing to compare with...
+*/
+void helper_ieee_input_s(CPUAlphaState *env, uint64_t val)
+{
+    if (unlikely(2 * val - 1 < 0x1fffffffffffff)) {
+	if (!FP_STATUS.flush_inputs_to_zero) {
+	    arith_excp(env, GETPC(), EXC_M_INV | EXC_M_SWC, 0);
+	}
+    }
+}
+
 /* F floating (VAX) */
 static uint64_t float32_to_f(float32 fa)
 {
@@ -707,7 +729,8 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
     frac = a & 0xfffffffffffffull;
 
     if (exp == 0) {
-        if (unlikely(frac != 0)) {
+        if (unlikely(frac != 0) && !FP_STATUS.flush_inputs_to_zero) {
+	    /* not going to happen without working DNOD; ifdef out, perhaps? */
             goto do_underflow;
         }
     } else if (exp == 0x7ff) {
@@ -826,7 +849,9 @@ uint64_t helper_cvtqg(CPUAlphaState *env, uint64_t a)
 
 void helper_fcvtql_v_input(CPUAlphaState *env, uint64_t val)
 {
+    set_float_exception_flags(0, &FP_STATUS);
     if (val != (int32_t)val) {
-        arith_excp(env, GETPC(), EXC_M_IOV, 0);
+        float_raise(float_flag_int_overflow, &FP_STATUS);
+	env->fpcr_exc_status |= float_flag_inexact;
     }
 }
diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index 2cc100b..596f24d 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -88,6 +88,7 @@ DEF_HELPER_FLAGS_3(fp_exc_raise_s, TCG_CALL_NO_WG, void, env, i32, i32)
 
 DEF_HELPER_FLAGS_2(ieee_input, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(ieee_input_cmp, TCG_CALL_NO_WG, void, env, i64)
+DEF_HELPER_FLAGS_2(ieee_input_s, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(fcvtql_v_input, TCG_CALL_NO_WG, void, env, i64)
 
 #if !defined (CONFIG_USER_ONLY)
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 6ea33f3..611b293 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -655,7 +655,8 @@ static TCGv gen_ieee_input(DisasContext *ctx, int reg, int fn11, int is_cmp)
             } else {
                 gen_helper_ieee_input(cpu_env, val);
             }
-        }
+        } else
+            gen_helper_ieee_input_s(cpu_env, val);
     }
     return val;
 }
@@ -2256,24 +2257,15 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             gen_fcmov(ctx, TCG_COND_GT, ra, rb, rc);
             break;
         case 0x030:
-            /* CVTQL */
-            REQUIRE_REG_31(ra);
-            vc = dest_fpr(ctx, rc);
-            vb = load_fpr(ctx, rb);
-            gen_fcvtql(vc, vb);
-            break;
         case 0x130:
-            /* CVTQL/V */
         case 0x530:
-            /* CVTQL/SV */
+            /* CVTQL, CVTQL/V, CVTQL/SV */
             REQUIRE_REG_31(ra);
-            /* ??? I'm pretty sure there's nothing that /sv needs to do that
-               /v doesn't do.  The only thing I can think is that /sv is a
-               valid instruction merely for completeness in the ISA.  */
             vc = dest_fpr(ctx, rc);
             vb = load_fpr(ctx, rb);
             gen_helper_fcvtql_v_input(cpu_env, vb);
             gen_fcvtql(vc, vb);
+            gen_fp_exc_raise(rc, fn11);
             break;
         default:
             goto invalid_opc;

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-05 22:55                                                   ` Al Viro
@ 2014-07-07 14:11                                                     ` Richard Henderson
  2014-07-07 15:06                                                       ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-07 14:11 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/05/2014 03:55 PM, Al Viro wrote:
> +/* Input handing with software completion.  Trap for denorms,
> +   unless DNZ is set.  *IF* we try to support DNOD (which
> +   none of the produced hardware did, AFAICS), we'll need
> +   to suppress the trap when FPCR.DNOD is set; then the
> +   code downstream of that will need to cope with denorms
> +   sans flush_input_to_zero.  Most of it should work sanely,
> +   but there's nothing to compare with...
> +*/
> +void helper_ieee_input_s(CPUAlphaState *env, uint64_t val)
> +{
> +    if (unlikely(2 * val - 1 < 0x1fffffffffffff)) {
> +	if (!FP_STATUS.flush_inputs_to_zero) {
> +	    arith_excp(env, GETPC(), EXC_M_INV | EXC_M_SWC, 0);
> +	}
> +    }
> +}
> +

A couple of points here:

1) We should never raise this in user-only mode.  In that mode, we emulate the
whole fpu stack, all the way through from HW to the OS completion handler.

2) Because of that, we have the capability of doing the same thing in system
mode.  This lets us do more of the computation in the host, and less in the
guest, which is faster.  The only thing this makes more difficult is debugging
the OS completion handlers within the kernel, since they'll only get invoked
when SIGFPE needs to be sent.

3) If we do want to implement a mode where we faithfully send SWC for all of
the bits of IEEE that real HW didn't implement, do we really need to avoid a
store to the output register when signalling this?  I.e. can we notice this
condition after the fact with float_flag_input_denormal, rather than having
another function call to prep the inputs?


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-07 14:11                                                     ` Richard Henderson
@ 2014-07-07 15:06                                                       ` Al Viro
  2014-07-07 16:20                                                         ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-07 15:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Mon, Jul 07, 2014 at 07:11:58AM -0700, Richard Henderson wrote:
> A couple of points here:
> 
> 1) We should never raise this in user-only mode.  In that mode, we emulate the
> whole fpu stack, all the way through from HW to the OS completion handler.

How is that different from other cases where we have an exception raised
by an fp operation?

> 2) Because of that, we have the capability of doing the same thing in system
> mode.  This lets us do more of the computation in the host, and less in the
> guest, which is faster.  The only thing this makes more difficult is debugging
> the OS completion handlers within the kernel, since they'll only get invoked
> when SIGFPE needs to be sent.

Umm...  The effect of software completion depends on current->ieee_state;
how would you keep track of that outside of guest kernel?

> 3) If we do want to implement a mode where we faithfully send SWC for all of
> the bits of IEEE that real HW didn't implement, do we really need to avoid a
> store to the output register when signalling this?  I.e. can we notice this
> condition after the fact with float_flag_input_denormal, rather than having
> another function call to prep the inputs?

But flag_input_denormal is raised only when we do have DNZ set.  Which is
an entirely different case, where we should not (and do not) get an exception
at all...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-07 15:06                                                       ` Al Viro
@ 2014-07-07 16:20                                                         ` Richard Henderson
  2014-07-08  4:20                                                           ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-07 16:20 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/07/2014 08:06 AM, Al Viro wrote:
> On Mon, Jul 07, 2014 at 07:11:58AM -0700, Richard Henderson wrote:
>> A couple of points here:
>>
>> 1) We should never raise this in user-only mode.  In that mode, we emulate the
>> whole fpu stack, all the way through from HW to the OS completion handler.
> 
> How is that different from other cases where we have an exception raised
> by an fp operation?

In all other cases we know we're going to send SIGFPE.  That's either through a
non /S insn which the kernel wouldn't touch, or by having computed the true
IEEE result and examined the exceptions to be raised.

>> 2) Because of that, we have the capability of doing the same thing in system
>> mode.  This lets us do more of the computation in the host, and less in the
>> guest, which is faster.  The only thing this makes more difficult is debugging
>> the OS completion handlers within the kernel, since they'll only get invoked
>> when SIGFPE needs to be sent.
> 
> Umm...  The effect of software completion depends on current->ieee_state;
> how would you keep track of that outside of guest kernel?

The kernel essentially keeps a copy of IEEE_STATE in the FPCR.  I don't see any
missing bits in ieee_swcr_to_fpcr, do you?

While real hardware might ignore some of those bits once stored, qemu doesn't.

While in real hardware one could force the FPCR and IEEE_STATE to differ,
honestly that'd be a bug.  (Although a silly one; I wish the kernel took the
EV6 FPCR as gospel for everything, not just the status flags.  That could make
certain libm.so computations much faster.)

> 
>> 3) If we do want to implement a mode where we faithfully send SWC for all of
>> the bits of IEEE that real HW didn't implement, do we really need to avoid a
>> store to the output register when signalling this?  I.e. can we notice this
>> condition after the fact with float_flag_input_denormal, rather than having
>> another function call to prep the inputs?
> 
> But flag_input_denormal is raised only when we do have DNZ set.  Which is
> an entirely different case, where we should not (and do not) get an exception
> at all...

Ah, you're right about that.  I'd mis-remembered the implementation.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-07 16:20                                                         ` Richard Henderson
@ 2014-07-08  4:20                                                           ` Al Viro
  2014-07-08  6:03                                                             ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08  4:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Mon, Jul 07, 2014 at 09:20:28AM -0700, Richard Henderson wrote:
> > How is that different from other cases where we have an exception raised
> > by an fp operation?
> 
> In all other cases we know we're going to send SIGFPE.  That's either through a
> non /S insn which the kernel wouldn't touch, or by having computed the true
> IEEE result and examined the exceptions to be raised.

Umm...  Not quite.  Consider e.g. CVTQL case.  There we have the following
picture in case of overflow (real hw with Linux on top of it):
	no suffix:		IOV INE
	/v:			IOV INE SIGFPE
	/sv, no IEEE INVE:	IOV INE INV
	/sv, IEEE INVE:		IOV INE INV SIGFPE
This is after the completion had a chance to run.  From the hw POV it's
	no suffix		IOV INE no trap
	/v			IOV INE trap<IOV>
	/sv			IOV INE trap<SWC,IOV>
and it's alpha_fp_emul() that does the rest in /sv case.  Actually, it's even
simpler:
	if overflow
		FPCR.INE = 1
		raise IOV
	do usual trap suffix handling
and I'm reasonably sure that this is what they did internally.  You are
proposing to do 4 cases in all their messy glory in qemu itself...

And I wouldn't bet a dime on not having similar turds in other insns; after
all, "it's hard, let's offload it to software" was only a part of motivation
for software completions.  "We really don't like this part of IEEE standard
and we'd love to tell you to see figure 1, but we need conformance, so you
can mark an insn with /s and have the kernel do what IEEE requires" is also
very visible in their manual.

Result is a mess - if you try to fold the kernel-side stuff into "hardware",
you end up with a pile of inconsistent behaviours.  In principle, it's
doable, especially since we are not really constrained by actual hw in terms
of what we do in case of FPCR.DNOD being true - no actual hw could set it.
So we want
	* hw behaviour without /s (denorms trap)
	* hw behaviour with /s without denorms
	* hw behaviour with /s with denorms with FPCR.DNZ (same as with 0) 
	* kernel completion behaviour with /s with denorm
and it might even be what they intended for DNOD to do.  But it's going
to be messy as hell.

And that's not even going into generating the right si_code for that SIGFPE.
What produces those TARGET_GEN_FLTINE and friends?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08  4:20                                                           ` Al Viro
@ 2014-07-08  6:03                                                             ` Richard Henderson
  2014-07-08  6:54                                                               ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-08  6:03 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/07/2014 09:20 PM, Al Viro wrote:
> and I'm reasonably sure that this is what they did internally.  You are
> proposing to do 4 cases in all their messy glory in qemu itself...

Yes.  Primarily because we *have* to do so for the linux-user case.

> And that's not even going into generating the right si_code for that SIGFPE.
> What produces those TARGET_GEN_FLTINE and friends?

linux-user/main.c, cpu_loop.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08  6:03                                                             ` Richard Henderson
@ 2014-07-08  6:54                                                               ` Al Viro
  2014-07-08  7:13                                                                 ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08  6:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Mon, Jul 07, 2014 at 11:03:08PM -0700, Richard Henderson wrote:
> On 07/07/2014 09:20 PM, Al Viro wrote:
> > and I'm reasonably sure that this is what they did internally.  You are
> > proposing to do 4 cases in all their messy glory in qemu itself...
> 
> Yes.  Primarily because we *have* to do so for the linux-user case.
> 
> > And that's not even going into generating the right si_code for that SIGFPE.
> > What produces those TARGET_GEN_FLTINE and friends?
> 
> linux-user/main.c, cpu_loop.

That's where we consume it; where is it produced?  Sure, explicit
gentrap in alpha code will lead there, with whatever we have in
$16 deciding what'll go into si_code, but where does that happen on
fp exception codepaths?  IOW, what sets si_code on those?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08  6:54                                                               ` Al Viro
@ 2014-07-08  7:13                                                                 ` Al Viro
  2014-07-08  8:05                                                                   ` Peter Maydell
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08  7:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Tue, Jul 08, 2014 at 07:54:36AM +0100, Al Viro wrote:
> On Mon, Jul 07, 2014 at 11:03:08PM -0700, Richard Henderson wrote:
> > On 07/07/2014 09:20 PM, Al Viro wrote:
> > > and I'm reasonably sure that this is what they did internally.  You are
> > > proposing to do 4 cases in all their messy glory in qemu itself...
> > 
> > Yes.  Primarily because we *have* to do so for the linux-user case.
> > 
> > > And that's not even going into generating the right si_code for that SIGFPE.
> > > What produces those TARGET_GEN_FLTINE and friends?
> > 
> > linux-user/main.c, cpu_loop.
> 
> That's where we consume it; where is it produced?  Sure, explicit
> gentrap in alpha code will lead there, with whatever we have in
> $16 deciding what'll go into si_code, but where does that happen on
> fp exception codepaths?  IOW, what sets si_code on those?

Actually, that's badly worded; what codepath ends up setting si_code on
e.g. fp addition overflows?  In system mode it's done by completion code
in the kernel, but AFAICS in user mode there are only two places where it
might happen - one is gentrap handling and another - osf_setsysinfo(2)
emulation for TARGET_SSI_IEEE_FP_CONTROL.  What I don't understand is how
do we get from float_raise(&FP_STATUS, float_flag_overflow) in fpu/softfloat.c
to either of those.

IOW, suppose I do
	x = DBL_MAX;
	feenableexcept(FE_ALL_EXCEPT);
	x *= x;
I understand how I'll get SIGFPE, but what will set correct si_code in
siginfo I'll see in the hanler?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08  7:13                                                                 ` Al Viro
@ 2014-07-08  8:05                                                                   ` Peter Maydell
  2014-07-08 14:53                                                                     ` Richard Henderson
  2014-07-08 16:13                                                                     ` Al Viro
  0 siblings, 2 replies; 68+ messages in thread
From: Peter Maydell @ 2014-07-08  8:05 UTC (permalink / raw)
  To: Al Viro; +Cc: QEMU Developers, Richard Henderson

On 8 July 2014 08:13, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Actually, that's badly worded; what codepath ends up setting si_code on
> e.g. fp addition overflows?  In system mode it's done by completion code
> in the kernel, but AFAICS in user mode there are only two places where it
> might happen - one is gentrap handling and another - osf_setsysinfo(2)
> emulation for TARGET_SSI_IEEE_FP_CONTROL.  What I don't understand is how
> do we get from float_raise(&FP_STATUS, float_flag_overflow) in fpu/softfloat.c
> to either of those.
>
> IOW, suppose I do
>         x = DBL_MAX;
>         feenableexcept(FE_ALL_EXCEPT);
>         x *= x;
> I understand how I'll get SIGFPE, but what will set correct si_code in
> siginfo I'll see in the hanler?

The code we have currently may well be buggy, but the correct
place to set si_code is (as Richard says) the Alpha cpu_loop() in
linux-user/main.c, which has access to the trap type that just
caused us to stop executing code, plus the CPUState, which
should be enough information to set si_code correctly. In
particular the GENTRAP case seems to be setting a variety
of different si_code values for SIGFPE.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08  8:05                                                                   ` Peter Maydell
@ 2014-07-08 14:53                                                                     ` Richard Henderson
  2014-07-08 16:13                                                                     ` Al Viro
  1 sibling, 0 replies; 68+ messages in thread
From: Richard Henderson @ 2014-07-08 14:53 UTC (permalink / raw)
  To: Peter Maydell, Al Viro; +Cc: QEMU Developers

On 07/08/2014 01:05 AM, Peter Maydell wrote:
> On 8 July 2014 08:13, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> Actually, that's badly worded; what codepath ends up setting si_code on
>> e.g. fp addition overflows?  In system mode it's done by completion code
>> in the kernel, but AFAICS in user mode there are only two places where it
>> might happen - one is gentrap handling and another - osf_setsysinfo(2)
>> emulation for TARGET_SSI_IEEE_FP_CONTROL.  What I don't understand is how
>> do we get from float_raise(&FP_STATUS, float_flag_overflow) in fpu/softfloat.c
>> to either of those.
>>
>> IOW, suppose I do
>>         x = DBL_MAX;
>>         feenableexcept(FE_ALL_EXCEPT);
>>         x *= x;
>> I understand how I'll get SIGFPE, but what will set correct si_code in
>> siginfo I'll see in the hanler?
> 
> The code we have currently may well be buggy, but the correct
> place to set si_code is (as Richard says) the Alpha cpu_loop() in
> linux-user/main.c, which has access to the trap type that just
> caused us to stop executing code, plus the CPUState, which
> should be enough information to set si_code correctly. In
> particular the GENTRAP case seems to be setting a variety
> of different si_code values for SIGFPE.

The gentrap case is a red-herring.

The case you're looking for is EXC_ARITH.  The path is from

    arith_excp
      dynamic_excp
        cpu_loop_exit
          longjmp
  cpu_exec
cpu_loop

It's also true that we don't install the correct si_code there, but we could.
Mostly the gcc/glibc test cases really only care that SIGFPE gets raised, not
what the codes are, so I haven't bothered.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08  8:05                                                                   ` Peter Maydell
  2014-07-08 14:53                                                                     ` Richard Henderson
@ 2014-07-08 16:13                                                                     ` Al Viro
  2014-07-08 16:33                                                                       ` Peter Maydell
  2014-07-08 18:12                                                                       ` Richard Henderson
  1 sibling, 2 replies; 68+ messages in thread
From: Al Viro @ 2014-07-08 16:13 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote:

> The code we have currently may well be buggy, but the correct

It is ;-/  We set TARGET_FPE_FLTINV unconditionally there.  BTW, what's
the reason why all these cpu_loop() instances can't go into
linux-user/<arch>/something?  Is that just because you have
static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
static int pending_cpus;
and a bunch of inlines using them?  As it is, about three quarters of
linux-user/main.c consist of code under series of arch ifdefs...

BTW, are there any more or less uptodate docs on qemu profiling?  I mean,
things like perf/oprofile on the host obviously end up lumping all tcg
output together.  Is there any way to get information beyond "~40% of time
is spent in generated code, ~15% - in tb_find_fast(), and the rest is very
much colder"?

Incidentally, combination of --enable-gprof and (default) --enable-pie
won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
With --disable-pie it builds, but gprof of course has the same problem
as perf and friends - generated code is transient, so we get no details ;-/

> place to set si_code is (as Richard says) the Alpha cpu_loop() in
> linux-user/main.c, which has access to the trap type that just
> caused us to stop executing code, plus the CPUState, which
> should be enough information to set si_code correctly. In
> particular the GENTRAP case seems to be setting a variety
> of different si_code values for SIGFPE.

Sigh...  Well, having read through alpha_fp_emul() and the stuff it calls,
I understand why they hadn't implemented DNOD in any released hardware.
It's a bloody mess, with tons of interesting special cases.  E.g. adding
denorm to very large finite can push into overflow, with further effects
depending on whether we have overflow and/or denorm IEEE traps disabled,
etc.

Frankly, I suspect that it's better to have qemu-system-alpha behave like
the actual hardware does (including "FPCR.DNOD can't be set") and keep the
linux-user behaviour as is, for somebody brave and masochistic enough to
fight that one.  And no, it's nowhere near "just let denorms ride through
the normal softfloat code and play a bit with the flags it might raise".
And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
want to play with their software completion semantics (not identical to Linux
one) for the sake of yet-to-be-written bsd-user alpha support...

Anyway, how about the following delta?  AFAICS, it gets qemu-system-alpha
behaviour in sync with actual hardware without screwing qemu-alpha up.

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..30cbf02 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
     return get_float_exception_flags(&FP_STATUS);
 }
 
+enum {
+	Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+		   float_flag_divbyzero | float_flag_overflow |
+		   float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
                                  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
    doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+    uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
     if (exc) {
         env->fpcr_exc_status |= exc;
         exc &= ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
     uint64_t frac = val & 0xfffffffffffffull;
 
     if (exp == 0) {
-        /* Denormals without DNZ set raise an exception.  */
-        if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-            arith_excp(env, GETPC(), EXC_M_UNF, 0);
+        /* Denormals without /S raise an exception.  */
+        if (frac != 0) {
+            arith_excp(env, GETPC(), EXC_M_INV, 0);
         }
     } else if (exp == 0x7ff) {
         /* Infinity or NaN.  */
-        /* ??? I'm not sure these exception bit flags are correct.  I do
-           know that the Linux kernel, at least, doesn't rely on them and
-           just emulates the insn to figure out what exception to use.  */
-        arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+        env->fpcr_exc_status |= float_flag_invalid;
+        arith_excp(env, GETPC(), EXC_M_INV, 0);
     }
 }
 
@@ -125,16 +129,34 @@ void helper_ieee_input_cmp(CPUAlphaState *env, uint64_t val)
     uint64_t frac = val & 0xfffffffffffffull;
 
     if (exp == 0) {
-        /* Denormals without DNZ set raise an exception.  */
-        if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-            arith_excp(env, GETPC(), EXC_M_UNF, 0);
+        /* Denormals raise an exception.  */
+        if (frac != 0) {
+            arith_excp(env, GETPC(), EXC_M_INV, 0);
         }
     } else if (exp == 0x7ff && frac) {
         /* NaN.  */
+        env->fpcr_exc_status |= float_flag_invalid;
         arith_excp(env, GETPC(), EXC_M_INV, 0);
     }
 }
 
+/* Input handing with software completion.  Trap for denorms,
+   unless DNZ is set.  *IF* we try to support DNOD (which
+   none of the produced hardware did, AFAICS), we'll need
+   to suppress the trap when FPCR.DNOD is set; then the
+   code downstream of that will need to cope with denorms
+   sans flush_input_to_zero.  Most of it should work sanely,
+   but there's nothing to compare with...
+*/
+void helper_ieee_input_s(CPUAlphaState *env, uint64_t val)
+{
+    if (unlikely(2 * val - 1 < 0x1fffffffffffff)) {
+	if (!FP_STATUS.flush_inputs_to_zero) {
+	    arith_excp(env, GETPC(), EXC_M_INV | EXC_M_SWC, 0);
+	}
+    }
+}
+
 /* F floating (VAX) */
 static uint64_t float32_to_f(float32 fa)
 {
@@ -707,7 +729,8 @@ static inline uint64_t inline_cvttq(CPUAlphaState *env, uint64_t a,
     frac = a & 0xfffffffffffffull;
 
     if (exp == 0) {
-        if (unlikely(frac != 0)) {
+        if (unlikely(frac != 0) && !FP_STATUS.flush_inputs_to_zero) {
+	    /* not going to happen without working DNOD; ifdef out, perhaps? */
             goto do_underflow;
         }
     } else if (exp == 0x7ff) {
@@ -826,7 +849,9 @@ uint64_t helper_cvtqg(CPUAlphaState *env, uint64_t a)
 
 void helper_fcvtql_v_input(CPUAlphaState *env, uint64_t val)
 {
+    set_float_exception_flags(0, &FP_STATUS);
     if (val != (int32_t)val) {
-        arith_excp(env, GETPC(), EXC_M_IOV, 0);
+        float_raise(float_flag_int_overflow, &FP_STATUS);
+        env->fpcr_exc_status |= float_flag_inexact;
     }
 }
diff --git a/target-alpha/helper.c b/target-alpha/helper.c
index 6bcde21..21af702 100644
--- a/target-alpha/helper.c
+++ b/target-alpha/helper.c
@@ -157,7 +157,9 @@ void cpu_alpha_store_fpcr (CPUAlphaState *env, uint64_t val)
     }
     env->fpcr_dyn_round = t;
 
+#ifdef CONFIG_USER_ONLY
     env->fpcr_dnod = (val & FPCR_DNOD) != 0;
+#endif
     env->fpcr_undz = (val & FPCR_UNDZ) != 0;
     env->fpcr_flush_to_zero = env->fpcr_dnod & env->fpcr_undz;
     env->fp_status.flush_inputs_to_zero = (val & FPCR_DNZ) != 0;
diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index 2cc100b..596f24d 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -88,6 +88,7 @@ DEF_HELPER_FLAGS_3(fp_exc_raise_s, TCG_CALL_NO_WG, void, env, i32, i32)
 
 DEF_HELPER_FLAGS_2(ieee_input, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(ieee_input_cmp, TCG_CALL_NO_WG, void, env, i64)
+DEF_HELPER_FLAGS_2(ieee_input_s, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(fcvtql_v_input, TCG_CALL_NO_WG, void, env, i64)
 
 #if !defined (CONFIG_USER_ONLY)
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 6ea33f3..ad041b0 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -655,6 +655,10 @@ static TCGv gen_ieee_input(DisasContext *ctx, int reg, int fn11, int is_cmp)
             } else {
                 gen_helper_ieee_input(cpu_env, val);
             }
+        } else {
+#ifndef CONFIG_USER_ONLY
+            gen_helper_ieee_input_s(cpu_env, val);
+#endif
         }
     }
     return val;
@@ -2256,24 +2260,15 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             gen_fcmov(ctx, TCG_COND_GT, ra, rb, rc);
             break;
         case 0x030:
-            /* CVTQL */
-            REQUIRE_REG_31(ra);
-            vc = dest_fpr(ctx, rc);
-            vb = load_fpr(ctx, rb);
-            gen_fcvtql(vc, vb);
-            break;
         case 0x130:
-            /* CVTQL/V */
         case 0x530:
-            /* CVTQL/SV */
+            /* CVTQL, CVTQL/V, CVTQL/SV */
             REQUIRE_REG_31(ra);
-            /* ??? I'm pretty sure there's nothing that /sv needs to do that
-               /v doesn't do.  The only thing I can think is that /sv is a
-               valid instruction merely for completeness in the ISA.  */
             vc = dest_fpr(ctx, rc);
             vb = load_fpr(ctx, rb);
             gen_helper_fcvtql_v_input(cpu_env, vb);
             gen_fcvtql(vc, vb);
+            gen_fp_exc_raise(rc, fn11);
             break;
         default:
             goto invalid_opc;

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 16:13                                                                     ` Al Viro
@ 2014-07-08 16:33                                                                       ` Peter Maydell
  2014-07-08 17:20                                                                         ` Al Viro
  2014-07-09  9:04                                                                         ` Alex Bennée
  2014-07-08 18:12                                                                       ` Richard Henderson
  1 sibling, 2 replies; 68+ messages in thread
From: Peter Maydell @ 2014-07-08 16:33 UTC (permalink / raw)
  To: Al Viro; +Cc: Alex Bennée, QEMU Developers, Richard Henderson

On 8 July 2014 17:13, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote:
>
>> The code we have currently may well be buggy, but the correct
>
> It is ;-/  We set TARGET_FPE_FLTINV unconditionally there.  BTW, what's
> the reason why all these cpu_loop() instances can't go into
> linux-user/<arch>/something?

It's just ancient code nobody's cleaned up yet. I do have
"move all this stuff into arch directories" on my "would like
to do" list, but I just haven't got round to it yet, since it's
not actually actively broken (unlike many other areas of
our codebase :-/).

> BTW, are there any more or less uptodate docs on qemu profiling?  I mean,
> things like perf/oprofile on the host obviously end up lumping all tcg
> output together.  Is there any way to get information beyond "~40% of time
> is spent in generated code, ~15% - in tb_find_fast(), and the rest is very
> much colder"?

Alex, did you say you'd done something with profiling recently?

> Incidentally, combination of --enable-gprof and (default) --enable-pie
> won't build - it dies with ld(1) complaining about relocs in gcrt1.o.

This sounds like a toolchain bug to me :-)

-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 16:33                                                                       ` Peter Maydell
@ 2014-07-08 17:20                                                                         ` Al Viro
  2014-07-08 19:32                                                                           ` Peter Maydell
  2014-07-09  9:04                                                                         ` Alex Bennée
  1 sibling, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08 17:20 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Alex Bennée, QEMU Developers, Richard Henderson

On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:

> > Incidentally, combination of --enable-gprof and (default) --enable-pie
> > won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
> 
> This sounds like a toolchain bug to me :-)

Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
That one has gcc 4.4.3.

Anyway, adding --disable-pie to --enable-gprof gets it to build, but
as I said, gprof is no better than perf and oprofile - same problem.

Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
going through their kernel package build.  I have perf report in front of
me right now; the top ones are
 41.77%  qemu-system-alp  perf-24701.map           [.] 0x7fbbee558930
 11.78%  qemu-system-alp  qemu-system-alpha        [.] cpu_alpha_exec
  4.95%  qemu-system-alp  [vdso]                   [.] 0x7fffdd7ff8de
  2.40%  qemu-system-alp  qemu-system-alpha        [.] phys_page_find
  1.49%  qemu-system-alp  qemu-system-alpha        [.] address_space_translate_internal
  1.34%  qemu-system-alp  [kernel.kallsyms]        [k] read_hpet
  1.26%  qemu-system-alp  qemu-system-alpha        [.] tlb_set_page
  1.23%  qemu-system-alp  qemu-system-alpha        [.] find_next_bit
  1.04%  qemu-system-alp  qemu-system-alpha        [.] get_page_addr_code
  1.01%  qemu-system-alp  libpthread-2.13.so       [.] pthread_mutex_lock
  0.88%  qemu-system-alp  qemu-system-alpha        [.] helper_cmpbge
  0.80%  qemu-system-alp  libc-2.13.so             [.] __memset_sse2
  0.72%  qemu-system-alp  libpthread-2.13.so       [.] __pthread_mutex_unlock_usercnt
  0.70%  qemu-system-alp  qemu-system-alpha        [.] get_physical_address
  0.69%  qemu-system-alp  qemu-system-alpha        [.] address_space_translate
  0.68%  qemu-system-alp  qemu-system-alpha        [.] tcg_optimize
  0.67%  qemu-system-alp  qemu-system-alpha        [.] ldq_phys
  0.63%  qemu-system-alp  qemu-system-alpha        [.] qemu_get_ram_ptr
  0.62%  qemu-system-alp  qemu-system-alpha        [.] helper_le_ldq_mmu
  0.57%  qemu-system-alp  qemu-system-alpha        [.] memory_region_is_ram

and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
It might be worth checking the actual distribution of the hash of virt
address used by that sucker - I wonder if dividing its argument by 4
wouldn't improve the things, but I don't have stats on actual frequency
of conflicts, etc.  In any case, the first lump (42%) seems to be tastier ;-)
There are all kinds of microoptimizations possible (e.g. helper_cmpbge() could
be done by a couple of MMX insns on amd64 host[1]), but it would be nice to
have some details on what we spend the time on in tcg output...

[1] The reason why helper_cmpbge() shows up is that string functions on alpha
use that insn a lot; it _might_ be worth optimizing.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 16:13                                                                     ` Al Viro
  2014-07-08 16:33                                                                       ` Peter Maydell
@ 2014-07-08 18:12                                                                       ` Richard Henderson
  2014-07-08 19:02                                                                         ` Al Viro
  1 sibling, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-08 18:12 UTC (permalink / raw)
  To: Al Viro, Peter Maydell; +Cc: QEMU Developers

On 07/08/2014 09:13 AM, Al Viro wrote:
> Frankly, I suspect that it's better to have qemu-system-alpha behave like
> the actual hardware does (including "FPCR.DNOD can't be set") and keep the
> linux-user behaviour as is, for somebody brave and masochistic enough to
> fight that one.  And no, it's nowhere near "just let denorms ride through
> the normal softfloat code and play a bit with the flags it might raise".
> And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
> want to play with their software completion semantics (not identical to Linux
> one) for the sake of yet-to-be-written bsd-user alpha support...

You're probably right there.

I've pushed a couple more patches to the branch, split out from your patch
here.  I believe I've got it all, and havn't mucked things up in the process.
I'll run some tests later today when I've got time.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 18:12                                                                       ` Richard Henderson
@ 2014-07-08 19:02                                                                         ` Al Viro
  2014-07-08 19:04                                                                           ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08 19:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Tue, Jul 08, 2014 at 11:12:20AM -0700, Richard Henderson wrote:
> On 07/08/2014 09:13 AM, Al Viro wrote:
> > Frankly, I suspect that it's better to have qemu-system-alpha behave like
> > the actual hardware does (including "FPCR.DNOD can't be set") and keep the
> > linux-user behaviour as is, for somebody brave and masochistic enough to
> > fight that one.  And no, it's nowhere near "just let denorms ride through
> > the normal softfloat code and play a bit with the flags it might raise".
> > And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
> > want to play with their software completion semantics (not identical to Linux
> > one) for the sake of yet-to-be-written bsd-user alpha support...
> 
> You're probably right there.
> 
> I've pushed a couple more patches to the branch, split out from your patch
> here.  I believe I've got it all, and havn't mucked things up in the process.
> I'll run some tests later today when I've got time.

Just one thing - 0x1fffffffffffff will make 32bit hosts whine about integer
constant being too large.  So will 0x1ffffffffffffful, unfortunately - it
really ought to be ull.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 19:02                                                                         ` Al Viro
@ 2014-07-08 19:04                                                                           ` Richard Henderson
  2014-07-08 20:20                                                                             ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-08 19:04 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/08/2014 12:02 PM, Al Viro wrote:
> On Tue, Jul 08, 2014 at 11:12:20AM -0700, Richard Henderson wrote:
>> On 07/08/2014 09:13 AM, Al Viro wrote:
>>> Frankly, I suspect that it's better to have qemu-system-alpha behave like
>>> the actual hardware does (including "FPCR.DNOD can't be set") and keep the
>>> linux-user behaviour as is, for somebody brave and masochistic enough to
>>> fight that one.  And no, it's nowhere near "just let denorms ride through
>>> the normal softfloat code and play a bit with the flags it might raise".
>>> And then there's netbsd/alpha and openbsd/alpha, so in theory somebody might
>>> want to play with their software completion semantics (not identical to Linux
>>> one) for the sake of yet-to-be-written bsd-user alpha support...
>>
>> You're probably right there.
>>
>> I've pushed a couple more patches to the branch, split out from your patch
>> here.  I believe I've got it all, and havn't mucked things up in the process.
>> I'll run some tests later today when I've got time.
> 
> Just one thing - 0x1fffffffffffff will make 32bit hosts whine about integer
> constant being too large.  So will 0x1ffffffffffffful, unfortunately - it
> really ought to be ull.
> 

I did use ull on the branch.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 17:20                                                                         ` Al Viro
@ 2014-07-08 19:32                                                                           ` Peter Maydell
  2014-07-08 20:12                                                                             ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2014-07-08 19:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Alex Bennée, QEMU Developers, Richard Henderson

On 8 July 2014 18:20, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:
>
>> > Incidentally, combination of --enable-gprof and (default) --enable-pie
>> > won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
>>
>> This sounds like a toolchain bug to me :-)
>
> Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
> this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
> That one has gcc 4.4.3.

That just makes it a long-standing toolchain bug. I don't see any
reason why PIE + gprof shouldn't work, it just looks like gprof
doesn't ship and link a PIE runtime.

> Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
> going through their kernel package build.  I have perf report in front of
> me right now; the top ones are
>  41.77%  qemu-system-alp  perf-24701.map           [.] 0x7fbbee558930
>  11.78%  qemu-system-alp  qemu-system-alpha        [.] cpu_alpha_exec

> and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
> It might be worth checking the actual distribution of the hash of virt
> address used by that sucker - I wonder if dividing its argument by 4
> wouldn't improve the things, but I don't have stats on actual frequency
> of conflicts, etc.  In any case, the first lump (42%) seems to be tastier ;-)

Depends on your point of view -- arguably we ought to be spending *more*
time executing translated guest code... (As you say, the problem is that
we don't have any breakdown of what things might turn out to be hotspots
in the translated code.)

-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 19:32                                                                           ` Peter Maydell
@ 2014-07-08 20:12                                                                             ` Al Viro
  2014-07-09  9:19                                                                               ` Alex Bennée
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08 20:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Alex Bennée, QEMU Developers, Richard Henderson

On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote:
 On 8 July 2014 18:20, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:
> >
> >> > Incidentally, combination of --enable-gprof and (default) --enable-pie
> >> > won't build - it dies with ld(1) complaining about relocs in gcrt1.o.
> >>
> >> This sounds like a toolchain bug to me :-)
> >
> > Debian stable/amd64, gcc 4.7.2, binutils 2.22.  And google search finds
> > this, for example: http://osdir.com/ml/qemu-devel/2013-05/msg00710.html.
> > That one has gcc 4.4.3.
> 
> That just makes it a long-standing toolchain bug. I don't see any
> reason why PIE + gprof shouldn't work, it just looks like gprof
> doesn't ship and link a PIE runtime.

*nod*

It's not a huge itch to scratch for me, and I'm not even sure whether the
bug should be filed for gcc or for libc (probably the latter).  In any case,
having that information findable in list archives would probably be a good
thing.

Again, gprof isn't particulary useful - kernel-side profilers are at least as
good.  So I suspect that most of the people running into that simply shrug and
use those instead.  Narrowing it down to -pie didn't take long and I can
confirm that this is the root cause of that breakage.  Should make debugging
said toolchain bug a bit easier, if anybody cares to do that...

> > Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
> > going through their kernel package build.  I have perf report in front of
> > me right now; the top ones are
> >  41.77%  qemu-system-alp  perf-24701.map           [.] 0x7fbbee558930
> >  11.78%  qemu-system-alp  qemu-system-alpha        [.] cpu_alpha_exec
> 
> > and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
> > It might be worth checking the actual distribution of the hash of virt
> > address used by that sucker - I wonder if dividing its argument by 4
> > wouldn't improve the things, but I don't have stats on actual frequency
> > of conflicts, etc.  In any case, the first lump (42%) seems to be tastier ;-)
> 
> Depends on your point of view -- arguably we ought to be spending *more*
> time executing translated guest code... (As you say, the problem is that
> we don't have any breakdown of what things might turn out to be hotspots
> in the translated code.)

Might be a fun project to teach perf that hits in such-and-such page should
lead to lookup in a table annotating it.  As in "offsets 42..69 should be
recorded as (<this address> + offset - 42).  Then tcg could generate
such tables and we'd get information like "that much time is spent in
the second host insn of instances of that code pattern generated by
tcg_gen_shr_i64", etc.

No idea if anything of that sort exists - qemu is not the only possible user
for that; looks like it might be useful for any JIT profiling, so somebody
could've done that already...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 19:04                                                                           ` Richard Henderson
@ 2014-07-08 20:20                                                                             ` Al Viro
  2014-07-09  4:59                                                                               ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-08 20:20 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Tue, Jul 08, 2014 at 12:04:10PM -0700, Richard Henderson wrote:

> > Just one thing - 0x1fffffffffffff will make 32bit hosts whine about integer
> > constant being too large.  So will 0x1ffffffffffffful, unfortunately - it
> > really ought to be ull.
> > 
> 
> I did use ull on the branch.

Aha...  So you've caught that one already...  I've looked at your branch;
AFAICS, the only thing missing there is treating stores to FPCR.DNOD in
system mode as "not implemented" (which it is in the code as well as in
21[0-3]64 hardware).  Other than that everything seems to be fine; you are
right about cvtql treatment - since that sucker doesn't have /i in any
allowed trap suffices, we might as well just raise Inexact and let it be
masked out - float_flag_inexact will be present in 'ignore'.  And yes,
folding the calculation itself in there obviously makes sense.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 20:20                                                                             ` Al Viro
@ 2014-07-09  4:59                                                                               ` Richard Henderson
  2014-07-09  5:47                                                                                 ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-09  4:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/08/2014 01:20 PM, Al Viro wrote:
> Aha...  So you've caught that one already...  I've looked at your branch;
> AFAICS, the only thing missing there is treating stores to FPCR.DNOD in
> system mode as "not implemented" (which it is in the code as well as in
> 21[0-3]64 hardware).

Is it loaded and stored on 21264, or it is read-as-zero/write-ignore?
Is UNDZ not required to be paired with DNOD?


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-09  4:59                                                                               ` Richard Henderson
@ 2014-07-09  5:47                                                                                 ` Al Viro
  2014-07-09 15:14                                                                                   ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Al Viro @ 2014-07-09  5:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Tue, Jul 08, 2014 at 09:59:33PM -0700, Richard Henderson wrote:
> On 07/08/2014 01:20 PM, Al Viro wrote:
> > Aha...  So you've caught that one already...  I've looked at your branch;
> > AFAICS, the only thing missing there is treating stores to FPCR.DNOD in
> > system mode as "not implemented" (which it is in the code as well as in
> > 21[0-3]64 hardware).
> 
> Is it loaded and stored on 21264, or it is read-as-zero/write-ignore?

RAZ, and the same on 21364 if Compaq manual for compiler-writers is to be
believed.

On 21264 bits 48..62 are writable, bit 63 is disjunction of bits 52..57
(stores are ignored), bits 0..47 are RAZ.  AARM requires RAZ bits 0..46
and RAZ on everything optional that is unimplemented.  IOW, DNOD is
unimplemented there, all other optional ones are implemented.  And
according to https://archive.org/details/dec-comp_guide_v2 21364 doesn't
implement DNOD either...

> Is UNDZ not required to be paired with DNOD?

There are 4 bits having some relation to handling of denorms.  DNZ and DNOD
are about denorm inputs; UNDZ and UNFD - about denorm output.  All of
them have effect only for IEEE insns with /S in trap suffix.

Rules:
	* if DNZ, denorm inputs are silently replaced with zero.
	* if !DNZ && !DNOD, denorm inputs trigger trap (invalid).  Same
as what would happen without /S.
	* if !DNZ && DNOD, perform operation on denorm(s).  And I would like
to play with whatever you are using to bring hardware from alternative
universes.
	* if !UNFD, denorm output triggers trap (underflow).  Same as what
would happen without /S.
	* if UNFD && UNDZ, denorm output is replaced with zero.
	* if UNFD && !UNDZ, denorm output remains as is.

So env->fpcr_flush_to_zero = env->fpcr_dnod & env->fpcr_undz; is another
bug - needs s/dnod/unfd/ there...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 16:33                                                                       ` Peter Maydell
  2014-07-08 17:20                                                                         ` Al Viro
@ 2014-07-09  9:04                                                                         ` Alex Bennée
  1 sibling, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2014-07-09  9:04 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Al Viro, Richard Henderson


Peter Maydell writes:

> On 8 July 2014 17:13, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Tue, Jul 08, 2014 at 09:05:10AM +0100, Peter Maydell wrote:
>>
<snip>
>> BTW, are there any more or less uptodate docs on qemu profiling?  I mean,
>> things like perf/oprofile on the host obviously end up lumping all tcg
>> output together.  Is there any way to get information beyond "~40% of time
>> is spent in generated code, ~15% - in tb_find_fast(), and the rest is very
>> much colder"?
>
> Alex, did you say you'd done something with profiling recently?

I posted some RFC patches up a while back that spit out the perf
/tmp/perf.<pid> JIT maps that helps with breaking down which TCG TBs are
the most executed.

There is another set of patches which allow you to selectively dump
translation blocks so you don't end up with multi-gigabyte log files.

I'm going to be doing some profiling myself over the next few days so
I'll clean-up the patches and re-submit to the list soon.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-08 20:12                                                                             ` Al Viro
@ 2014-07-09  9:19                                                                               ` Alex Bennée
  0 siblings, 0 replies; 68+ messages in thread
From: Alex Bennée @ 2014-07-09  9:19 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers, Richard Henderson


Al Viro writes:

> On Tue, Jul 08, 2014 at 08:32:55PM +0100, Peter Maydell wrote:
>  On 8 July 2014 18:20, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > On Tue, Jul 08, 2014 at 05:33:16PM +0100, Peter Maydell wrote:
<snip>
> Again, gprof isn't particulary useful - kernel-side profilers are at least as
> good.  So I suspect that most of the people running into that simply shrug and
> use those instead.  Narrowing it down to -pie didn't take long and I can
> confirm that this is the root cause of that breakage.  Should make debugging
> said toolchain bug a bit easier, if anybody cares to do that...
>
>> > Stats I quoted were from qemu-system-alpha booting debian/lenny (5.10) and
>> > going through their kernel package build.  I have perf report in front of
>> > me right now; the top ones are
>> >  41.77%  qemu-system-alp  perf-24701.map           [.] 0x7fbbee558930
>> >  11.78%  qemu-system-alp  qemu-system-alpha        [.] cpu_alpha_exec
>> 
>> > and cpu_alpha_exec() spends most of the time in inlined tb_find_fast().
>> > It might be worth checking the actual distribution of the hash of virt
>> > address used by that sucker - I wonder if dividing its argument by 4
>> > wouldn't improve the things, but I don't have stats on actual frequency
>> > of conflicts, etc.  In any case, the first lump (42%) seems to be tastier ;-)
>> 
>> Depends on your point of view -- arguably we ought to be spending *more*
>> time executing translated guest code... (As you say, the problem is that
>> we don't have any breakdown of what things might turn out to be hotspots
>> in the translated code.)
>
> Might be a fun project to teach perf that hits in such-and-such page should
> lead to lookup in a table annotating it.  As in "offsets 42..69 should be
> recorded as (<this address> + offset - 42).  Then tcg could generate
> such tables and we'd get information like "that much time is spent in
> the second host insn of instances of that code pattern generated by
> tcg_gen_shr_i64", etc.
>
> No idea if anything of that sort exists - qemu is not the only possible user
> for that; looks like it might be useful for any JIT profiling, so somebody
> could've done that already...

Handily our patch tracker has remembered what I couldn't find ;-)

https://patches.linaro.org/27229/

As I mentioned previously I plan to clean these up over the next week.

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-09  5:47                                                                                 ` Al Viro
@ 2014-07-09 15:14                                                                                   ` Richard Henderson
  2014-07-09 16:41                                                                                     ` Al Viro
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 2014-07-09 15:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Peter Maydell, QEMU Developers

On 07/08/2014 10:47 PM, Al Viro wrote:
> So env->fpcr_flush_to_zero = env->fpcr_dnod & env->fpcr_undz; is another
> bug - needs s/dnod/unfd/ there...

That's exactly what I was looking at, thanks.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions
  2014-07-09 15:14                                                                                   ` Richard Henderson
@ 2014-07-09 16:41                                                                                     ` Al Viro
  0 siblings, 0 replies; 68+ messages in thread
From: Al Viro @ 2014-07-09 16:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On Wed, Jul 09, 2014 at 08:14:12AM -0700, Richard Henderson wrote:
> On 07/08/2014 10:47 PM, Al Viro wrote:
> > So env->fpcr_flush_to_zero = env->fpcr_dnod & env->fpcr_undz; is another
> > bug - needs s/dnod/unfd/ there...
> 
> That's exactly what I was looking at, thanks.

BTW, that (unimplementeds being RAZ) is why AARM insists on having FP_C in
software - FPCR isn't guaranteed to have the "trap disable" bits and, in fact,
doesn't have anywhere to store IEEE_TRAP_ENABLE_DNO on actual hw.  The
software completion is where it has to be dealt with; note that both
swcr_update_status() and ieee_swcr_to_fpcr() treat ->ieee_state (i.e. our FP_C)
as authoritative wrt trap enable bits, 21264 or not.  Trap _status_ bits are
different - there (on 21264) FPCR is considered authoritative, but that's it.

Unimplemented trap bits are treated as "trap enabled", so the completion gets
to decide what it wants to do.  If you want to keep FPCR authoritative for
all those bits in linux-user case, we have to treat FPCR.DNOD as writable
bit for that mode, which is why my variant slapped an ifdef CONFIG_USER_ONLY
around 
     env->fpcr_dnod = (val & FPCR_DNOD) != 0;
instead of ripping it out completely...

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2014-07-09 16:41 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-24  4:34 [Qemu-devel] [RFC] alpha qemu arithmetic exceptions Al Viro
2014-06-24 16:52 ` Al Viro
2014-06-24 18:33   ` Richard Henderson
2014-06-24 20:32     ` Al Viro
2014-06-24 20:57       ` Richard Henderson
2014-06-24 21:24         ` Al Viro
2014-06-24 21:32           ` Richard Henderson
2014-06-25  7:01             ` Al Viro
2014-06-25  9:27               ` Peter Maydell
2014-06-25 14:26                 ` Al Viro
2014-06-25 17:41                   ` Peter Maydell
2014-06-26  5:55               ` Al Viro
2014-06-30 18:39                 ` Richard Henderson
2014-06-30 20:56                   ` Al Viro
2014-07-01  4:34                     ` Al Viro
2014-07-01  5:00                       ` Al Viro
2014-07-01 14:31                       ` Richard Henderson
2014-07-01 17:03                     ` Richard Henderson
2014-07-01 17:50                       ` Al Viro
2014-07-01 18:23                         ` Peter Maydell
2014-07-01 18:30                           ` Richard Henderson
2014-07-01 19:08                             ` Peter Maydell
2014-07-02  4:05                             ` Al Viro
2014-07-02  5:50                               ` Al Viro
2014-07-02  6:17                                 ` Al Viro
2014-07-02 15:26                                   ` Richard Henderson
2014-07-02 15:49                                     ` Al Viro
2014-07-02 14:59                               ` Richard Henderson
2014-07-02 15:20                                 ` Al Viro
2014-07-03  6:51                                   ` Al Viro
2014-07-03 18:25                                     ` Al Viro
2014-07-03 20:19                                       ` Richard Henderson
2014-07-03 22:47                                         ` Al Viro
2014-07-03 23:05                                           ` Peter Maydell
2014-07-03 23:22                                             ` Al Viro
2014-07-04  0:50                                         ` Al Viro
2014-07-04  4:30                                           ` Richard Henderson
2014-07-04  7:29                                             ` Al Viro
2014-07-05  1:40                                               ` Al Viro
2014-07-05  5:26                                                 ` Al Viro
2014-07-05 21:09                                                 ` Al Viro
2014-07-05 22:55                                                   ` Al Viro
2014-07-07 14:11                                                     ` Richard Henderson
2014-07-07 15:06                                                       ` Al Viro
2014-07-07 16:20                                                         ` Richard Henderson
2014-07-08  4:20                                                           ` Al Viro
2014-07-08  6:03                                                             ` Richard Henderson
2014-07-08  6:54                                                               ` Al Viro
2014-07-08  7:13                                                                 ` Al Viro
2014-07-08  8:05                                                                   ` Peter Maydell
2014-07-08 14:53                                                                     ` Richard Henderson
2014-07-08 16:13                                                                     ` Al Viro
2014-07-08 16:33                                                                       ` Peter Maydell
2014-07-08 17:20                                                                         ` Al Viro
2014-07-08 19:32                                                                           ` Peter Maydell
2014-07-08 20:12                                                                             ` Al Viro
2014-07-09  9:19                                                                               ` Alex Bennée
2014-07-09  9:04                                                                         ` Alex Bennée
2014-07-08 18:12                                                                       ` Richard Henderson
2014-07-08 19:02                                                                         ` Al Viro
2014-07-08 19:04                                                                           ` Richard Henderson
2014-07-08 20:20                                                                             ` Al Viro
2014-07-09  4:59                                                                               ` Richard Henderson
2014-07-09  5:47                                                                                 ` Al Viro
2014-07-09 15:14                                                                                   ` Richard Henderson
2014-07-09 16:41                                                                                     ` Al Viro
2014-06-24 18:23 ` Richard Henderson
2014-06-24 20:47   ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.