All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
@ 2020-10-09 14:47 Peter Maydell
  2020-10-09 17:57 ` Richard Henderson
  2020-10-09 18:48 ` Peter Maydell
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Maydell @ 2020-10-09 14:47 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

The SMLAD instruction is supposed to:
 * signed multiply Rn[15:0] * Rm[15:0]
 * signed multiply Rn[31:16] * Rm[31:16]
 * perform a signed addition of the products and Ra
 * set Rd to the low 32 bits of the theoretical
   infinite-precision result
 * set the Q flag if the sign-extension of Rd
   would differ from the infinite-precision result
   (ie on overflow)

Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q.  This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
 r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
 smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1

The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.

Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 58 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index d34c1d351a6..d8729e42c48 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7401,21 +7401,59 @@ static bool op_smlad(DisasContext *s, arg_rrrr *a, bool m_swap, bool sub)
     gen_smul_dual(t1, t2);
 
     if (sub) {
-        /* This subtraction cannot overflow. */
+        /*
+         * This subtraction cannot overflow, so we can do a simple
+         * 32-bit subtraction and then a possible 32-bit saturating
+         * addition of Ra.
+         */
         tcg_gen_sub_i32(t1, t1, t2);
+        tcg_temp_free_i32(t2);
+
+        if (a->ra != 15) {
+            t2 = load_reg(s, a->ra);
+            gen_helper_add_setq(t1, cpu_env, t1, t2);
+            tcg_temp_free_i32(t2);
+        }
+    } else if (a->ra == 15) {
+        /* Single saturation-checking addition */
+        gen_helper_add_setq(t1, cpu_env, t1, t2);
+        tcg_temp_free_i32(t2);
     } else {
         /*
-         * This addition cannot overflow 32 bits; however it may
-         * overflow considered as a signed operation, in which case
-         * we must set the Q flag.
+         * We need to add the products and Ra together and then
+         * determine whether the final result overflowed. Doing
+         * this as two separate add-and-check-overflow steps incorrectly
+         * sets Q for cases like (-32768 * -32768) + (-32768 * -32768) + -1.
+         * Do all the arithmetic at 64-bits and then check for overflow.
          */
-        gen_helper_add_setq(t1, cpu_env, t1, t2);
-    }
-    tcg_temp_free_i32(t2);
+        TCGv_i64 p64, q64;
+        TCGv_i32 t3, qf, one;
 
-    if (a->ra != 15) {
-        t2 = load_reg(s, a->ra);
-        gen_helper_add_setq(t1, cpu_env, t1, t2);
+        p64 = tcg_temp_new_i64();
+        q64 = tcg_temp_new_i64();
+        tcg_gen_ext_i32_i64(p64, t1);
+        tcg_gen_ext_i32_i64(q64, t2);
+        tcg_gen_add_i64(p64, p64, q64);
+        load_reg_var(s, t2, a->ra);
+        tcg_gen_ext_i32_i64(q64, t2);
+        tcg_gen_add_i64(p64, p64, q64);
+        tcg_temp_free_i64(q64);
+
+        tcg_gen_extr_i64_i32(t1, t2, p64);
+        tcg_temp_free_i64(p64);
+        /*
+         * t1 is the low half of the result which goes into Rd.
+         * We have overflow and must set Q if the high half (t2)
+         * is different from the sign-extension of t1.
+         */
+        t3 = tcg_temp_new_i32();
+        tcg_gen_sari_i32(t3, t1, 31);
+        qf = load_cpu_field(QF);
+        one = tcg_const_i32(1);
+        tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
+        store_cpu_field(qf, QF);
+        tcg_temp_free_i32(one);
+        tcg_temp_free_i32(t3);
         tcg_temp_free_i32(t2);
     }
     store_reg(s, a->rd, t1);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
  2020-10-09 14:47 [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit Peter Maydell
@ 2020-10-09 17:57 ` Richard Henderson
  2020-10-09 18:47   ` Peter Maydell
  2020-10-09 18:48 ` Peter Maydell
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2020-10-09 17:57 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, qemu-devel

On 10/9/20 9:47 AM, Peter Maydell wrote:
> +        /*
> +         * t1 is the low half of the result which goes into Rd.
> +         * We have overflow and must set Q if the high half (t2)
> +         * is different from the sign-extension of t1.
> +         */
> +        t3 = tcg_temp_new_i32();
> +        tcg_gen_sari_i32(t3, t1, 31);
> +        qf = load_cpu_field(QF);
> +        one = tcg_const_i32(1);
> +        tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
> +        store_cpu_field(qf, QF);

This works, however, QF is not restricted to {0,1}.

Perhaps this is simpler?

	qf = load_cpu_field(QF);
	/* t1 is the low half; extract the sign bit */
	tcg_gen_shri_i32(t3, t1, 31);
	/* t2 is the high half; must be 0 or -1,
	   and the extension of the sign bit.  adding them
	   must equal 0 (0 + 0 = 0; -1 + 1 = 0). */
	tcg_gen_add_i32(t3, t3, t2);
	/* Any non-zero result sets QF */
	tcg_gen_or_i32(qf, qf, t3);
	store_cpu_field(qf, QF);

Either way,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
  2020-10-09 17:57 ` Richard Henderson
@ 2020-10-09 18:47   ` Peter Maydell
  2020-10-09 22:34     ` Richard Henderson
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-10-09 18:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Fri, 9 Oct 2020 at 18:57, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/9/20 9:47 AM, Peter Maydell wrote:
> > +        /*
> > +         * t1 is the low half of the result which goes into Rd.
> > +         * We have overflow and must set Q if the high half (t2)
> > +         * is different from the sign-extension of t1.
> > +         */
> > +        t3 = tcg_temp_new_i32();
> > +        tcg_gen_sari_i32(t3, t1, 31);
> > +        qf = load_cpu_field(QF);
> > +        one = tcg_const_i32(1);
> > +        tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
> > +        store_cpu_field(qf, QF);
>
> This works, however, QF is not restricted to {0,1}.

I'm not sure if you mean "this code doesn't maintain that
invariant" or "there is no such invariant". If the former,
the declaration of the field in cpu.h disagrees:
    uint32_t QF; /* 0 or 1 */

If the latter, I think the code does preserve the invariant.
Either we set QF to 'one', or we write back the value it had
to start with (which must have been {0,1}, but we don't really
care, because we don't want to touch it.)

> Perhaps this is simpler?
>
>         qf = load_cpu_field(QF);
>         /* t1 is the low half; extract the sign bit */
>         tcg_gen_shri_i32(t3, t1, 31);
>         /* t2 is the high half; must be 0 or -1,
>            and the extension of the sign bit.  adding them
>            must equal 0 (0 + 0 = 0; -1 + 1 = 0). */
>         tcg_gen_add_i32(t3, t3, t2);
>         /* Any non-zero result sets QF */
>         tcg_gen_or_i32(qf, qf, t3);
>         store_cpu_field(qf, QF);

This looks like it can write something to QF that isn't 0 or 1.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
  2020-10-09 14:47 [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit Peter Maydell
  2020-10-09 17:57 ` Richard Henderson
@ 2020-10-09 18:48 ` Peter Maydell
  2020-10-09 22:36   ` Richard Henderson
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-10-09 18:48 UTC (permalink / raw)
  To: qemu-arm, QEMU Developers, Richard Henderson

On Fri, 9 Oct 2020 at 15:47, Peter Maydell <peter.maydell@linaro.org> wrote:
> +        tcg_gen_extr_i64_i32(t1, t2, p64);

Oh, I forgot to mention, but it looks like extr_i64_i32
isn't documented in tcg/README. Is that because it isn't
really a TCG IR op, or just an omission?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
  2020-10-09 18:47   ` Peter Maydell
@ 2020-10-09 22:34     ` Richard Henderson
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2020-10-09 22:34 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers

On 10/9/20 1:47 PM, Peter Maydell wrote:
> On Fri, 9 Oct 2020 at 18:57, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 10/9/20 9:47 AM, Peter Maydell wrote:
>>> +        /*
>>> +         * t1 is the low half of the result which goes into Rd.
>>> +         * We have overflow and must set Q if the high half (t2)
>>> +         * is different from the sign-extension of t1.
>>> +         */
>>> +        t3 = tcg_temp_new_i32();
>>> +        tcg_gen_sari_i32(t3, t1, 31);
>>> +        qf = load_cpu_field(QF);
>>> +        one = tcg_const_i32(1);
>>> +        tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
>>> +        store_cpu_field(qf, QF);
>>
>> This works, however, QF is not restricted to {0,1}.
> 
> I'm not sure if you mean "this code doesn't maintain that
> invariant" or "there is no such invariant". If the former,
> the declaration of the field in cpu.h disagrees:
>     uint32_t QF; /* 0 or 1 */

Oops, I confused QF with QC, the neon saturation bit.


r~


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
  2020-10-09 18:48 ` Peter Maydell
@ 2020-10-09 22:36   ` Richard Henderson
  2020-10-10 12:48     ` Peter Maydell
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Henderson @ 2020-10-09 22:36 UTC (permalink / raw)
  To: Peter Maydell, qemu-arm, QEMU Developers

On 10/9/20 1:48 PM, Peter Maydell wrote:
> On Fri, 9 Oct 2020 at 15:47, Peter Maydell <peter.maydell@linaro.org> wrote:
>> +        tcg_gen_extr_i64_i32(t1, t2, p64);
> 
> Oh, I forgot to mention, but it looks like extr_i64_i32
> isn't documented in tcg/README. Is that because it isn't
> really a TCG IR op, or just an omission?

Because it's not an IR op.  It's the combo of extrl and extrh.


r~


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit
  2020-10-09 22:36   ` Richard Henderson
@ 2020-10-10 12:48     ` Peter Maydell
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Maydell @ 2020-10-10 12:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Fri, 9 Oct 2020 at 23:36, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/9/20 1:48 PM, Peter Maydell wrote:
> > On Fri, 9 Oct 2020 at 15:47, Peter Maydell <peter.maydell@linaro.org> wrote:
> >> +        tcg_gen_extr_i64_i32(t1, t2, p64);
> >
> > Oh, I forgot to mention, but it looks like extr_i64_i32
> > isn't documented in tcg/README. Is that because it isn't
> > really a TCG IR op, or just an omission?
>
> Because it's not an IR op.  It's the combo of extrl and extrh.

We really should figure out somewhere to document the
interface and operations that frontends can use. (Among
other important things, extr_i64_i32 is usable generically,
but extrl/extrh are 64-bit hosts only according to tcg/README,
so if you trusted the README docs then you'd end up
trying to synthesize this out of shifts and trunc.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-10 12:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-09 14:47 [PATCH] target/arm: Fix SMLAD incorrect setting of Q bit Peter Maydell
2020-10-09 17:57 ` Richard Henderson
2020-10-09 18:47   ` Peter Maydell
2020-10-09 22:34     ` Richard Henderson
2020-10-09 18:48 ` Peter Maydell
2020-10-09 22:36   ` Richard Henderson
2020-10-10 12:48     ` Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.