qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups
@ 2019-08-08 20:26 Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im Richard Henderson
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Some of these were cleanups that I was making simultaneous
with the decodetree split.  Let's do those beforehand to
make the split easier to read.

Some of these are new, noticed while I was in the area.


r~


Richard Henderson (7):
  target/arm: Use tcg_gen_extract_i32 for shifter_out_im
  target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB
  target/arm: Remove redundant shift tests
  target/arm: Use ror32 instead of open-coding the operation
  target/arm: Use tcg_gen_rotri_i32 for gen_swap_half
  target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
  target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word

 target/arm/translate.c | 157 ++++++++++++++---------------------------
 1 file changed, 53 insertions(+), 104 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB Richard Henderson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Extract is a compact combination of shift + and.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 846052acea..43e005d191 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -620,14 +620,7 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
 
 static void shifter_out_im(TCGv_i32 var, int shift)
 {
-    if (shift == 0) {
-        tcg_gen_andi_i32(cpu_CF, var, 1);
-    } else {
-        tcg_gen_shri_i32(cpu_CF, var, shift);
-        if (shift != 31) {
-            tcg_gen_andi_i32(cpu_CF, cpu_CF, 1);
-        }
-    }
+    tcg_gen_extract_i32(cpu_CF, var, shift, 1);
 }
 
 /* Shift by immediate.  Includes special handling for shift == 0.  */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests Richard Henderson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Use deposit as the composit operation to merge the
bits from the two inputs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 26 ++++++++++----------------
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 43e005d191..94170af134 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8769,19 +8769,16 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                         shift = (insn >> 7) & 0x1f;
                         if (insn & (1 << 6)) {
                             /* pkhtb */
-                            if (shift == 0)
+                            if (shift == 0) {
                                 shift = 31;
+                            }
                             tcg_gen_sari_i32(tmp2, tmp2, shift);
-                            tcg_gen_andi_i32(tmp, tmp, 0xffff0000);
-                            tcg_gen_ext16u_i32(tmp2, tmp2);
+                            tcg_gen_deposit_i32(tmp, tmp, tmp2, 0, 16);
                         } else {
                             /* pkhbt */
-                            if (shift)
-                                tcg_gen_shli_i32(tmp2, tmp2, shift);
-                            tcg_gen_ext16u_i32(tmp, tmp);
-                            tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
+                            tcg_gen_shli_i32(tmp2, tmp2, shift);
+                            tcg_gen_deposit_i32(tmp, tmp2, tmp, 0, 16);
                         }
-                        tcg_gen_or_i32(tmp, tmp, tmp2);
                         tcg_temp_free_i32(tmp2);
                         store_reg(s, rd, tmp);
                     } else if ((insn & 0x00200020) == 0x00200000) {
@@ -9817,19 +9814,16 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             shift = ((insn >> 10) & 0x1c) | ((insn >> 6) & 0x3);
             if (insn & (1 << 5)) {
                 /* pkhtb */
-                if (shift == 0)
+                if (shift == 0) {
                     shift = 31;
+                }
                 tcg_gen_sari_i32(tmp2, tmp2, shift);
-                tcg_gen_andi_i32(tmp, tmp, 0xffff0000);
-                tcg_gen_ext16u_i32(tmp2, tmp2);
+                tcg_gen_deposit_i32(tmp, tmp, tmp2, 0, 16);
             } else {
                 /* pkhbt */
-                if (shift)
-                    tcg_gen_shli_i32(tmp2, tmp2, shift);
-                tcg_gen_ext16u_i32(tmp, tmp);
-                tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
+                tcg_gen_shli_i32(tmp2, tmp2, shift);
+                tcg_gen_deposit_i32(tmp, tmp2, tmp, 0, 16);
             }
-            tcg_gen_or_i32(tmp, tmp, tmp2);
             tcg_temp_free_i32(tmp2);
             store_reg(s, rd, tmp);
         } else {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation Richard Henderson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

The immediate shift generator functions already test for,
and eliminate, the case of a shift by zero.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 94170af134..3ddc404b3b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8826,8 +8826,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                         shift = (insn >> 10) & 3;
                         /* ??? In many cases it's not necessary to do a
                            rotate, a shift is sufficient.  */
-                        if (shift != 0)
-                            tcg_gen_rotri_i32(tmp, tmp, shift * 8);
+                        tcg_gen_rotri_i32(tmp, tmp, shift * 8);
                         op1 = (insn >> 20) & 7;
                         switch (op1) {
                         case 0: gen_sxtb16(tmp);  break;
@@ -9904,8 +9903,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             shift = (insn >> 4) & 3;
             /* ??? In many cases it's not necessary to do a
                rotate, a shift is sufficient.  */
-            if (shift != 0)
-                tcg_gen_rotri_i32(tmp, tmp, shift * 8);
+            tcg_gen_rotri_i32(tmp, tmp, shift * 8);
             op = (insn >> 20) & 7;
             switch (op) {
             case 0: gen_sxth(tmp);   break;
@@ -10632,11 +10630,10 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                     case 7:
                         goto illegal_op;
                     default: /* Saturate.  */
-                        if (shift) {
-                            if (op & 1)
-                                tcg_gen_sari_i32(tmp, tmp, shift);
-                            else
-                                tcg_gen_shli_i32(tmp, tmp, shift);
+                        if (op & 1) {
+                            tcg_gen_sari_i32(tmp, tmp, shift);
+                        } else {
+                            tcg_gen_shli_i32(tmp, tmp, shift);
                         }
                         tmp2 = tcg_const_i32(imm);
                         if (op & 4) {
@@ -10827,9 +10824,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                     goto illegal_op;
                 }
                 tmp = load_reg(s, rm);
-                if (shift) {
-                    tcg_gen_shli_i32(tmp, tmp, shift);
-                }
+                tcg_gen_shli_i32(tmp, tmp, shift);
                 tcg_gen_add_i32(addr, addr, tmp);
                 tcg_temp_free_i32(tmp);
                 break;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
                   ` (2 preceding siblings ...)
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half Richard Henderson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

The helper function is more documentary, and also already
handles the case of rotate by zero.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3ddc404b3b..b40f163bab 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7979,8 +7979,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                 /* CPSR = immediate */
                 val = insn & 0xff;
                 shift = ((insn >> 8) & 0xf) * 2;
-                if (shift)
-                    val = (val >> shift) | (val << (32 - shift));
+                val = ror32(val, shift);
                 i = ((insn & (1 << 22)) != 0);
                 if (gen_set_psr_im(s, msr_mask(s, (insn >> 16) & 0xf, i),
                                    i, val)) {
@@ -8243,9 +8242,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             /* immediate operand */
             val = insn & 0xff;
             shift = ((insn >> 8) & 0xf) * 2;
-            if (shift) {
-                val = (val >> shift) | (val << (32 - shift));
-            }
+            val = ror32(val, shift);
             tmp2 = tcg_temp_new_i32();
             tcg_gen_movi_i32(tmp2, val);
             if (logic_cc && shift) {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
                   ` (3 preceding siblings ...)
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Rotate is the more compact and obvious way to swap 16-bit
elements of a 32-bit word.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index b40f163bab..ddc54e77e4 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -459,11 +459,7 @@ static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv_i32 var)
 {
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_shri_i32(tmp, var, 16);
-    tcg_gen_shli_i32(var, var, 16);
-    tcg_gen_or_i32(var, var, tmp);
-    tcg_temp_free_i32(tmp);
+    tcg_gen_rotri_i32(var, var, 16);
 }
 
 /* Dual 16-bit add.  Result placed in t0 and t1 is marked as dead.
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
                   ` (4 preceding siblings ...)
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-28  7:22   ` [Qemu-devel] [Qemu-arm] " Laurent Desnogues
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
  2019-08-15 10:34 ` [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Peter Maydell
  7 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

All of the inputs to these instructions are 32-bits.  Rather than
extend each input to 64-bits and then extract the high 32-bits of
the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 72 +++++++++++++++---------------------------
 1 file changed, 26 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index ddc54e77e4..77154be743 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -391,34 +391,6 @@ static void gen_revsh(TCGv_i32 var)
     tcg_gen_ext16s_i32(var, var);
 }
 
-/* Return (b << 32) + a. Mark inputs as dead */
-static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
-{
-    TCGv_i64 tmp64 = tcg_temp_new_i64();
-
-    tcg_gen_extu_i32_i64(tmp64, b);
-    tcg_temp_free_i32(b);
-    tcg_gen_shli_i64(tmp64, tmp64, 32);
-    tcg_gen_add_i64(a, tmp64, a);
-
-    tcg_temp_free_i64(tmp64);
-    return a;
-}
-
-/* Return (b << 32) - a. Mark inputs as dead. */
-static TCGv_i64 gen_subq_msw(TCGv_i64 a, TCGv_i32 b)
-{
-    TCGv_i64 tmp64 = tcg_temp_new_i64();
-
-    tcg_gen_extu_i32_i64(tmp64, b);
-    tcg_temp_free_i32(b);
-    tcg_gen_shli_i64(tmp64, tmp64, 32);
-    tcg_gen_sub_i64(a, tmp64, a);
-
-    tcg_temp_free_i64(tmp64);
-    return a;
-}
-
 /* 32x32->64 multiply.  Marks inputs as dead.  */
 static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
 {
@@ -8872,23 +8844,27 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                            (SMMUL, SMMLA, SMMLS) */
                         tmp = load_reg(s, rm);
                         tmp2 = load_reg(s, rs);
-                        tmp64 = gen_muls_i64_i32(tmp, tmp2);
+                        tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
 
                         if (rd != 15) {
-                            tmp = load_reg(s, rd);
+                            tmp3 = load_reg(s, rd);
                             if (insn & (1 << 6)) {
-                                tmp64 = gen_subq_msw(tmp64, tmp);
+                                tcg_gen_sub_i32(tmp, tmp, tmp3);
                             } else {
-                                tmp64 = gen_addq_msw(tmp64, tmp);
+                                tcg_gen_add_i32(tmp, tmp, tmp3);
                             }
+                            tcg_temp_free_i32(tmp3);
                         }
                         if (insn & (1 << 5)) {
-                            tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
+                            /*
+                             * Adding 0x80000000 to the 64-bit quantity
+                             * means that we have carry in to the high
+                             * word when the low word has the high bit set.
+                             */
+                            tcg_gen_shri_i32(tmp2, tmp2, 31);
+                            tcg_gen_add_i32(tmp, tmp, tmp2);
                         }
-                        tcg_gen_shri_i64(tmp64, tmp64, 32);
-                        tmp = tcg_temp_new_i32();
-                        tcg_gen_extrl_i64_i32(tmp, tmp64);
-                        tcg_temp_free_i64(tmp64);
+                        tcg_temp_free_i32(tmp2);
                         store_reg(s, rn, tmp);
                         break;
                     case 0:
@@ -10114,22 +10090,26 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                   }
                 break;
             case 5: case 6: /* 32 * 32 -> 32msb (SMMUL, SMMLA, SMMLS) */
-                tmp64 = gen_muls_i64_i32(tmp, tmp2);
+                tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
                 if (rs != 15) {
-                    tmp = load_reg(s, rs);
+                    tmp3 = load_reg(s, rs);
                     if (insn & (1 << 20)) {
-                        tmp64 = gen_addq_msw(tmp64, tmp);
+                        tcg_gen_add_i32(tmp, tmp, tmp3);
                     } else {
-                        tmp64 = gen_subq_msw(tmp64, tmp);
+                        tcg_gen_sub_i32(tmp, tmp, tmp3);
                     }
+                    tcg_temp_free_i32(tmp3);
                 }
                 if (insn & (1 << 4)) {
-                    tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
+                    /*
+                     * Adding 0x80000000 to the 64-bit quantity
+                     * means that we have carry in to the high
+                     * word when the low word has the high bit set.
+                     */
+                    tcg_gen_shri_i32(tmp2, tmp2, 31);
+                    tcg_gen_add_i32(tmp, tmp, tmp2);
                 }
-                tcg_gen_shri_i64(tmp64, tmp64, 32);
-                tmp = tcg_temp_new_i32();
-                tcg_gen_extrl_i64_i32(tmp, tmp64);
-                tcg_temp_free_i64(tmp64);
+                tcg_temp_free_i32(tmp2);
                 break;
             case 7: /* Unsigned sum of absolute differences.  */
                 gen_helper_usad8(tmp, tmp, tmp2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
                   ` (5 preceding siblings ...)
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
  2019-08-15 10:16   ` Peter Maydell
  2019-08-15 10:34 ` [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Peter Maydell
  7 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, qemu-arm

Separate shift + extract low will result in one extra insn
for hosts like RISC-V, MIPS, and Sparc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 77154be743..9e103e4fad 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1761,8 +1761,7 @@ static int disas_iwmmxt_insn(DisasContext *s, uint32_t insn)
             if (insn & ARM_CP_RW_BIT) {                         /* TMRRC */
                 iwmmxt_load_reg(cpu_V0, wrd);
                 tcg_gen_extrl_i64_i32(cpu_R[rdlo], cpu_V0);
-                tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
-                tcg_gen_extrl_i64_i32(cpu_R[rdhi], cpu_V0);
+                tcg_gen_extrh_i64_i32(cpu_R[rdhi], cpu_V0);
             } else {                                    /* TMCRR */
                 tcg_gen_concat_i32_i64(cpu_V0, cpu_R[rdlo], cpu_R[rdhi]);
                 iwmmxt_store_reg(cpu_V0, wrd);
@@ -2807,8 +2806,7 @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
         if (insn & ARM_CP_RW_BIT) {                     /* MRA */
             iwmmxt_load_reg(cpu_V0, acc);
             tcg_gen_extrl_i64_i32(cpu_R[rdlo], cpu_V0);
-            tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
-            tcg_gen_extrl_i64_i32(cpu_R[rdhi], cpu_V0);
+            tcg_gen_extrh_i64_i32(cpu_R[rdhi], cpu_V0);
             tcg_gen_andi_i32(cpu_R[rdhi], cpu_R[rdhi], (1 << (40 - 32)) - 1);
         } else {                                        /* MAR */
             tcg_gen_concat_i32_i64(cpu_V0, cpu_R[rdlo], cpu_R[rdhi]);
@@ -6005,8 +6003,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                 gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
                                 break;
                             case 2:
-                                tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
-                                tcg_gen_extrl_i64_i32(tmp, cpu_V0);
+                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
                                 break;
                             default: abort();
                             }
@@ -6020,8 +6017,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                 break;
                             case 2:
                                 tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
-                                tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
-                                tcg_gen_extrl_i64_i32(tmp, cpu_V0);
+                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
                                 break;
                             default: abort();
                             }
@@ -7254,9 +7250,8 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
                 tmp = tcg_temp_new_i32();
                 tcg_gen_extrl_i64_i32(tmp, tmp64);
                 store_reg(s, rt, tmp);
-                tcg_gen_shri_i64(tmp64, tmp64, 32);
                 tmp = tcg_temp_new_i32();
-                tcg_gen_extrl_i64_i32(tmp, tmp64);
+                tcg_gen_extrh_i64_i32(tmp, tmp64);
                 tcg_temp_free_i64(tmp64);
                 store_reg(s, rt2, tmp);
             } else {
@@ -7365,8 +7360,7 @@ static void gen_storeq_reg(DisasContext *s, int rlow, int rhigh, TCGv_i64 val)
     tcg_gen_extrl_i64_i32(tmp, val);
     store_reg(s, rlow, tmp);
     tmp = tcg_temp_new_i32();
-    tcg_gen_shri_i64(val, val, 32);
-    tcg_gen_extrl_i64_i32(tmp, val);
+    tcg_gen_extrh_i64_i32(tmp, val);
     store_reg(s, rhigh, tmp);
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
@ 2019-08-15 10:16   ` Peter Maydell
  2019-08-15 11:56     ` Richard Henderson
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2019-08-15 10:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 8 Aug 2019 at 21:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Separate shift + extract low will result in one extra insn
> for hosts like RISC-V, MIPS, and Sparc.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 77154be743..9e103e4fad 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1761,8 +1761,7 @@ static int disas_iwmmxt_insn(DisasContext *s, uint32_t insn)
>              if (insn & ARM_CP_RW_BIT) {                         /* TMRRC */
>                  iwmmxt_load_reg(cpu_V0, wrd);
>                  tcg_gen_extrl_i64_i32(cpu_R[rdlo], cpu_V0);
> -                tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
> -                tcg_gen_extrl_i64_i32(cpu_R[rdhi], cpu_V0);
> +                tcg_gen_extrh_i64_i32(cpu_R[rdhi], cpu_V0);
>              } else {                                    /* TMCRR */
>                  tcg_gen_concat_i32_i64(cpu_V0, cpu_R[rdlo], cpu_R[rdhi]);
>                  iwmmxt_store_reg(cpu_V0, wrd);

This patch is fine, but I noticed while reviewing it that tcg/README
labels both the extrl_i64_i32 and extrh_i64_i32 operations as
"for 64-bit hosts only". Presumably that's a documentation error,
since we're not guarding the existing uses of the extrl_i64_i32
here with any kind of ifdeffery to restrict them to 64-bit hosts ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups
  2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
                   ` (6 preceding siblings ...)
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
@ 2019-08-15 10:34 ` Peter Maydell
  7 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2019-08-15 10:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 8 Aug 2019 at 21:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Some of these were cleanups that I was making simultaneous
> with the decodetree split.  Let's do those beforehand to
> make the split easier to read.
>
> Some of these are new, noticed while I was in the area.
>
>
> r~
>
>
> Richard Henderson (7):
>   target/arm: Use tcg_gen_extract_i32 for shifter_out_im
>   target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB
>   target/arm: Remove redundant shift tests
>   target/arm: Use ror32 instead of open-coding the operation
>   target/arm: Use tcg_gen_rotri_i32 for gen_swap_half
>   target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
>   target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word

Applied to target-arm.next, thanks. (I had a comment on patch 6
but it was about the tcg docs, not the patch itself.)

-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
  2019-08-15 10:16   ` Peter Maydell
@ 2019-08-15 11:56     ` Richard Henderson
  2019-08-15 12:02       ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-08-15 11:56 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, QEMU Developers




>
>This patch is fine, but I noticed while reviewing it that tcg/README
>labels both the extrl_i64_i32 and extrh_i64_i32 operations as
>"for 64-bit hosts only". Presumably that's a documentation error,
>since we're not guarding the existing uses of the extrl_i64_i32
>here with any kind of ifdeffery to restrict them to 64-bit hosts ?
>


A documentation unclarity in that the opcodes are for 64-bit hosts. The tcg_gen_* functions are always available, and expand to INDEX_op_mov_i32 on 32-bit hosts.


r~


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
  2019-08-15 11:56     ` Richard Henderson
@ 2019-08-15 12:02       ` Peter Maydell
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2019-08-15 12:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-arm, QEMU Developers

On Thu, 15 Aug 2019 at 12:56, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
>
>
>
> >
> >This patch is fine, but I noticed while reviewing it that tcg/README
> >labels both the extrl_i64_i32 and extrh_i64_i32 operations as
> >"for 64-bit hosts only". Presumably that's a documentation error,
> >since we're not guarding the existing uses of the extrl_i64_i32
> >here with any kind of ifdeffery to restrict them to 64-bit hosts ?
> >
>
>
> A documentation unclarity in that the opcodes are for 64-bit hosts. The tcg_gen_* functions are always available, and expand to INDEX_op_mov_i32 on 32-bit hosts.

Oh, I see. We should probably split that document out properly
into a primary "what you need to know to generate TCG code as
a target" (which is the main audience) and "what you need to
implement for a TCG backend (which I guess is relevant to fewer
people).

thanks
-- PMM


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [Qemu-arm] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
  2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
@ 2019-08-28  7:22   ` Laurent Desnogues
  0 siblings, 0 replies; 13+ messages in thread
From: Laurent Desnogues @ 2019-08-28  7:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, qemu-arm, qemu-devel

Hi Richard,

On Thu, Aug 8, 2019 at 10:28 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> All of the inputs to these instructions are 32-bits.  Rather than
> extend each input to 64-bits and then extract the high 32-bits of
> the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.c | 72 +++++++++++++++---------------------------
>  1 file changed, 26 insertions(+), 46 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index ddc54e77e4..77154be743 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -391,34 +391,6 @@ static void gen_revsh(TCGv_i32 var)
>      tcg_gen_ext16s_i32(var, var);
>  }
>
> -/* Return (b << 32) + a. Mark inputs as dead */
> -static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
> -{
> -    TCGv_i64 tmp64 = tcg_temp_new_i64();
> -
> -    tcg_gen_extu_i32_i64(tmp64, b);
> -    tcg_temp_free_i32(b);
> -    tcg_gen_shli_i64(tmp64, tmp64, 32);
> -    tcg_gen_add_i64(a, tmp64, a);
> -
> -    tcg_temp_free_i64(tmp64);
> -    return a;
> -}
> -
> -/* Return (b << 32) - a. Mark inputs as dead. */
> -static TCGv_i64 gen_subq_msw(TCGv_i64 a, TCGv_i32 b)
> -{
> -    TCGv_i64 tmp64 = tcg_temp_new_i64();
> -
> -    tcg_gen_extu_i32_i64(tmp64, b);
> -    tcg_temp_free_i32(b);
> -    tcg_gen_shli_i64(tmp64, tmp64, 32);
> -    tcg_gen_sub_i64(a, tmp64, a);
> -
> -    tcg_temp_free_i64(tmp64);
> -    return a;
> -}
> -
>  /* 32x32->64 multiply.  Marks inputs as dead.  */
>  static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
>  {
> @@ -8872,23 +8844,27 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>                             (SMMUL, SMMLA, SMMLS) */
>                          tmp = load_reg(s, rm);
>                          tmp2 = load_reg(s, rs);
> -                        tmp64 = gen_muls_i64_i32(tmp, tmp2);
> +                        tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
>
>                          if (rd != 15) {
> -                            tmp = load_reg(s, rd);
> +                            tmp3 = load_reg(s, rd);
>                              if (insn & (1 << 6)) {
> -                                tmp64 = gen_subq_msw(tmp64, tmp);
> +                                tcg_gen_sub_i32(tmp, tmp, tmp3);

Shouldn't you subtract tmp from tmp3?

>                              } else {
> -                                tmp64 = gen_addq_msw(tmp64, tmp);
> +                                tcg_gen_add_i32(tmp, tmp, tmp3);
>                              }
> +                            tcg_temp_free_i32(tmp3);
>                          }
>                          if (insn & (1 << 5)) {
> -                            tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
> +                            /*
> +                             * Adding 0x80000000 to the 64-bit quantity
> +                             * means that we have carry in to the high
> +                             * word when the low word has the high bit set.
> +                             */
> +                            tcg_gen_shri_i32(tmp2, tmp2, 31);
> +                            tcg_gen_add_i32(tmp, tmp, tmp2);
>                          }
> -                        tcg_gen_shri_i64(tmp64, tmp64, 32);
> -                        tmp = tcg_temp_new_i32();
> -                        tcg_gen_extrl_i64_i32(tmp, tmp64);
> -                        tcg_temp_free_i64(tmp64);
> +                        tcg_temp_free_i32(tmp2);
>                          store_reg(s, rn, tmp);
>                          break;
>                      case 0:
> @@ -10114,22 +10090,26 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
>                    }
>                  break;
>              case 5: case 6: /* 32 * 32 -> 32msb (SMMUL, SMMLA, SMMLS) */
> -                tmp64 = gen_muls_i64_i32(tmp, tmp2);
> +                tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
>                  if (rs != 15) {
> -                    tmp = load_reg(s, rs);
> +                    tmp3 = load_reg(s, rs);
>                      if (insn & (1 << 20)) {
> -                        tmp64 = gen_addq_msw(tmp64, tmp);
> +                        tcg_gen_add_i32(tmp, tmp, tmp3);
>                      } else {
> -                        tmp64 = gen_subq_msw(tmp64, tmp);
> +                        tcg_gen_sub_i32(tmp, tmp, tmp3);

Same here.

Also the way you do the computation means you don't propagate the
borrow from the lower 32-bit of the 64-bit product when doing the
subtraction.

Thanks,

Laurent

>                      }
> +                    tcg_temp_free_i32(tmp3);
>                  }
>                  if (insn & (1 << 4)) {
> -                    tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
> +                    /*
> +                     * Adding 0x80000000 to the 64-bit quantity
> +                     * means that we have carry in to the high
> +                     * word when the low word has the high bit set.
> +                     */
> +                    tcg_gen_shri_i32(tmp2, tmp2, 31);
> +                    tcg_gen_add_i32(tmp, tmp, tmp2);
>                  }
> -                tcg_gen_shri_i64(tmp64, tmp64, 32);
> -                tmp = tcg_temp_new_i32();
> -                tcg_gen_extrl_i64_i32(tmp, tmp64);
> -                tcg_temp_free_i64(tmp64);
> +                tcg_temp_free_i32(tmp2);
>                  break;
>              case 7: /* Unsigned sum of absolute differences.  */
>                  gen_helper_usad8(tmp, tmp, tmp2);
> --
> 2.17.1
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-08-28  7:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
2019-08-28  7:22   ` [Qemu-devel] [Qemu-arm] " Laurent Desnogues
2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
2019-08-15 10:16   ` Peter Maydell
2019-08-15 11:56     ` Richard Henderson
2019-08-15 12:02       ` Peter Maydell
2019-08-15 10:34 ` [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).