* [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB Richard Henderson
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Extract is a compact combination of shift + and.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 846052acea..43e005d191 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -620,14 +620,7 @@ static void gen_sar(TCGv_i32 dest, TCGv_i32 t0, TCGv_i32 t1)
static void shifter_out_im(TCGv_i32 var, int shift)
{
- if (shift == 0) {
- tcg_gen_andi_i32(cpu_CF, var, 1);
- } else {
- tcg_gen_shri_i32(cpu_CF, var, shift);
- if (shift != 31) {
- tcg_gen_andi_i32(cpu_CF, cpu_CF, 1);
- }
- }
+ tcg_gen_extract_i32(cpu_CF, var, shift, 1);
}
/* Shift by immediate. Includes special handling for shift == 0. */
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests Richard Henderson
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Use deposit as the composit operation to merge the
bits from the two inputs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 43e005d191..94170af134 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8769,19 +8769,16 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
shift = (insn >> 7) & 0x1f;
if (insn & (1 << 6)) {
/* pkhtb */
- if (shift == 0)
+ if (shift == 0) {
shift = 31;
+ }
tcg_gen_sari_i32(tmp2, tmp2, shift);
- tcg_gen_andi_i32(tmp, tmp, 0xffff0000);
- tcg_gen_ext16u_i32(tmp2, tmp2);
+ tcg_gen_deposit_i32(tmp, tmp, tmp2, 0, 16);
} else {
/* pkhbt */
- if (shift)
- tcg_gen_shli_i32(tmp2, tmp2, shift);
- tcg_gen_ext16u_i32(tmp, tmp);
- tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
+ tcg_gen_shli_i32(tmp2, tmp2, shift);
+ tcg_gen_deposit_i32(tmp, tmp2, tmp, 0, 16);
}
- tcg_gen_or_i32(tmp, tmp, tmp2);
tcg_temp_free_i32(tmp2);
store_reg(s, rd, tmp);
} else if ((insn & 0x00200020) == 0x00200000) {
@@ -9817,19 +9814,16 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
shift = ((insn >> 10) & 0x1c) | ((insn >> 6) & 0x3);
if (insn & (1 << 5)) {
/* pkhtb */
- if (shift == 0)
+ if (shift == 0) {
shift = 31;
+ }
tcg_gen_sari_i32(tmp2, tmp2, shift);
- tcg_gen_andi_i32(tmp, tmp, 0xffff0000);
- tcg_gen_ext16u_i32(tmp2, tmp2);
+ tcg_gen_deposit_i32(tmp, tmp, tmp2, 0, 16);
} else {
/* pkhbt */
- if (shift)
- tcg_gen_shli_i32(tmp2, tmp2, shift);
- tcg_gen_ext16u_i32(tmp, tmp);
- tcg_gen_andi_i32(tmp2, tmp2, 0xffff0000);
+ tcg_gen_shli_i32(tmp2, tmp2, shift);
+ tcg_gen_deposit_i32(tmp, tmp2, tmp, 0, 16);
}
- tcg_gen_or_i32(tmp, tmp, tmp2);
tcg_temp_free_i32(tmp2);
store_reg(s, rd, tmp);
} else {
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 1/7] target/arm: Use tcg_gen_extract_i32 for shifter_out_im Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 2/7] target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation Richard Henderson
` (4 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
The immediate shift generator functions already test for,
and eliminate, the case of a shift by zero.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 94170af134..3ddc404b3b 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -8826,8 +8826,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
shift = (insn >> 10) & 3;
/* ??? In many cases it's not necessary to do a
rotate, a shift is sufficient. */
- if (shift != 0)
- tcg_gen_rotri_i32(tmp, tmp, shift * 8);
+ tcg_gen_rotri_i32(tmp, tmp, shift * 8);
op1 = (insn >> 20) & 7;
switch (op1) {
case 0: gen_sxtb16(tmp); break;
@@ -9904,8 +9903,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
shift = (insn >> 4) & 3;
/* ??? In many cases it's not necessary to do a
rotate, a shift is sufficient. */
- if (shift != 0)
- tcg_gen_rotri_i32(tmp, tmp, shift * 8);
+ tcg_gen_rotri_i32(tmp, tmp, shift * 8);
op = (insn >> 20) & 7;
switch (op) {
case 0: gen_sxth(tmp); break;
@@ -10632,11 +10630,10 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
case 7:
goto illegal_op;
default: /* Saturate. */
- if (shift) {
- if (op & 1)
- tcg_gen_sari_i32(tmp, tmp, shift);
- else
- tcg_gen_shli_i32(tmp, tmp, shift);
+ if (op & 1) {
+ tcg_gen_sari_i32(tmp, tmp, shift);
+ } else {
+ tcg_gen_shli_i32(tmp, tmp, shift);
}
tmp2 = tcg_const_i32(imm);
if (op & 4) {
@@ -10827,9 +10824,7 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
goto illegal_op;
}
tmp = load_reg(s, rm);
- if (shift) {
- tcg_gen_shli_i32(tmp, tmp, shift);
- }
+ tcg_gen_shli_i32(tmp, tmp, shift);
tcg_gen_add_i32(addr, addr, tmp);
tcg_temp_free_i32(tmp);
break;
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
` (2 preceding siblings ...)
2019-08-08 20:26 ` [Qemu-devel] [PATCH 3/7] target/arm: Remove redundant shift tests Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half Richard Henderson
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
The helper function is more documentary, and also already
handles the case of rotate by zero.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 3ddc404b3b..b40f163bab 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7979,8 +7979,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
/* CPSR = immediate */
val = insn & 0xff;
shift = ((insn >> 8) & 0xf) * 2;
- if (shift)
- val = (val >> shift) | (val << (32 - shift));
+ val = ror32(val, shift);
i = ((insn & (1 << 22)) != 0);
if (gen_set_psr_im(s, msr_mask(s, (insn >> 16) & 0xf, i),
i, val)) {
@@ -8243,9 +8242,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
/* immediate operand */
val = insn & 0xff;
shift = ((insn >> 8) & 0xf) * 2;
- if (shift) {
- val = (val >> shift) | (val << (32 - shift));
- }
+ val = ror32(val, shift);
tmp2 = tcg_temp_new_i32();
tcg_gen_movi_i32(tmp2, val);
if (logic_cc && shift) {
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
` (3 preceding siblings ...)
2019-08-08 20:26 ` [Qemu-devel] [PATCH 4/7] target/arm: Use ror32 instead of open-coding the operation Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Rotate is the more compact and obvious way to swap 16-bit
elements of a 32-bit word.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index b40f163bab..ddc54e77e4 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -459,11 +459,7 @@ static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
/* Swap low and high halfwords. */
static void gen_swap_half(TCGv_i32 var)
{
- TCGv_i32 tmp = tcg_temp_new_i32();
- tcg_gen_shri_i32(tmp, var, 16);
- tcg_gen_shli_i32(var, var, 16);
- tcg_gen_or_i32(var, var, tmp);
- tcg_temp_free_i32(tmp);
+ tcg_gen_rotri_i32(var, var, 16);
}
/* Dual 16-bit add. Result placed in t0 and t1 is marked as dead.
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
` (4 preceding siblings ...)
2019-08-08 20:26 ` [Qemu-devel] [PATCH 5/7] target/arm: Use tcg_gen_rotri_i32 for gen_swap_half Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-28 7:22 ` [Qemu-devel] [Qemu-arm] " Laurent Desnogues
2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
2019-08-15 10:34 ` [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Peter Maydell
7 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
All of the inputs to these instructions are 32-bits. Rather than
extend each input to 64-bits and then extract the high 32-bits of
the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 72 +++++++++++++++---------------------------
1 file changed, 26 insertions(+), 46 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index ddc54e77e4..77154be743 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -391,34 +391,6 @@ static void gen_revsh(TCGv_i32 var)
tcg_gen_ext16s_i32(var, var);
}
-/* Return (b << 32) + a. Mark inputs as dead */
-static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
-{
- TCGv_i64 tmp64 = tcg_temp_new_i64();
-
- tcg_gen_extu_i32_i64(tmp64, b);
- tcg_temp_free_i32(b);
- tcg_gen_shli_i64(tmp64, tmp64, 32);
- tcg_gen_add_i64(a, tmp64, a);
-
- tcg_temp_free_i64(tmp64);
- return a;
-}
-
-/* Return (b << 32) - a. Mark inputs as dead. */
-static TCGv_i64 gen_subq_msw(TCGv_i64 a, TCGv_i32 b)
-{
- TCGv_i64 tmp64 = tcg_temp_new_i64();
-
- tcg_gen_extu_i32_i64(tmp64, b);
- tcg_temp_free_i32(b);
- tcg_gen_shli_i64(tmp64, tmp64, 32);
- tcg_gen_sub_i64(a, tmp64, a);
-
- tcg_temp_free_i64(tmp64);
- return a;
-}
-
/* 32x32->64 multiply. Marks inputs as dead. */
static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
{
@@ -8872,23 +8844,27 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
(SMMUL, SMMLA, SMMLS) */
tmp = load_reg(s, rm);
tmp2 = load_reg(s, rs);
- tmp64 = gen_muls_i64_i32(tmp, tmp2);
+ tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
if (rd != 15) {
- tmp = load_reg(s, rd);
+ tmp3 = load_reg(s, rd);
if (insn & (1 << 6)) {
- tmp64 = gen_subq_msw(tmp64, tmp);
+ tcg_gen_sub_i32(tmp, tmp, tmp3);
} else {
- tmp64 = gen_addq_msw(tmp64, tmp);
+ tcg_gen_add_i32(tmp, tmp, tmp3);
}
+ tcg_temp_free_i32(tmp3);
}
if (insn & (1 << 5)) {
- tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
+ /*
+ * Adding 0x80000000 to the 64-bit quantity
+ * means that we have carry in to the high
+ * word when the low word has the high bit set.
+ */
+ tcg_gen_shri_i32(tmp2, tmp2, 31);
+ tcg_gen_add_i32(tmp, tmp, tmp2);
}
- tcg_gen_shri_i64(tmp64, tmp64, 32);
- tmp = tcg_temp_new_i32();
- tcg_gen_extrl_i64_i32(tmp, tmp64);
- tcg_temp_free_i64(tmp64);
+ tcg_temp_free_i32(tmp2);
store_reg(s, rn, tmp);
break;
case 0:
@@ -10114,22 +10090,26 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
}
break;
case 5: case 6: /* 32 * 32 -> 32msb (SMMUL, SMMLA, SMMLS) */
- tmp64 = gen_muls_i64_i32(tmp, tmp2);
+ tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
if (rs != 15) {
- tmp = load_reg(s, rs);
+ tmp3 = load_reg(s, rs);
if (insn & (1 << 20)) {
- tmp64 = gen_addq_msw(tmp64, tmp);
+ tcg_gen_add_i32(tmp, tmp, tmp3);
} else {
- tmp64 = gen_subq_msw(tmp64, tmp);
+ tcg_gen_sub_i32(tmp, tmp, tmp3);
}
+ tcg_temp_free_i32(tmp3);
}
if (insn & (1 << 4)) {
- tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
+ /*
+ * Adding 0x80000000 to the 64-bit quantity
+ * means that we have carry in to the high
+ * word when the low word has the high bit set.
+ */
+ tcg_gen_shri_i32(tmp2, tmp2, 31);
+ tcg_gen_add_i32(tmp, tmp, tmp2);
}
- tcg_gen_shri_i64(tmp64, tmp64, 32);
- tmp = tcg_temp_new_i32();
- tcg_gen_extrl_i64_i32(tmp, tmp64);
- tcg_temp_free_i64(tmp64);
+ tcg_temp_free_i32(tmp2);
break;
case 7: /* Unsigned sum of absolute differences. */
gen_helper_usad8(tmp, tmp, tmp2);
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [Qemu-arm] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
@ 2019-08-28 7:22 ` Laurent Desnogues
0 siblings, 0 replies; 13+ messages in thread
From: Laurent Desnogues @ 2019-08-28 7:22 UTC (permalink / raw)
To: Richard Henderson; +Cc: Peter Maydell, qemu-arm, qemu-devel
Hi Richard,
On Thu, Aug 8, 2019 at 10:28 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> All of the inputs to these instructions are 32-bits. Rather than
> extend each input to 64-bits and then extract the high 32-bits of
> the output, use tcg_gen_muls2_i32 and other 32-bit generator functions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 72 +++++++++++++++---------------------------
> 1 file changed, 26 insertions(+), 46 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index ddc54e77e4..77154be743 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -391,34 +391,6 @@ static void gen_revsh(TCGv_i32 var)
> tcg_gen_ext16s_i32(var, var);
> }
>
> -/* Return (b << 32) + a. Mark inputs as dead */
> -static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
> -{
> - TCGv_i64 tmp64 = tcg_temp_new_i64();
> -
> - tcg_gen_extu_i32_i64(tmp64, b);
> - tcg_temp_free_i32(b);
> - tcg_gen_shli_i64(tmp64, tmp64, 32);
> - tcg_gen_add_i64(a, tmp64, a);
> -
> - tcg_temp_free_i64(tmp64);
> - return a;
> -}
> -
> -/* Return (b << 32) - a. Mark inputs as dead. */
> -static TCGv_i64 gen_subq_msw(TCGv_i64 a, TCGv_i32 b)
> -{
> - TCGv_i64 tmp64 = tcg_temp_new_i64();
> -
> - tcg_gen_extu_i32_i64(tmp64, b);
> - tcg_temp_free_i32(b);
> - tcg_gen_shli_i64(tmp64, tmp64, 32);
> - tcg_gen_sub_i64(a, tmp64, a);
> -
> - tcg_temp_free_i64(tmp64);
> - return a;
> -}
> -
> /* 32x32->64 multiply. Marks inputs as dead. */
> static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
> {
> @@ -8872,23 +8844,27 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
> (SMMUL, SMMLA, SMMLS) */
> tmp = load_reg(s, rm);
> tmp2 = load_reg(s, rs);
> - tmp64 = gen_muls_i64_i32(tmp, tmp2);
> + tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
>
> if (rd != 15) {
> - tmp = load_reg(s, rd);
> + tmp3 = load_reg(s, rd);
> if (insn & (1 << 6)) {
> - tmp64 = gen_subq_msw(tmp64, tmp);
> + tcg_gen_sub_i32(tmp, tmp, tmp3);
Shouldn't you subtract tmp from tmp3?
> } else {
> - tmp64 = gen_addq_msw(tmp64, tmp);
> + tcg_gen_add_i32(tmp, tmp, tmp3);
> }
> + tcg_temp_free_i32(tmp3);
> }
> if (insn & (1 << 5)) {
> - tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
> + /*
> + * Adding 0x80000000 to the 64-bit quantity
> + * means that we have carry in to the high
> + * word when the low word has the high bit set.
> + */
> + tcg_gen_shri_i32(tmp2, tmp2, 31);
> + tcg_gen_add_i32(tmp, tmp, tmp2);
> }
> - tcg_gen_shri_i64(tmp64, tmp64, 32);
> - tmp = tcg_temp_new_i32();
> - tcg_gen_extrl_i64_i32(tmp, tmp64);
> - tcg_temp_free_i64(tmp64);
> + tcg_temp_free_i32(tmp2);
> store_reg(s, rn, tmp);
> break;
> case 0:
> @@ -10114,22 +10090,26 @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
> }
> break;
> case 5: case 6: /* 32 * 32 -> 32msb (SMMUL, SMMLA, SMMLS) */
> - tmp64 = gen_muls_i64_i32(tmp, tmp2);
> + tcg_gen_muls2_i32(tmp2, tmp, tmp, tmp2);
> if (rs != 15) {
> - tmp = load_reg(s, rs);
> + tmp3 = load_reg(s, rs);
> if (insn & (1 << 20)) {
> - tmp64 = gen_addq_msw(tmp64, tmp);
> + tcg_gen_add_i32(tmp, tmp, tmp3);
> } else {
> - tmp64 = gen_subq_msw(tmp64, tmp);
> + tcg_gen_sub_i32(tmp, tmp, tmp3);
Same here.
Also the way you do the computation means you don't propagate the
borrow from the lower 32-bit of the 64-bit product when doing the
subtraction.
Thanks,
Laurent
> }
> + tcg_temp_free_i32(tmp3);
> }
> if (insn & (1 << 4)) {
> - tcg_gen_addi_i64(tmp64, tmp64, 0x80000000u);
> + /*
> + * Adding 0x80000000 to the 64-bit quantity
> + * means that we have carry in to the high
> + * word when the low word has the high bit set.
> + */
> + tcg_gen_shri_i32(tmp2, tmp2, 31);
> + tcg_gen_add_i32(tmp, tmp, tmp2);
> }
> - tcg_gen_shri_i64(tmp64, tmp64, 32);
> - tmp = tcg_temp_new_i32();
> - tcg_gen_extrl_i64_i32(tmp, tmp64);
> - tcg_temp_free_i64(tmp64);
> + tcg_temp_free_i32(tmp2);
> break;
> case 7: /* Unsigned sum of absolute differences. */
> gen_helper_usad8(tmp, tmp, tmp2);
> --
> 2.17.1
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
` (5 preceding siblings ...)
2019-08-08 20:26 ` [Qemu-devel] [PATCH 6/7] target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR Richard Henderson
@ 2019-08-08 20:26 ` Richard Henderson
2019-08-15 10:16 ` Peter Maydell
2019-08-15 10:34 ` [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Peter Maydell
7 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-08-08 20:26 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Separate shift + extract low will result in one extra insn
for hosts like RISC-V, MIPS, and Sparc.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 77154be743..9e103e4fad 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1761,8 +1761,7 @@ static int disas_iwmmxt_insn(DisasContext *s, uint32_t insn)
if (insn & ARM_CP_RW_BIT) { /* TMRRC */
iwmmxt_load_reg(cpu_V0, wrd);
tcg_gen_extrl_i64_i32(cpu_R[rdlo], cpu_V0);
- tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
- tcg_gen_extrl_i64_i32(cpu_R[rdhi], cpu_V0);
+ tcg_gen_extrh_i64_i32(cpu_R[rdhi], cpu_V0);
} else { /* TMCRR */
tcg_gen_concat_i32_i64(cpu_V0, cpu_R[rdlo], cpu_R[rdhi]);
iwmmxt_store_reg(cpu_V0, wrd);
@@ -2807,8 +2806,7 @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
if (insn & ARM_CP_RW_BIT) { /* MRA */
iwmmxt_load_reg(cpu_V0, acc);
tcg_gen_extrl_i64_i32(cpu_R[rdlo], cpu_V0);
- tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
- tcg_gen_extrl_i64_i32(cpu_R[rdhi], cpu_V0);
+ tcg_gen_extrh_i64_i32(cpu_R[rdhi], cpu_V0);
tcg_gen_andi_i32(cpu_R[rdhi], cpu_R[rdhi], (1 << (40 - 32)) - 1);
} else { /* MAR */
tcg_gen_concat_i32_i64(cpu_V0, cpu_R[rdlo], cpu_R[rdhi]);
@@ -6005,8 +6003,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
break;
case 2:
- tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
- tcg_gen_extrl_i64_i32(tmp, cpu_V0);
+ tcg_gen_extrh_i64_i32(tmp, cpu_V0);
break;
default: abort();
}
@@ -6020,8 +6017,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
break;
case 2:
tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
- tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
- tcg_gen_extrl_i64_i32(tmp, cpu_V0);
+ tcg_gen_extrh_i64_i32(tmp, cpu_V0);
break;
default: abort();
}
@@ -7254,9 +7250,8 @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
tmp = tcg_temp_new_i32();
tcg_gen_extrl_i64_i32(tmp, tmp64);
store_reg(s, rt, tmp);
- tcg_gen_shri_i64(tmp64, tmp64, 32);
tmp = tcg_temp_new_i32();
- tcg_gen_extrl_i64_i32(tmp, tmp64);
+ tcg_gen_extrh_i64_i32(tmp, tmp64);
tcg_temp_free_i64(tmp64);
store_reg(s, rt2, tmp);
} else {
@@ -7365,8 +7360,7 @@ static void gen_storeq_reg(DisasContext *s, int rlow, int rhigh, TCGv_i64 val)
tcg_gen_extrl_i64_i32(tmp, val);
store_reg(s, rlow, tmp);
tmp = tcg_temp_new_i32();
- tcg_gen_shri_i64(val, val, 32);
- tcg_gen_extrl_i64_i32(tmp, val);
+ tcg_gen_extrh_i64_i32(tmp, val);
store_reg(s, rhigh, tmp);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
@ 2019-08-15 10:16 ` Peter Maydell
2019-08-15 11:56 ` Richard Henderson
0 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2019-08-15 10:16 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-arm, QEMU Developers
On Thu, 8 Aug 2019 at 21:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Separate shift + extract low will result in one extra insn
> for hosts like RISC-V, MIPS, and Sparc.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 77154be743..9e103e4fad 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1761,8 +1761,7 @@ static int disas_iwmmxt_insn(DisasContext *s, uint32_t insn)
> if (insn & ARM_CP_RW_BIT) { /* TMRRC */
> iwmmxt_load_reg(cpu_V0, wrd);
> tcg_gen_extrl_i64_i32(cpu_R[rdlo], cpu_V0);
> - tcg_gen_shri_i64(cpu_V0, cpu_V0, 32);
> - tcg_gen_extrl_i64_i32(cpu_R[rdhi], cpu_V0);
> + tcg_gen_extrh_i64_i32(cpu_R[rdhi], cpu_V0);
> } else { /* TMCRR */
> tcg_gen_concat_i32_i64(cpu_V0, cpu_R[rdlo], cpu_R[rdhi]);
> iwmmxt_store_reg(cpu_V0, wrd);
This patch is fine, but I noticed while reviewing it that tcg/README
labels both the extrl_i64_i32 and extrh_i64_i32 operations as
"for 64-bit hosts only". Presumably that's a documentation error,
since we're not guarding the existing uses of the extrl_i64_i32
here with any kind of ifdeffery to restrict them to 64-bit hosts ?
thanks
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
2019-08-15 10:16 ` Peter Maydell
@ 2019-08-15 11:56 ` Richard Henderson
2019-08-15 12:02 ` Peter Maydell
0 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-08-15 11:56 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-arm, QEMU Developers
>
>This patch is fine, but I noticed while reviewing it that tcg/README
>labels both the extrl_i64_i32 and extrh_i64_i32 operations as
>"for 64-bit hosts only". Presumably that's a documentation error,
>since we're not guarding the existing uses of the extrl_i64_i32
>here with any kind of ifdeffery to restrict them to 64-bit hosts ?
>
A documentation unclarity in that the opcodes are for 64-bit hosts. The tcg_gen_* functions are always available, and expand to INDEX_op_mov_i32 on 32-bit hosts.
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
2019-08-15 11:56 ` Richard Henderson
@ 2019-08-15 12:02 ` Peter Maydell
0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2019-08-15 12:02 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-arm, QEMU Developers
On Thu, 15 Aug 2019 at 12:56, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
>
>
>
> >
> >This patch is fine, but I noticed while reviewing it that tcg/README
> >labels both the extrl_i64_i32 and extrh_i64_i32 operations as
> >"for 64-bit hosts only". Presumably that's a documentation error,
> >since we're not guarding the existing uses of the extrl_i64_i32
> >here with any kind of ifdeffery to restrict them to 64-bit hosts ?
> >
>
>
> A documentation unclarity in that the opcodes are for 64-bit hosts. The tcg_gen_* functions are always available, and expand to INDEX_op_mov_i32 on 32-bit hosts.
Oh, I see. We should probably split that document out properly
into a primary "what you need to know to generate TCG code as
a target" (which is the main audience) and "what you need to
implement for a TCG backend (which I guess is relevant to fewer
people).
thanks
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups
2019-08-08 20:26 [Qemu-devel] [PATCH 0/7] target/arm: Misc cleanups Richard Henderson
` (6 preceding siblings ...)
2019-08-08 20:26 ` [Qemu-devel] [PATCH 7/7] target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word Richard Henderson
@ 2019-08-15 10:34 ` Peter Maydell
7 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2019-08-15 10:34 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-arm, QEMU Developers
On Thu, 8 Aug 2019 at 21:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Some of these were cleanups that I was making simultaneous
> with the decodetree split. Let's do those beforehand to
> make the split easier to read.
>
> Some of these are new, noticed while I was in the area.
>
>
> r~
>
>
> Richard Henderson (7):
> target/arm: Use tcg_gen_extract_i32 for shifter_out_im
> target/arm: Use tcg_gen_deposit_i32 for PKHBT, PKHTB
> target/arm: Remove redundant shift tests
> target/arm: Use ror32 instead of open-coding the operation
> target/arm: Use tcg_gen_rotri_i32 for gen_swap_half
> target/arm: Simplify SMMLA, SMMLAR, SMMLS, SMMLSR
> target/arm: Use tcg_gen_extrh_i64_i32 to extract the high word
Applied to target-arm.next, thanks. (I had a comment on patch 6
but it was about the tcg docs, not the patch itself.)
-- PMM
^ permalink raw reply [flat|nested] 13+ messages in thread