All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/28] tcg: bswap improvements
@ 2021-06-14  8:37 Richard Henderson
  2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
                   ` (28 more replies)
  0 siblings, 29 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

This has been on my to-do list for several years, and I've
finally spent a rainy weekend doing something about it.

The current tcg bswap opcode is fairly strict: for swaps smaller
than the TCGv size, it requires zero-extended input and provides
zero-extended output.

This has meant that various tcg/ backends have their own handling
of bswap when it comes to memory, to minimize overhead for stores
(which do not care about zero-extended output) or for signed loads
(which would rather not sign-extend after zero-extending).

Solve this by adding some operation flags to the tcg bswap opcode:

  TCG_BSWAP_IZ  -- Input is Zero extended
  TCG_BSWAP_OZ  -- Output is Zero extended
  TCG_BSWAP_OS  -- Output is Sign extended

For instance, bswap before store would not set any of these flags,
allowing unextended input and producing unextended output.

The patch set can be broken into sections:

  * patches 1 - 16 implement the functionality in the backend,
    but do not provide the interface to use it,

  * patch 17 enables the interface,

  * patches 18 - 25 use the new interface in the front ends

  * patches 26 - 28 remove some tcg backend complexity,
    leaving the bswap handling to the middle-end.


r~


Richard Henderson (28):
  tcg: Add flags argument to bswap opcodes
  tcg/i386: Support bswap flags
  tcg/aarch64: Support bswap flags
  tcg/arm: Support bswap flags
  tcg/ppc: Split out tcg_out_ext{8,16,32}s
  tcg/ppc: Split out tcg_out_sari{32,64}
  tcg/ppc: Split out tcg_out_bswap16
  tcg/ppc: Split out tcg_out_bswap32
  tcg/ppc: Split out tcg_out_bswap64
  tcg/ppc: Support bswap flags
  tcg/ppc: Use power10 byte-reverse instructions
  tcg/s390: Support bswap flags
  tcg/mips: Support bswap flags in tcg_out_bswap16
  tcg/mips: Support bswap flags in tcg_out_bswap32
  tcg/tci: Support bswap flags
  tcg: Handle new bswap flags during optimize
  tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  tcg: Make use of bswap flags in tcg_gen_qemu_ld_*
  tcg: Make use of bswap flags in tcg_gen_qemu_st_*
  target/arm: Improve REV32
  target/arm: Improve vector REV
  target/arm: Improve REVSH
  target/i386: Improve bswap translation
  target/sh4: Improve swap.b translation
  target/mips: Fix gen_mxu_s32ldd_s32lddr
  tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP
  tcg/aarch64: Unset TCG_TARGET_HAS_MEMORY_BSWAP
  tcg/riscv: Remove MO_BSWAP handling

 include/tcg/tcg-op.h            |   8 +-
 include/tcg/tcg-opc.h           |  10 +-
 include/tcg/tcg.h               |  12 ++
 tcg/aarch64/tcg-target.h        |   2 +-
 tcg/arm/tcg-target.h            |   2 +-
 target/arm/translate-a64.c      |  21 +--
 target/arm/translate.c          |   4 +-
 target/i386/tcg/translate.c     |  14 +-
 target/mips/tcg/mxu_translate.c |   6 +-
 target/s390x/translate.c        |   4 +-
 target/sh4/translate.c          |   3 +-
 tcg/optimize.c                  |  56 ++++++-
 tcg/tcg-op.c                    | 143 +++++++++++------
 tcg/tci.c                       |   3 +-
 tcg/aarch64/tcg-target.c.inc    |  99 +++++-------
 tcg/arm/tcg-target.c.inc        | 272 ++++++++++++--------------------
 tcg/i386/tcg-target.c.inc       |  20 ++-
 tcg/mips/tcg-target.c.inc       |  99 ++++++------
 tcg/ppc/tcg-target.c.inc        | 199 ++++++++++++++---------
 tcg/riscv/tcg-target.c.inc      |  64 ++++----
 tcg/s390/tcg-target.c.inc       |  34 +++-
 tcg/tci/tcg-target.c.inc        |  23 ++-
 tcg/README                      |  18 ++-
 23 files changed, 607 insertions(+), 509 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  9:19   ` Philippe Mathieu-Daudé
                     ` (2 more replies)
  2021-06-14  8:37 ` [PATCH 02/28] tcg/i386: Support bswap flags Richard Henderson
                   ` (27 subsequent siblings)
  28 siblings, 3 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

This will eventually simplify front-end usage, and will allow
backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
optimization.

The argument is added during expansion, not currently exposed
to the front end translators.  Non-zero values are not yet
supported by any backends.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-opc.h | 10 +++++-----
 include/tcg/tcg.h     | 12 ++++++++++++
 tcg/tcg-op.c          | 13 ++++++++-----
 tcg/README            | 18 ++++++++++--------
 4 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index bbb0884af8..fddcc42cbd 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -96,8 +96,8 @@ DEF(ext8s_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext8s_i32))
 DEF(ext16s_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext16s_i32))
 DEF(ext8u_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext8u_i32))
 DEF(ext16u_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext16u_i32))
-DEF(bswap16_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_bswap16_i32))
-DEF(bswap32_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_bswap32_i32))
+DEF(bswap16_i32, 1, 1, 1, IMPL(TCG_TARGET_HAS_bswap16_i32))
+DEF(bswap32_i32, 1, 1, 1, IMPL(TCG_TARGET_HAS_bswap32_i32))
 DEF(not_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_i32))
 DEF(neg_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_i32))
 DEF(andc_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_i32))
@@ -165,9 +165,9 @@ DEF(ext32s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext32s_i64))
 DEF(ext8u_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext8u_i64))
 DEF(ext16u_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext16u_i64))
 DEF(ext32u_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext32u_i64))
-DEF(bswap16_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_bswap16_i64))
-DEF(bswap32_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_bswap32_i64))
-DEF(bswap64_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_bswap64_i64))
+DEF(bswap16_i64, 1, 1, 1, IMPL64 | IMPL(TCG_TARGET_HAS_bswap16_i64))
+DEF(bswap32_i64, 1, 1, 1, IMPL64 | IMPL(TCG_TARGET_HAS_bswap32_i64))
+DEF(bswap64_i64, 1, 1, 1, IMPL64 | IMPL(TCG_TARGET_HAS_bswap64_i64))
 DEF(not_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_not_i64))
 DEF(neg_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_neg_i64))
 DEF(andc_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_andc_i64))
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 064dab383b..7a060e532d 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -430,6 +430,18 @@ typedef enum {
     TCG_COND_GTU    = 8 | 4 | 0 | 1,
 } TCGCond;
 
+/*
+ * Flags for the bswap opcodes.
+ * If IZ, the input is zero-extended, otherwise unknown.
+ * If OZ or OS, the output is zero- or sign-extended respectively,
+ * otherwise the high bits are undefined.
+ */
+enum {
+    TCG_BSWAP_IZ = 1,
+    TCG_BSWAP_OZ = 2,
+    TCG_BSWAP_OS = 4,
+};
+
 /* Invert the sense of the comparison.  */
 static inline TCGCond tcg_invert_cond(TCGCond c)
 {
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index dcc2ed0bbc..dc65577e2f 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1005,7 +1005,8 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg)
 void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
 {
     if (TCG_TARGET_HAS_bswap16_i32) {
-        tcg_gen_op2_i32(INDEX_op_bswap16_i32, ret, arg);
+        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg,
+                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     } else {
         TCGv_i32 t0 = tcg_temp_new_i32();
 
@@ -1020,7 +1021,7 @@ void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
 void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
 {
     if (TCG_TARGET_HAS_bswap32_i32) {
-        tcg_gen_op2_i32(INDEX_op_bswap32_i32, ret, arg);
+        tcg_gen_op3i_i32(INDEX_op_bswap32_i32, ret, arg, 0);
     } else {
         TCGv_i32 t0 = tcg_temp_new_i32();
         TCGv_i32 t1 = tcg_temp_new_i32();
@@ -1661,7 +1662,8 @@ void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg)
         tcg_gen_bswap16_i32(TCGV_LOW(ret), TCGV_LOW(arg));
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
     } else if (TCG_TARGET_HAS_bswap16_i64) {
-        tcg_gen_op2_i64(INDEX_op_bswap16_i64, ret, arg);
+        tcg_gen_op3i_i64(INDEX_op_bswap16_i64, ret, arg,
+                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
 
@@ -1680,7 +1682,8 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
         tcg_gen_bswap32_i32(TCGV_LOW(ret), TCGV_LOW(arg));
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
     } else if (TCG_TARGET_HAS_bswap32_i64) {
-        tcg_gen_op2_i64(INDEX_op_bswap32_i64, ret, arg);
+        tcg_gen_op3i_i64(INDEX_op_bswap32_i64, ret, arg,
+                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i64 t1 = tcg_temp_new_i64();
@@ -1717,7 +1720,7 @@ void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
         tcg_temp_free_i32(t0);
         tcg_temp_free_i32(t1);
     } else if (TCG_TARGET_HAS_bswap64_i64) {
-        tcg_gen_op2_i64(INDEX_op_bswap64_i64, ret, arg);
+        tcg_gen_op3i_i64(INDEX_op_bswap64_i64, ret, arg, 0);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i64 t1 = tcg_temp_new_i64();
diff --git a/tcg/README b/tcg/README
index 8510d823e3..19fbf6ca52 100644
--- a/tcg/README
+++ b/tcg/README
@@ -295,19 +295,21 @@ ext32u_i64 t0, t1
 
 8, 16 or 32 bit sign/zero extension (both operands must have the same type)
 
-* bswap16_i32/i64 t0, t1
+* bswap16_i32/i64 t0, t1, flags
 
-16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order
-bytes are set to zero.
+16 bit byte swap on a 32/64 bit value.  The flags values control how
+the input and output sign- or zero-extension is treated.
 
-* bswap32_i32/i64 t0, t1
+* bswap32_i32/i64 t0, t1, flags
 
-32 bit byte swap on a 32/64 bit value. With a 64 bit value, it assumes that
-the four high order bytes are set to zero.
+32 bit byte swap on a 32/64 bit value.  For 32-bit value, the flags
+are ignored; for a 64-bit value the flags values control how the
+input and output sign- or zero-extension is treated.
 
-* bswap64_i64 t0, t1
+* bswap64_i64 t0, t1, flags
 
-64 bit byte swap
+64 bit byte swap.  The flags are ignored -- the argument is present
+for consistency with the smaller bswaps.
 
 * discard_i32/i64 t0
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 02/28] tcg/i386: Support bswap flags
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
  2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 13:53   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 03/28] tcg/aarch64: " Richard Henderson
                   ` (26 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Retain the current rorw bswap16 expansion for the zero-in/zero-out case.
Otherwise, perform a wider bswap plus a right-shift or extend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.c.inc | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 34113388ef..98d924b91a 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -2421,10 +2421,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(bswap16):
-        tcg_out_rolw_8(s, a0);
+        if (a2 & TCG_BSWAP_OS) {
+            /* Output must be sign-extended. */
+            if (rexw) {
+                tcg_out_bswap64(s, a0);
+                tcg_out_shifti(s, SHIFT_SAR + rexw, a0, 48);
+            } else {
+                tcg_out_bswap32(s, a0);
+                tcg_out_shifti(s, SHIFT_SAR, a0, 16);
+            }
+        } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            /* Output must be zero-extended, but input isn't. */
+            tcg_out_bswap32(s, a0);
+            tcg_out_shifti(s, SHIFT_SHR, a0, 16);
+        } else {
+            tcg_out_rolw_8(s, a0);
+        }
         break;
     OP_32_64(bswap32):
         tcg_out_bswap32(s, a0);
+        if (rexw && (a2 & TCG_BSWAP_OS)) {
+            tcg_out_ext32s(s, a0, a0);
+        }
         break;
 
     OP_32_64(neg):
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
  2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
  2021-06-14  8:37 ` [PATCH 02/28] tcg/i386: Support bswap flags Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:01   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 04/28] tcg/arm: " Richard Henderson
                   ` (25 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.c.inc | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 27cde314a9..f72218b036 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2187,12 +2187,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_rev64(s, a0, a1);
         break;
     case INDEX_op_bswap32_i64:
+        tcg_out_rev32(s, a0, a1);
+        if (a2 & TCG_BSWAP_OS) {
+            tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a0);
+        }
+        break;
     case INDEX_op_bswap32_i32:
         tcg_out_rev32(s, a0, a1);
         break;
     case INDEX_op_bswap16_i64:
     case INDEX_op_bswap16_i32:
         tcg_out_rev16(s, a0, a1);
+        if (a2 & TCG_BSWAP_OS) {
+            /* Output must be sign-extended. */
+            tcg_out_sxt(s, ext, MO_16, a0, a0);
+        } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            /* Output must be zero-extended, but input isn't. */
+            tcg_out_uxt(s, MO_16, a0, a0);
+        }
         break;
 
     case INDEX_op_ext8s_i64:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 04/28] tcg/arm: Support bswap flags
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 03/28] tcg/aarch64: " Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:09   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s Richard Henderson
                   ` (24 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Combine the three bswap16 routines, and differentiate via the flags.
Use the correct flags combination from the load/store routines, and
pass along the constant parameter from tcg_out_op.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.c.inc | 78 ++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 5157143246..9824e215be 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1013,50 +1013,44 @@ static inline void tcg_out_ext16u(TCGContext *s, int cond,
     }
 }
 
-static inline void tcg_out_bswap16s(TCGContext *s, int cond, int rd, int rn)
+static void tcg_out_bswap16(TCGContext *s, int cond, int rd, int rn, int flags)
 {
     if (use_armv6_instructions) {
-        /* revsh */
-        tcg_out32(s, 0x06ff0fb0 | (cond << 28) | (rd << 12) | rn);
-    } else {
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, rn, SHIFT_IMM_LSL(24));
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, TCG_REG_TMP, SHIFT_IMM_ASR(16));
-        tcg_out_dat_reg(s, cond, ARITH_ORR,
-                        rd, TCG_REG_TMP, rn, SHIFT_IMM_LSR(8));
-    }
-}
+        if (flags & TCG_BSWAP_OS) {
+            /* revsh */
+            tcg_out32(s, 0x06ff0fb0 | (cond << 28) | (rd << 12) | rn);
+            return;
+        }
 
-static inline void tcg_out_bswap16(TCGContext *s, int cond, int rd, int rn)
-{
-    if (use_armv6_instructions) {
         /* rev16 */
         tcg_out32(s, 0x06bf0fb0 | (cond << 28) | (rd << 12) | rn);
-    } else {
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, rn, SHIFT_IMM_LSL(24));
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, TCG_REG_TMP, SHIFT_IMM_LSR(16));
-        tcg_out_dat_reg(s, cond, ARITH_ORR,
-                        rd, TCG_REG_TMP, rn, SHIFT_IMM_LSR(8));
+        if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            /* uxth */
+            tcg_out32(s, 0x06ff0070 | (cond << 28) | (rd << 12) | rd);
+        }
+        return;
     }
-}
 
-/* swap the two low bytes assuming that the two high input bytes and the
-   two high output bit can hold any value. */
-static inline void tcg_out_bswap16st(TCGContext *s, int cond, int rd, int rn)
-{
-    if (use_armv6_instructions) {
-        /* rev16 */
-        tcg_out32(s, 0x06bf0fb0 | (cond << 28) | (rd << 12) | rn);
-    } else {
-        tcg_out_dat_reg(s, cond, ARITH_MOV,
-                        TCG_REG_TMP, 0, rn, SHIFT_IMM_LSR(8));
+    /* Move the high byte down and isolate it. */
+    /* rn=xxAB -> tmp=0xxA -> tmp=000A */
+    tcg_out_dat_reg(s, cond, ARITH_MOV, TCG_REG_TMP, 0, rn, SHIFT_IMM_LSR(8));
+    if (!(flags & TCG_BSWAP_IZ)) {
         tcg_out_dat_imm(s, cond, ARITH_AND, TCG_REG_TMP, TCG_REG_TMP, 0xff);
+    }
+
+    /* Move the low byte up and extend it. */
+    if (!(flags & (TCG_BSWAP_OS | TCG_BSWAP_OZ))) {
+        /* No output extension: rd=xABA */
         tcg_out_dat_reg(s, cond, ARITH_ORR,
                         rd, TCG_REG_TMP, rn, SHIFT_IMM_LSL(8));
+        return;
     }
+
+    /* rn=xxAB -> rd=B000 -> rd=ssBA */
+    tcg_out_dat_reg(s, cond, ARITH_MOV, rd, 0, rn, SHIFT_IMM_LSL(24));
+    tcg_out_dat_reg(s, cond, ARITH_ORR, rd, TCG_REG_TMP, rd,
+                    flags & TCG_BSWAP_OS
+                    ? SHIFT_IMM_ASR(16) : SHIFT_IMM_LSR(16));
 }
 
 static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
@@ -1705,13 +1699,15 @@ static inline void tcg_out_qemu_ld_index(TCGContext *s, MemOp opc,
     case MO_UW:
         tcg_out_ld16u_r(s, COND_AL, datalo, addrlo, addend);
         if (bswap) {
-            tcg_out_bswap16(s, COND_AL, datalo, datalo);
+            tcg_out_bswap16(s, COND_AL, datalo, datalo,
+                            TCG_BSWAP_IZ | TCG_BSWAP_OZ);
         }
         break;
     case MO_SW:
         if (bswap) {
             tcg_out_ld16u_r(s, COND_AL, datalo, addrlo, addend);
-            tcg_out_bswap16s(s, COND_AL, datalo, datalo);
+            tcg_out_bswap16(s, COND_AL, datalo, datalo,
+                            TCG_BSWAP_IZ | TCG_BSWAP_OS);
         } else {
             tcg_out_ld16s_r(s, COND_AL, datalo, addrlo, addend);
         }
@@ -1766,13 +1762,15 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, MemOp opc,
     case MO_UW:
         tcg_out_ld16u_8(s, COND_AL, datalo, addrlo, 0);
         if (bswap) {
-            tcg_out_bswap16(s, COND_AL, datalo, datalo);
+            tcg_out_bswap16(s, COND_AL, datalo, datalo,
+                            TCG_BSWAP_IZ | TCG_BSWAP_OZ);
         }
         break;
     case MO_SW:
         if (bswap) {
             tcg_out_ld16u_8(s, COND_AL, datalo, addrlo, 0);
-            tcg_out_bswap16s(s, COND_AL, datalo, datalo);
+            tcg_out_bswap16(s, COND_AL, datalo, datalo,
+                            TCG_BSWAP_IZ | TCG_BSWAP_OS);
         } else {
             tcg_out_ld16s_8(s, COND_AL, datalo, addrlo, 0);
         }
@@ -1862,7 +1860,7 @@ static inline void tcg_out_qemu_st_index(TCGContext *s, int cond, MemOp opc,
         break;
     case MO_16:
         if (bswap) {
-            tcg_out_bswap16st(s, cond, TCG_REG_R0, datalo);
+            tcg_out_bswap16(s, cond, TCG_REG_R0, datalo, 0);
             tcg_out_st16_r(s, cond, TCG_REG_R0, addrlo, addend);
         } else {
             tcg_out_st16_r(s, cond, datalo, addrlo, addend);
@@ -1907,7 +1905,7 @@ static inline void tcg_out_qemu_st_direct(TCGContext *s, MemOp opc,
         break;
     case MO_16:
         if (bswap) {
-            tcg_out_bswap16st(s, COND_AL, TCG_REG_R0, datalo);
+            tcg_out_bswap16(s, COND_AL, TCG_REG_R0, datalo, 0);
             tcg_out_st16_8(s, COND_AL, TCG_REG_R0, addrlo, 0);
         } else {
             tcg_out_st16_8(s, COND_AL, datalo, addrlo, 0);
@@ -2245,7 +2243,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_bswap16_i32:
-        tcg_out_bswap16(s, COND_AL, args[0], args[1]);
+        tcg_out_bswap16(s, COND_AL, args[0], args[1], args[2]);
         break;
     case INDEX_op_bswap32_i32:
         tcg_out_bswap32(s, COND_AL, args[0], args[1]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 04/28] tcg/arm: " Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:18   ` Peter Maydell
  2021-06-21 20:44   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64} Richard Henderson
                   ` (23 subsequent siblings)
  28 siblings, 2 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

We will shortly require these in other context;
make the expansion as clear as possible.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 795701442b..aa35ff8250 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -738,6 +738,21 @@ static inline void tcg_out_rlw(TCGContext *s, int op, TCGReg ra, TCGReg rs,
     tcg_out32(s, op | RA(ra) | RS(rs) | SH(sh) | MB(mb) | ME(me));
 }
 
+static inline void tcg_out_ext8s(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    tcg_out32(s, EXTSB | RA(dst) | RS(src));
+}
+
+static inline void tcg_out_ext16s(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    tcg_out32(s, EXTSH | RA(dst) | RS(src));
+}
+
+static inline void tcg_out_ext32s(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    tcg_out32(s, EXTSW | RA(dst) | RS(src));
+}
+
 static inline void tcg_out_ext32u(TCGContext *s, TCGReg dst, TCGReg src)
 {
     tcg_out_rld(s, RLDICL, dst, src, 0, 32);
@@ -2322,7 +2337,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const int const_args[TCG_MAX_OP_ARGS])
 {
     TCGArg a0, a1, a2;
-    int c;
 
     switch (opc) {
     case INDEX_op_exit_tb:
@@ -2390,7 +2404,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ld8s_i32:
     case INDEX_op_ld8s_i64:
         tcg_out_mem_long(s, LBZ, LBZX, args[0], args[1], args[2]);
-        tcg_out32(s, EXTSB | RS(args[0]) | RA(args[0]));
+        tcg_out_ext8s(s, args[0], args[0]);
         break;
     case INDEX_op_ld16u_i32:
     case INDEX_op_ld16u_i64:
@@ -2728,18 +2742,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
-        c = EXTSB;
-        goto gen_ext;
+        tcg_out_ext8s(s, args[0], args[1]);
+        break;
     case INDEX_op_ext16s_i32:
     case INDEX_op_ext16s_i64:
-        c = EXTSH;
-        goto gen_ext;
+        tcg_out_ext16s(s, args[0], args[1]);
+        break;
     case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
-        c = EXTSW;
-        goto gen_ext;
-    gen_ext:
-        tcg_out32(s, c | RS(args[1]) | RA(args[0]));
+        tcg_out_ext32s(s, args[0], args[1]);
         break;
     case INDEX_op_extu_i32_i64:
         tcg_out_ext32u(s, args[0], args[1]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64}
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:22   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16 Richard Henderson
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

We will shortly require sari in other context;
split out both for cleanliness sake.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index aa35ff8250..b49204f707 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -768,6 +768,11 @@ static inline void tcg_out_shli64(TCGContext *s, TCGReg dst, TCGReg src, int c)
     tcg_out_rld(s, RLDICR, dst, src, c, 63 - c);
 }
 
+static inline void tcg_out_sari32(TCGContext *s, TCGReg dst, TCGReg src, int c)
+{
+    tcg_out32(s, SRAWI | RA(dst) | RS(src) | SH(c));
+}
+
 static inline void tcg_out_shri32(TCGContext *s, TCGReg dst, TCGReg src, int c)
 {
     tcg_out_rlw(s, RLWINM, dst, src, 32 - c, c, 31);
@@ -778,6 +783,11 @@ static inline void tcg_out_shri64(TCGContext *s, TCGReg dst, TCGReg src, int c)
     tcg_out_rld(s, RLDICL, dst, src, 64 - c, c);
 }
 
+static inline void tcg_out_sari64(TCGContext *s, TCGReg dst, TCGReg src, int c)
+{
+    tcg_out32(s, SRADI | RA(dst) | RS(src) | SH(c & 0x1f) | ((c >> 4) & 2));
+}
+
 /* Emit a move into ret of arg, if it can be done in one insn.  */
 static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
 {
@@ -2602,7 +2612,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sar_i32:
         if (const_args[2]) {
             /* Limit immediate shift count lest we create an illegal insn.  */
-            tcg_out32(s, SRAWI | RS(args[1]) | RA(args[0]) | SH(args[2] & 31));
+            tcg_out_sari32(s, args[0], args[1], args[2] & 31);
         } else {
             tcg_out32(s, SRAW | SAB(args[1], args[0], args[2]));
         }
@@ -2690,8 +2700,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_sar_i64:
         if (const_args[2]) {
-            int sh = SH(args[2] & 0x1f) | (((args[2] >> 5) & 1) << 1);
-            tcg_out32(s, SRADI | RA(args[0]) | RS(args[1]) | sh);
+            tcg_out_sari64(s, args[0], args[1], args[2]);
         } else {
             tcg_out32(s, SRAD | SAB(args[1], args[0], args[2]));
         }
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64} Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:29   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 08/28] tcg/ppc: Split out tcg_out_bswap32 Richard Henderson
                   ` (21 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

With the use of a suitable temporary, we can use the same
algorithm when src overlaps dst.  The result is the same
number of instructions either way.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index b49204f707..64c24382a8 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -788,6 +788,16 @@ static inline void tcg_out_sari64(TCGContext *s, TCGReg dst, TCGReg src, int c)
     tcg_out32(s, SRADI | RA(dst) | RS(src) | SH(c & 0x1f) | ((c >> 4) & 2));
 }
 
+static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
+
+                                                   /* src = abcd */
+    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 000c */
+    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 00dc */
+    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
+}
+
 /* Emit a move into ret of arg, if it can be done in one insn.  */
 static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
 {
@@ -2779,21 +2789,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_bswap16_i32:
     case INDEX_op_bswap16_i64:
-        a0 = args[0], a1 = args[1];
-        /* a1 = abcd */
-        if (a0 != a1) {
-            /* a0 = (a1 r<< 24) & 0xff # 000c */
-            tcg_out_rlw(s, RLWINM, a0, a1, 24, 24, 31);
-            /* a0 = (a0 & ~0xff00) | (a1 r<< 8) & 0xff00 # 00dc */
-            tcg_out_rlw(s, RLWIMI, a0, a1, 8, 16, 23);
-        } else {
-            /* r0 = (a1 r<< 8) & 0xff00 # 00d0 */
-            tcg_out_rlw(s, RLWINM, TCG_REG_R0, a1, 8, 16, 23);
-            /* a0 = (a1 r<< 24) & 0xff # 000c */
-            tcg_out_rlw(s, RLWINM, a0, a1, 24, 24, 31);
-            /* a0 = a0 | r0 # 00dc */
-            tcg_out32(s, OR | SAB(TCG_REG_R0, a0, a0));
-        }
+        tcg_out_bswap16(s, args[0], args[1]);
         break;
 
     case INDEX_op_bswap32_i32:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 08/28] tcg/ppc: Split out tcg_out_bswap32
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:30   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 09/28] tcg/ppc: Split out tcg_out_bswap64 Richard Henderson
                   ` (20 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 64c24382a8..f0e42e4b88 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -798,6 +798,17 @@ static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
     tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
 }
 
+static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
+
+    /* Stolen from gcc's builtin_bswap32.             src = abcd */
+    tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = bcda */
+    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = dcda */
+    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 16, 23);  /* tmp = dcba */
+    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
+}
+
 /* Emit a move into ret of arg, if it can be done in one insn.  */
 static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
 {
@@ -2791,24 +2802,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_bswap16_i64:
         tcg_out_bswap16(s, args[0], args[1]);
         break;
-
     case INDEX_op_bswap32_i32:
     case INDEX_op_bswap32_i64:
-        /* Stolen from gcc's builtin_bswap32 */
-        a1 = args[1];
-        a0 = args[0] == a1 ? TCG_REG_R0 : args[0];
-
-        /* a1 = args[1] # abcd */
-        /* a0 = rotate_left (a1, 8) # bcda */
-        tcg_out_rlw(s, RLWINM, a0, a1, 8, 0, 31);
-        /* a0 = (a0 & ~0xff000000) | ((a1 r<< 24) & 0xff000000) # dcda */
-        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 0, 7);
-        /* a0 = (a0 & ~0x0000ff00) | ((a1 r<< 24) & 0x0000ff00) # dcba */
-        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 16, 23);
-
-        if (a0 == TCG_REG_R0) {
-            tcg_out_mov(s, TCG_TYPE_REG, args[0], a0);
-        }
+        tcg_out_bswap32(s, args[0], args[1]);
         break;
 
     case INDEX_op_bswap64_i64:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 09/28] tcg/ppc: Split out tcg_out_bswap64
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 08/28] tcg/ppc: Split out tcg_out_bswap32 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:35   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 10/28] tcg/ppc: Support bswap flags Richard Henderson
                   ` (19 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 51 +++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index f0e42e4b88..690c77b4da 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -809,6 +809,26 @@ static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src)
     tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
 }
 
+static void tcg_out_bswap64(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    TCGReg t0 = dst == src ? TCG_REG_R0 : dst;
+    TCGReg t1 = dst == src ? dst : TCG_REG_R0;
+
+                                                   /* src = abcd efgh */
+    tcg_out_rlw(s, RLWINM, t0, src, 8, 0, 31);     /* t0  = 0000 fghe */
+    tcg_out_rlw(s, RLWIMI, t0, src, 24, 0, 7);     /* t0  = 0000 hghe */
+    tcg_out_rlw(s, RLWIMI, t0, src, 24, 16, 23);   /* t0  = 0000 hgfe */
+
+    tcg_out_rld(s, RLDICL, t0, t0, 32, 0);         /* t0  = hgfe 0000 */
+    tcg_out_rld(s, RLDICL, t1, src, 32, 0);        /* t1  = efgh abcd */
+
+    tcg_out_rlw(s, RLWIMI, t0, t1, 8, 0, 31);      /* t0  = hgfe bcda */
+    tcg_out_rlw(s, RLWIMI, t0, t1, 24, 0, 7);      /* t0  = hgfe dcda */
+    tcg_out_rlw(s, RLWIMI, t0, t1, 24, 16, 23);    /* t0  = hgfe dcba */
+
+    tcg_out_mov(s, TCG_TYPE_REG, dst, t0);
+}
+
 /* Emit a move into ret of arg, if it can be done in one insn.  */
 static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
 {
@@ -2806,37 +2826,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_bswap32_i64:
         tcg_out_bswap32(s, args[0], args[1]);
         break;
-
     case INDEX_op_bswap64_i64:
-        a0 = args[0], a1 = args[1], a2 = TCG_REG_R0;
-        if (a0 == a1) {
-            a0 = TCG_REG_R0;
-            a2 = a1;
-        }
-
-        /* a1 = # abcd efgh */
-        /* a0 = rl32(a1, 8) # 0000 fghe */
-        tcg_out_rlw(s, RLWINM, a0, a1, 8, 0, 31);
-        /* a0 = dep(a0, rl32(a1, 24), 0xff000000) # 0000 hghe */
-        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 0, 7);
-        /* a0 = dep(a0, rl32(a1, 24), 0x0000ff00) # 0000 hgfe */
-        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 16, 23);
-
-        /* a0 = rl64(a0, 32) # hgfe 0000 */
-        /* a2 = rl64(a1, 32) # efgh abcd */
-        tcg_out_rld(s, RLDICL, a0, a0, 32, 0);
-        tcg_out_rld(s, RLDICL, a2, a1, 32, 0);
-
-        /* a0 = dep(a0, rl32(a2, 8), 0xffffffff)  # hgfe bcda */
-        tcg_out_rlw(s, RLWIMI, a0, a2, 8, 0, 31);
-        /* a0 = dep(a0, rl32(a2, 24), 0xff000000) # hgfe dcda */
-        tcg_out_rlw(s, RLWIMI, a0, a2, 24, 0, 7);
-        /* a0 = dep(a0, rl32(a2, 24), 0x0000ff00) # hgfe dcba */
-        tcg_out_rlw(s, RLWIMI, a0, a2, 24, 16, 23);
-
-        if (a0 == 0) {
-            tcg_out_mov(s, TCG_TYPE_REG, args[0], a0);
-        }
+        tcg_out_bswap64(s, args[0], args[1]);
         break;
 
     case INDEX_op_deposit_i32:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/28] tcg/ppc: Support bswap flags
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 09/28] tcg/ppc: Split out tcg_out_bswap64 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:38   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 11/28] tcg/ppc: Use power10 byte-reverse instructions Richard Henderson
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

For INDEX_op_bswap32_i32, pass 0 for flags: input not zero-extended,
output does not need extension within the host 64-bit register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 690c77b4da..e868417168 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -788,25 +788,35 @@ static inline void tcg_out_sari64(TCGContext *s, TCGReg dst, TCGReg src, int c)
     tcg_out32(s, SRADI | RA(dst) | RS(src) | SH(c & 0x1f) | ((c >> 4) & 2));
 }
 
-static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
+static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src, int flags)
 {
     TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
 
-                                                   /* src = abcd */
-    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 000c */
-    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 00dc */
-    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
+                                                   /* src = xxxx abcd */
+    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 0000 000c */
+    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 0000 00dc */
+
+    if (flags & TCG_BSWAP_OS) {
+        tcg_out_ext16s(s, dst, tmp);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
+    }
 }
 
-static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src)
+static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src, int flags)
 {
     TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
 
-    /* Stolen from gcc's builtin_bswap32.             src = abcd */
-    tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = bcda */
-    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = dcda */
-    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 16, 23);  /* tmp = dcba */
-    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
+    /* Stolen from gcc's builtin_bswap32.             src = xxxx abcd */
+    tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = 0000 bcda */
+    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = 0000 dcda */
+    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 16, 23);  /* tmp = 0000 dcba */
+
+    if (flags & TCG_BSWAP_OS) {
+        tcg_out_ext32s(s, dst, tmp);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
+    }
 }
 
 static void tcg_out_bswap64(TCGContext *s, TCGReg dst, TCGReg src)
@@ -2820,11 +2830,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_bswap16_i32:
     case INDEX_op_bswap16_i64:
-        tcg_out_bswap16(s, args[0], args[1]);
+        tcg_out_bswap16(s, args[0], args[1], args[2]);
         break;
     case INDEX_op_bswap32_i32:
+        tcg_out_bswap32(s, args[0], args[1], 0);
+        break;
     case INDEX_op_bswap32_i64:
-        tcg_out_bswap32(s, args[0], args[1]);
+        tcg_out_bswap32(s, args[0], args[1], args[2]);
         break;
     case INDEX_op_bswap64_i64:
         tcg_out_bswap64(s, args[0], args[1]);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/28] tcg/ppc: Use power10 byte-reverse instructions
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 10/28] tcg/ppc: Support bswap flags Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  8:37 ` [PATCH 12/28] tcg/s390: Support bswap flags Richard Henderson
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index e868417168..af87643f54 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -413,6 +413,10 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
 #define SRAD   XO31(794)
 #define SRADI  XO31(413<<1)
 
+#define BRH    XO31(219)
+#define BRW    XO31(155)
+#define BRD    XO31(187)
+
 #define TW     XO31( 4)
 #define TRAP   (TW | TO(31))
 
@@ -748,6 +752,11 @@ static inline void tcg_out_ext16s(TCGContext *s, TCGReg dst, TCGReg src)
     tcg_out32(s, EXTSH | RA(dst) | RS(src));
 }
 
+static inline void tcg_out_ext16u(TCGContext *s, TCGReg dst, TCGReg src)
+{
+    tcg_out32(s, ANDI | SAI(src, dst, 0xffff));
+}
+
 static inline void tcg_out_ext32s(TCGContext *s, TCGReg dst, TCGReg src)
 {
     tcg_out32(s, EXTSW | RA(dst) | RS(src));
@@ -792,6 +801,16 @@ static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src, int flags)
 {
     TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
 
+    if (have_isa_3_10) {
+        tcg_out32(s, BRH | RA(dst) | RS(src));
+        if (flags & TCG_BSWAP_OS) {
+            tcg_out_ext16s(s, dst, dst);
+        } else if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            tcg_out_ext16u(s, dst, dst);
+        }
+        return;
+    }
+
                                                    /* src = xxxx abcd */
     tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 0000 000c */
     tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 0000 00dc */
@@ -807,6 +826,16 @@ static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src, int flags)
 {
     TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
 
+    if (have_isa_3_10) {
+        tcg_out32(s, BRW | RA(dst) | RS(src));
+        if (flags & TCG_BSWAP_OS) {
+            tcg_out_ext32s(s, dst, src);
+        } else if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            tcg_out_ext32u(s, dst, dst);
+        }
+        return;
+    }
+
     /* Stolen from gcc's builtin_bswap32.             src = xxxx abcd */
     tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = 0000 bcda */
     tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = 0000 dcda */
@@ -824,6 +853,11 @@ static void tcg_out_bswap64(TCGContext *s, TCGReg dst, TCGReg src)
     TCGReg t0 = dst == src ? TCG_REG_R0 : dst;
     TCGReg t1 = dst == src ? dst : TCG_REG_R0;
 
+    if (have_isa_3_10) {
+        tcg_out32(s, BRD | RA(dst) | RS(src));
+        return;
+    }
+
                                                    /* src = abcd efgh */
     tcg_out_rlw(s, RLWINM, t0, src, 8, 0, 31);     /* t0  = 0000 fghe */
     tcg_out_rlw(s, RLWIMI, t0, src, 24, 0, 7);     /* t0  = 0000 hghe */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/28] tcg/s390: Support bswap flags
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 11/28] tcg/ppc: Use power10 byte-reverse instructions Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  8:37 ` [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16 Richard Henderson
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

For INDEX_op_bswap16_i64, use 64-bit instructions so that we can
easily provide the extension to 64-bits.  Drop the special case,
previously used, where the input is already zero-extended -- the
minor code size savings is not worth the complication.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/s390/tcg-target.c.inc | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/tcg/s390/tcg-target.c.inc b/tcg/s390/tcg-target.c.inc
index 5fe073f09a..b82cf19f09 100644
--- a/tcg/s390/tcg-target.c.inc
+++ b/tcg/s390/tcg-target.c.inc
@@ -1951,15 +1951,37 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_ext16u(s, TCG_TYPE_I32, args[0], args[1]);
         break;
 
-    OP_32_64(bswap16):
-        /* The TCG bswap definition requires bits 0-47 already be zero.
-           Thus we don't need the G-type insns to implement bswap16_i64.  */
-        tcg_out_insn(s, RRE, LRVR, args[0], args[1]);
-        tcg_out_sh32(s, RS_SRL, args[0], TCG_REG_NONE, 16);
+    case INDEX_op_bswap16_i32:
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        tcg_out_insn(s, RRE, LRVR, a0, a1);
+        if (a2 & TCG_BSWAP_OS) {
+            tcg_out_sh32(s, RS_SRA, a0, TCG_REG_NONE, 16);
+        } else {
+            tcg_out_sh32(s, RS_SRL, a0, TCG_REG_NONE, 16);
+        }
         break;
-    OP_32_64(bswap32):
+    case INDEX_op_bswap16_i64:
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        tcg_out_insn(s, RRE, LRVGR, a0, a1);
+        if (a2 & TCG_BSWAP_OS) {
+            tcg_out_sh64(s, RSY_SRAG, a0, a0, TCG_REG_NONE, 48);
+        } else {
+            tcg_out_sh64(s, RSY_SRLG, a0, a0, TCG_REG_NONE, 48);
+        }
+        break;
+
+    case INDEX_op_bswap32_i32:
         tcg_out_insn(s, RRE, LRVR, args[0], args[1]);
         break;
+    case INDEX_op_bswap32_i64:
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        tcg_out_insn(s, RRE, LRVR, a0, a1);
+        if (a2 & TCG_BSWAP_OS) {
+            tgen_ext32s(s, a0, a0);
+        } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            tgen_ext32u(s, a0, a0);
+        }
+        break;
 
     case INDEX_op_add2_i32:
         if (const_args[4]) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 12/28] tcg/s390: Support bswap flags Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-22  6:36   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32 Richard Henderson
                   ` (15 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Merge tcg_out_bswap16 and tcg_out_bswap16s.  Use the flags
in the internal uses for loads and stores.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/mips/tcg-target.c.inc | 60 ++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 5944448b2a..7a5634419c 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -540,39 +540,36 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 }
 
-static inline void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg)
+static void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
 {
+    /* ret and arg can't be register tmp0 */
+    tcg_debug_assert(ret != TCG_TMP0);
+    tcg_debug_assert(arg != TCG_TMP0);
+
     if (use_mips32r2_instructions) {
         tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
-    } else {
-        /* ret and arg can't be register at */
-        if (ret == TCG_TMP0 || arg == TCG_TMP0) {
-            tcg_abort();
+        if (flags & TCG_BSWAP_OS) {
+            tcg_out_opc_reg(s, OPC_SEH, ret, 0, ret);
+        } else if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
+            tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xffff);
         }
-
-        tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
-        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 8);
-        tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xff00);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+        return;
     }
-}
 
-static inline void tcg_out_bswap16s(TCGContext *s, TCGReg ret, TCGReg arg)
-{
-    if (use_mips32r2_instructions) {
-        tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
-        tcg_out_opc_reg(s, OPC_SEH, ret, 0, ret);
-    } else {
-        /* ret and arg can't be register at */
-        if (ret == TCG_TMP0 || arg == TCG_TMP0) {
-            tcg_abort();
-        }
-
-        tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
+    tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
+    if (!(flags & TCG_BSWAP_IZ)) {
+        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
+    }
+    if (flags & TCG_BSWAP_OS) {
         tcg_out_opc_sa(s, OPC_SLL, ret, arg, 24);
         tcg_out_opc_sa(s, OPC_SRA, ret, ret, 16);
-        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
+    } else {
+        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 8);
+        if (flags & TCG_BSWAP_OZ) {
+            tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xff00);
+        }
     }
+    tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
 }
 
 static void tcg_out_bswap_subr(TCGContext *s, const tcg_insn_unit *sub)
@@ -1367,14 +1364,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
         break;
     case MO_UW | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
-        tcg_out_bswap16(s, lo, TCG_TMP1);
+        tcg_out_bswap16(s, lo, TCG_TMP1, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
         break;
     case MO_UW:
         tcg_out_opc_imm(s, OPC_LHU, lo, base, 0);
         break;
     case MO_SW | MO_BSWAP:
         tcg_out_opc_imm(s, OPC_LHU, TCG_TMP1, base, 0);
-        tcg_out_bswap16s(s, lo, TCG_TMP1);
+        tcg_out_bswap16(s, lo, TCG_TMP1, TCG_BSWAP_IZ | TCG_BSWAP_OS);
         break;
     case MO_SW:
         tcg_out_opc_imm(s, OPC_LH, lo, base, 0);
@@ -1514,8 +1511,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo, TCGReg hi,
         break;
 
     case MO_16 | MO_BSWAP:
-        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP1, lo, 0xffff);
-        tcg_out_bswap16(s, TCG_TMP1, TCG_TMP1);
+        tcg_out_bswap16(s, TCG_TMP1, lo, 0);
         lo = TCG_TMP1;
         /* FALLTHRU */
     case MO_16:
@@ -1933,10 +1929,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_not_i64:
         i1 = OPC_NOR;
         goto do_unary;
-    case INDEX_op_bswap16_i32:
-    case INDEX_op_bswap16_i64:
-        i1 = OPC_WSBH;
-        goto do_unary;
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
         i1 = OPC_SEB;
@@ -1948,6 +1940,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_opc_reg(s, i1, a0, TCG_REG_ZERO, a1);
         break;
 
+    case INDEX_op_bswap16_i32:
+    case INDEX_op_bswap16_i64:
+        tcg_out_bswap16(s, a0, a1, a2);
+        break;
     case INDEX_op_bswap32_i32:
         tcg_out_bswap32(s, a0, a1);
         break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  9:31   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 15/28] tcg/tci: Support bswap flags Richard Henderson
                   ` (14 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Merge tcg_out_bswap32 and tcg_out_bswap32s.  Use the flags
in the internal uses for loads and stores.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/mips/tcg-target.c.inc | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 7a5634419c..e3698274eb 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -578,27 +578,20 @@ static void tcg_out_bswap_subr(TCGContext *s, const tcg_insn_unit *sub)
     tcg_debug_assert(ok);
 }
 
-static void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg)
+static void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
 {
     if (use_mips32r2_instructions) {
         tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
         tcg_out_opc_sa(s, OPC_ROTR, ret, ret, 16);
+        if (flags & TCG_BSWAP_OZ) {
+            tcg_out_opc_bf(s, OPC_DEXT, ret, ret, 31, 0);
+        }
     } else {
-        tcg_out_bswap_subr(s, bswap32_addr);
-        /* delay slot -- never omit the insn, like tcg_out_mov might.  */
-        tcg_out_opc_reg(s, OPC_OR, TCG_TMP0, arg, TCG_REG_ZERO);
-        tcg_out_mov(s, TCG_TYPE_I32, ret, TCG_TMP3);
-    }
-}
-
-static void tcg_out_bswap32u(TCGContext *s, TCGReg ret, TCGReg arg)
-{
-    if (use_mips32r2_instructions) {
-        tcg_out_opc_reg(s, OPC_DSBH, ret, 0, arg);
-        tcg_out_opc_reg(s, OPC_DSHD, ret, 0, ret);
-        tcg_out_dsrl(s, ret, ret, 32);
-    } else {
-        tcg_out_bswap_subr(s, bswap32u_addr);
+        if (flags & TCG_BSWAP_OZ) {
+            tcg_out_bswap_subr(s, bswap32u_addr);
+        } else {
+            tcg_out_bswap_subr(s, bswap32_addr);
+        }
         /* delay slot -- never omit the insn, like tcg_out_mov might.  */
         tcg_out_opc_reg(s, OPC_OR, TCG_TMP0, arg, TCG_REG_ZERO);
         tcg_out_mov(s, TCG_TYPE_I32, ret, TCG_TMP3);
@@ -1380,7 +1373,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
         if (TCG_TARGET_REG_BITS == 64 && is_64) {
             if (use_mips32r2_instructions) {
                 tcg_out_opc_imm(s, OPC_LWU, lo, base, 0);
-                tcg_out_bswap32u(s, lo, lo);
+                tcg_out_bswap32(s, lo, lo, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             } else {
                 tcg_out_bswap_subr(s, bswap32u_addr);
                 /* delay slot */
@@ -1393,7 +1386,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
     case MO_SL | MO_BSWAP:
         if (use_mips32r2_instructions) {
             tcg_out_opc_imm(s, OPC_LW, lo, base, 0);
-            tcg_out_bswap32(s, lo, lo);
+            tcg_out_bswap32(s, lo, lo, 0);
         } else {
             tcg_out_bswap_subr(s, bswap32_addr);
             /* delay slot */
@@ -1519,7 +1512,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo, TCGReg hi,
         break;
 
     case MO_32 | MO_BSWAP:
-        tcg_out_bswap32(s, TCG_TMP3, lo);
+        tcg_out_bswap32(s, TCG_TMP3, lo, 0);
         lo = TCG_TMP3;
         /* FALLTHRU */
     case MO_32:
@@ -1538,9 +1531,9 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo, TCGReg hi,
             tcg_out_opc_imm(s, OPC_SW, TCG_TMP0, base, 0);
             tcg_out_opc_imm(s, OPC_SW, TCG_TMP1, base, 4);
         } else {
-            tcg_out_bswap32(s, TCG_TMP3, MIPS_BE ? lo : hi);
+            tcg_out_bswap32(s, TCG_TMP3, MIPS_BE ? lo : hi, 0);
             tcg_out_opc_imm(s, OPC_SW, TCG_TMP3, base, 0);
-            tcg_out_bswap32(s, TCG_TMP3, MIPS_BE ? hi : lo);
+            tcg_out_bswap32(s, TCG_TMP3, MIPS_BE ? hi : lo, 0);
             tcg_out_opc_imm(s, OPC_SW, TCG_TMP3, base, 4);
         }
         break;
@@ -1945,10 +1938,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_bswap16(s, a0, a1, a2);
         break;
     case INDEX_op_bswap32_i32:
-        tcg_out_bswap32(s, a0, a1);
+        tcg_out_bswap32(s, a0, a1, 0);
         break;
     case INDEX_op_bswap32_i64:
-        tcg_out_bswap32u(s, a0, a1);
+        tcg_out_bswap32(s, a0, a1, a2);
         break;
     case INDEX_op_bswap64_i64:
         tcg_out_bswap64(s, a0, a1);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 15/28] tcg/tci: Support bswap flags
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  9:33   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 16/28] tcg: Handle new bswap flags during optimize Richard Henderson
                   ` (13 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

The existing interpreter zero-extends, ignoring high bits.
Simply add a separate sign-extension opcode if required.
Ensure that the interpreter supports ext16s when bswap16 is enabled.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tci.c                |  3 ++-
 tcg/tci/tcg-target.c.inc | 23 ++++++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index d68c5a4e55..109522a865 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -733,7 +733,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState *env,
             regs[r0] = (int8_t)regs[r1];
             break;
 #endif
-#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
+#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64 || \
+    TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
         CASE_32_64(ext16s)
             tci_args_rr(&tb_ptr, &r0, &r1);
             regs[r0] = (int16_t)regs[r1];
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 823ecd5d35..1e92688dca 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -617,6 +617,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const TCGArg args[TCG_MAX_OP_ARGS],
                        const int const_args[TCG_MAX_OP_ARGS])
 {
+    TCGOpcode exts;
+
     switch (opc) {
     case INDEX_op_exit_tb:
         tcg_out_op_p(s, opc, (void *)args[0]);
@@ -710,12 +712,27 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     CASE_64(ext32u)      /* Optional (TCG_TARGET_HAS_ext32u_i64). */
     CASE_64(ext_i32)
     CASE_64(extu_i32)
-    CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
-    CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
-    CASE_64(bswap64)     /* Optional (TCG_TARGET_HAS_bswap64_i64). */
+    case INDEX_op_bswap32_i32: /* Optional (TCG_TARGET_HAS_bswap32_i32). */
+    case INDEX_op_bswap64_i64: /* Optional (TCG_TARGET_HAS_bswap64_i64). */
         tcg_out_op_rr(s, opc, args[0], args[1]);
         break;
 
+    case INDEX_op_bswap16_i32: /* Optional (TCG_TARGET_HAS_bswap16_i32). */
+        exts = INDEX_op_ext16s_i32;
+        goto do_bswap;
+    case INDEX_op_bswap16_i64: /* Optional (TCG_TARGET_HAS_bswap16_i64). */
+        exts = INDEX_op_ext16s_i64;
+        goto do_bswap;
+    case INDEX_op_bswap32_i64: /* Optional (TCG_TARGET_HAS_bswap32_i64). */
+        exts = INDEX_op_ext32s_i64;
+    do_bswap:
+        /* The base tci bswaps zero-extend, and ignore high bits. */
+        tcg_out_op_rr(s, opc, args[0], args[1]);
+        if (args[2] & TCG_BSWAP_OS) {
+            tcg_out_op_rr(s, exts, args[0], args[0]);
+        }
+        break;
+
 #if TCG_TARGET_REG_BITS == 32
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 16/28] tcg: Handle new bswap flags during optimize
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 15/28] tcg/tci: Support bswap flags Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 14:47   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64 Richard Henderson
                   ` (12 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Notice when the input is known to be zero-extended and force
the TCG_BSWAP_IZ flag on.  Honor the TCG_BSWAP_OS bit during
constant folding.  Propagate the input to the output mask.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 56 +++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 37c902283e..3b6983fbef 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -354,10 +354,12 @@ static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
         return (uint16_t)x;
 
     CASE_OP_32_64(bswap16):
-        return bswap16(x);
+        x = bswap16(x);
+        return y & TCG_BSWAP_OS ? (int16_t)x : x;
 
     CASE_OP_32_64(bswap32):
-        return bswap32(x);
+        x = bswap32(x);
+        return y & TCG_BSWAP_OS ? (int32_t)x : x;
 
     case INDEX_op_bswap64_i64:
         return bswap64(x);
@@ -1028,6 +1030,42 @@ void tcg_optimize(TCGContext *s)
             }
             break;
 
+        CASE_OP_32_64(bswap16):
+            mask = arg_info(op->args[1])->mask;
+            if (mask <= 0xffff) {
+                op->args[2] |= TCG_BSWAP_IZ;
+            }
+            mask = bswap16(mask);
+            switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
+            case TCG_BSWAP_OZ:
+                break;
+            case TCG_BSWAP_OS:
+                mask = (int16_t)mask;
+                break;
+            default: /* undefined high bits */
+                mask |= MAKE_64BIT_MASK(16, 48);
+                break;
+            }
+            break;
+
+        case INDEX_op_bswap32_i64:
+            mask = arg_info(op->args[1])->mask;
+            if (mask <= 0xffffffffu) {
+                op->args[2] |= TCG_BSWAP_IZ;
+            }
+            mask = bswap32(mask);
+            switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
+            case TCG_BSWAP_OZ:
+                break;
+            case TCG_BSWAP_OS:
+                mask = (int32_t)mask;
+                break;
+            default: /* undefined high bits */
+                mask |= MAKE_64BIT_MASK(32, 32);
+                break;
+            }
+            break;
+
         default:
             break;
         }
@@ -1134,9 +1172,6 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext16s):
         CASE_OP_32_64(ext16u):
         CASE_OP_32_64(ctpop):
-        CASE_OP_32_64(bswap16):
-        CASE_OP_32_64(bswap32):
-        case INDEX_op_bswap64_i64:
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
         case INDEX_op_ext_i32_i64:
@@ -1150,6 +1185,17 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
+        CASE_OP_32_64(bswap16):
+        CASE_OP_32_64(bswap32):
+        case INDEX_op_bswap64_i64:
+            if (arg_is_const(op->args[1])) {
+                tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
+                                          op->args[2]);
+                tcg_opt_gen_movi(s, &temps_used, op, op->args[0], tmp);
+                break;
+            }
+            goto do_default;
+
         CASE_OP_32_64(add):
         CASE_OP_32_64(sub):
         CASE_OP_32_64(mul):
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 16/28] tcg: Handle new bswap flags during optimize Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  9:41   ` Philippe Mathieu-Daudé
  2021-06-21 15:01   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_* Richard Henderson
                   ` (11 subsequent siblings)
  28 siblings, 2 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Implement the new semantics in the fallback expansion.
Change all callers to supply the flags that keep the
semantics unchanged locally.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg-op.h            |   8 +--
 target/arm/translate-a64.c      |  12 ++--
 target/arm/translate.c          |   2 +-
 target/i386/tcg/translate.c     |   2 +-
 target/mips/tcg/mxu_translate.c |   2 +-
 target/s390x/translate.c        |   4 +-
 target/sh4/translate.c          |   2 +-
 tcg/tcg-op.c                    | 121 ++++++++++++++++++++++----------
 8 files changed, 99 insertions(+), 54 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index ef8a008ea7..caf6ba2149 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -330,7 +330,7 @@ void tcg_gen_ext8s_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_ext16s_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_ext8u_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg);
-void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg);
+void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, int flags);
 void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_smin_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_smax_i32(TCGv_i32, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -525,8 +525,8 @@ void tcg_gen_ext32s_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_ext8u_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_ext16u_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_ext32u_i64(TCGv_i64 ret, TCGv_i64 arg);
-void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg);
-void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg);
+void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg, int flags);
+void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg, int flags);
 void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_smin_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_smax_i64(TCGv_i64, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -1185,7 +1185,7 @@ void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
 #define tcg_gen_ext32u_tl tcg_gen_mov_i32
 #define tcg_gen_ext32s_tl tcg_gen_mov_i32
 #define tcg_gen_bswap16_tl tcg_gen_bswap16_i32
-#define tcg_gen_bswap32_tl tcg_gen_bswap32_i32
+#define tcg_gen_bswap32_tl(D, S, F) tcg_gen_bswap32_i32(D, S)
 #define tcg_gen_bswap_tl tcg_gen_bswap32_i32
 #define tcg_gen_concat_tl_i64 tcg_gen_concat_i32_i64
 #define tcg_gen_extr_i64_tl tcg_gen_extr_i64_i32
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 8713dfec17..e0785ce859 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5437,15 +5437,15 @@ static void handle_rev32(DisasContext *s, unsigned int sf,
 
         /* bswap32_i64 requires zero high word */
         tcg_gen_ext32u_i64(tcg_tmp, tcg_rn);
-        tcg_gen_bswap32_i64(tcg_rd, tcg_tmp);
+        tcg_gen_bswap32_i64(tcg_rd, tcg_tmp, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
         tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-        tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp);
+        tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
         tcg_gen_concat32_i64(tcg_rd, tcg_rd, tcg_tmp);
 
         tcg_temp_free_i64(tcg_tmp);
     } else {
         tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, rn));
-        tcg_gen_bswap32_i64(tcg_rd, tcg_rd);
+        tcg_gen_bswap32_i64(tcg_rd, tcg_rd, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     }
 }
 
@@ -12436,10 +12436,12 @@ static void handle_rev(DisasContext *s, int opcode, bool u,
             read_vec_element(s, tcg_tmp, rn, i, grp_size);
             switch (grp_size) {
             case MO_16:
-                tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
+                tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp,
+                                    TCG_BSWAP_IZ | TCG_BSWAP_OZ);
                 break;
             case MO_32:
-                tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp);
+                tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp,
+                                    TCG_BSWAP_IZ | TCG_BSWAP_OZ);
                 break;
             case MO_64:
                 tcg_gen_bswap64_i64(tcg_tmp, tcg_tmp);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 8e0e55c1e0..6b88163e3a 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -342,7 +342,7 @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var)
 static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
 {
     tcg_gen_ext16u_i32(var, var);
-    tcg_gen_bswap16_i32(var, var);
+    tcg_gen_bswap16_i32(var, var, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     tcg_gen_ext16s_i32(dest, var);
 }
 
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index a7f5c0c8f2..e8a9dcd21a 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -7203,7 +7203,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         {
             gen_op_mov_v_reg(s, MO_32, s->T0, reg);
             tcg_gen_ext32u_tl(s->T0, s->T0);
-            tcg_gen_bswap32_tl(s->T0, s->T0);
+            tcg_gen_bswap32_tl(s->T0, s->T0, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             gen_op_mov_reg_v(s, MO_32, reg, s->T0);
         }
         break;
diff --git a/target/mips/tcg/mxu_translate.c b/target/mips/tcg/mxu_translate.c
index fb0a811af6..c12cf78df7 100644
--- a/target/mips/tcg/mxu_translate.c
+++ b/target/mips/tcg/mxu_translate.c
@@ -861,7 +861,7 @@ static void gen_mxu_s32ldd_s32lddr(DisasContext *ctx)
 
     if (sel == 1) {
         /* S32LDDR */
-        tcg_gen_bswap32_tl(t1, t1);
+        tcg_gen_bswap32_tl(t1, t1, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     }
     gen_store_mxu_gpr(t1, XRa);
 
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index e243624d2a..03dab9f350 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -3939,13 +3939,13 @@ static DisasJumpType op_rosbg(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_rev16(DisasContext *s, DisasOps *o)
 {
-    tcg_gen_bswap16_i64(o->out, o->in2);
+    tcg_gen_bswap16_i64(o->out, o->in2, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     return DISAS_NEXT;
 }
 
 static DisasJumpType op_rev32(DisasContext *s, DisasOps *o)
 {
-    tcg_gen_bswap32_i64(o->out, o->in2);
+    tcg_gen_bswap32_i64(o->out, o->in2, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
     return DISAS_NEXT;
 }
 
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9312790623..147219759b 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -677,7 +677,7 @@ static void _decode_opc(DisasContext * ctx)
 	{
             TCGv low = tcg_temp_new();
 	    tcg_gen_ext16u_i32(low, REG(B7_4));
-	    tcg_gen_bswap16_i32(low, low);
+	    tcg_gen_bswap16_i32(low, low, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             tcg_gen_deposit_i32(REG(B11_8), REG(B7_4), low, 0, 16);
 	    tcg_temp_free(low);
 	}
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index dc65577e2f..3763285bb0 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1001,20 +1001,35 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg)
     }
 }
 
-/* Note: we assume the two high bytes are set to zero */
-void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
+void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, int flags)
 {
+    /* Only one extension flag may be present. */
+    tcg_debug_assert(!(flags & TCG_BSWAP_OS) || !(flags & TCG_BSWAP_OZ));
+
     if (TCG_TARGET_HAS_bswap16_i32) {
-        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg,
-                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg, flags);
     } else {
         TCGv_i32 t0 = tcg_temp_new_i32();
+        TCGv_i32 t1 = tcg_temp_new_i32();
 
-        tcg_gen_ext8u_i32(t0, arg);
-        tcg_gen_shli_i32(t0, t0, 8);
-        tcg_gen_shri_i32(ret, arg, 8);
-        tcg_gen_or_i32(ret, ret, t0);
+        tcg_gen_shri_i32(t0, arg, 8);
+        if (!(flags & TCG_BSWAP_IZ)) {
+            tcg_gen_ext8u_i32(t0, t0);
+        }
+
+        if (flags & TCG_BSWAP_OS) {
+            tcg_gen_shli_i32(t1, t1, 24);
+            tcg_gen_sari_i32(t1, t1, 16);
+        } else if (flags & TCG_BSWAP_OZ) {
+            tcg_gen_ext8u_i32(t1, arg);
+            tcg_gen_shli_i32(t1, t1, 8);
+        } else {
+            tcg_gen_shli_i32(t1, arg, 8);
+        }
+
+        tcg_gen_or_i32(ret, t0, t1);
         tcg_temp_free_i32(t0);
+        tcg_temp_free_i32(t1);
     }
 }
 
@@ -1655,51 +1670,79 @@ void tcg_gen_ext32u_i64(TCGv_i64 ret, TCGv_i64 arg)
     }
 }
 
-/* Note: we assume the six high bytes are set to zero */
-void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg)
+void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg, int flags)
 {
+    /* Only one extension flag may be present. */
+    tcg_debug_assert(!(flags & TCG_BSWAP_OS) || !(flags & TCG_BSWAP_OZ));
+
     if (TCG_TARGET_REG_BITS == 32) {
-        tcg_gen_bswap16_i32(TCGV_LOW(ret), TCGV_LOW(arg));
-        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        tcg_gen_bswap16_i32(TCGV_LOW(ret), TCGV_LOW(arg), flags);
+        if (flags & TCG_BSWAP_OS) {
+            tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
+        } else {
+            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        }
     } else if (TCG_TARGET_HAS_bswap16_i64) {
-        tcg_gen_op3i_i64(INDEX_op_bswap16_i64, ret, arg,
-                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+        tcg_gen_op3i_i64(INDEX_op_bswap16_i64, ret, arg, flags);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
+        TCGv_i64 t1 = tcg_temp_new_i64();
 
-        tcg_gen_ext8u_i64(t0, arg);
-        tcg_gen_shli_i64(t0, t0, 8);
-        tcg_gen_shri_i64(ret, arg, 8);
-        tcg_gen_or_i64(ret, ret, t0);
+        tcg_gen_shri_i64(t0, arg, 8);
+        if (!(flags & TCG_BSWAP_IZ)) {
+            tcg_gen_ext8u_i64(t0, t0);
+        }
+
+        if (flags & TCG_BSWAP_OS) {
+            tcg_gen_shli_i64(t1, t1, 56);
+            tcg_gen_sari_i64(t1, t1, 48);
+        } else if (flags & TCG_BSWAP_OZ) {
+            tcg_gen_ext8u_i64(t1, arg);
+            tcg_gen_shli_i64(t1, t1, 8);
+        } else {
+            tcg_gen_shli_i64(t1, arg, 8);
+        }
+
+        tcg_gen_or_i64(ret, t0, t1);
         tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
     }
 }
 
-/* Note: we assume the four high bytes are set to zero */
-void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
+void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg, int flags)
 {
+    /* Only one extension flag may be present. */
+    tcg_debug_assert(!(flags & TCG_BSWAP_OS) || !(flags & TCG_BSWAP_OZ));
+
     if (TCG_TARGET_REG_BITS == 32) {
         tcg_gen_bswap32_i32(TCGV_LOW(ret), TCGV_LOW(arg));
-        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        if (flags & TCG_BSWAP_OS) {
+            tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
+        } else {
+            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        }
     } else if (TCG_TARGET_HAS_bswap32_i64) {
-        tcg_gen_op3i_i64(INDEX_op_bswap32_i64, ret, arg,
-                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+        tcg_gen_op3i_i64(INDEX_op_bswap32_i64, ret, arg, flags);
     } else {
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i64 t1 = tcg_temp_new_i64();
         TCGv_i64 t2 = tcg_constant_i64(0x00ff00ff);
 
-                                        /* arg = ....abcd */
-        tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
-        tcg_gen_and_i64(t1, arg, t2);   /*  t1 = .....b.d */
-        tcg_gen_and_i64(t0, t0, t2);    /*  t0 = .....a.c */
-        tcg_gen_shli_i64(t1, t1, 8);    /*  t1 = ....b.d. */
-        tcg_gen_or_i64(ret, t0, t1);    /* ret = ....badc */
+                                            /* arg = xxxxabcd */
+        tcg_gen_shri_i64(t0, arg, 8);       /*  t0 = .xxxxabc */
+        tcg_gen_and_i64(t1, arg, t2);       /*  t1 = .....b.d */
+        tcg_gen_and_i64(t0, t0, t2);        /*  t0 = .....a.c */
+        tcg_gen_shli_i64(t1, t1, 8);        /*  t1 = ....b.d. */
+        tcg_gen_or_i64(ret, t0, t1);        /* ret = ....badc */
 
-        tcg_gen_shli_i64(t1, ret, 48);  /*  t1 = dc...... */
-        tcg_gen_shri_i64(t0, ret, 16);  /*  t0 = ......ba */
-        tcg_gen_shri_i64(t1, t1, 32);   /*  t1 = ....dc.. */
-        tcg_gen_or_i64(ret, t0, t1);    /* ret = ....dcba */
+        tcg_gen_shli_i64(t1, ret, 48);      /*  t1 = dc...... */
+        tcg_gen_shri_i64(t0, ret, 16);      /*  t0 = ......ba */
+        if (flags & TCG_BSWAP_OS) {
+            tcg_gen_sari_i64(t1, t1, 32);   /*  t1 = ssssdc.. */
+        } else {
+            tcg_gen_shri_i64(t1, t1, 32);   /*  t1 = ....dc.. */
+        }
+        tcg_gen_or_i64(ret, t0, t1);        /* ret = ssssdcba */
 
         tcg_temp_free_i64(t0);
         tcg_temp_free_i64(t1);
@@ -2846,7 +2889,7 @@ void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, MemOp memop)
     if ((orig_memop ^ memop) & MO_BSWAP) {
         switch (orig_memop & MO_SIZE) {
         case MO_16:
-            tcg_gen_bswap16_i32(val, val);
+            tcg_gen_bswap16_i32(val, val, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             if (orig_memop & MO_SIGN) {
                 tcg_gen_ext16s_i32(val, val);
             }
@@ -2874,7 +2917,7 @@ void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, MemOp memop)
         switch (memop & MO_SIZE) {
         case MO_16:
             tcg_gen_ext16u_i32(swap, val);
-            tcg_gen_bswap16_i32(swap, swap);
+            tcg_gen_bswap16_i32(swap, swap, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             break;
         case MO_32:
             tcg_gen_bswap32_i32(swap, val);
@@ -2935,13 +2978,13 @@ void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, MemOp memop)
     if ((orig_memop ^ memop) & MO_BSWAP) {
         switch (orig_memop & MO_SIZE) {
         case MO_16:
-            tcg_gen_bswap16_i64(val, val);
+            tcg_gen_bswap16_i64(val, val, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             if (orig_memop & MO_SIGN) {
                 tcg_gen_ext16s_i64(val, val);
             }
             break;
         case MO_32:
-            tcg_gen_bswap32_i64(val, val);
+            tcg_gen_bswap32_i64(val, val, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             if (orig_memop & MO_SIGN) {
                 tcg_gen_ext32s_i64(val, val);
             }
@@ -2975,11 +3018,11 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, MemOp memop)
         switch (memop & MO_SIZE) {
         case MO_16:
             tcg_gen_ext16u_i64(swap, val);
-            tcg_gen_bswap16_i64(swap, swap);
+            tcg_gen_bswap16_i64(swap, swap, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             break;
         case MO_32:
             tcg_gen_ext32u_i64(swap, val);
-            tcg_gen_bswap32_i64(swap, swap);
+            tcg_gen_bswap32_i64(swap, swap, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
             break;
         case MO_64:
             tcg_gen_bswap64_i64(swap, val);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_*
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:04   ` Peter Maydell
  2021-06-22  6:41   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 19/28] tcg: Make use of bswap flags in tcg_gen_qemu_st_* Richard Henderson
                   ` (10 subsequent siblings)
  28 siblings, 2 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

We can perform any required sign-extension via TCG_BSWAP_OS.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 3763285bb0..702da7afb7 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2876,7 +2876,7 @@ void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, MemOp memop)
     orig_memop = memop;
     if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
         memop &= ~MO_BSWAP;
-        /* The bswap primitive requires zero-extended input.  */
+        /* The bswap primitive benefits from zero-extended input.  */
         if ((memop & MO_SSIZE) == MO_SW) {
             memop &= ~MO_SIGN;
         }
@@ -2889,10 +2889,9 @@ void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, MemOp memop)
     if ((orig_memop ^ memop) & MO_BSWAP) {
         switch (orig_memop & MO_SIZE) {
         case MO_16:
-            tcg_gen_bswap16_i32(val, val, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-            if (orig_memop & MO_SIGN) {
-                tcg_gen_ext16s_i32(val, val);
-            }
+            tcg_gen_bswap16_i32(val, val, (orig_memop & MO_SIGN
+                                           ? TCG_BSWAP_IZ | TCG_BSWAP_OS
+                                           : TCG_BSWAP_IZ | TCG_BSWAP_OZ));
             break;
         case MO_32:
             tcg_gen_bswap32_i32(val, val);
@@ -2965,7 +2964,7 @@ void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, MemOp memop)
     orig_memop = memop;
     if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
         memop &= ~MO_BSWAP;
-        /* The bswap primitive requires zero-extended input.  */
+        /* The bswap primitive benefits from zero-extended input.  */
         if ((memop & MO_SIGN) && (memop & MO_SIZE) < MO_64) {
             memop &= ~MO_SIGN;
         }
@@ -2976,18 +2975,15 @@ void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, MemOp memop)
     plugin_gen_mem_callbacks(addr, info);
 
     if ((orig_memop ^ memop) & MO_BSWAP) {
+        int flags = (orig_memop & MO_SIGN
+                     ? TCG_BSWAP_IZ | TCG_BSWAP_OS
+                     : TCG_BSWAP_IZ | TCG_BSWAP_OZ);
         switch (orig_memop & MO_SIZE) {
         case MO_16:
-            tcg_gen_bswap16_i64(val, val, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-            if (orig_memop & MO_SIGN) {
-                tcg_gen_ext16s_i64(val, val);
-            }
+            tcg_gen_bswap16_i64(val, val, flags);
             break;
         case MO_32:
-            tcg_gen_bswap32_i64(val, val, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-            if (orig_memop & MO_SIGN) {
-                tcg_gen_ext32s_i64(val, val);
-            }
+            tcg_gen_bswap32_i64(val, val, flags);
             break;
         case MO_64:
             tcg_gen_bswap64_i64(val, val);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 19/28] tcg: Make use of bswap flags in tcg_gen_qemu_st_*
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_* Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:05   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 20/28] target/arm: Improve REV32 Richard Henderson
                   ` (9 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

By removing TCG_BSWAP_IZ we indicate that the input is
not zero-extended, and thus can remove an explicit extend.
By removing TCG_BSWAP_OZ, we allow the implementation to
leave high bits set, which will be ignored by the store.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 702da7afb7..5dbcd1b0cf 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2915,8 +2915,7 @@ void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, MemOp memop)
         swap = tcg_temp_new_i32();
         switch (memop & MO_SIZE) {
         case MO_16:
-            tcg_gen_ext16u_i32(swap, val);
-            tcg_gen_bswap16_i32(swap, swap, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+            tcg_gen_bswap16_i32(swap, val, 0);
             break;
         case MO_32:
             tcg_gen_bswap32_i32(swap, val);
@@ -3013,12 +3012,10 @@ void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, MemOp memop)
         swap = tcg_temp_new_i64();
         switch (memop & MO_SIZE) {
         case MO_16:
-            tcg_gen_ext16u_i64(swap, val);
-            tcg_gen_bswap16_i64(swap, swap, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+            tcg_gen_bswap16_i64(swap, val, 0);
             break;
         case MO_32:
-            tcg_gen_ext32u_i64(swap, val);
-            tcg_gen_bswap32_i64(swap, swap, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+            tcg_gen_bswap32_i64(swap, val, 0);
             break;
         case MO_64:
             tcg_gen_bswap64_i64(swap, val);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 20/28] target/arm: Improve REV32
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 19/28] tcg: Make use of bswap flags in tcg_gen_qemu_st_* Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  9:08   ` Philippe Mathieu-Daudé
  2021-06-21 15:08   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 21/28] target/arm: Improve vector REV Richard Henderson
                   ` (8 subsequent siblings)
  28 siblings, 2 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

For the sf version, we are performing two 32-bit bswaps
in either half of the register.  This is equivalent to
performing one 64-bit bswap followed by a rotate.

For the non-sf version, we can remove TCG_BSWAP_IZ
and the preceding zero-extension.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e0785ce859..ea9d2d4647 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -5430,22 +5430,13 @@ static void handle_rev32(DisasContext *s, unsigned int sf,
                          unsigned int rn, unsigned int rd)
 {
     TCGv_i64 tcg_rd = cpu_reg(s, rd);
+    TCGv_i64 tcg_rn = cpu_reg(s, rn);
 
     if (sf) {
-        TCGv_i64 tcg_tmp = tcg_temp_new_i64();
-        TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
-
-        /* bswap32_i64 requires zero high word */
-        tcg_gen_ext32u_i64(tcg_tmp, tcg_rn);
-        tcg_gen_bswap32_i64(tcg_rd, tcg_tmp, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-        tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-        tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-        tcg_gen_concat32_i64(tcg_rd, tcg_rd, tcg_tmp);
-
-        tcg_temp_free_i64(tcg_tmp);
+        tcg_gen_bswap64_i64(tcg_rd, tcg_rn);
+        tcg_gen_rotri_i64(tcg_rd, tcg_rd, 32);
     } else {
-        tcg_gen_ext32u_i64(tcg_rd, cpu_reg(s, rn));
-        tcg_gen_bswap32_i64(tcg_rd, tcg_rd, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+        tcg_gen_bswap32_i64(tcg_rd, tcg_rn, TCG_BSWAP_OZ);
     }
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 21/28] target/arm: Improve vector REV
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (19 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 20/28] target/arm: Improve REV32 Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:10   ` Peter Maydell
  2021-06-22  6:44   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 22/28] target/arm: Improve REVSH Richard Henderson
                   ` (7 subsequent siblings)
  28 siblings, 2 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

We can eliminate the requirement for a zero-extended output,
because the following store will ignore any garbage high bits.

Cc: Peter Maydell <peter.maydell@linaro.org> 
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-a64.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index ea9d2d4647..e12387e19c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -12427,12 +12427,10 @@ static void handle_rev(DisasContext *s, int opcode, bool u,
             read_vec_element(s, tcg_tmp, rn, i, grp_size);
             switch (grp_size) {
             case MO_16:
-                tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp,
-                                    TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+                tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp, TCG_BSWAP_IZ);
                 break;
             case MO_32:
-                tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp,
-                                    TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+                tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp, TCG_BSWAP_IZ);
                 break;
             case MO_64:
                 tcg_gen_bswap64_i64(tcg_tmp, tcg_tmp);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 22/28] target/arm: Improve REVSH
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (20 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 21/28] target/arm: Improve vector REV Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-14  9:42   ` Philippe Mathieu-Daudé
  2021-06-21 15:11   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 23/28] target/i386: Improve bswap translation Richard Henderson
                   ` (6 subsequent siblings)
  28 siblings, 2 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

The new bswap flags can implement the semantics exactly.

Cc: Peter Maydell <peter.maydell@linaro.org> 
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 6b88163e3a..46d95d75ae 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -341,9 +341,7 @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var)
 /* Byteswap low halfword and sign extend.  */
 static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
 {
-    tcg_gen_ext16u_i32(var, var);
-    tcg_gen_bswap16_i32(var, var, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-    tcg_gen_ext16s_i32(dest, var);
+    tcg_gen_bswap16_i32(var, var, TCG_BSWAP_OS);
 }
 
 /* Dual 16-bit add.  Result placed in t0 and t1 is marked as dead.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 23/28] target/i386: Improve bswap translation
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (21 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 22/28] target/arm: Improve REVSH Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:15   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 24/28] target/sh4: Improve swap.b translation Richard Henderson
                   ` (5 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Eduardo Habkost

Use a break instead of an ifdefed else.
There's no need to move the values through s->T0.
Remove TCG_BSWAP_IZ and the preceding zero-extension.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/tcg/translate.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index e8a9dcd21a..b21873ed23 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -7195,17 +7195,11 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         reg = (b & 7) | REX_B(s);
 #ifdef TARGET_X86_64
         if (dflag == MO_64) {
-            gen_op_mov_v_reg(s, MO_64, s->T0, reg);
-            tcg_gen_bswap64_i64(s->T0, s->T0);
-            gen_op_mov_reg_v(s, MO_64, reg, s->T0);
-        } else
-#endif
-        {
-            gen_op_mov_v_reg(s, MO_32, s->T0, reg);
-            tcg_gen_ext32u_tl(s->T0, s->T0);
-            tcg_gen_bswap32_tl(s->T0, s->T0, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-            gen_op_mov_reg_v(s, MO_32, reg, s->T0);
+            tcg_gen_bswap64_i64(cpu_regs[reg], cpu_regs[reg]);
+            break;
         }
+#endif
+        tcg_gen_bswap32_tl(cpu_regs[reg], cpu_regs[reg], TCG_BSWAP_OZ);
         break;
     case 0xd6: /* salc */
         if (CODE64(s))
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 24/28] target/sh4: Improve swap.b translation
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (22 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 23/28] target/i386: Improve bswap translation Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:16   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 25/28] target/mips: Fix gen_mxu_s32ldd_s32lddr Richard Henderson
                   ` (4 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yoshinori Sato

Remove TCG_BSWAP_IZ and the preceding zero-extension.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sh4/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 147219759b..f45515952f 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -676,8 +676,7 @@ static void _decode_opc(DisasContext * ctx)
     case 0x6008:		/* swap.b Rm,Rn */
 	{
             TCGv low = tcg_temp_new();
-	    tcg_gen_ext16u_i32(low, REG(B7_4));
-	    tcg_gen_bswap16_i32(low, low, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
+	    tcg_gen_bswap16_i32(low, REG(B7_4), 0);
             tcg_gen_deposit_i32(REG(B11_8), REG(B7_4), low, 0, 16);
 	    tcg_temp_free(low);
 	}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 25/28] target/mips: Fix gen_mxu_s32ldd_s32lddr
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (23 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 24/28] target/sh4: Improve swap.b translation Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 20:59   ` Philippe Mathieu-Daudé
  2021-06-14  8:37 ` [PATCH 26/28] tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
                   ` (3 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

There were two bugs here: (1) the required endianness was
not present in the MemOp, and (2) we were not providing a
zero-extended input to the bswap as semantics required.

The best fix is to fold the bswap into the memory operation,
producing the desired result directly.

Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/mips/tcg/mxu_translate.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/mips/tcg/mxu_translate.c b/target/mips/tcg/mxu_translate.c
index c12cf78df7..f4356432c7 100644
--- a/target/mips/tcg/mxu_translate.c
+++ b/target/mips/tcg/mxu_translate.c
@@ -857,12 +857,8 @@ static void gen_mxu_s32ldd_s32lddr(DisasContext *ctx)
         tcg_gen_ori_tl(t1, t1, 0xFFFFF000);
     }
     tcg_gen_add_tl(t1, t0, t1);
-    tcg_gen_qemu_ld_tl(t1, t1, ctx->mem_idx, MO_SL);
+    tcg_gen_qemu_ld_tl(t1, t1, ctx->mem_idx, MO_TESL ^ (sel * MO_BSWAP));
 
-    if (sel == 1) {
-        /* S32LDDR */
-        tcg_gen_bswap32_tl(t1, t1, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-    }
     gen_store_mxu_gpr(t1, XRa);
 
     tcg_temp_free(t0);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 26/28] tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (24 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 25/28] target/mips: Fix gen_mxu_s32ldd_s32lddr Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:31   ` Peter Maydell
  2021-06-14  8:37 ` [PATCH 27/28] tcg/aarch64: " Richard Henderson
                   ` (2 subsequent siblings)
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

Now that the middle-end can replicate the same tricks as tcg/arm
used for optimizing bswap for signed loads and for stores, do not
pretend to have these memory ops in the backend.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/arm/tcg-target.h     |   2 +-
 tcg/arm/tcg-target.c.inc | 214 ++++++++++++++-------------------------
 2 files changed, 77 insertions(+), 139 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 57fd0c0c74..95fcef33bc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -174,7 +174,7 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
-#define TCG_TARGET_HAS_MEMORY_BSWAP     1
+#define TCG_TARGET_HAS_MEMORY_BSWAP     0
 
 /* not defined -- call should be eliminated at compile time */
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 9824e215be..35753c9e46 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1366,34 +1366,38 @@ static void tcg_out_vldst(TCGContext *s, ARMInsn insn,
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[8] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_SB]   = helper_ret_ldsb_mmu,
-
-    [MO_LEUW] = helper_le_lduw_mmu,
-    [MO_LEUL] = helper_le_ldul_mmu,
-    [MO_LEQ]  = helper_le_ldq_mmu,
-    [MO_LESW] = helper_le_ldsw_mmu,
-    [MO_LESL] = helper_le_ldul_mmu,
-
-    [MO_BEUW] = helper_be_lduw_mmu,
-    [MO_BEUL] = helper_be_ldul_mmu,
-    [MO_BEQ]  = helper_be_ldq_mmu,
-    [MO_BESW] = helper_be_ldsw_mmu,
-    [MO_BESL] = helper_be_ldul_mmu,
+#ifdef HOST_WORDS_BIGENDIAN
+    [MO_UW] = helper_be_lduw_mmu,
+    [MO_UL] = helper_be_ldul_mmu,
+    [MO_Q]  = helper_be_ldq_mmu,
+    [MO_SW] = helper_be_ldsw_mmu,
+    [MO_SL] = helper_be_ldul_mmu,
+#else
+    [MO_UW] = helper_le_lduw_mmu,
+    [MO_UL] = helper_le_ldul_mmu,
+    [MO_Q]  = helper_le_ldq_mmu,
+    [MO_SW] = helper_le_ldsw_mmu,
+    [MO_SL] = helper_le_ldul_mmu,
+#endif
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
-    [MO_UB]   = helper_ret_stb_mmu,
-    [MO_LEUW] = helper_le_stw_mmu,
-    [MO_LEUL] = helper_le_stl_mmu,
-    [MO_LEQ]  = helper_le_stq_mmu,
-    [MO_BEUW] = helper_be_stw_mmu,
-    [MO_BEUL] = helper_be_stl_mmu,
-    [MO_BEQ]  = helper_be_stq_mmu,
+static void * const qemu_st_helpers[4] = {
+    [MO_8]   = helper_ret_stb_mmu,
+#ifdef HOST_WORDS_BIGENDIAN
+    [MO_16] = helper_be_stw_mmu,
+    [MO_32] = helper_be_stl_mmu,
+    [MO_64] = helper_be_stq_mmu,
+#else
+    [MO_16] = helper_le_stw_mmu,
+    [MO_32] = helper_le_stl_mmu,
+    [MO_64] = helper_le_stq_mmu,
+#endif
 };
 
 /* Helper routines for marshalling helper function arguments into
@@ -1598,9 +1602,9 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
        icache usage.  For pre-armv6, use the signed helpers since we do
        not have a single insn sign-extend.  */
     if (use_armv6_instructions) {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)];
+        func = qemu_ld_helpers[opc & MO_SIZE];
     } else {
-        func = qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)];
+        func = qemu_ld_helpers[opc & MO_SSIZE];
         if (opc & MO_SIGN) {
             opc = MO_UL;
         }
@@ -1678,7 +1682,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
 
     /* Tail-call to the helper, which will return to the fast path.  */
-    tcg_out_goto(s, COND_AL, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    tcg_out_goto(s, COND_AL, qemu_st_helpers[opc & MO_SIZE]);
     return true;
 }
 #endif /* SOFTMMU */
@@ -1687,7 +1691,8 @@ static inline void tcg_out_qemu_ld_index(TCGContext *s, MemOp opc,
                                          TCGReg datalo, TCGReg datahi,
                                          TCGReg addrlo, TCGReg addend)
 {
-    MemOp bswap = opc & MO_BSWAP;
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
 
     switch (opc & MO_SSIZE) {
     case MO_UB:
@@ -1698,51 +1703,30 @@ static inline void tcg_out_qemu_ld_index(TCGContext *s, MemOp opc,
         break;
     case MO_UW:
         tcg_out_ld16u_r(s, COND_AL, datalo, addrlo, addend);
-        if (bswap) {
-            tcg_out_bswap16(s, COND_AL, datalo, datalo,
-                            TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-        }
         break;
     case MO_SW:
-        if (bswap) {
-            tcg_out_ld16u_r(s, COND_AL, datalo, addrlo, addend);
-            tcg_out_bswap16(s, COND_AL, datalo, datalo,
-                            TCG_BSWAP_IZ | TCG_BSWAP_OS);
-        } else {
-            tcg_out_ld16s_r(s, COND_AL, datalo, addrlo, addend);
-        }
+        tcg_out_ld16s_r(s, COND_AL, datalo, addrlo, addend);
         break;
     case MO_UL:
-    default:
         tcg_out_ld32_r(s, COND_AL, datalo, addrlo, addend);
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, datalo, datalo);
-        }
         break;
     case MO_Q:
-        {
-            TCGReg dl = (bswap ? datahi : datalo);
-            TCGReg dh = (bswap ? datalo : datahi);
-
-            /* Avoid ldrd for user-only emulation, to handle unaligned.  */
-            if (USING_SOFTMMU && use_armv6_instructions
-                && (dl & 1) == 0 && dh == dl + 1) {
-                tcg_out_ldrd_r(s, COND_AL, dl, addrlo, addend);
-            } else if (dl != addend) {
-                tcg_out_ld32_rwb(s, COND_AL, dl, addend, addrlo);
-                tcg_out_ld32_12(s, COND_AL, dh, addend, 4);
-            } else {
-                tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_TMP,
-                                addend, addrlo, SHIFT_IMM_LSL(0));
-                tcg_out_ld32_12(s, COND_AL, dl, TCG_REG_TMP, 0);
-                tcg_out_ld32_12(s, COND_AL, dh, TCG_REG_TMP, 4);
-            }
-            if (bswap) {
-                tcg_out_bswap32(s, COND_AL, dl, dl);
-                tcg_out_bswap32(s, COND_AL, dh, dh);
-            }
+        /* Avoid ldrd for user-only emulation, to handle unaligned.  */
+        if (USING_SOFTMMU && use_armv6_instructions
+            && (datalo & 1) == 0 && datahi == datalo + 1) {
+            tcg_out_ldrd_r(s, COND_AL, datalo, addrlo, addend);
+        } else if (datalo != addend) {
+            tcg_out_ld32_rwb(s, COND_AL, datalo, addend, addrlo);
+            tcg_out_ld32_12(s, COND_AL, datahi, addend, 4);
+        } else {
+            tcg_out_dat_reg(s, COND_AL, ARITH_ADD, TCG_REG_TMP,
+                            addend, addrlo, SHIFT_IMM_LSL(0));
+            tcg_out_ld32_12(s, COND_AL, datalo, TCG_REG_TMP, 0);
+            tcg_out_ld32_12(s, COND_AL, datahi, TCG_REG_TMP, 4);
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1750,7 +1734,8 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, MemOp opc,
                                           TCGReg datalo, TCGReg datahi,
                                           TCGReg addrlo)
 {
-    MemOp bswap = opc & MO_BSWAP;
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
 
     switch (opc & MO_SSIZE) {
     case MO_UB:
@@ -1761,49 +1746,28 @@ static inline void tcg_out_qemu_ld_direct(TCGContext *s, MemOp opc,
         break;
     case MO_UW:
         tcg_out_ld16u_8(s, COND_AL, datalo, addrlo, 0);
-        if (bswap) {
-            tcg_out_bswap16(s, COND_AL, datalo, datalo,
-                            TCG_BSWAP_IZ | TCG_BSWAP_OZ);
-        }
         break;
     case MO_SW:
-        if (bswap) {
-            tcg_out_ld16u_8(s, COND_AL, datalo, addrlo, 0);
-            tcg_out_bswap16(s, COND_AL, datalo, datalo,
-                            TCG_BSWAP_IZ | TCG_BSWAP_OS);
-        } else {
-            tcg_out_ld16s_8(s, COND_AL, datalo, addrlo, 0);
-        }
+        tcg_out_ld16s_8(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_UL:
-    default:
         tcg_out_ld32_12(s, COND_AL, datalo, addrlo, 0);
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, datalo, datalo);
-        }
         break;
     case MO_Q:
-        {
-            TCGReg dl = (bswap ? datahi : datalo);
-            TCGReg dh = (bswap ? datalo : datahi);
-
-            /* Avoid ldrd for user-only emulation, to handle unaligned.  */
-            if (USING_SOFTMMU && use_armv6_instructions
-                && (dl & 1) == 0 && dh == dl + 1) {
-                tcg_out_ldrd_8(s, COND_AL, dl, addrlo, 0);
-            } else if (dl == addrlo) {
-                tcg_out_ld32_12(s, COND_AL, dh, addrlo, bswap ? 0 : 4);
-                tcg_out_ld32_12(s, COND_AL, dl, addrlo, bswap ? 4 : 0);
-            } else {
-                tcg_out_ld32_12(s, COND_AL, dl, addrlo, bswap ? 4 : 0);
-                tcg_out_ld32_12(s, COND_AL, dh, addrlo, bswap ? 0 : 4);
-            }
-            if (bswap) {
-                tcg_out_bswap32(s, COND_AL, dl, dl);
-                tcg_out_bswap32(s, COND_AL, dh, dh);
-            }
+        /* Avoid ldrd for user-only emulation, to handle unaligned.  */
+        if (USING_SOFTMMU && use_armv6_instructions
+            && (datalo & 1) == 0 && datahi == datalo + 1) {
+            tcg_out_ldrd_8(s, COND_AL, datalo, addrlo, 0);
+        } else if (datalo == addrlo) {
+            tcg_out_ld32_12(s, COND_AL, datahi, addrlo, 4);
+            tcg_out_ld32_12(s, COND_AL, datalo, addrlo, 0);
+        } else {
+            tcg_out_ld32_12(s, COND_AL, datalo, addrlo, 0);
+            tcg_out_ld32_12(s, COND_AL, datahi, addrlo, 4);
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1852,44 +1816,31 @@ static inline void tcg_out_qemu_st_index(TCGContext *s, int cond, MemOp opc,
                                          TCGReg datalo, TCGReg datahi,
                                          TCGReg addrlo, TCGReg addend)
 {
-    MemOp bswap = opc & MO_BSWAP;
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
 
     switch (opc & MO_SIZE) {
     case MO_8:
         tcg_out_st8_r(s, cond, datalo, addrlo, addend);
         break;
     case MO_16:
-        if (bswap) {
-            tcg_out_bswap16(s, cond, TCG_REG_R0, datalo, 0);
-            tcg_out_st16_r(s, cond, TCG_REG_R0, addrlo, addend);
-        } else {
-            tcg_out_st16_r(s, cond, datalo, addrlo, addend);
-        }
+        tcg_out_st16_r(s, cond, datalo, addrlo, addend);
         break;
     case MO_32:
-    default:
-        if (bswap) {
-            tcg_out_bswap32(s, cond, TCG_REG_R0, datalo);
-            tcg_out_st32_r(s, cond, TCG_REG_R0, addrlo, addend);
-        } else {
-            tcg_out_st32_r(s, cond, datalo, addrlo, addend);
-        }
+        tcg_out_st32_r(s, cond, datalo, addrlo, addend);
         break;
     case MO_64:
         /* Avoid strd for user-only emulation, to handle unaligned.  */
-        if (bswap) {
-            tcg_out_bswap32(s, cond, TCG_REG_R0, datahi);
-            tcg_out_st32_rwb(s, cond, TCG_REG_R0, addend, addrlo);
-            tcg_out_bswap32(s, cond, TCG_REG_R0, datalo);
-            tcg_out_st32_12(s, cond, TCG_REG_R0, addend, 4);
-        } else if (USING_SOFTMMU && use_armv6_instructions
-                   && (datalo & 1) == 0 && datahi == datalo + 1) {
+        if (USING_SOFTMMU && use_armv6_instructions
+            && (datalo & 1) == 0 && datahi == datalo + 1) {
             tcg_out_strd_r(s, cond, datalo, addrlo, addend);
         } else {
             tcg_out_st32_rwb(s, cond, datalo, addend, addrlo);
             tcg_out_st32_12(s, cond, datahi, addend, 4);
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
@@ -1897,44 +1848,31 @@ static inline void tcg_out_qemu_st_direct(TCGContext *s, MemOp opc,
                                           TCGReg datalo, TCGReg datahi,
                                           TCGReg addrlo)
 {
-    MemOp bswap = opc & MO_BSWAP;
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
 
     switch (opc & MO_SIZE) {
     case MO_8:
         tcg_out_st8_12(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_16:
-        if (bswap) {
-            tcg_out_bswap16(s, COND_AL, TCG_REG_R0, datalo, 0);
-            tcg_out_st16_8(s, COND_AL, TCG_REG_R0, addrlo, 0);
-        } else {
-            tcg_out_st16_8(s, COND_AL, datalo, addrlo, 0);
-        }
+        tcg_out_st16_8(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_32:
-    default:
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, datalo);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addrlo, 0);
-        } else {
-            tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
-        }
+        tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
         break;
     case MO_64:
         /* Avoid strd for user-only emulation, to handle unaligned.  */
-        if (bswap) {
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, datahi);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addrlo, 0);
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, datalo);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addrlo, 4);
-        } else if (USING_SOFTMMU && use_armv6_instructions
-                   && (datalo & 1) == 0 && datahi == datalo + 1) {
+        if (USING_SOFTMMU && use_armv6_instructions
+            && (datalo & 1) == 0 && datahi == datalo + 1) {
             tcg_out_strd_8(s, COND_AL, datalo, addrlo, 0);
         } else {
             tcg_out_st32_12(s, COND_AL, datalo, addrlo, 0);
             tcg_out_st32_12(s, COND_AL, datahi, addrlo, 4);
         }
         break;
+    default:
+        g_assert_not_reached();
     }
 }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 27/28] tcg/aarch64: Unset TCG_TARGET_HAS_MEMORY_BSWAP
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (25 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 26/28] tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
@ 2021-06-14  8:37 ` Richard Henderson
  2021-06-21 15:33   ` Peter Maydell
  2021-06-14  8:38 ` [PATCH 28/28] tcg/riscv: Remove MO_BSWAP handling Richard Henderson
  2021-06-14 22:41 ` [PATCH 00/28] tcg: bswap improvements no-reply
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:37 UTC (permalink / raw)
  To: qemu-devel

The memory bswap support in the aarch64 backend merely dates from
a time when it was required.  There is nothing special about the
backend support that could not have been provided by the middle-end
even prior to the introduction of the bswap flags.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.c.inc | 87 +++++++++++++-----------------------
 2 files changed, 32 insertions(+), 57 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index ef55f7c185..551baf8da3 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -148,7 +148,7 @@ typedef enum {
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
-#define TCG_TARGET_HAS_MEMORY_BSWAP     1
+#define TCG_TARGET_HAS_MEMORY_BSWAP     0
 
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index f72218b036..55eb98d0b1 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1557,28 +1557,34 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
-    [MO_UB]   = helper_ret_ldub_mmu,
-    [MO_LEUW] = helper_le_lduw_mmu,
-    [MO_LEUL] = helper_le_ldul_mmu,
-    [MO_LEQ]  = helper_le_ldq_mmu,
-    [MO_BEUW] = helper_be_lduw_mmu,
-    [MO_BEUL] = helper_be_ldul_mmu,
-    [MO_BEQ]  = helper_be_ldq_mmu,
+static void * const qemu_ld_helpers[4] = {
+    [MO_8]  = helper_ret_ldub_mmu,
+#ifdef HOST_WORDS_BIGENDIAN
+    [MO_16] = helper_be_lduw_mmu,
+    [MO_32] = helper_be_ldul_mmu,
+    [MO_64] = helper_be_ldq_mmu,
+#else
+    [MO_16] = helper_le_lduw_mmu,
+    [MO_32] = helper_le_ldul_mmu,
+    [MO_64] = helper_le_ldq_mmu,
+#endif
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, TCGMemOpIdx oi,
  *                                     uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
-    [MO_UB]   = helper_ret_stb_mmu,
-    [MO_LEUW] = helper_le_stw_mmu,
-    [MO_LEUL] = helper_le_stl_mmu,
-    [MO_LEQ]  = helper_le_stq_mmu,
-    [MO_BEUW] = helper_be_stw_mmu,
-    [MO_BEUL] = helper_be_stl_mmu,
-    [MO_BEQ]  = helper_be_stq_mmu,
+static void * const qemu_st_helpers[4] = {
+    [MO_8]  = helper_ret_stb_mmu,
+#ifdef HOST_WORDS_BIGENDIAN
+    [MO_16] = helper_be_stw_mmu,
+    [MO_32] = helper_be_stl_mmu,
+    [MO_64] = helper_be_stq_mmu,
+#else
+    [MO_16] = helper_le_stw_mmu,
+    [MO_32] = helper_le_stl_mmu,
+    [MO_64] = helper_le_stq_mmu,
+#endif
 };
 
 static inline void tcg_out_adr(TCGContext *s, TCGReg rd, const void *target)
@@ -1602,7 +1608,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, oi);
     tcg_out_adr(s, TCG_REG_X3, lb->raddr);
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    tcg_out_call(s, qemu_ld_helpers[opc & MO_SIZE]);
     if (opc & MO_SIGN) {
         tcg_out_sxt(s, lb->type, size, lb->datalo_reg, TCG_REG_X0);
     } else {
@@ -1628,7 +1634,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_mov(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, oi);
     tcg_out_adr(s, TCG_REG_X4, lb->raddr);
-    tcg_out_call(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
+    tcg_out_call(s, qemu_st_helpers[opc & MO_SIZE]);
     tcg_out_goto(s, lb->raddr);
     return true;
 }
@@ -1724,7 +1730,8 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, MemOp memop, TCGType ext,
                                    TCGReg data_r, TCGReg addr_r,
                                    TCGType otype, TCGReg off_r)
 {
-    const MemOp bswap = memop & MO_BSWAP;
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((memop & MO_BSWAP) == 0);
 
     switch (memop & MO_SSIZE) {
     case MO_UB:
@@ -1736,40 +1743,19 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, MemOp memop, TCGType ext,
         break;
     case MO_UW:
         tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, otype, off_r);
-        if (bswap) {
-            tcg_out_rev16(s, data_r, data_r);
-        }
         break;
     case MO_SW:
-        if (bswap) {
-            tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, otype, off_r);
-            tcg_out_rev16(s, data_r, data_r);
-            tcg_out_sxt(s, ext, MO_16, data_r, data_r);
-        } else {
-            tcg_out_ldst_r(s, (ext ? I3312_LDRSHX : I3312_LDRSHW),
-                           data_r, addr_r, otype, off_r);
-        }
+        tcg_out_ldst_r(s, (ext ? I3312_LDRSHX : I3312_LDRSHW),
+                       data_r, addr_r, otype, off_r);
         break;
     case MO_UL:
         tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, otype, off_r);
-        if (bswap) {
-            tcg_out_rev32(s, data_r, data_r);
-        }
         break;
     case MO_SL:
-        if (bswap) {
-            tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, otype, off_r);
-            tcg_out_rev32(s, data_r, data_r);
-            tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
-        } else {
-            tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, otype, off_r);
-        }
+        tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, otype, off_r);
         break;
     case MO_Q:
         tcg_out_ldst_r(s, I3312_LDRX, data_r, addr_r, otype, off_r);
-        if (bswap) {
-            tcg_out_rev64(s, data_r, data_r);
-        }
         break;
     default:
         tcg_abort();
@@ -1780,31 +1766,20 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp memop,
                                    TCGReg data_r, TCGReg addr_r,
                                    TCGType otype, TCGReg off_r)
 {
-    const MemOp bswap = memop & MO_BSWAP;
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((memop & MO_BSWAP) == 0);
 
     switch (memop & MO_SIZE) {
     case MO_8:
         tcg_out_ldst_r(s, I3312_STRB, data_r, addr_r, otype, off_r);
         break;
     case MO_16:
-        if (bswap && data_r != TCG_REG_XZR) {
-            tcg_out_rev16(s, TCG_REG_TMP, data_r);
-            data_r = TCG_REG_TMP;
-        }
         tcg_out_ldst_r(s, I3312_STRH, data_r, addr_r, otype, off_r);
         break;
     case MO_32:
-        if (bswap && data_r != TCG_REG_XZR) {
-            tcg_out_rev32(s, TCG_REG_TMP, data_r);
-            data_r = TCG_REG_TMP;
-        }
         tcg_out_ldst_r(s, I3312_STRW, data_r, addr_r, otype, off_r);
         break;
     case MO_64:
-        if (bswap && data_r != TCG_REG_XZR) {
-            tcg_out_rev64(s, TCG_REG_TMP, data_r);
-            data_r = TCG_REG_TMP;
-        }
         tcg_out_ldst_r(s, I3312_STRX, data_r, addr_r, otype, off_r);
         break;
     default:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 28/28] tcg/riscv: Remove MO_BSWAP handling
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (26 preceding siblings ...)
  2021-06-14  8:37 ` [PATCH 27/28] tcg/aarch64: " Richard Henderson
@ 2021-06-14  8:38 ` Richard Henderson
  2021-06-18  6:30   ` Alistair Francis
  2021-06-14 22:41 ` [PATCH 00/28] tcg: bswap improvements no-reply
  28 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14  8:38 UTC (permalink / raw)
  To: qemu-devel

TCG_TARGET_HAS_MEMORY_BSWAP is already unset for this backend,
which means that MO_BSWAP be handled by the middle-end and
will never be seen by the backend.  Thus the indexes used with
qemu_{ld,st}_helpers will always be zero.

Tidy the comments and asserts in tcg_out_qemu_{ld,st}_direct.
It is not that we do not handle bswap "yet", but never will.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.c.inc | 64 ++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index da7eecafc5..c16f96b401 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -852,37 +852,43 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
-    [MO_UB]   = helper_ret_ldub_mmu,
-    [MO_SB]   = helper_ret_ldsb_mmu,
-    [MO_LEUW] = helper_le_lduw_mmu,
-    [MO_LESW] = helper_le_ldsw_mmu,
-    [MO_LEUL] = helper_le_ldul_mmu,
+static void * const qemu_ld_helpers[8] = {
+    [MO_UB] = helper_ret_ldub_mmu,
+    [MO_SB] = helper_ret_ldsb_mmu,
+#ifdef HOST_WORDS_BIGENDIAN
+    [MO_UW] = helper_be_lduw_mmu,
+    [MO_SW] = helper_be_ldsw_mmu,
+    [MO_UL] = helper_be_ldul_mmu,
 #if TCG_TARGET_REG_BITS == 64
-    [MO_LESL] = helper_le_ldsl_mmu,
+    [MO_SL] = helper_be_ldsl_mmu,
 #endif
-    [MO_LEQ]  = helper_le_ldq_mmu,
-    [MO_BEUW] = helper_be_lduw_mmu,
-    [MO_BESW] = helper_be_ldsw_mmu,
-    [MO_BEUL] = helper_be_ldul_mmu,
+    [MO_Q]  = helper_be_ldq_mmu,
+#else
+    [MO_UW] = helper_le_lduw_mmu,
+    [MO_SW] = helper_le_ldsw_mmu,
+    [MO_UL] = helper_le_ldul_mmu,
 #if TCG_TARGET_REG_BITS == 64
-    [MO_BESL] = helper_be_ldsl_mmu,
+    [MO_SL] = helper_le_ldsl_mmu,
+#endif
+    [MO_Q]  = helper_le_ldq_mmu,
 #endif
-    [MO_BEQ]  = helper_be_ldq_mmu,
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, TCGMemOpIdx oi,
  *                                     uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
-    [MO_UB]   = helper_ret_stb_mmu,
-    [MO_LEUW] = helper_le_stw_mmu,
-    [MO_LEUL] = helper_le_stl_mmu,
-    [MO_LEQ]  = helper_le_stq_mmu,
-    [MO_BEUW] = helper_be_stw_mmu,
-    [MO_BEUL] = helper_be_stl_mmu,
-    [MO_BEQ]  = helper_be_stq_mmu,
+static void * const qemu_st_helpers[4] = {
+    [MO_8]   = helper_ret_stb_mmu,
+#ifdef HOST_WORDS_BIGENDIAN
+    [MO_16] = helper_be_stw_mmu,
+    [MO_32] = helper_be_stl_mmu,
+    [MO_64] = helper_be_stq_mmu,
+#else
+    [MO_16] = helper_le_stw_mmu,
+    [MO_32] = helper_le_stl_mmu,
+    [MO_64] = helper_le_stq_mmu,
+#endif
 };
 
 /* We don't support oversize guests */
@@ -997,7 +1003,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     tcg_out_movi(s, TCG_TYPE_PTR, a2, oi);
     tcg_out_movi(s, TCG_TYPE_PTR, a3, (tcg_target_long)l->raddr);
 
-    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
+    tcg_out_call(s, qemu_ld_helpers[opc & MO_SSIZE]);
     tcg_out_mov(s, (opc & MO_SIZE) == MO_64, l->datalo_reg, a0);
 
     tcg_out_goto(s, l->raddr);
@@ -1042,7 +1048,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     tcg_out_movi(s, TCG_TYPE_PTR, a3, oi);
     tcg_out_movi(s, TCG_TYPE_PTR, a4, (tcg_target_long)l->raddr);
 
-    tcg_out_call(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
+    tcg_out_call(s, qemu_st_helpers[opc & MO_SIZE]);
 
     tcg_out_goto(s, l->raddr);
     return true;
@@ -1052,10 +1058,8 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
                                    TCGReg base, MemOp opc, bool is_64)
 {
-    const MemOp bswap = opc & MO_BSWAP;
-
-    /* We don't yet handle byteswapping, assert */
-    g_assert(!bswap);
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
 
     switch (opc & (MO_SSIZE)) {
     case MO_UB:
@@ -1139,10 +1143,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
 static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo, TCGReg hi,
                                    TCGReg base, MemOp opc)
 {
-    const MemOp bswap = opc & MO_BSWAP;
-
-    /* We don't yet handle byteswapping, assert */
-    g_assert(!bswap);
+    /* Byte swapping is left to middle-end expansion. */
+    tcg_debug_assert((opc & MO_BSWAP) == 0);
 
     switch (opc & (MO_SSIZE)) {
     case MO_8:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 20/28] target/arm: Improve REV32
  2021-06-14  8:37 ` [PATCH 20/28] target/arm: Improve REV32 Richard Henderson
@ 2021-06-14  9:08   ` Philippe Mathieu-Daudé
  2021-06-21 15:08   ` Peter Maydell
  1 sibling, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-14  9:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell

On 6/14/21 10:37 AM, Richard Henderson wrote:
> For the sf version, we are performing two 32-bit bswaps
> in either half of the register.  This is equivalent to
> performing one 64-bit bswap followed by a rotate.
> 
> For the non-sf version, we can remove TCG_BSWAP_IZ
> and the preceding zero-extension.
> 
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
@ 2021-06-14  9:19   ` Philippe Mathieu-Daudé
  2021-06-14 11:49   ` Alex Bennée
  2021-06-21 13:41   ` Peter Maydell
  2 siblings, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-14  9:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 10:37 AM, Richard Henderson wrote:
> This will eventually simplify front-end usage, and will allow
> backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
> optimization.
> 
> The argument is added during expansion, not currently exposed
> to the front end translators.  Non-zero values are not yet
> supported by any backends.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg-opc.h | 10 +++++-----
>  include/tcg/tcg.h     | 12 ++++++++++++
>  tcg/tcg-op.c          | 13 ++++++++-----
>  tcg/README            | 18 ++++++++++--------
>  4 files changed, 35 insertions(+), 18 deletions(-)

> +/*
> + * Flags for the bswap opcodes.
> + * If IZ, the input is zero-extended, otherwise unknown.
> + * If OZ or OS, the output is zero- or sign-extended respectively,
> + * otherwise the high bits are undefined.
> + */
> +enum {
> +    TCG_BSWAP_IZ = 1,
> +    TCG_BSWAP_OZ = 2,
> +    TCG_BSWAP_OS = 4,

Matter of taste, I find "1 << x" bullet proof for flags.

Otherwise:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32
  2021-06-14  8:37 ` [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32 Richard Henderson
@ 2021-06-14  9:31   ` Philippe Mathieu-Daudé
  2021-06-14 15:49     ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-14  9:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 10:37 AM, Richard Henderson wrote:
> Merge tcg_out_bswap32 and tcg_out_bswap32s.  Use the flags
> in the internal uses for loads and stores.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/mips/tcg-target.c.inc | 39 ++++++++++++++++-----------------------
>  1 file changed, 16 insertions(+), 23 deletions(-)

> -static void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg)
> +static void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
>  {
>      if (use_mips32r2_instructions) {
>          tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
>          tcg_out_opc_sa(s, OPC_ROTR, ret, ret, 16);
> +        if (flags & TCG_BSWAP_OZ) {
> +            tcg_out_opc_bf(s, OPC_DEXT, ret, ret, 31, 0);

Maybe mention the rotr -> ext32u mips32r2 simplification?

Otherwise:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 15/28] tcg/tci: Support bswap flags
  2021-06-14  8:37 ` [PATCH 15/28] tcg/tci: Support bswap flags Richard Henderson
@ 2021-06-14  9:33   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-14  9:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 10:37 AM, Richard Henderson wrote:
> The existing interpreter zero-extends, ignoring high bits.
> Simply add a separate sign-extension opcode if required.
> Ensure that the interpreter supports ext16s when bswap16 is enabled.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tci.c                |  3 ++-
>  tcg/tci/tcg-target.c.inc | 23 ++++++++++++++++++++---
>  2 files changed, 22 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  2021-06-14  8:37 ` [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64 Richard Henderson
@ 2021-06-14  9:41   ` Philippe Mathieu-Daudé
  2021-06-14 15:58     ` Richard Henderson
  2021-06-21 15:01   ` Peter Maydell
  1 sibling, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-14  9:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 10:37 AM, Richard Henderson wrote:
> Implement the new semantics in the fallback expansion.
> Change all callers to supply the flags that keep the
> semantics unchanged locally.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg-op.h            |   8 +--
>  target/arm/translate-a64.c      |  12 ++--
>  target/arm/translate.c          |   2 +-
>  target/i386/tcg/translate.c     |   2 +-
>  target/mips/tcg/mxu_translate.c |   2 +-
>  target/s390x/translate.c        |   4 +-
>  target/sh4/translate.c          |   2 +-

Various REV 16/32, would it be useful to have it as a TCG opcode?

>  tcg/tcg-op.c                    | 121 ++++++++++++++++++++++----------
>  8 files changed, 99 insertions(+), 54 deletions(-)

>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i64 t1 = tcg_temp_new_i64();
>          TCGv_i64 t2 = tcg_constant_i64(0x00ff00ff);
>  
> -                                        /* arg = ....abcd */
> -        tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .....abc */
> -        tcg_gen_and_i64(t1, arg, t2);   /*  t1 = .....b.d */
> -        tcg_gen_and_i64(t0, t0, t2);    /*  t0 = .....a.c */
> -        tcg_gen_shli_i64(t1, t1, 8);    /*  t1 = ....b.d. */
> -        tcg_gen_or_i64(ret, t0, t1);    /* ret = ....badc */
> +                                            /* arg = xxxxabcd */
> +        tcg_gen_shri_i64(t0, arg, 8);       /*  t0 = .xxxxabc */
> +        tcg_gen_and_i64(t1, arg, t2);       /*  t1 = .....b.d */
> +        tcg_gen_and_i64(t0, t0, t2);        /*  t0 = .....a.c */
> +        tcg_gen_shli_i64(t1, t1, 8);        /*  t1 = ....b.d. */
> +        tcg_gen_or_i64(ret, t0, t1);        /* ret = ....badc */
>  
> -        tcg_gen_shli_i64(t1, ret, 48);  /*  t1 = dc...... */
> -        tcg_gen_shri_i64(t0, ret, 16);  /*  t0 = ......ba */
> -        tcg_gen_shri_i64(t1, t1, 32);   /*  t1 = ....dc.. */
> -        tcg_gen_or_i64(ret, t0, t1);    /* ret = ....dcba */
> +        tcg_gen_shli_i64(t1, ret, 48);      /*  t1 = dc...... */
> +        tcg_gen_shri_i64(t0, ret, 16);      /*  t0 = ......ba */
> +        if (flags & TCG_BSWAP_OS) {
> +            tcg_gen_sari_i64(t1, t1, 32);   /*  t1 = ssssdc.. */
> +        } else {
> +            tcg_gen_shri_i64(t1, t1, 32);   /*  t1 = ....dc.. */
> +        }
> +        tcg_gen_or_i64(ret, t0, t1);        /* ret = ssssdcba */

Comment update appreciated, thanks.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 22/28] target/arm: Improve REVSH
  2021-06-14  8:37 ` [PATCH 22/28] target/arm: Improve REVSH Richard Henderson
@ 2021-06-14  9:42   ` Philippe Mathieu-Daudé
  2021-06-21 15:11   ` Peter Maydell
  1 sibling, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-14  9:42 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell

On 6/14/21 10:37 AM, Richard Henderson wrote:
> The new bswap flags can implement the semantics exactly.
> 
> Cc: Peter Maydell <peter.maydell@linaro.org> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
  2021-06-14  9:19   ` Philippe Mathieu-Daudé
@ 2021-06-14 11:49   ` Alex Bennée
  2021-06-14 14:43     ` Richard Henderson
  2021-06-21 13:41   ` Peter Maydell
  2 siblings, 1 reply; 81+ messages in thread
From: Alex Bennée @ 2021-06-14 11:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

Richard Henderson <richard.henderson@linaro.org> writes:

> This will eventually simplify front-end usage, and will allow
> backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
> optimization.
>
> The argument is added during expansion, not currently exposed
> to the front end translators.  Non-zero values are not yet
> supported by any backends.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/tcg/tcg-opc.h | 10 +++++-----
>  include/tcg/tcg.h     | 12 ++++++++++++
>  tcg/tcg-op.c          | 13 ++++++++-----
>  tcg/README            | 18 ++++++++++--------
>  4 files changed, 35 insertions(+), 18 deletions(-)
>
> diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
> index bbb0884af8..fddcc42cbd 100644
> --- a/include/tcg/tcg-opc.h
> +++ b/include/tcg/tcg-opc.h
> @@ -96,8 +96,8 @@ DEF(ext8s_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext8s_i32))
>  DEF(ext16s_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext16s_i32))
>  DEF(ext8u_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext8u_i32))
>  DEF(ext16u_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ext16u_i32))
> -DEF(bswap16_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_bswap16_i32))
> -DEF(bswap32_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_bswap32_i32))
> +DEF(bswap16_i32, 1, 1, 1, IMPL(TCG_TARGET_HAS_bswap16_i32))
> +DEF(bswap32_i32, 1, 1, 1, IMPL(TCG_TARGET_HAS_bswap32_i32))
>  DEF(not_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_not_i32))
>  DEF(neg_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_neg_i32))
>  DEF(andc_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_andc_i32))
> @@ -165,9 +165,9 @@ DEF(ext32s_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext32s_i64))
>  DEF(ext8u_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext8u_i64))
>  DEF(ext16u_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext16u_i64))
>  DEF(ext32u_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ext32u_i64))
> -DEF(bswap16_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_bswap16_i64))
> -DEF(bswap32_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_bswap32_i64))
> -DEF(bswap64_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_bswap64_i64))
> +DEF(bswap16_i64, 1, 1, 1, IMPL64 | IMPL(TCG_TARGET_HAS_bswap16_i64))
> +DEF(bswap32_i64, 1, 1, 1, IMPL64 | IMPL(TCG_TARGET_HAS_bswap32_i64))
> +DEF(bswap64_i64, 1, 1, 1, IMPL64 | IMPL(TCG_TARGET_HAS_bswap64_i64))
>  DEF(not_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_not_i64))
>  DEF(neg_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_neg_i64))
>  DEF(andc_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_andc_i64))
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index 064dab383b..7a060e532d 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -430,6 +430,18 @@ typedef enum {
>      TCG_COND_GTU    = 8 | 4 | 0 | 1,
>  } TCGCond;
>  
> +/*
> + * Flags for the bswap opcodes.
> + * If IZ, the input is zero-extended, otherwise unknown.
> + * If OZ or OS, the output is zero- or sign-extended respectively,
> + * otherwise the high bits are undefined.
> + */
> +enum {
> +    TCG_BSWAP_IZ = 1,
> +    TCG_BSWAP_OZ = 2,
> +    TCG_BSWAP_OS = 4,
> +};
> +

So is a TCG_BSWAP_IZ only really for cases where we have loaded up a
narrower width value into the "natural" TCG sized register? We seem to
assume this is always the case even though the TCG bswap op doesn't have
visibility of how the arg value was loaded.


>  /* Invert the sense of the comparison.  */
>  static inline TCGCond tcg_invert_cond(TCGCond c)
>  {
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index dcc2ed0bbc..dc65577e2f 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1005,7 +1005,8 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg)
>  void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
>  {
>      if (TCG_TARGET_HAS_bswap16_i32) {
> -        tcg_gen_op2_i32(INDEX_op_bswap16_i32, ret, arg);
> +        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg,
> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();
>  
> @@ -1020,7 +1021,7 @@ void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
>  void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
>  {
>      if (TCG_TARGET_HAS_bswap32_i32) {
> -        tcg_gen_op2_i32(INDEX_op_bswap32_i32, ret, arg);
> +        tcg_gen_op3i_i32(INDEX_op_bswap32_i32, ret, arg, 0);
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();
>          TCGv_i32 t1 = tcg_temp_new_i32();
> @@ -1661,7 +1662,8 @@ void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg)
>          tcg_gen_bswap16_i32(TCGV_LOW(ret), TCGV_LOW(arg));
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>      } else if (TCG_TARGET_HAS_bswap16_i64) {
> -        tcg_gen_op2_i64(INDEX_op_bswap16_i64, ret, arg);
> +        tcg_gen_op3i_i64(INDEX_op_bswap16_i64, ret, arg,
> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>  
> @@ -1680,7 +1682,8 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
>          tcg_gen_bswap32_i32(TCGV_LOW(ret), TCGV_LOW(arg));
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>      } else if (TCG_TARGET_HAS_bswap32_i64) {
> -        tcg_gen_op2_i64(INDEX_op_bswap32_i64, ret, arg);
> +        tcg_gen_op3i_i64(INDEX_op_bswap32_i64, ret, arg,
> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i64 t1 = tcg_temp_new_i64();
> @@ -1717,7 +1720,7 @@ void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
>          tcg_temp_free_i32(t0);
>          tcg_temp_free_i32(t1);
>      } else if (TCG_TARGET_HAS_bswap64_i64) {
> -        tcg_gen_op2_i64(INDEX_op_bswap64_i64, ret, arg);
> +        tcg_gen_op3i_i64(INDEX_op_bswap64_i64, ret, arg, 0);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i64 t1 = tcg_temp_new_i64();
> diff --git a/tcg/README b/tcg/README
> index 8510d823e3..19fbf6ca52 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -295,19 +295,21 @@ ext32u_i64 t0, t1
>  
>  8, 16 or 32 bit sign/zero extension (both operands must have the same type)
>  
> -* bswap16_i32/i64 t0, t1
> +* bswap16_i32/i64 t0, t1, flags
>  
> -16 bit byte swap on a 32/64 bit value. It assumes that the two/six high order
> -bytes are set to zero.
> +16 bit byte swap on a 32/64 bit value.  The flags values control how
> +the input and output sign- or zero-extension is treated.
>  
> -* bswap32_i32/i64 t0, t1
> +* bswap32_i32/i64 t0, t1, flags
>  
> -32 bit byte swap on a 32/64 bit value. With a 64 bit value, it assumes that
> -the four high order bytes are set to zero.
> +32 bit byte swap on a 32/64 bit value.  For 32-bit value, the flags
> +are ignored; for a 64-bit value the flags values control how the
> +input and output sign- or zero-extension is treated.
>  
> -* bswap64_i64 t0, t1
> +* bswap64_i64 t0, t1, flags
>  
> -64 bit byte swap
> +64 bit byte swap.  The flags are ignored -- the argument is present
> +for consistency with the smaller bswaps.
>  
>  * discard_i32/i64 t0

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-14 11:49   ` Alex Bennée
@ 2021-06-14 14:43     ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14 14:43 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 6/14/21 4:49 AM, Alex Bennée wrote:
>> +/*
>> + * Flags for the bswap opcodes.
>> + * If IZ, the input is zero-extended, otherwise unknown.
>> + * If OZ or OS, the output is zero- or sign-extended respectively,
>> + * otherwise the high bits are undefined.
>> + */
>> +enum {
>> +    TCG_BSWAP_IZ = 1,
>> +    TCG_BSWAP_OZ = 2,
>> +    TCG_BSWAP_OS = 4,
>> +};
>> +
> 
> So is a TCG_BSWAP_IZ only really for cases where we have loaded up a
> narrower width value into the "natural" TCG sized register? We seem to
> assume this is always the case even though the TCG bswap op doesn't have
> visibility of how the arg value was loaded.

All of these flags are for narrower width bswap.  When you get to patch 17, you'll see 
that tcg_gen_bswap32_i32 and bswap64_i64 do not present this argument to the target/ front 
ends at all.

IZ is for when we know that we've used a zero-extending load feeding into the operation. 
I've also added code to the optimizer to *set* this bit when it can prove that the value 
feeding the bswap is zero-extended.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32
  2021-06-14  9:31   ` Philippe Mathieu-Daudé
@ 2021-06-14 15:49     ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-14 15:49 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel

On 6/14/21 2:31 AM, Philippe Mathieu-Daudé wrote:
> On 6/14/21 10:37 AM, Richard Henderson wrote:
>> Merge tcg_out_bswap32 and tcg_out_bswap32s.  Use the flags
>> in the internal uses for loads and stores.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tcg/mips/tcg-target.c.inc | 39 ++++++++++++++++-----------------------
>>   1 file changed, 16 insertions(+), 23 deletions(-)
> 
>> -static void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg)
>> +static void tcg_out_bswap32(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
>>   {
>>       if (use_mips32r2_instructions) {
>>           tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
>>           tcg_out_opc_sa(s, OPC_ROTR, ret, ret, 16);
>> +        if (flags & TCG_BSWAP_OZ) {
>> +            tcg_out_opc_bf(s, OPC_DEXT, ret, ret, 31, 0);
> 
> Maybe mention the rotr -> ext32u mips32r2 simplification?

Oh, from dsbh + dshd + dsrl in the old bswap32u function?
Sure, I can do that.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  2021-06-14  9:41   ` Philippe Mathieu-Daudé
@ 2021-06-14 15:58     ` Richard Henderson
  2021-06-22 10:20       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-14 15:58 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel

On 6/14/21 2:41 AM, Philippe Mathieu-Daudé wrote:
> On 6/14/21 10:37 AM, Richard Henderson wrote:
>> Implement the new semantics in the fallback expansion.
>> Change all callers to supply the flags that keep the
>> semantics unchanged locally.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   include/tcg/tcg-op.h            |   8 +--
>>   target/arm/translate-a64.c      |  12 ++--
>>   target/arm/translate.c          |   2 +-
>>   target/i386/tcg/translate.c     |   2 +-
>>   target/mips/tcg/mxu_translate.c |   2 +-
>>   target/s390x/translate.c        |   4 +-
>>   target/sh4/translate.c          |   2 +-
> 
> Various REV 16/32, would it be useful to have it as a TCG opcode?

Which operation are you proposing as tcg opcode?  The per-halfword swap akin to mips wsbh? 
  Yes, that operation also appears in arm (rev16) and ppc (brh).  So it's a reasonable 
thing to do.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 00/28] tcg: bswap improvements
  2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
                   ` (27 preceding siblings ...)
  2021-06-14  8:38 ` [PATCH 28/28] tcg/riscv: Remove MO_BSWAP handling Richard Henderson
@ 2021-06-14 22:41 ` no-reply
  28 siblings, 0 replies; 81+ messages in thread
From: no-reply @ 2021-06-14 22:41 UTC (permalink / raw)
  To: richard.henderson; +Cc: qemu-devel

Patchew URL: https://patchew.org/QEMU/20210614083800.1166166-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210614083800.1166166-1-richard.henderson@linaro.org
Subject: [PATCH 00/28] tcg: bswap improvements

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fdfe6d3 tcg/riscv: Remove MO_BSWAP handling
fb9c8e1 tcg/aarch64: Unset TCG_TARGET_HAS_MEMORY_BSWAP
75c5607 tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP
a57e516 target/mips: Fix gen_mxu_s32ldd_s32lddr
e4a99d6 target/sh4: Improve swap.b translation
e8b1814 target/i386: Improve bswap translation
129291d target/arm: Improve REVSH
ff11784 target/arm: Improve vector REV
f23fd8e target/arm: Improve REV32
a53eb56 tcg: Make use of bswap flags in tcg_gen_qemu_st_*
ef89c56 tcg: Make use of bswap flags in tcg_gen_qemu_ld_*
13c8b52 tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
68e5471 tcg: Handle new bswap flags during optimize
17db675 tcg/tci: Support bswap flags
ff37d58 tcg/mips: Support bswap flags in tcg_out_bswap32
718bb99 tcg/mips: Support bswap flags in tcg_out_bswap16
784afc4 tcg/s390: Support bswap flags
2f03527 tcg/ppc: Use power10 byte-reverse instructions
d446243 tcg/ppc: Support bswap flags
7e02157 tcg/ppc: Split out tcg_out_bswap64
17813eb tcg/ppc: Split out tcg_out_bswap32
3316b19 tcg/ppc: Split out tcg_out_bswap16
b075ca8 tcg/ppc: Split out tcg_out_sari{32,64}
3b8cbf1 tcg/ppc: Split out tcg_out_ext{8,16,32}s
8ddb299 tcg/arm: Support bswap flags
22141b0 tcg/aarch64: Support bswap flags
a88e86e tcg/i386: Support bswap flags
7355a7f tcg: Add flags argument to bswap opcodes

=== OUTPUT BEGIN ===
1/28 Checking commit 7355a7f32661 (tcg: Add flags argument to bswap opcodes)
2/28 Checking commit a88e86e6484d (tcg/i386: Support bswap flags)
3/28 Checking commit 22141b0c1f1d (tcg/aarch64: Support bswap flags)
4/28 Checking commit 8ddb299c393e (tcg/arm: Support bswap flags)
5/28 Checking commit 3b8cbf12df69 (tcg/ppc: Split out tcg_out_ext{8,16,32}s)
6/28 Checking commit b075ca8b4504 (tcg/ppc: Split out tcg_out_sari{32,64})
7/28 Checking commit 3316b1902c92 (tcg/ppc: Split out tcg_out_bswap16)
8/28 Checking commit 17813ebc9f38 (tcg/ppc: Split out tcg_out_bswap32)
9/28 Checking commit 7e02157c16ed (tcg/ppc: Split out tcg_out_bswap64)
10/28 Checking commit d44624327aee (tcg/ppc: Support bswap flags)
11/28 Checking commit 2f03527066a7 (tcg/ppc: Use power10 byte-reverse instructions)
12/28 Checking commit 784afc41c21e (tcg/s390: Support bswap flags)
13/28 Checking commit 718bb99a5724 (tcg/mips: Support bswap flags in tcg_out_bswap16)
14/28 Checking commit ff37d58c7324 (tcg/mips: Support bswap flags in tcg_out_bswap32)
15/28 Checking commit 17db67555db3 (tcg/tci: Support bswap flags)
16/28 Checking commit 68e5471a5b41 (tcg: Handle new bswap flags during optimize)
ERROR: spaces required around that ':' (ctx:VxE)
#40: FILE: tcg/optimize.c:1033:
+        CASE_OP_32_64(bswap16):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#93: FILE: tcg/optimize.c:1188:
+        CASE_OP_32_64(bswap16):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#94: FILE: tcg/optimize.c:1189:
+        CASE_OP_32_64(bswap32):
                               ^

total: 3 errors, 0 warnings, 82 lines checked

Patch 16/28 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

17/28 Checking commit 13c8b52b1b7a (tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64)
ERROR: code indent should never use tabs
#163: FILE: target/sh4/translate.c:680:
+^I    tcg_gen_bswap16_i32(low, low, TCG_BSWAP_IZ | TCG_BSWAP_OZ);$

total: 1 errors, 0 warnings, 296 lines checked

Patch 17/28 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

18/28 Checking commit ef89c566b6f3 (tcg: Make use of bswap flags in tcg_gen_qemu_ld_*)
19/28 Checking commit a53eb569d43b (tcg: Make use of bswap flags in tcg_gen_qemu_st_*)
20/28 Checking commit f23fd8e08da1 (target/arm: Improve REV32)
21/28 Checking commit ff11784030c8 (target/arm: Improve vector REV)
22/28 Checking commit 129291d55edb (target/arm: Improve REVSH)
23/28 Checking commit e8b18140cc85 (target/i386: Improve bswap translation)
24/28 Checking commit e4a99d6a7db1 (target/sh4: Improve swap.b translation)
ERROR: code indent should never use tabs
#26: FILE: target/sh4/translate.c:679:
+^I    tcg_gen_bswap16_i32(low, REG(B7_4), 0);$

total: 1 errors, 0 warnings, 9 lines checked

Patch 24/28 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

25/28 Checking commit a57e516cd8b6 (target/mips: Fix gen_mxu_s32ldd_s32lddr)
26/28 Checking commit 75c5607d9f40 (tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP)
27/28 Checking commit fb9c8e15e0d3 (tcg/aarch64: Unset TCG_TARGET_HAS_MEMORY_BSWAP)
28/28 Checking commit fdfe6d304c35 (tcg/riscv: Remove MO_BSWAP handling)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210614083800.1166166-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 28/28] tcg/riscv: Remove MO_BSWAP handling
  2021-06-14  8:38 ` [PATCH 28/28] tcg/riscv: Remove MO_BSWAP handling Richard Henderson
@ 2021-06-18  6:30   ` Alistair Francis
  0 siblings, 0 replies; 81+ messages in thread
From: Alistair Francis @ 2021-06-18  6:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel@nongnu.org Developers

On Mon, Jun 14, 2021 at 6:54 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> TCG_TARGET_HAS_MEMORY_BSWAP is already unset for this backend,
> which means that MO_BSWAP be handled by the middle-end and
> will never be seen by the backend.  Thus the indexes used with
> qemu_{ld,st}_helpers will always be zero.
>
> Tidy the comments and asserts in tcg_out_qemu_{ld,st}_direct.
> It is not that we do not handle bswap "yet", but never will.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  tcg/riscv/tcg-target.c.inc | 64 ++++++++++++++++++++------------------
>  1 file changed, 33 insertions(+), 31 deletions(-)
>
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index da7eecafc5..c16f96b401 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -852,37 +852,43 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     TCGMemOpIdx oi, uintptr_t ra)
>   */
> -static void * const qemu_ld_helpers[16] = {
> -    [MO_UB]   = helper_ret_ldub_mmu,
> -    [MO_SB]   = helper_ret_ldsb_mmu,
> -    [MO_LEUW] = helper_le_lduw_mmu,
> -    [MO_LESW] = helper_le_ldsw_mmu,
> -    [MO_LEUL] = helper_le_ldul_mmu,
> +static void * const qemu_ld_helpers[8] = {
> +    [MO_UB] = helper_ret_ldub_mmu,
> +    [MO_SB] = helper_ret_ldsb_mmu,
> +#ifdef HOST_WORDS_BIGENDIAN
> +    [MO_UW] = helper_be_lduw_mmu,
> +    [MO_SW] = helper_be_ldsw_mmu,
> +    [MO_UL] = helper_be_ldul_mmu,
>  #if TCG_TARGET_REG_BITS == 64
> -    [MO_LESL] = helper_le_ldsl_mmu,
> +    [MO_SL] = helper_be_ldsl_mmu,
>  #endif
> -    [MO_LEQ]  = helper_le_ldq_mmu,
> -    [MO_BEUW] = helper_be_lduw_mmu,
> -    [MO_BESW] = helper_be_ldsw_mmu,
> -    [MO_BEUL] = helper_be_ldul_mmu,
> +    [MO_Q]  = helper_be_ldq_mmu,
> +#else
> +    [MO_UW] = helper_le_lduw_mmu,
> +    [MO_SW] = helper_le_ldsw_mmu,
> +    [MO_UL] = helper_le_ldul_mmu,
>  #if TCG_TARGET_REG_BITS == 64
> -    [MO_BESL] = helper_be_ldsl_mmu,
> +    [MO_SL] = helper_le_ldsl_mmu,
> +#endif
> +    [MO_Q]  = helper_le_ldq_mmu,
>  #endif
> -    [MO_BEQ]  = helper_be_ldq_mmu,
>  };
>
>  /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
>   *                                     uintxx_t val, TCGMemOpIdx oi,
>   *                                     uintptr_t ra)
>   */
> -static void * const qemu_st_helpers[16] = {
> -    [MO_UB]   = helper_ret_stb_mmu,
> -    [MO_LEUW] = helper_le_stw_mmu,
> -    [MO_LEUL] = helper_le_stl_mmu,
> -    [MO_LEQ]  = helper_le_stq_mmu,
> -    [MO_BEUW] = helper_be_stw_mmu,
> -    [MO_BEUL] = helper_be_stl_mmu,
> -    [MO_BEQ]  = helper_be_stq_mmu,
> +static void * const qemu_st_helpers[4] = {
> +    [MO_8]   = helper_ret_stb_mmu,
> +#ifdef HOST_WORDS_BIGENDIAN
> +    [MO_16] = helper_be_stw_mmu,
> +    [MO_32] = helper_be_stl_mmu,
> +    [MO_64] = helper_be_stq_mmu,
> +#else
> +    [MO_16] = helper_le_stw_mmu,
> +    [MO_32] = helper_le_stl_mmu,
> +    [MO_64] = helper_le_stq_mmu,
> +#endif
>  };
>
>  /* We don't support oversize guests */
> @@ -997,7 +1003,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>      tcg_out_movi(s, TCG_TYPE_PTR, a2, oi);
>      tcg_out_movi(s, TCG_TYPE_PTR, a3, (tcg_target_long)l->raddr);
>
> -    tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
> +    tcg_out_call(s, qemu_ld_helpers[opc & MO_SSIZE]);
>      tcg_out_mov(s, (opc & MO_SIZE) == MO_64, l->datalo_reg, a0);
>
>      tcg_out_goto(s, l->raddr);
> @@ -1042,7 +1048,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>      tcg_out_movi(s, TCG_TYPE_PTR, a3, oi);
>      tcg_out_movi(s, TCG_TYPE_PTR, a4, (tcg_target_long)l->raddr);
>
> -    tcg_out_call(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SSIZE)]);
> +    tcg_out_call(s, qemu_st_helpers[opc & MO_SIZE]);
>
>      tcg_out_goto(s, l->raddr);
>      return true;
> @@ -1052,10 +1058,8 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>  static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
>                                     TCGReg base, MemOp opc, bool is_64)
>  {
> -    const MemOp bswap = opc & MO_BSWAP;
> -
> -    /* We don't yet handle byteswapping, assert */
> -    g_assert(!bswap);
> +    /* Byte swapping is left to middle-end expansion. */
> +    tcg_debug_assert((opc & MO_BSWAP) == 0);
>
>      switch (opc & (MO_SSIZE)) {
>      case MO_UB:
> @@ -1139,10 +1143,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
>  static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg lo, TCGReg hi,
>                                     TCGReg base, MemOp opc)
>  {
> -    const MemOp bswap = opc & MO_BSWAP;
> -
> -    /* We don't yet handle byteswapping, assert */
> -    g_assert(!bswap);
> +    /* Byte swapping is left to middle-end expansion. */
> +    tcg_debug_assert((opc & MO_BSWAP) == 0);
>
>      switch (opc & (MO_SSIZE)) {
>      case MO_8:
> --
> 2.25.1
>
>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
  2021-06-14  9:19   ` Philippe Mathieu-Daudé
  2021-06-14 11:49   ` Alex Bennée
@ 2021-06-21 13:41   ` Peter Maydell
  2021-06-21 13:51     ` Peter Maydell
  2021-06-21 13:59     ` Richard Henderson
  2 siblings, 2 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 13:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This will eventually simplify front-end usage, and will allow
> backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
> optimization.
>
> The argument is added during expansion, not currently exposed
> to the front end translators.  Non-zero values are not yet
> supported by any backends.

Here we say non-zero values are not yet supported by the backend...

> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index dcc2ed0bbc..dc65577e2f 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1005,7 +1005,8 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg)
>  void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
>  {
>      if (TCG_TARGET_HAS_bswap16_i32) {
> -        tcg_gen_op2_i32(INDEX_op_bswap16_i32, ret, arg);
> +        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg,
> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();

...but here we pass a non-zero value...

> @@ -1661,7 +1662,8 @@ void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg)
>          tcg_gen_bswap16_i32(TCGV_LOW(ret), TCGV_LOW(arg));
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>      } else if (TCG_TARGET_HAS_bswap16_i64) {
> -        tcg_gen_op2_i64(INDEX_op_bswap16_i64, ret, arg);
> +        tcg_gen_op3i_i64(INDEX_op_bswap16_i64, ret, arg,
> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>

...and again here...

> @@ -1680,7 +1682,8 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
>          tcg_gen_bswap32_i32(TCGV_LOW(ret), TCGV_LOW(arg));
>          tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
>      } else if (TCG_TARGET_HAS_bswap32_i64) {
> -        tcg_gen_op2_i64(INDEX_op_bswap32_i64, ret, arg);
> +        tcg_gen_op3i_i64(INDEX_op_bswap32_i64, ret, arg,
> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i64 t1 = tcg_temp_new_i64();

...and here.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-21 13:41   ` Peter Maydell
@ 2021-06-21 13:51     ` Peter Maydell
  2021-06-21 14:02       ` Richard Henderson
  2021-06-21 13:59     ` Richard Henderson
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 13:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 21 Jun 2021 at 14:41, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 14 Jun 2021 at 09:43, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > This will eventually simplify front-end usage, and will allow
> > backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
> > optimization.
> >
> > The argument is added during expansion, not currently exposed
> > to the front end translators.  Non-zero values are not yet
> > supported by any backends.
>
> Here we say non-zero values are not yet supported by the backend...

Looking at the tcg/README docs, I think what you mean is that
at the moment all the backends assume/require that the caller passes
one of TCG_BSWAP_IZ or (TCG_BSWAP_IZ | TCG_BSWAP_OZ), since the
pre-flags implementation requires the top bytes to be zero and leaves
them that way. But then the parts of your patch that pass in a zero
flags word would be wrong...

-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 02/28] tcg/i386: Support bswap flags
  2021-06-14  8:37 ` [PATCH 02/28] tcg/i386: Support bswap flags Richard Henderson
@ 2021-06-21 13:53   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 13:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:46, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Retain the current rorw bswap16 expansion for the zero-in/zero-out case.
> Otherwise, perform a wider bswap plus a right-shift or extend.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-21 13:41   ` Peter Maydell
  2021-06-21 13:51     ` Peter Maydell
@ 2021-06-21 13:59     ` Richard Henderson
  1 sibling, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 13:59 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 6:41 AM, Peter Maydell wrote:
> On Mon, 14 Jun 2021 at 09:43, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> This will eventually simplify front-end usage, and will allow
>> backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
>> optimization.
>>
>> The argument is added during expansion, not currently exposed
>> to the front end translators.  Non-zero values are not yet
>> supported by any backends.
> 
> Here we say non-zero values are not yet supported by the backend...

I meant to say "non-zeroed" wrt the inputs and outputs,
i.e. the previous semantics...

>> +        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg,
>> +                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);

which were these.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-14  8:37 ` [PATCH 03/28] tcg/aarch64: " Richard Henderson
@ 2021-06-21 14:01   ` Peter Maydell
  2021-06-21 14:04     ` Richard Henderson
  2021-06-21 18:12     ` Richard Henderson
  0 siblings, 2 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:41, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/aarch64/tcg-target.c.inc | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
> index 27cde314a9..f72218b036 100644
> --- a/tcg/aarch64/tcg-target.c.inc
> +++ b/tcg/aarch64/tcg-target.c.inc
> @@ -2187,12 +2187,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_rev64(s, a0, a1);
>          break;
>      case INDEX_op_bswap32_i64:
> +        tcg_out_rev32(s, a0, a1);
> +        if (a2 & TCG_BSWAP_OS) {
> +            tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a0);
> +        }

Side note: it's rather confusing that tcg_out_rev32() doesn't
emit a REV32 insn (it emits REV with sf==0).

> +        break;
>      case INDEX_op_bswap32_i32:
>          tcg_out_rev32(s, a0, a1);
>          break;
>      case INDEX_op_bswap16_i64:
>      case INDEX_op_bswap16_i32:
>          tcg_out_rev16(s, a0, a1);
> +        if (a2 & TCG_BSWAP_OS) {
> +            /* Output must be sign-extended. */
> +            tcg_out_sxt(s, ext, MO_16, a0, a0);
> +        } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
> +            /* Output must be zero-extended, but input isn't. */
> +            tcg_out_uxt(s, MO_16, a0, a0);
> +        }
>          break;

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-21 13:51     ` Peter Maydell
@ 2021-06-21 14:02       ` Richard Henderson
  2021-06-21 14:15         ` Peter Maydell
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 14:02 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 6:51 AM, Peter Maydell wrote:
> On Mon, 21 Jun 2021 at 14:41, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> On Mon, 14 Jun 2021 at 09:43, Richard Henderson
>> <richard.henderson@linaro.org> wrote:
>>>
>>> This will eventually simplify front-end usage, and will allow
>>> backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
>>> optimization.
>>>
>>> The argument is added during expansion, not currently exposed
>>> to the front end translators.  Non-zero values are not yet
>>> supported by any backends.
>>
>> Here we say non-zero values are not yet supported by the backend...
> 
> Looking at the tcg/README docs, I think what you mean is that
> at the moment all the backends assume/require that the caller passes
> one of TCG_BSWAP_IZ or (TCG_BSWAP_IZ | TCG_BSWAP_OZ), since the
> pre-flags implementation requires the top bytes to be zero and leaves
> them that way.

Correct.

> But then the parts of your patch that pass in a zero
> flags word would be wrong...

The parts that pass in a zero flags word are covered by, from the README:

> The flags are ignored -- the argument is present
> for consistency with the smaller bswaps.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-21 14:01   ` Peter Maydell
@ 2021-06-21 14:04     ` Richard Henderson
  2021-06-21 18:12     ` Richard Henderson
  1 sibling, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 14:04 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 7:01 AM, Peter Maydell wrote:
> On Mon, 14 Jun 2021 at 09:41, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tcg/aarch64/tcg-target.c.inc | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
>> index 27cde314a9..f72218b036 100644
>> --- a/tcg/aarch64/tcg-target.c.inc
>> +++ b/tcg/aarch64/tcg-target.c.inc
>> @@ -2187,12 +2187,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>>           tcg_out_rev64(s, a0, a1);
>>           break;
>>       case INDEX_op_bswap32_i64:
>> +        tcg_out_rev32(s, a0, a1);
>> +        if (a2 & TCG_BSWAP_OS) {
>> +            tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a0);
>> +        }
> 
> Side note: it's rather confusing that tcg_out_rev32() doesn't
> emit a REV32 insn (it emits REV with sf==0).

Mm, yes.  I'll tidy that up in a separate patch.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/28] tcg/arm: Support bswap flags
  2021-06-14  8:37 ` [PATCH 04/28] tcg/arm: " Richard Henderson
@ 2021-06-21 14:09   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:49, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Combine the three bswap16 routines, and differentiate via the flags.
> Use the correct flags combination from the load/store routines, and
> pass along the constant parameter from tcg_out_op.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/arm/tcg-target.c.inc | 78 ++++++++++++++++++++--------------------
>  1 file changed, 38 insertions(+), 40 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 01/28] tcg: Add flags argument to bswap opcodes
  2021-06-21 14:02       ` Richard Henderson
@ 2021-06-21 14:15         ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 21 Jun 2021 at 15:02, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/21/21 6:51 AM, Peter Maydell wrote:
> > On Mon, 21 Jun 2021 at 14:41, Peter Maydell <peter.maydell@linaro.org> wrote:
> >>
> >> On Mon, 14 Jun 2021 at 09:43, Richard Henderson
> >> <richard.henderson@linaro.org> wrote:
> >>>
> >>> This will eventually simplify front-end usage, and will allow
> >>> backends to unset TCG_TARGET_HAS_MEMORY_BSWAP without loss of
> >>> optimization.
> >>>
> >>> The argument is added during expansion, not currently exposed
> >>> to the front end translators.  Non-zero values are not yet
> >>> supported by any backends.
> >>
> >> Here we say non-zero values are not yet supported by the backend...
> >
> > Looking at the tcg/README docs, I think what you mean is that
> > at the moment all the backends assume/require that the caller passes
> > one of TCG_BSWAP_IZ or (TCG_BSWAP_IZ | TCG_BSWAP_OZ), since the
> > pre-flags implementation requires the top bytes to be zero and leaves
> > them that way.
>
> Correct.
>
> > But then the parts of your patch that pass in a zero
> > flags word would be wrong...
>
> The parts that pass in a zero flags word are covered by, from the README:
>
> > The flags are ignored -- the argument is present
> > for consistency with the smaller bswaps.

Ah, I see. If you fix up the commit message, maybe something like

# The argument is added during expansion, not currently exposed
# to the front end translators. The backends currently only support
# a flags value of either TCG_BSWAP_IZ, or (TCG_BSWAP_IZ | TCG_BSWAP_OZ),
# since they all require zero top bytes and leave them that way.
# At the existing callsites we pass in (TCG_BSWAP_IZ | TCG_BSWAP_OZ),
# except for the flags-ignored cases of a 32-bit swap of a 32-bit
# value and or a 64-bit swap of a 64-bit value, where we pass 0.

then

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

It would be nice to document the actual flag values/names in
the user-facing documentation, too.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s
  2021-06-14  8:37 ` [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s Richard Henderson
@ 2021-06-21 14:18   ` Peter Maydell
  2021-06-21 20:44   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:52, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We will shortly require these in other context;
> make the expansion as clear as possible.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Much nicer than those gotos...

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64}
  2021-06-14  8:37 ` [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64} Richard Henderson
@ 2021-06-21 14:22   ` Peter Maydell
  2021-06-21 14:23     ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We will shortly require sari in other context;
> split out both for cleanliness sake.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)

>              /* Limit immediate shift count lest we create an illegal insn.  */
> -            tcg_out32(s, SRAWI | RS(args[1]) | RA(args[0]) | SH(args[2] & 31));
> +            tcg_out_sari32(s, args[0], args[1], args[2] & 31);

Maybe the "& 31" would be better inside tcg_out_sari32()
rather than outside it? The sari64() implementation already
implicitly ignores high bits of an overlarge shift count, so
having sari32() do the same would be more consistent.

Either way
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64}
  2021-06-21 14:22   ` Peter Maydell
@ 2021-06-21 14:23     ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 14:23 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 7:22 AM, Peter Maydell wrote:
> On Mon, 14 Jun 2021 at 09:43, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> We will shortly require sari in other context;
>> split out both for cleanliness sake.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tcg/ppc/tcg-target.c.inc | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
>>               /* Limit immediate shift count lest we create an illegal insn.  */
>> -            tcg_out32(s, SRAWI | RS(args[1]) | RA(args[0]) | SH(args[2] & 31));
>> +            tcg_out_sari32(s, args[0], args[1], args[2] & 31);
> 
> Maybe the "& 31" would be better inside tcg_out_sari32()
> rather than outside it? The sari64() implementation already
> implicitly ignores high bits of an overlarge shift count, so
> having sari32() do the same would be more consistent.
> 
> Either way
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Good idea.

r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16
  2021-06-14  8:37 ` [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16 Richard Henderson
@ 2021-06-21 14:29   ` Peter Maydell
  2021-06-21 14:41     ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:47, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> With the use of a suitable temporary, we can use the same
> algorithm when src overlaps dst.  The result is the same
> number of instructions either way.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 26 +++++++++++---------------
>  1 file changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index b49204f707..64c24382a8 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -788,6 +788,16 @@ static inline void tcg_out_sari64(TCGContext *s, TCGReg dst, TCGReg src, int c)
>      tcg_out32(s, SRADI | RA(dst) | RS(src) | SH(c & 0x1f) | ((c >> 4) & 2));
>  }
>
> +static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
> +{
> +    TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
> +
> +                                                   /* src = abcd */
> +    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 000c */
> +    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 00dc */
> +    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
> +}

TCG_REG_R0 is implicitly available as a temp because it's not
listed in tcg_target_reg_alloc_order[], right? (There's a comment
in the definition of that array that says V0 and V1 are reserved
as temporaries, but not a comment about R0.)

>  /* Emit a move into ret of arg, if it can be done in one insn.  */
>  static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
>  {
> @@ -2779,21 +2789,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>
>      case INDEX_op_bswap16_i32:
>      case INDEX_op_bswap16_i64:
> -        a0 = args[0], a1 = args[1];
> -        /* a1 = abcd */
> -        if (a0 != a1) {
> -            /* a0 = (a1 r<< 24) & 0xff # 000c */
> -            tcg_out_rlw(s, RLWINM, a0, a1, 24, 24, 31);
> -            /* a0 = (a0 & ~0xff00) | (a1 r<< 8) & 0xff00 # 00dc */
> -            tcg_out_rlw(s, RLWIMI, a0, a1, 8, 16, 23);

Would be nice to keep these comments about what operations we think
the insns are doing, given that RLWINM and RLWIMI are pretty confusing.

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/28] tcg/ppc: Split out tcg_out_bswap32
  2021-06-14  8:37 ` [PATCH 08/28] tcg/ppc: Split out tcg_out_bswap32 Richard Henderson
@ 2021-06-21 14:30   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:42, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 28 ++++++++++++----------------
>  1 file changed, 12 insertions(+), 16 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 64c24382a8..f0e42e4b88 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -798,6 +798,17 @@ static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
>      tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
>  }
>
> +static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src)
> +{
> +    TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
> +
> +    /* Stolen from gcc's builtin_bswap32.             src = abcd */
> +    tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = bcda */
> +    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = dcda */
> +    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 16, 23);  /* tmp = dcba */
> +    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
> +}
> +
>  /* Emit a move into ret of arg, if it can be done in one insn.  */
>  static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
>  {
> @@ -2791,24 +2802,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_bswap16_i64:
>          tcg_out_bswap16(s, args[0], args[1]);
>          break;
> -
>      case INDEX_op_bswap32_i32:
>      case INDEX_op_bswap32_i64:
> -        /* Stolen from gcc's builtin_bswap32 */
> -        a1 = args[1];
> -        a0 = args[0] == a1 ? TCG_REG_R0 : args[0];
> -
> -        /* a1 = args[1] # abcd */
> -        /* a0 = rotate_left (a1, 8) # bcda */
> -        tcg_out_rlw(s, RLWINM, a0, a1, 8, 0, 31);
> -        /* a0 = (a0 & ~0xff000000) | ((a1 r<< 24) & 0xff000000) # dcda */
> -        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 0, 7);
> -        /* a0 = (a0 & ~0x0000ff00) | ((a1 r<< 24) & 0x0000ff00) # dcba */
> -        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 16, 23);
> -
> -        if (a0 == TCG_REG_R0) {
> -            tcg_out_mov(s, TCG_TYPE_REG, args[0], a0);
> -        }
> +        tcg_out_bswap32(s, args[0], args[1]);
>          break;
>

Same remark about keeping the comments; otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 09/28] tcg/ppc: Split out tcg_out_bswap64
  2021-06-14  8:37 ` [PATCH 09/28] tcg/ppc: Split out tcg_out_bswap64 Richard Henderson
@ 2021-06-21 14:35   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:55, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 51 +++++++++++++++++-----------------------
>  1 file changed, 21 insertions(+), 30 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index f0e42e4b88..690c77b4da 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -809,6 +809,26 @@ static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src)
>      tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
>  }
>
> +static void tcg_out_bswap64(TCGContext *s, TCGReg dst, TCGReg src)
> +{
> +    TCGReg t0 = dst == src ? TCG_REG_R0 : dst;
> +    TCGReg t1 = dst == src ? dst : TCG_REG_R0;
> +
> +                                                   /* src = abcd efgh */
> +    tcg_out_rlw(s, RLWINM, t0, src, 8, 0, 31);     /* t0  = 0000 fghe */
> +    tcg_out_rlw(s, RLWIMI, t0, src, 24, 0, 7);     /* t0  = 0000 hghe */
> +    tcg_out_rlw(s, RLWIMI, t0, src, 24, 16, 23);   /* t0  = 0000 hgfe */
> +
> +    tcg_out_rld(s, RLDICL, t0, t0, 32, 0);         /* t0  = hgfe 0000 */
> +    tcg_out_rld(s, RLDICL, t1, src, 32, 0);        /* t1  = efgh abcd */
> +
> +    tcg_out_rlw(s, RLWIMI, t0, t1, 8, 0, 31);      /* t0  = hgfe bcda */
> +    tcg_out_rlw(s, RLWIMI, t0, t1, 24, 0, 7);      /* t0  = hgfe dcda */
> +    tcg_out_rlw(s, RLWIMI, t0, t1, 24, 16, 23);    /* t0  = hgfe dcba */
> +
> +    tcg_out_mov(s, TCG_TYPE_REG, dst, t0);
> +}
> +
>  /* Emit a move into ret of arg, if it can be done in one insn.  */
>  static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
>  {
> @@ -2806,37 +2826,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_bswap32_i64:
>          tcg_out_bswap32(s, args[0], args[1]);
>          break;
> -
>      case INDEX_op_bswap64_i64:
> -        a0 = args[0], a1 = args[1], a2 = TCG_REG_R0;
> -        if (a0 == a1) {
> -            a0 = TCG_REG_R0;
> -            a2 = a1;
> -        }
> -
> -        /* a1 = # abcd efgh */
> -        /* a0 = rl32(a1, 8) # 0000 fghe */
> -        tcg_out_rlw(s, RLWINM, a0, a1, 8, 0, 31);
> -        /* a0 = dep(a0, rl32(a1, 24), 0xff000000) # 0000 hghe */
> -        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 0, 7);
> -        /* a0 = dep(a0, rl32(a1, 24), 0x0000ff00) # 0000 hgfe */
> -        tcg_out_rlw(s, RLWIMI, a0, a1, 24, 16, 23);
> -
> -        /* a0 = rl64(a0, 32) # hgfe 0000 */
> -        /* a2 = rl64(a1, 32) # efgh abcd */
> -        tcg_out_rld(s, RLDICL, a0, a0, 32, 0);
> -        tcg_out_rld(s, RLDICL, a2, a1, 32, 0);
> -
> -        /* a0 = dep(a0, rl32(a2, 8), 0xffffffff)  # hgfe bcda */
> -        tcg_out_rlw(s, RLWIMI, a0, a2, 8, 0, 31);
> -        /* a0 = dep(a0, rl32(a2, 24), 0xff000000) # hgfe dcda */
> -        tcg_out_rlw(s, RLWIMI, a0, a2, 24, 0, 7);
> -        /* a0 = dep(a0, rl32(a2, 24), 0x0000ff00) # hgfe dcba */
> -        tcg_out_rlw(s, RLWIMI, a0, a2, 24, 16, 23);
> -
> -        if (a0 == 0) {
> -            tcg_out_mov(s, TCG_TYPE_REG, args[0], a0);
> -        }
> +        tcg_out_bswap64(s, args[0], args[1]);
>          break;
>

Again, keep the comments, otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/28] tcg/ppc: Support bswap flags
  2021-06-14  8:37 ` [PATCH 10/28] tcg/ppc: Support bswap flags Richard Henderson
@ 2021-06-21 14:38   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:44, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> For INDEX_op_bswap32_i32, pass 0 for flags: input not zero-extended,
> output does not need extension within the host 64-bit register.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 38 +++++++++++++++++++++++++-------------
>  1 file changed, 25 insertions(+), 13 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 690c77b4da..e868417168 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -788,25 +788,35 @@ static inline void tcg_out_sari64(TCGContext *s, TCGReg dst, TCGReg src, int c)
>      tcg_out32(s, SRADI | RA(dst) | RS(src) | SH(c & 0x1f) | ((c >> 4) & 2));
>  }
>
> -static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
> +static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src, int flags)
>  {
>      TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
>
> -                                                   /* src = abcd */
> -    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 000c */
> -    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 00dc */
> -    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
> +                                                   /* src = xxxx abcd */
> +    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 0000 000c */
> +    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 0000 00dc */
> +
> +    if (flags & TCG_BSWAP_OS) {
> +        tcg_out_ext16s(s, dst, tmp);
> +    } else {
> +        tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
> +    }
>  }
>
> -static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src)
> +static void tcg_out_bswap32(TCGContext *s, TCGReg dst, TCGReg src, int flags)
>  {
>      TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
>
> -    /* Stolen from gcc's builtin_bswap32.             src = abcd */
> -    tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = bcda */
> -    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = dcda */
> -    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 16, 23);  /* tmp = dcba */
> -    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
> +    /* Stolen from gcc's builtin_bswap32.             src = xxxx abcd */
> +    tcg_out_rlw(s, RLWINM, tmp, src, 8, 0, 31);    /* tmp = 0000 bcda */
> +    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 0, 7);    /* tmp = 0000 dcda */
> +    tcg_out_rlw(s, RLWIMI, tmp, src, 24, 16, 23);  /* tmp = 0000 dcba */

I'm going to come back for v2 and review the version of this that has
the comments describing what the insns are doing, so I don't have
to try to cross-reference back to the earlier patch.

-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16
  2021-06-21 14:29   ` Peter Maydell
@ 2021-06-21 14:41     ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 14:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 7:29 AM, Peter Maydell wrote:
>> +static void tcg_out_bswap16(TCGContext *s, TCGReg dst, TCGReg src)
>> +{
>> +    TCGReg tmp = dst == src ? TCG_REG_R0 : dst;
>> +
>> +                                                   /* src = abcd */
>> +    tcg_out_rlw(s, RLWINM, tmp, src, 24, 24, 31);  /* tmp = 000c */
>> +    tcg_out_rlw(s, RLWIMI, tmp, src, 8, 16, 23);   /* tmp = 00dc */
>> +    tcg_out_mov(s, TCG_TYPE_REG, dst, tmp);
>> +}
> 
> TCG_REG_R0 is implicitly available as a temp because it's not
> listed in tcg_target_reg_alloc_order[], right?

Slightly better than that:

     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0); /* tcg temp */


> (There's a comment
> in the definition of that array that says V0 and V1 are reserved
> as temporaries, but not a comment about R0.)

Yeah, tcg/ppc/ is due for a bit of cleanup.

> Would be nice to keep these comments about what operations we think
> the insns are doing, given that RLWINM and RLWIMI are pretty confusing.

Hmm, I guess.  I found them intrusive.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 16/28] tcg: Handle new bswap flags during optimize
  2021-06-14  8:37 ` [PATCH 16/28] tcg: Handle new bswap flags during optimize Richard Henderson
@ 2021-06-21 14:47   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 14:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Notice when the input is known to be zero-extended and force
> the TCG_BSWAP_IZ flag on.  Honor the TCG_BSWAP_OS bit during
> constant folding.  Propagate the input to the output mask.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/optimize.c | 56 +++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 51 insertions(+), 5 deletions(-)

Not really familiar with the TCG optimizer internals, so review
from somebody who is would be useful, but this looks plausible:

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  2021-06-14  8:37 ` [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64 Richard Henderson
  2021-06-14  9:41   ` Philippe Mathieu-Daudé
@ 2021-06-21 15:01   ` Peter Maydell
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:52, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Implement the new semantics in the fallback expansion.
> Change all callers to supply the flags that keep the
> semantics unchanged locally.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index dc65577e2f..3763285bb0 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -1001,20 +1001,35 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg)
>      }
>  }
>
> -/* Note: we assume the two high bytes are set to zero */
> -void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg)
> +void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, int flags)
>  {
> +    /* Only one extension flag may be present. */
> +    tcg_debug_assert(!(flags & TCG_BSWAP_OS) || !(flags & TCG_BSWAP_OZ));
> +
>      if (TCG_TARGET_HAS_bswap16_i32) {
> -        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg,
> -                         TCG_BSWAP_IZ | TCG_BSWAP_OZ);
> +        tcg_gen_op3i_i32(INDEX_op_bswap16_i32, ret, arg, flags);
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();
> +        TCGv_i32 t1 = tcg_temp_new_i32();
>
> -        tcg_gen_ext8u_i32(t0, arg);
> -        tcg_gen_shli_i32(t0, t0, 8);
> -        tcg_gen_shri_i32(ret, arg, 8);
> -        tcg_gen_or_i32(ret, ret, t0);
> +        tcg_gen_shri_i32(t0, arg, 8);
> +        if (!(flags & TCG_BSWAP_IZ)) {
> +            tcg_gen_ext8u_i32(t0, t0);
> +        }
> +
> +        if (flags & TCG_BSWAP_OS) {
> +            tcg_gen_shli_i32(t1, t1, 24);

t1 hasn't been initialized yet. Should this be "tcg_gen_shli_i32(t1, arg, 24)" ?

> +            tcg_gen_sari_i32(t1, t1, 16);
> +        } else if (flags & TCG_BSWAP_OZ) {
> +            tcg_gen_ext8u_i32(t1, arg);
> +            tcg_gen_shli_i32(t1, t1, 8);
> +        } else {
> +            tcg_gen_shli_i32(t1, arg, 8);
> +        }
> +
> +        tcg_gen_or_i32(ret, t0, t1);
>          tcg_temp_free_i32(t0);
> +        tcg_temp_free_i32(t1);
>      }

>      } else {
>          TCGv_i64 t0 = tcg_temp_new_i64();
> +        TCGv_i64 t1 = tcg_temp_new_i64();
>
> -        tcg_gen_ext8u_i64(t0, arg);
> -        tcg_gen_shli_i64(t0, t0, 8);
> -        tcg_gen_shri_i64(ret, arg, 8);
> -        tcg_gen_or_i64(ret, ret, t0);
> +        tcg_gen_shri_i64(t0, arg, 8);
> +        if (!(flags & TCG_BSWAP_IZ)) {
> +            tcg_gen_ext8u_i64(t0, t0);
> +        }
> +
> +        if (flags & TCG_BSWAP_OS) {
> +            tcg_gen_shli_i64(t1, t1, 56);

Similarly here.

> +            tcg_gen_sari_i64(t1, t1, 48);
> +        } else if (flags & TCG_BSWAP_OZ) {
> +            tcg_gen_ext8u_i64(t1, arg);
> +            tcg_gen_shli_i64(t1, t1, 8);
> +        } else {
> +            tcg_gen_shli_i64(t1, arg, 8);
> +        }
> +
> +        tcg_gen_or_i64(ret, t0, t1);
>          tcg_temp_free_i64(t0);
> +        tcg_temp_free_i64(t1);
>      }
>  }

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_*
  2021-06-14  8:37 ` [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_* Richard Henderson
@ 2021-06-21 15:04   ` Peter Maydell
  2021-06-22  6:41   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 10:02, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We can perform any required sign-extension via TCG_BSWAP_OS.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 19/28] tcg: Make use of bswap flags in tcg_gen_qemu_st_*
  2021-06-14  8:37 ` [PATCH 19/28] tcg: Make use of bswap flags in tcg_gen_qemu_st_* Richard Henderson
@ 2021-06-21 15:05   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:05 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:51, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> By removing TCG_BSWAP_IZ we indicate that the input is
> not zero-extended, and thus can remove an explicit extend.
> By removing TCG_BSWAP_OZ, we allow the implementation to
> leave high bits set, which will be ignored by the store.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 20/28] target/arm: Improve REV32
  2021-06-14  8:37 ` [PATCH 20/28] target/arm: Improve REV32 Richard Henderson
  2021-06-14  9:08   ` Philippe Mathieu-Daudé
@ 2021-06-21 15:08   ` Peter Maydell
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:38, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> For the sf version, we are performing two 32-bit bswaps
> in either half of the register.  This is equivalent to
> performing one 64-bit bswap followed by a rotate.
>
> For the non-sf version, we can remove TCG_BSWAP_IZ
> and the preceding zero-extension.
>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 21/28] target/arm: Improve vector REV
  2021-06-14  8:37 ` [PATCH 21/28] target/arm: Improve vector REV Richard Henderson
@ 2021-06-21 15:10   ` Peter Maydell
  2021-06-22  6:44   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:38, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We can eliminate the requirement for a zero-extended output,
> because the following store will ignore any garbage high bits.
>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 22/28] target/arm: Improve REVSH
  2021-06-14  8:37 ` [PATCH 22/28] target/arm: Improve REVSH Richard Henderson
  2021-06-14  9:42   ` Philippe Mathieu-Daudé
@ 2021-06-21 15:11   ` Peter Maydell
  1 sibling, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:38, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The new bswap flags can implement the semantics exactly.
>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 23/28] target/i386: Improve bswap translation
  2021-06-14  8:37 ` [PATCH 23/28] target/i386: Improve bswap translation Richard Henderson
@ 2021-06-21 15:15   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paolo Bonzini, QEMU Developers, Eduardo Habkost

On Mon, 14 Jun 2021 at 09:48, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Use a break instead of an ifdefed else.
> There's no need to move the values through s->T0.
> Remove TCG_BSWAP_IZ and the preceding zero-extension.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 24/28] target/sh4: Improve swap.b translation
  2021-06-14  8:37 ` [PATCH 24/28] target/sh4: Improve swap.b translation Richard Henderson
@ 2021-06-21 15:16   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Yoshinori Sato

On Mon, 14 Jun 2021 at 09:54, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Remove TCG_BSWAP_IZ and the preceding zero-extension.
>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/sh4/translate.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/target/sh4/translate.c b/target/sh4/translate.c
> index 147219759b..f45515952f 100644
> --- a/target/sh4/translate.c
> +++ b/target/sh4/translate.c
> @@ -676,8 +676,7 @@ static void _decode_opc(DisasContext * ctx)
>      case 0x6008:               /* swap.b Rm,Rn */
>         {
>              TCGv low = tcg_temp_new();
> -           tcg_gen_ext16u_i32(low, REG(B7_4));
> -           tcg_gen_bswap16_i32(low, low, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
> +           tcg_gen_bswap16_i32(low, REG(B7_4), 0);
>              tcg_gen_deposit_i32(REG(B11_8), REG(B7_4), low, 0, 16);
>             tcg_temp_free(low);
>         }
> --

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 26/28] tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP
  2021-06-14  8:37 ` [PATCH 26/28] tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
@ 2021-06-21 15:31   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:58, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Now that the middle-end can replicate the same tricks as tcg/arm
> used for optimizing bswap for signed loads and for stores, do not
> pretend to have these memory ops in the backend.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/arm/tcg-target.h     |   2 +-
>  tcg/arm/tcg-target.c.inc | 214 ++++++++++++++-------------------------
>  2 files changed, 77 insertions(+), 139 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 27/28] tcg/aarch64: Unset TCG_TARGET_HAS_MEMORY_BSWAP
  2021-06-14  8:37 ` [PATCH 27/28] tcg/aarch64: " Richard Henderson
@ 2021-06-21 15:33   ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 15:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 14 Jun 2021 at 09:55, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The memory bswap support in the aarch64 backend merely dates from
> a time when it was required.  There is nothing special about the
> backend support that could not have been provided by the middle-end
> even prior to the introduction of the bswap flags.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-21 14:01   ` Peter Maydell
  2021-06-21 14:04     ` Richard Henderson
@ 2021-06-21 18:12     ` Richard Henderson
  2021-06-21 19:40       ` Peter Maydell
  1 sibling, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 18:12 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 7:01 AM, Peter Maydell wrote:
> Side note: it's rather confusing that tcg_out_rev32() doesn't
> emit a REV32 insn (it emits REV with sf==0).

Which is REV with SF=0 also has OPC=10, which is REV32.

So I think it's a simple case of the assembly mnemonics not matching up with the machine 
instructions as nicely as all that.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-21 18:12     ` Richard Henderson
@ 2021-06-21 19:40       ` Peter Maydell
  2021-06-21 19:50         ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 19:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 21 Jun 2021 at 19:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/21/21 7:01 AM, Peter Maydell wrote:
> > Side note: it's rather confusing that tcg_out_rev32() doesn't
> > emit a REV32 insn (it emits REV with sf==0).
>
> Which is REV with SF=0 also has OPC=10, which is REV32.

No, REV32 has SF=1. The two operations are different:

 REV <Wd>, <Wn> -- swaps byte order of the bottom 32 bits
                   (zeroes the top half of Xd, as usual for Wn ops)
 REV32 <Xd>, <Xn> -- swaps byte order of bottom 32 bits and
                     also swaps byte order of top 32 bits
                     (ie it is a 64-bit to 64-bit operation
                      which does does two bswap32()s)

-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-21 19:40       ` Peter Maydell
@ 2021-06-21 19:50         ` Richard Henderson
  2021-06-21 19:52           ` Peter Maydell
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Henderson @ 2021-06-21 19:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 6/21/21 12:40 PM, Peter Maydell wrote:
> On Mon, 21 Jun 2021 at 19:12, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 6/21/21 7:01 AM, Peter Maydell wrote:
>>> Side note: it's rather confusing that tcg_out_rev32() doesn't
>>> emit a REV32 insn (it emits REV with sf==0).
>>
>> Which is REV with SF=0 also has OPC=10, which is REV32.
> 
> No, REV32 has SF=1. The two operations are different:
> 
>   REV <Wd>, <Wn> -- swaps byte order of the bottom 32 bits
>                     (zeroes the top half of Xd, as usual for Wn ops)
>   REV32 <Xd>, <Xn> -- swaps byte order of bottom 32 bits and
>                       also swaps byte order of top 32 bits
>                       (ie it is a 64-bit to 64-bit operation
>                        which does does two bswap32()s)

Ignore the assembler mnemonic and look at the opcode:

REV   Wd,Wn   = SF=0, OPC=10
REV32 Xd,Xn   = SF=1, OPC=10
REV   Xd,Xn   = SF=1, OPC=11

REV(Wd,Wd) = (uint32_t)REV32(Xd,Xd)
i.e. the usual interpretation of sf=0.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/28] tcg/aarch64: Support bswap flags
  2021-06-21 19:50         ` Richard Henderson
@ 2021-06-21 19:52           ` Peter Maydell
  0 siblings, 0 replies; 81+ messages in thread
From: Peter Maydell @ 2021-06-21 19:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Mon, 21 Jun 2021 at 20:50, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 6/21/21 12:40 PM, Peter Maydell wrote:
> > On Mon, 21 Jun 2021 at 19:12, Richard Henderson
> > <richard.henderson@linaro.org> wrote:
> >>
> >> On 6/21/21 7:01 AM, Peter Maydell wrote:
> >>> Side note: it's rather confusing that tcg_out_rev32() doesn't
> >>> emit a REV32 insn (it emits REV with sf==0).
> >>
> >> Which is REV with SF=0 also has OPC=10, which is REV32.
> >
> > No, REV32 has SF=1. The two operations are different:
> >
> >   REV <Wd>, <Wn> -- swaps byte order of the bottom 32 bits
> >                     (zeroes the top half of Xd, as usual for Wn ops)
> >   REV32 <Xd>, <Xn> -- swaps byte order of bottom 32 bits and
> >                       also swaps byte order of top 32 bits
> >                       (ie it is a 64-bit to 64-bit operation
> >                        which does does two bswap32()s)
>
> Ignore the assembler mnemonic and look at the opcode:

...but the point is that tcg_out_rev32() is not doing the
thing that the assembler insn REV32 does, so ignoring the
mnemonic is missing the point.

> REV   Wd,Wn   = SF=0, OPC=10
> REV32 Xd,Xn   = SF=1, OPC=10
> REV   Xd,Xn   = SF=1, OPC=11
>
> REV(Wd,Wd) = (uint32_t)REV32(Xd,Xd)
> i.e. the usual interpretation of sf=0.

You could look at it that way, but that's not the way
the insn mnemonics have been defined...

-- PMM


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s
  2021-06-14  8:37 ` [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s Richard Henderson
  2021-06-21 14:18   ` Peter Maydell
@ 2021-06-21 20:44   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-21 20:44 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 10:37 AM, Richard Henderson wrote:
> We will shortly require these in other context;
> make the expansion as clear as possible.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/ppc/tcg-target.c.inc | 31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 25/28] target/mips: Fix gen_mxu_s32ldd_s32lddr
  2021-06-14  8:37 ` [PATCH 25/28] target/mips: Fix gen_mxu_s32ldd_s32lddr Richard Henderson
@ 2021-06-21 20:59   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-21 20:59 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Craig Janeczek

Cc'ing Craig

On 6/14/21 10:37 AM, Richard Henderson wrote:
> There were two bugs here: (1) the required endianness was
> not present in the MemOp, and (2) we were not providing a
> zero-extended input to the bswap as semantics required.
> 
> The best fix is to fold the bswap into the memory operation,
> producing the desired result directly.
> 
> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/mips/tcg/mxu_translate.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/target/mips/tcg/mxu_translate.c b/target/mips/tcg/mxu_translate.c
> index c12cf78df7..f4356432c7 100644
> --- a/target/mips/tcg/mxu_translate.c
> +++ b/target/mips/tcg/mxu_translate.c
> @@ -857,12 +857,8 @@ static void gen_mxu_s32ldd_s32lddr(DisasContext *ctx)
>          tcg_gen_ori_tl(t1, t1, 0xFFFFF000);
>      }
>      tcg_gen_add_tl(t1, t0, t1);
> -    tcg_gen_qemu_ld_tl(t1, t1, ctx->mem_idx, MO_SL);
> +    tcg_gen_qemu_ld_tl(t1, t1, ctx->mem_idx, MO_TESL ^ (sel * MO_BSWAP));
>  
> -    if (sel == 1) {
> -        /* S32LDDR */
> -        tcg_gen_bswap32_tl(t1, t1, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
> -    }
>      gen_store_mxu_gpr(t1, XRa);
>  
>      tcg_temp_free(t0);
> 

Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16
  2021-06-14  8:37 ` [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16 Richard Henderson
@ 2021-06-22  6:36   ` Philippe Mathieu-Daudé
  2021-06-22 13:54     ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-22  6:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

On 6/14/21 10:37 AM, Richard Henderson wrote:
> Merge tcg_out_bswap16 and tcg_out_bswap16s.  Use the flags
> in the internal uses for loads and stores.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/mips/tcg-target.c.inc | 60 ++++++++++++++++++---------------------
>  1 file changed, 28 insertions(+), 32 deletions(-)
> 
> diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
> index 5944448b2a..7a5634419c 100644
> --- a/tcg/mips/tcg-target.c.inc
> +++ b/tcg/mips/tcg-target.c.inc
> @@ -540,39 +540,36 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
>      }
>  }
>  
> -static inline void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg)
> +static void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
>  {
> +    /* ret and arg can't be register tmp0 */
> +    tcg_debug_assert(ret != TCG_TMP0);
> +    tcg_debug_assert(arg != TCG_TMP0);
> +
>      if (use_mips32r2_instructions) {
>          tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
> -    } else {
> -        /* ret and arg can't be register at */
> -        if (ret == TCG_TMP0 || arg == TCG_TMP0) {
> -            tcg_abort();
> +        if (flags & TCG_BSWAP_OS) {
> +            tcg_out_opc_reg(s, OPC_SEH, ret, 0, ret);
> +        } else if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
> +            tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xffff);
>          }
> -
> -        tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
> -        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 8);
> -        tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xff00);
> -        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
> +        return;
>      }
> -}
>  
> -static inline void tcg_out_bswap16s(TCGContext *s, TCGReg ret, TCGReg arg)
> -{
> -    if (use_mips32r2_instructions) {
> -        tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);
> -        tcg_out_opc_reg(s, OPC_SEH, ret, 0, ret);
> -    } else {
> -        /* ret and arg can't be register at */
> -        if (ret == TCG_TMP0 || arg == TCG_TMP0) {
> -            tcg_abort();
> -        }
> -
> -        tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
> +    tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);
> +    if (!(flags & TCG_BSWAP_IZ)) {
> +        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
> +    }
> +    if (flags & TCG_BSWAP_OS) {
>          tcg_out_opc_sa(s, OPC_SLL, ret, arg, 24);
>          tcg_out_opc_sa(s, OPC_SRA, ret, ret, 16);
> -        tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
> +    } else {
> +        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 8);
> +        if (flags & TCG_BSWAP_OZ) {
> +            tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xff00);
> +        }
>      }
> +    tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0);
>  }

Do you mind including the comments (after reviewing them ;) )?

static void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg, int
flags)
{
    /* ret and arg can't be register tmp0 */
    tcg_debug_assert(ret != TCG_TMP0);
    tcg_debug_assert(arg != TCG_TMP0);

                                                        /* src = abcd
efgh */
    if (use_mips32r2_instructions) {
        tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);      /* ret = cdab
ghef */
        if (flags & TCG_BSWAP_OS) {
            tcg_out_opc_reg(s, OPC_SEH, ret, 0, ret);   /* ret = ssss
ghef */
        } else if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) ==
TCG_BSWAP_OZ) {
            tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xffff);
                                                        /* ret = 0000
ghef */
        }
        return;
    }

    tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);       /* t0  = ssab
cdef */
    if (!(flags & TCG_BSWAP_IZ)) {                      /* t0  = 0000
00ef */
        tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
    }
    if (flags & TCG_BSWAP_OS) {
        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 24);       /* ret = gh..
.... */
        tcg_out_opc_sa(s, OPC_SRA, ret, ret, 16);       /* ret = ssss
gh.. */
    } else {
        tcg_out_opc_sa(s, OPC_SLL, ret, arg, 8);        /* ret = cdef
gh.. */
        if (flags & TCG_BSWAP_OZ) {                     /* ret = 0000
gh.. */
            tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xff00);
        }
    }
    tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0); /* OZ: ret = 0000
ghef */
                                                    /* OS: ret = ssss
ghef */
}

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_*
  2021-06-14  8:37 ` [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_* Richard Henderson
  2021-06-21 15:04   ` Peter Maydell
@ 2021-06-22  6:41   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-22  6:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 10:37 AM, Richard Henderson wrote:
> We can perform any required sign-extension via TCG_BSWAP_OS.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg-op.c | 24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 21/28] target/arm: Improve vector REV
  2021-06-14  8:37 ` [PATCH 21/28] target/arm: Improve vector REV Richard Henderson
  2021-06-21 15:10   ` Peter Maydell
@ 2021-06-22  6:44   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-22  6:44 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell

On 6/14/21 10:37 AM, Richard Henderson wrote:
> We can eliminate the requirement for a zero-extended output,
> because the following store will ignore any garbage high bits.
> 
> Cc: Peter Maydell <peter.maydell@linaro.org> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-a64.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  2021-06-14 15:58     ` Richard Henderson
@ 2021-06-22 10:20       ` Philippe Mathieu-Daudé
  2021-06-22 13:56         ` Richard Henderson
  0 siblings, 1 reply; 81+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-22 10:20 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 6/14/21 5:58 PM, Richard Henderson wrote:
> On 6/14/21 2:41 AM, Philippe Mathieu-Daudé wrote:
>> On 6/14/21 10:37 AM, Richard Henderson wrote:
>>> Implement the new semantics in the fallback expansion.
>>> Change all callers to supply the flags that keep the
>>> semantics unchanged locally.
>>>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>   include/tcg/tcg-op.h            |   8 +--
>>>   target/arm/translate-a64.c      |  12 ++--
>>>   target/arm/translate.c          |   2 +-
>>>   target/i386/tcg/translate.c     |   2 +-
>>>   target/mips/tcg/mxu_translate.c |   2 +-
>>>   target/s390x/translate.c        |   4 +-
>>>   target/sh4/translate.c          |   2 +-
>>
>> Various REV 16/32, would it be useful to have it as a TCG opcode?
> 
> Which operation are you proposing as tcg opcode?  The per-halfword swap
> akin to mips wsbh?  Yes, that operation also appears in arm (rev16) and
> ppc (brh).  So it's a reasonable thing to do.

and REV32 for PPC BRW?

Another I noticed is popcnt.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16
  2021-06-22  6:36   ` Philippe Mathieu-Daudé
@ 2021-06-22 13:54     ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-22 13:54 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel

On 6/21/21 11:36 PM, Philippe Mathieu-Daudé wrote:
>                                                          /* src = abcd
> efgh */
>      if (use_mips32r2_instructions) {
>          tcg_out_opc_reg(s, OPC_WSBH, ret, 0, arg);      /* ret = cdab
> ghef */

badc -- bytes swapped in halfwords.  Also, this is a 32-bit insn, so 4 bytes is sufficient.

>          if (flags & TCG_BSWAP_OS) {
>              tcg_out_opc_reg(s, OPC_SEH, ret, 0, ret);   /* ret = ssss
> ghef */

(ssss)ssdc

Again, 32-bit insn, but implicitly sign-extending to 64-bits as per standard mips convention.

>          } else if ((flags & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) ==
> TCG_BSWAP_OZ) {
>              tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xffff);
>                                                          /* ret = 0000
> ghef */

(0000)00dc.

>      tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, arg, 8);       /* t0  = ssab
> cdef */
>      if (!(flags & TCG_BSWAP_IZ)) {                      /* t0  = 0000
> 00ef */
>          tcg_out_opc_imm(s, OPC_ANDI, TCG_TMP0, TCG_TMP0, 0x00ff);
>      }
>      if (flags & TCG_BSWAP_OS) {
>          tcg_out_opc_sa(s, OPC_SLL, ret, arg, 24);       /* ret = gh..
> .... */
>          tcg_out_opc_sa(s, OPC_SRA, ret, ret, 16);       /* ret = ssss
> gh.. */
>      } else {
>          tcg_out_opc_sa(s, OPC_SLL, ret, arg, 8);        /* ret = cdef
> gh.. */
>          if (flags & TCG_BSWAP_OZ) {                     /* ret = 0000
> gh.. */
>              tcg_out_opc_imm(s, OPC_ANDI, ret, ret, 0xff00);
>          }
>      }
>      tcg_out_opc_reg(s, OPC_OR, ret, ret, TCG_TMP0); /* OZ: ret = 0000
> ghef */
>                                                      /* OS: ret = ssss
> ghef */

Something like that, yes.  I'll fix it up.


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64
  2021-06-22 10:20       ` Philippe Mathieu-Daudé
@ 2021-06-22 13:56         ` Richard Henderson
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Henderson @ 2021-06-22 13:56 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel

On 6/22/21 3:20 AM, Philippe Mathieu-Daudé wrote:
> Another I noticed is popcnt.

Already present as ctpop.  (Which is how we name the operation in host-utils.h too.)


r~


^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2021-06-22 13:57 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14  8:37 [PATCH 00/28] tcg: bswap improvements Richard Henderson
2021-06-14  8:37 ` [PATCH 01/28] tcg: Add flags argument to bswap opcodes Richard Henderson
2021-06-14  9:19   ` Philippe Mathieu-Daudé
2021-06-14 11:49   ` Alex Bennée
2021-06-14 14:43     ` Richard Henderson
2021-06-21 13:41   ` Peter Maydell
2021-06-21 13:51     ` Peter Maydell
2021-06-21 14:02       ` Richard Henderson
2021-06-21 14:15         ` Peter Maydell
2021-06-21 13:59     ` Richard Henderson
2021-06-14  8:37 ` [PATCH 02/28] tcg/i386: Support bswap flags Richard Henderson
2021-06-21 13:53   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 03/28] tcg/aarch64: " Richard Henderson
2021-06-21 14:01   ` Peter Maydell
2021-06-21 14:04     ` Richard Henderson
2021-06-21 18:12     ` Richard Henderson
2021-06-21 19:40       ` Peter Maydell
2021-06-21 19:50         ` Richard Henderson
2021-06-21 19:52           ` Peter Maydell
2021-06-14  8:37 ` [PATCH 04/28] tcg/arm: " Richard Henderson
2021-06-21 14:09   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 05/28] tcg/ppc: Split out tcg_out_ext{8,16,32}s Richard Henderson
2021-06-21 14:18   ` Peter Maydell
2021-06-21 20:44   ` Philippe Mathieu-Daudé
2021-06-14  8:37 ` [PATCH 06/28] tcg/ppc: Split out tcg_out_sari{32,64} Richard Henderson
2021-06-21 14:22   ` Peter Maydell
2021-06-21 14:23     ` Richard Henderson
2021-06-14  8:37 ` [PATCH 07/28] tcg/ppc: Split out tcg_out_bswap16 Richard Henderson
2021-06-21 14:29   ` Peter Maydell
2021-06-21 14:41     ` Richard Henderson
2021-06-14  8:37 ` [PATCH 08/28] tcg/ppc: Split out tcg_out_bswap32 Richard Henderson
2021-06-21 14:30   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 09/28] tcg/ppc: Split out tcg_out_bswap64 Richard Henderson
2021-06-21 14:35   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 10/28] tcg/ppc: Support bswap flags Richard Henderson
2021-06-21 14:38   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 11/28] tcg/ppc: Use power10 byte-reverse instructions Richard Henderson
2021-06-14  8:37 ` [PATCH 12/28] tcg/s390: Support bswap flags Richard Henderson
2021-06-14  8:37 ` [PATCH 13/28] tcg/mips: Support bswap flags in tcg_out_bswap16 Richard Henderson
2021-06-22  6:36   ` Philippe Mathieu-Daudé
2021-06-22 13:54     ` Richard Henderson
2021-06-14  8:37 ` [PATCH 14/28] tcg/mips: Support bswap flags in tcg_out_bswap32 Richard Henderson
2021-06-14  9:31   ` Philippe Mathieu-Daudé
2021-06-14 15:49     ` Richard Henderson
2021-06-14  8:37 ` [PATCH 15/28] tcg/tci: Support bswap flags Richard Henderson
2021-06-14  9:33   ` Philippe Mathieu-Daudé
2021-06-14  8:37 ` [PATCH 16/28] tcg: Handle new bswap flags during optimize Richard Henderson
2021-06-21 14:47   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 17/28] tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64 Richard Henderson
2021-06-14  9:41   ` Philippe Mathieu-Daudé
2021-06-14 15:58     ` Richard Henderson
2021-06-22 10:20       ` Philippe Mathieu-Daudé
2021-06-22 13:56         ` Richard Henderson
2021-06-21 15:01   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 18/28] tcg: Make use of bswap flags in tcg_gen_qemu_ld_* Richard Henderson
2021-06-21 15:04   ` Peter Maydell
2021-06-22  6:41   ` Philippe Mathieu-Daudé
2021-06-14  8:37 ` [PATCH 19/28] tcg: Make use of bswap flags in tcg_gen_qemu_st_* Richard Henderson
2021-06-21 15:05   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 20/28] target/arm: Improve REV32 Richard Henderson
2021-06-14  9:08   ` Philippe Mathieu-Daudé
2021-06-21 15:08   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 21/28] target/arm: Improve vector REV Richard Henderson
2021-06-21 15:10   ` Peter Maydell
2021-06-22  6:44   ` Philippe Mathieu-Daudé
2021-06-14  8:37 ` [PATCH 22/28] target/arm: Improve REVSH Richard Henderson
2021-06-14  9:42   ` Philippe Mathieu-Daudé
2021-06-21 15:11   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 23/28] target/i386: Improve bswap translation Richard Henderson
2021-06-21 15:15   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 24/28] target/sh4: Improve swap.b translation Richard Henderson
2021-06-21 15:16   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 25/28] target/mips: Fix gen_mxu_s32ldd_s32lddr Richard Henderson
2021-06-21 20:59   ` Philippe Mathieu-Daudé
2021-06-14  8:37 ` [PATCH 26/28] tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP Richard Henderson
2021-06-21 15:31   ` Peter Maydell
2021-06-14  8:37 ` [PATCH 27/28] tcg/aarch64: " Richard Henderson
2021-06-21 15:33   ` Peter Maydell
2021-06-14  8:38 ` [PATCH 28/28] tcg/riscv: Remove MO_BSWAP handling Richard Henderson
2021-06-18  6:30   ` Alistair Francis
2021-06-14 22:41 ` [PATCH 00/28] tcg: bswap improvements no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.