All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue
@ 2016-11-23 13:00 Richard Henderson
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives Richard Henderson
                   ` (65 more replies)
  0 siblings, 66 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

This is a combination of two patch sets that have had previous
revisions, as well as some new patches.  I wanted to post this
all together since Alex was having trouble with prerequisites.

The full tree is at

  git://github.com/rth7680/qemu.git tcg-2.9

Changes since v3:
  * PPC host patches have been properly annotated for cpu revision,
    - cnttz[wd] are power9 inventions,
    - cntpop[wd] are power7 inventions.

  * X86 host checks the correct cpuid bit for lzcnt.

  * Generic TCG has significant changes to enable "interesting"
    combinations of constraints for X86 host bsr/bsf and to some
    extent lzcnt/tzcnt.

  * Opcode for ctpop.  I had begun with only the helpers for ctpop,
    but added the opcode after I discovered that power7/8 could use
    that as a better alternative for implementing ctz.

  * Updates to the i386 and ppc disassemblers, to handle the new
    insns that we're emitting.


r~


Richard Henderson (64):
  tcg: Add field extraction primitives
  tcg: Minor adjustments to deposit expanders
  tcg: Add deposit_z expander
  tcg/aarch64: Implement field extraction opcodes
  tcg/arm: Move isa detection to tcg-target.h
  tcg/arm: Implement field extraction opcodes
  tcg/i386: Implement field extraction opcodes
  tcg/mips: Implement field extraction opcodes
  tcg/ppc: Implement field extraction opcodes
  tcg/s390: Expose host facilities to tcg-target.h
  tcg/s390: Implement field extraction opcodes
  tcg/s390: Support deposit into zero
  target-alpha: Use deposit and extract ops
  target-arm: Use new deposit and extract ops
  target-i386: Use new deposit and extract ops
  target-mips: Use the new extract op
  target-ppc: Use the new deposit and extract ops
  target-s390x: Use the new deposit and extract ops
  tcg/optimize: Fold movcond 0/1 into setcond
  tcg: Add markup for output requires new register
  tcg: Transition flat op_defs array to a target callback
  tcg: Pass the opcode width to target_parse_constraint
  tcg: Allow an operand to be matching or a constant
  tcg: Add clz and ctz opcodes
  disas/i386.c: Handle tzcnt
  disas/ppc: Handle popcnt and cnttz
  target-alpha: Use the ctz and clz opcodes
  target-cris: Use clz opcode
  target-microblaze: Use clz opcode
  target-mips: Use clz opcode
  target-openrisc: Use clz and ctz opcodes
  target-ppc: Use clz and ctz opcodes
  target-s390x: Use clz opcode
  target-tilegx: Use clz and ctz opcodes
  target-tricore: Use clz opcode
  target-unicore32: Use clz opcode
  target-xtensa: Use clz opcode
  target-arm: Use clz opcode
  target-i386: Use clz and ctz opcodes
  tcg/ppc: Handle ctz and clz opcodes
  tcg/aarch64: Handle ctz and clz opcodes
  tcg/arm: Handle ctz and clz opcodes
  tcg/mips: Handle clz opcode
  tcg/s390: Handle clz opcode
  tcg/i386: Fuly convert tcg_target_op_def
  tcg/i386: Hoist common arguments in tcg_out_op
  tcg/i386: Allow bmi2 shiftx to have non-matching operands
  tcg/i386: Handle ctz and clz opcodes
  tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR
  tcg: Add helpers for clrsb
  target-arm: Use clrsb helper
  target-tricore: Use clrsb helper
  target-xtensa: Use clrsb helper
  tcg: Add opcode for ctpop
  target-alpha: Use ctpop helper
  target-ppc: Use ctpop helper
  target-s390x: Avoid a loop for popcnt
  target-sparc: Use ctpop helper
  target-tilegx: Use ctpop helper
  target-i386: Use ctpop helper
  qemu/host-utils.h: Reduce the operation count in the fallback ctpop
  tcg: Use ctpop to generate ctz if needed
  tcg/ppc: Handle ctpop opcode
  tcg/i386: Handle ctpop opcode

 disas/i386.c                  |  12 +-
 disas/ppc.c                   |  10 +
 include/qemu/host-utils.h     |  25 +-
 target-alpha/helper.h         |   4 -
 target-alpha/int_helper.c     |  15 -
 target-alpha/translate.c      |  73 +++--
 target-arm/helper-a64.c       |  20 --
 target-arm/helper-a64.h       |   4 -
 target-arm/helper.c           |   5 -
 target-arm/helper.h           |   1 -
 target-arm/translate-a64.c    |  95 ++----
 target-arm/translate.c        |  43 +--
 target-cris/helper.h          |   1 -
 target-cris/op_helper.c       |   5 -
 target-cris/translate.c       |   2 +-
 target-i386/cc_helper.c       |   3 +
 target-i386/cpu.h             |   1 +
 target-i386/helper.h          |   2 -
 target-i386/int_helper.c      |  11 -
 target-i386/ops_sse.h         |  26 --
 target-i386/ops_sse_header.h  |   1 -
 target-i386/translate.c       |  89 ++---
 target-microblaze/helper.h    |   1 -
 target-microblaze/op_helper.c |   5 -
 target-microblaze/translate.c |   2 +-
 target-mips/helper.h          |   7 -
 target-mips/op_helper.c       |  22 --
 target-mips/translate.c       |  35 +-
 target-openrisc/helper.h      |   2 -
 target-openrisc/int_helper.c  |  19 --
 target-openrisc/translate.c   |   6 +-
 target-ppc/helper.h           |   7 +-
 target-ppc/int_helper.c       |  38 +--
 target-ppc/translate.c        |  61 ++--
 target-s390x/helper.h         |   1 -
 target-s390x/int_helper.c     |  21 +-
 target-s390x/translate.c      |  36 ++-
 target-sparc/helper.c         |   5 -
 target-sparc/helper.h         |   1 -
 target-sparc/translate.c      |   2 +-
 target-tilegx/helper.c        |  15 -
 target-tilegx/helper.h        |   3 -
 target-tilegx/translate.c     |   6 +-
 target-tricore/helper.h       |   3 -
 target-tricore/op_helper.c    |  15 -
 target-tricore/translate.c    |   7 +-
 target-unicore32/helper.c     |  10 -
 target-unicore32/helper.h     |   3 -
 target-unicore32/translate.c  |   6 +-
 target-xtensa/helper.h        |   2 -
 target-xtensa/op_helper.c     |  13 -
 target-xtensa/translate.c     |   4 +-
 tcg-runtime.c                 |  40 +++
 tcg/README                    |  28 +-
 tcg/aarch64/tcg-target.h      |  10 +
 tcg/aarch64/tcg-target.inc.c  |  90 +++++-
 tcg/arm/tcg-target.h          |  41 ++-
 tcg/arm/tcg-target.inc.c      | 119 ++++---
 tcg/i386/tcg-target.h         |  17 +
 tcg/i386/tcg-target.inc.c     | 732 +++++++++++++++++++++++++++---------------
 tcg/ia64/tcg-target.h         |  10 +
 tcg/ia64/tcg-target.inc.c     |  28 +-
 tcg/mips/tcg-target.h         |   5 +
 tcg/mips/tcg-target.inc.c     |  66 +++-
 tcg/optimize.c                |  94 ++++++
 tcg/ppc/tcg-target.h          |  13 +
 tcg/ppc/tcg-target.inc.c      | 117 ++++++-
 tcg/s390/tcg-target.h         | 128 ++++----
 tcg/s390/tcg-target.inc.c     | 173 ++++++----
 tcg/sparc/tcg-target.h        |  10 +
 tcg/sparc/tcg-target.inc.c    |  28 +-
 tcg/tcg-op.c                  | 695 ++++++++++++++++++++++++++++++++++++++-
 tcg/tcg-op.h                  |  42 +++
 tcg/tcg-opc.h                 |  10 +
 tcg/tcg-runtime.h             |   9 +
 tcg/tcg.c                     | 173 +++++-----
 tcg/tcg.h                     |  14 +-
 tcg/tci/tcg-target.h          |  10 +
 tcg/tci/tcg-target.inc.c      |  25 +-
 79 files changed, 2425 insertions(+), 1108 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
@ 2016-11-23 13:00 ` Richard Henderson
  2016-12-05 13:17   ` Alex Bennée
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 02/64] tcg: Minor adjustments to deposit expanders Richard Henderson
                   ` (64 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
fixed position bitfields, much like we already have for deposit.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README               |  20 ++-
 tcg/aarch64/tcg-target.h |   4 +
 tcg/arm/tcg-target.h     |   2 +
 tcg/i386/tcg-target.h    |   4 +
 tcg/ia64/tcg-target.h    |   4 +
 tcg/mips/tcg-target.h    |   2 +
 tcg/optimize.c           |  29 +++++
 tcg/ppc/tcg-target.h     |   4 +
 tcg/s390/tcg-target.h    |   4 +
 tcg/sparc/tcg-target.h   |   4 +
 tcg/tcg-op.c             | 323 +++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h             |  12 ++
 tcg/tcg-opc.h            |   4 +
 tcg/tcg.h                |   8 ++
 tcg/tci/tcg-target.h     |   4 +
 15 files changed, 426 insertions(+), 2 deletions(-)

diff --git a/tcg/README b/tcg/README
index ae31388..065d9c2 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,11 +314,27 @@ The bitfield is described by POS/LEN, which are immediate values:
   LEN - the length of the bitfield
   POS - the position of the first bit, counting from the LSB
 
-For example, pos=8, len=4 indicates a 4-bit field at bit 8.
-This operation would be equivalent to
+For example, "deposit_i32 dest, t1, t2, 8, 4" indicates a 4-bit field
+at bit 8.  This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
+* extract_i32/i64 dest, t1, pos, len
+* sextract_i32/i64 dest, t1, pos, len
+
+Extract a bitfield from T1, placing the result in DEST.
+The bitfield is described by POS/LEN, which are immediate values,
+as above for deposit.  For extract_*, the result will be extended
+to the left with zeros; for sextract_*, the result will be extended
+to the left with copies of the bitfield sign bit at pos + len - 1.
+
+For example, "sextract_i32 dest, t1, 8, 4" indicates a 4-bit field
+at bit 8.  This operation would be equivalent to
+
+  dest = (t1 << 20) >> 28
+
+(using an arithmetic right shift).
+
 * extrl_i64_i32 t0, t1
 
 For 64-bit hosts only, extract the low 32-bits of input T1 and place it
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index a1d101f..410c31b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -63,6 +63,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -93,6 +95,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index a0e1acf..8e724be 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -80,6 +80,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        1
 #define TCG_TARGET_HAS_muls2_i32        1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 524cfc6..7625188 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -94,6 +94,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -124,6 +126,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 6dddb7f..8856dc8 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -149,6 +149,10 @@ typedef enum {
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i32         0
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 3aeac87..1bcea3b 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -123,6 +123,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 0f13490..f41ed2c 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -878,6 +878,19 @@ void tcg_optimize(TCGContext *s)
                              temps[args[2]].mask);
             break;
 
+        CASE_OP_32_64(extract):
+            mask = extract64(temps[args[1]].mask, args[2], args[3]);
+            if (args[2] == 0) {
+                affected = temps[args[1]].mask & ~mask;
+            }
+            break;
+        CASE_OP_32_64(sextract):
+            mask = sextract64(temps[args[1]].mask, args[2], args[3]);
+            if (args[2] == 0 && (tcg_target_long)mask >= 0) {
+                affected = temps[args[1]].mask & ~mask;
+            }
+            break;
+
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
             mask = temps[args[1]].mask | temps[args[2]].mask;
@@ -1048,6 +1061,22 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
+        CASE_OP_32_64(extract):
+            if (temp_is_const(args[1])) {
+                tmp = extract64(temps[args[1]].val, args[2], args[3]);
+                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                break;
+            }
+            goto do_default;
+
+        CASE_OP_32_64(sextract):
+            if (temp_is_const(args[1])) {
+                tmp = sextract64(temps[args[1]].val, args[2], args[3]);
+                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                break;
+            }
+            goto do_default;
+
         CASE_OP_32_64(setcond):
             tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
             if (tmp != 2) {
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index dd032f2..c765d3e 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -69,6 +69,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        0
 #define TCG_TARGET_HAS_muls2_i32        0
@@ -100,6 +102,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 0c1af24..9583df4 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -66,6 +66,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -95,6 +97,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 88f9c90..a212167 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -111,6 +111,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -141,6 +143,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 6e2fb35..c185b9c 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -560,6 +560,131 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
     tcg_temp_free_i32(t1);
 }
 
+void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                         unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 32);
+    tcg_debug_assert(ofs + len <= 32);
+
+    /* Canonicalize certain special cases, even if extract is supported.  */
+    if (ofs + len == 32) {
+        tcg_gen_shri_i32(ret, arg, 32 - len);
+        return;
+    }
+    if (ofs == 0) {
+        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+        return;
+    }
+
+    if (TCG_TARGET_HAS_extract_i32
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
+        tcg_gen_op4ii_i32(INDEX_op_extract_i32, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that zero-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 16:
+        if (TCG_TARGET_HAS_ext16u_i32) {
+            tcg_gen_ext16u_i32(ret, arg);
+            tcg_gen_shri_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8u_i32) {
+            tcg_gen_ext8u_i32(ret, arg);
+            tcg_gen_shri_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+
+    /* ??? Ideally we'd know what values are available for immediate AND.
+       Assume that 8 bits are available, plus the special case of 16,
+       so that we get ext8u, ext16u.  */
+    switch (len) {
+    case 1 ... 8: case 16:
+        tcg_gen_shri_i32(ret, arg, ofs);
+        tcg_gen_andi_i32(ret, ret, (1u << len) - 1);
+        break;
+    default:
+        tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
+        tcg_gen_shri_i32(ret, ret, 32 - len);
+        break;
+    }
+}
+
+void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                          unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 32);
+    tcg_debug_assert(ofs + len <= 32);
+
+    /* Canonicalize certain special cases, even if extract is supported.  */
+    if (ofs + len == 32) {
+        tcg_gen_sari_i32(ret, arg, 32 - len);
+        return;
+    }
+    if (ofs == 0) {
+        switch (len) {
+        case 16:
+            tcg_gen_ext16s_i32(ret, arg);
+            return;
+        case 8:
+            tcg_gen_ext8s_i32(ret, arg);
+            return;
+        }
+    }
+
+    if (TCG_TARGET_HAS_sextract_i32
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
+        tcg_gen_op4ii_i32(INDEX_op_sextract_i32, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that sign-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i32) {
+            tcg_gen_ext16s_i32(ret, arg);
+            tcg_gen_sari_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i32) {
+            tcg_gen_ext8s_i32(ret, arg);
+            tcg_gen_sari_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+    switch (len) {
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i32) {
+            tcg_gen_shri_i32(ret, arg, ofs);
+            tcg_gen_ext16s_i32(ret, ret);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i32) {
+            tcg_gen_shri_i32(ret, arg, ofs);
+            tcg_gen_ext8s_i32(ret, ret);
+            return;
+        }
+        break;
+    }
+
+    tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
+    tcg_gen_sari_i32(ret, ret, 32 - len);
+}
+
 void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
                          TCGv_i32 c2, TCGv_i32 v1, TCGv_i32 v2)
 {
@@ -1635,6 +1760,204 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
     tcg_temp_free_i64(t1);
 }
 
+void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                         unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 64);
+    tcg_debug_assert(ofs + len <= 64);
+
+    /* Canonicalize certain special cases, even if extract is supported.  */
+    if (ofs + len == 64) {
+        tcg_gen_shri_i64(ret, arg, 64 - len);
+        return;
+    }
+    if (ofs == 0) {
+        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+        return;
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* Look for a 32-bit extract within one of the two words.  */
+        if (ofs >= 32) {
+            tcg_gen_extract_i32(TCGV_LOW(ret), TCGV_HIGH(arg), ofs - 32, len);
+            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+            return;
+        }
+        if (ofs + len <= 32) {
+            tcg_gen_extract_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+            return;
+        }
+        /* The field is split across two words.  One double-word
+           shift is better than two double-word shifts.  */
+        goto do_shift_and;
+    }
+
+    if (TCG_TARGET_HAS_extract_i64
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
+        tcg_gen_op4ii_i64(INDEX_op_extract_i64, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that zero-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 32:
+        if (TCG_TARGET_HAS_ext32u_i64) {
+            tcg_gen_ext32u_i64(ret, arg);
+            tcg_gen_shri_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 16:
+        if (TCG_TARGET_HAS_ext16u_i64) {
+            tcg_gen_ext16u_i64(ret, arg);
+            tcg_gen_shri_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8u_i64) {
+            tcg_gen_ext8u_i64(ret, arg);
+            tcg_gen_shri_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+
+    /* ??? Ideally we'd know what values are available for immediate AND.
+       Assume that 8 bits are available, plus the special cases of 16 and 32,
+       so that we get ext8u, ext16u, and ext32u.  */
+    switch (len) {
+    case 1 ... 8: case 16: case 32:
+    do_shift_and:
+        tcg_gen_shri_i64(ret, arg, ofs);
+        tcg_gen_andi_i64(ret, ret, (1ull << len) - 1);
+        break;
+    default:
+        tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
+        tcg_gen_shri_i64(ret, ret, 64 - len);
+        break;
+    }
+}
+
+void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                          unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 64);
+    tcg_debug_assert(ofs + len <= 64);
+
+    /* Canonicalize certain special cases, even if sextract is supported.  */
+    if (ofs + len == 64) {
+        tcg_gen_sari_i64(ret, arg, 64 - len);
+        return;
+    }
+    if (ofs == 0) {
+        switch (len) {
+        case 32:
+            tcg_gen_ext32s_i64(ret, arg);
+            return;
+        case 16:
+            tcg_gen_ext16s_i64(ret, arg);
+            return;
+        case 8:
+            tcg_gen_ext8s_i64(ret, arg);
+            return;
+        }
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* Look for a 32-bit extract within one of the two words.  */
+        if (ofs >= 32) {
+            tcg_gen_sextract_i32(TCGV_LOW(ret), TCGV_HIGH(arg), ofs - 32, len);
+        } else if (ofs + len <= 32) {
+            tcg_gen_sextract_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+        } else if (ofs == 0) {
+            tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
+            tcg_gen_sextract_i32(TCGV_HIGH(ret), TCGV_HIGH(arg), 0, len - 32);
+            return;
+        } else if (len > 32) {
+            TCGv_i32 t = tcg_temp_new_i32();
+            /* Extract the bits for the high word normally.  */
+            tcg_gen_sextract_i32(t, TCGV_HIGH(arg), ofs + 32, len - 32);
+            /* Shift the field down for the low part.  */
+            tcg_gen_shri_i64(ret, arg, ofs);
+            /* Overwrite the shift into the high part.  */
+            tcg_gen_mov_i32(TCGV_HIGH(ret), t);
+            tcg_temp_free_i32(t);
+            return;
+        } else {
+            /* Shift the field down for the low part, such that the
+               field sits at the MSB.  */
+            tcg_gen_shri_i64(ret, arg, ofs + len - 32);
+            /* Shift the field down from the MSB, sign extending.  */
+            tcg_gen_sari_i32(TCGV_LOW(ret), TCGV_LOW(ret), 32 - len);
+        }
+        /* Sign-extend the field from 32 bits.  */
+        tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
+        return;
+    }
+
+    if (TCG_TARGET_HAS_sextract_i64
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
+        tcg_gen_op4ii_i64(INDEX_op_sextract_i64, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that sign-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 32:
+        if (TCG_TARGET_HAS_ext32s_i64) {
+            tcg_gen_ext32s_i64(ret, arg);
+            tcg_gen_sari_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i64) {
+            tcg_gen_ext16s_i64(ret, arg);
+            tcg_gen_sari_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i64) {
+            tcg_gen_ext8s_i64(ret, arg);
+            tcg_gen_sari_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+    switch (len) {
+    case 32:
+        if (TCG_TARGET_HAS_ext32s_i64) {
+            tcg_gen_shri_i64(ret, arg, ofs);
+            tcg_gen_ext32s_i64(ret, ret);
+            return;
+        }
+        break;
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i64) {
+            tcg_gen_shri_i64(ret, arg, ofs);
+            tcg_gen_ext16s_i64(ret, ret);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i64) {
+            tcg_gen_shri_i64(ret, arg, ofs);
+            tcg_gen_ext8s_i64(ret, ret);
+            return;
+        }
+        break;
+    }
+    tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
+    tcg_gen_sari_i64(ret, ret, 64 - len);
+}
+
 void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
                          TCGv_i64 c2, TCGv_i64 v1, TCGv_i64 v2)
 {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 6d044b7..b515e6f 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -292,6 +292,10 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                         unsigned int ofs, unsigned int len);
+void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                          unsigned int ofs, unsigned int len);
 void tcg_gen_brcond_i32(TCGCond cond, TCGv_i32 arg1, TCGv_i32 arg2, TCGLabel *);
 void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *);
 void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
@@ -469,6 +473,10 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                         unsigned int ofs, unsigned int len);
+void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                          unsigned int ofs, unsigned int len);
 void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *);
 void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *);
 void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
@@ -951,6 +959,8 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
 #define tcg_gen_rotri_tl tcg_gen_rotri_i64
 #define tcg_gen_deposit_tl tcg_gen_deposit_i64
+#define tcg_gen_extract_tl tcg_gen_extract_i64
+#define tcg_gen_sextract_tl tcg_gen_sextract_i64
 #define tcg_const_tl tcg_const_i64
 #define tcg_const_local_tl tcg_const_local_i64
 #define tcg_gen_movcond_tl tcg_gen_movcond_i64
@@ -1039,6 +1049,8 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
 #define tcg_gen_rotri_tl tcg_gen_rotri_i32
 #define tcg_gen_deposit_tl tcg_gen_deposit_i32
+#define tcg_gen_extract_tl tcg_gen_extract_i32
+#define tcg_gen_sextract_tl tcg_gen_sextract_i32
 #define tcg_const_tl tcg_const_i32
 #define tcg_const_local_tl tcg_const_local_i32
 #define tcg_gen_movcond_tl tcg_gen_movcond_i32
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 45528d2..11563ac 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -77,6 +77,8 @@ DEF(sar_i32, 1, 2, 0, 0)
 DEF(rotl_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
 DEF(rotr_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
 DEF(deposit_i32, 1, 2, 2, IMPL(TCG_TARGET_HAS_deposit_i32))
+DEF(extract_i32, 1, 1, 2, IMPL(TCG_TARGET_HAS_extract_i32))
+DEF(sextract_i32, 1, 1, 2, IMPL(TCG_TARGET_HAS_sextract_i32))
 
 DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END)
 
@@ -139,6 +141,8 @@ DEF(sar_i64, 1, 2, 0, IMPL64)
 DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
+DEF(extract_i64, 1, 1, 2, IMPL64 | IMPL(TCG_TARGET_HAS_extract_i64))
+DEF(sextract_i64, 1, 1, 2, IMPL64 | IMPL(TCG_TARGET_HAS_sextract_i64))
 
 /* size changing ops */
 DEF(ext_i32_i64, 1, 1, 0, IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a35e4c4..5fd3733 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -112,6 +112,8 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      0
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i64         0
@@ -130,6 +132,12 @@ typedef uint64_t TCGRegSet;
 #ifndef TCG_TARGET_deposit_i64_valid
 #define TCG_TARGET_deposit_i64_valid(ofs, len) 1
 #endif
+#ifndef TCG_TARGET_extract_i32_valid
+#define TCG_TARGET_extract_i32_valid(ofs, len) 1
+#endif
+#ifndef TCG_TARGET_extract_i64_valid
+#define TCG_TARGET_extract_i64_valid(ofs, len) 1
+#endif
 
 /* Only one of DIV or DIV2 should be defined.  */
 #if defined(TCG_TARGET_HAS_div_i32)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 868228b..2065042 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -69,6 +69,8 @@
 #define TCG_TARGET_HAS_ext16u_i32       1
 #define TCG_TARGET_HAS_andc_i32         0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
@@ -88,6 +90,8 @@
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_ext8s_i64        1
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 02/64] tcg: Minor adjustments to deposit expanders
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives Richard Henderson
@ 2016-11-23 13:00 ` Richard Henderson
  2016-12-05 13:18   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 03/64] tcg: Add deposit_z expander Richard Henderson
                   ` (63 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Assert that len is not 0.

Since we have asserted that ofs + len <= N, a later
check for len == N implies that ofs == 0.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index c185b9c..b17f03f 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -533,10 +533,11 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
     TCGv_i32 t1;
 
     tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 32);
     tcg_debug_assert(ofs + len <= 32);
 
-    if (ofs == 0 && len == 32) {
+    if (len == 32) {
         tcg_gen_mov_i32(ret, arg2);
         return;
     }
@@ -1718,10 +1719,11 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
     TCGv_i64 t1;
 
     tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 64);
     tcg_debug_assert(ofs + len <= 64);
 
-    if (ofs == 0 && len == 64) {
+    if (len == 64) {
         tcg_gen_mov_i64(ret, arg2);
         return;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 03/64] tcg: Add deposit_z expander
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives Richard Henderson
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 02/64] tcg: Minor adjustments to deposit expanders Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes Richard Henderson
                   ` (62 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

While we don't require a new opcode, it is handy to have an expander
that knows the first source is zero.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.c | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h |   6 +++
 2 files changed, 149 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index b17f03f..1927e53 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -561,6 +561,64 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
     tcg_temp_free_i32(t1);
 }
 
+void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
+                           unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 32);
+    tcg_debug_assert(ofs + len <= 32);
+
+    if (ofs + len == 32) {
+        tcg_gen_shli_i32(ret, arg, ofs);
+    } else if (ofs == 0) {
+        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+    } else if (TCG_TARGET_HAS_deposit_i32
+               && TCG_TARGET_deposit_i32_valid(ofs, len)) {
+        TCGv_i32 zero = tcg_const_i32(0);
+        tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
+        tcg_temp_free_i32(zero);
+    } else {
+        /* To help two-operand hosts we prefer to zero-extend first,
+           which allows ARG to stay live.  */
+        switch (len) {
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i32) {
+                tcg_gen_ext16u_i32(ret, arg);
+                tcg_gen_shli_i32(ret, ret, ofs);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i32) {
+                tcg_gen_ext8u_i32(ret, arg);
+                tcg_gen_shli_i32(ret, ret, ofs);
+                return;
+            }
+            break;
+        }
+        /* Otherwise prefer zero-extension over AND for code size.  */
+        switch (ofs + len) {
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i32) {
+                tcg_gen_shli_i32(ret, arg, ofs);
+                tcg_gen_ext16u_i32(ret, ret);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i32) {
+                tcg_gen_shli_i32(ret, arg, ofs);
+                tcg_gen_ext8u_i32(ret, ret);
+                return;
+            }
+            break;
+        }
+        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+        tcg_gen_shli_i32(ret, ret, ofs);
+    }
+}
+
 void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
                          unsigned int ofs, unsigned int len)
 {
@@ -1762,6 +1820,91 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
     tcg_temp_free_i64(t1);
 }
 
+void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
+                           unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 64);
+    tcg_debug_assert(ofs + len <= 64);
+
+    if (ofs + len == 64) {
+        tcg_gen_shli_i64(ret, arg, ofs);
+    } else if (ofs == 0) {
+        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+    } else if (TCG_TARGET_HAS_deposit_i64
+               && TCG_TARGET_deposit_i64_valid(ofs, len)) {
+        TCGv_i64 zero = tcg_const_i64(0);
+        tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
+        tcg_temp_free_i64(zero);
+    } else {
+        if (TCG_TARGET_REG_BITS == 32) {
+            if (ofs >= 32) {
+                tcg_gen_deposit_z_i32(TCGV_HIGH(ret), TCGV_LOW(arg),
+                                      ofs - 32, len);
+                tcg_gen_movi_i32(TCGV_LOW(ret), 0);
+                return;
+            }
+            if (ofs + len <= 32) {
+                tcg_gen_deposit_z_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+                tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+                return;
+            }
+        }
+        /* To help two-operand hosts we prefer to zero-extend first,
+           which allows ARG to stay live.  */
+        switch (len) {
+        case 32:
+            if (TCG_TARGET_HAS_ext32u_i64) {
+                tcg_gen_ext32u_i64(ret, arg);
+                tcg_gen_shli_i64(ret, ret, ofs);
+                return;
+            }
+            break;
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i64) {
+                tcg_gen_ext16u_i64(ret, arg);
+                tcg_gen_shli_i64(ret, ret, ofs);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i64) {
+                tcg_gen_ext8u_i64(ret, arg);
+                tcg_gen_shli_i64(ret, ret, ofs);
+                return;
+            }
+            break;
+        }
+        /* Otherwise prefer zero-extension over AND for code size.  */
+        switch (ofs + len) {
+        case 32:
+            if (TCG_TARGET_HAS_ext32u_i64) {
+                tcg_gen_shli_i64(ret, arg, ofs);
+                tcg_gen_ext32u_i64(ret, ret);
+                return;
+            }
+            break;
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i64) {
+                tcg_gen_shli_i64(ret, arg, ofs);
+                tcg_gen_ext16u_i64(ret, ret);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i64) {
+                tcg_gen_shli_i64(ret, arg, ofs);
+                tcg_gen_ext8u_i64(ret, ret);
+                return;
+            }
+            break;
+        }
+        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+        tcg_gen_shli_i64(ret, ret, ofs);
+    }
+}
+
 void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
                          unsigned int ofs, unsigned int len)
 {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index b515e6f..d42fd0d 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -292,6 +292,8 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
+                           unsigned int ofs, unsigned int len);
 void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
                          unsigned int ofs, unsigned int len);
 void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
@@ -473,6 +475,8 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
+                           unsigned int ofs, unsigned int len);
 void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
                          unsigned int ofs, unsigned int len);
 void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
@@ -959,6 +963,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
 #define tcg_gen_rotri_tl tcg_gen_rotri_i64
 #define tcg_gen_deposit_tl tcg_gen_deposit_i64
+#define tcg_gen_deposit_z_tl tcg_gen_deposit_z_i64
 #define tcg_gen_extract_tl tcg_gen_extract_i64
 #define tcg_gen_sextract_tl tcg_gen_sextract_i64
 #define tcg_const_tl tcg_const_i64
@@ -1049,6 +1054,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
 #define tcg_gen_rotri_tl tcg_gen_rotri_i32
 #define tcg_gen_deposit_tl tcg_gen_deposit_i32
+#define tcg_gen_deposit_z_tl tcg_gen_deposit_z_i32
 #define tcg_gen_extract_tl tcg_gen_extract_i32
 #define tcg_gen_sextract_tl tcg_gen_sextract_i32
 #define tcg_const_tl tcg_const_i32
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (2 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 03/64] tcg: Add deposit_z expander Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 12:24   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 05/64] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
                   ` (61 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h     |  8 ++++----
 tcg/aarch64/tcg-target.inc.c | 14 ++++++++++++++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 410c31b..4a74bd8 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -63,8 +63,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      1
+#define TCG_TARGET_HAS_sextract_i32     1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -95,8 +95,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
-#define TCG_TARGET_HAS_sextract_i64     0
+#define TCG_TARGET_HAS_extract_i64      1
+#define TCG_TARGET_HAS_sextract_i64     1
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1939d35..c0e9890 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1640,6 +1640,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
         break;
 
+    case INDEX_op_extract_i64:
+    case INDEX_op_extract_i32:
+        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
+        break;
+
+    case INDEX_op_sextract_i64:
+    case INDEX_op_sextract_i32:
+        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
+        break;
+
     case INDEX_op_add2_i32:
         tcg_out_addsub2(s, TCG_TYPE_I32, a0, a1, REG0(2), REG0(3),
                         (int32_t)args[4], args[5], const_args[4],
@@ -1785,6 +1795,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
+    { INDEX_op_sextract_i32, { "r", "r" } },
+    { INDEX_op_sextract_i64, { "r", "r" } },
 
     { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
     { INDEX_op_add2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 05/64] tcg/arm: Move isa detection to tcg-target.h
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (3 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 12:34   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 06/64] tcg/arm: Implement field extraction opcodes Richard Henderson
                   ` (60 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     | 36 ++++++++++++++++++++++++++++++++----
 tcg/arm/tcg-target.inc.c | 41 +----------------------------------------
 2 files changed, 33 insertions(+), 44 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 8e724be..d1fe12b 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -26,6 +26,37 @@
 #ifndef ARM_TCG_TARGET_H
 #define ARM_TCG_TARGET_H
 
+/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
+#ifndef __ARM_ARCH
+# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
+     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
+     || defined(__ARM_ARCH_7EM__)
+#  define __ARM_ARCH 7
+# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
+       || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
+       || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
+#  define __ARM_ARCH 6
+# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
+       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
+       || defined(__ARM_ARCH_5TEJ__)
+#  define __ARM_ARCH 5
+# else
+#  define __ARM_ARCH 4
+# endif
+#endif
+
+extern int arm_arch;
+
+#if defined(__ARM_ARCH_5T__) \
+    || defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
+# define use_armv5t_instructions 1
+#else
+# define use_armv5t_instructions use_armv6_instructions
+#endif
+
+#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
+#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
+
 #undef TCG_TARGET_STACK_GROWSUP
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
@@ -79,7 +110,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      0
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
@@ -90,9 +121,6 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32          0
 
-extern bool tcg_target_deposit_valid(int ofs, int len);
-#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
-
 enum {
     TCG_AREG0 = TCG_REG_R6,
 };
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index ffa0d40..1415c27 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -25,36 +25,7 @@
 #include "elf.h"
 #include "tcg-be-ldst.h"
 
-/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
-#ifndef __ARM_ARCH
-# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
-     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
-     || defined(__ARM_ARCH_7EM__)
-#  define __ARM_ARCH 7
-# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
-       || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
-       || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
-#  define __ARM_ARCH 6
-# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
-       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
-       || defined(__ARM_ARCH_5TEJ__)
-#  define __ARM_ARCH 5
-# else
-#  define __ARM_ARCH 4
-# endif
-#endif
-
-static int arm_arch = __ARM_ARCH;
-
-#if defined(__ARM_ARCH_5T__) \
-    || defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
-# define use_armv5t_instructions 1
-#else
-# define use_armv5t_instructions use_armv6_instructions
-#endif
-
-#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
-#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
+int arm_arch = __ARM_ARCH;
 
 #ifndef use_idiv_instructions
 bool use_idiv_instructions;
@@ -730,16 +701,6 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
     }
 }
 
-bool tcg_target_deposit_valid(int ofs, int len)
-{
-    /* ??? Without bfi, we could improve over generic code by combining
-       the right-shift from a non-zero ofs with the orr.  We do run into
-       problems when rd == rs, and the mask generated from ofs+len doesn't
-       fit into an immediate.  We would have to be careful not to pessimize
-       wrt the optimizations performed on the expanded code.  */
-    return use_armv7_instructions;
-}
-
 static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
                                    TCGArg a1, int ofs, int len, bool const_a1)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 06/64] tcg/arm: Implement field extraction opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (4 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 05/64] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 16:16   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 07/64] tcg/i386: " Richard Henderson
                   ` (59 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     |  4 ++--
 tcg/arm/tcg-target.inc.c | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index d1fe12b..4e30728 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -111,8 +111,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
+#define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        1
 #define TCG_TARGET_HAS_muls2_i32        1
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 1415c27..6765a9d 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -713,6 +713,20 @@ static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
               | (ofs << 7) | ((ofs + len - 1) << 16));
 }
 
+static inline void tcg_out_extract(TCGContext *s, int cond, TCGReg rd,
+                                   TCGArg a1, int ofs, int len)
+{
+    tcg_out32(s, 0x07e00050 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((len - 1) << 16));
+}
+
+static inline void tcg_out_sextract(TCGContext *s, int cond, TCGReg rd,
+                                    TCGArg a1, int ofs, int len)
+{
+    tcg_out32(s, 0x07a00050 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((len - 1) << 16));
+}
+
 /* Note that this routine is used for both LDR and LDRH formats, so we do
    not wish to include an immediate shift at this point.  */
 static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
@@ -1894,6 +1908,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_deposit(s, COND_AL, args[0], args[2],
                         args[3], args[4], const_args[2]);
         break;
+    case INDEX_op_extract_i32:
+        tcg_out_extract(s, COND_AL, args[0], args[1], args[2], args[3]);
+        break;
+    case INDEX_op_sextract_i32:
+        tcg_out_sextract(s, COND_AL, args[0], args[1], args[2], args[3]);
+        break;
 
     case INDEX_op_div_i32:
         tcg_out_sdiv(s, COND_AL, args[0], args[1], args[2]);
@@ -1976,6 +1996,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_ext16u_i32, { "r", "r" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
+    { INDEX_op_sextract_i32, { "r", "r" } },
 
     { INDEX_op_div_i32, { "r", "r", "r" } },
     { INDEX_op_divu_i32, { "r", "r", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 07/64] tcg/i386: Implement field extraction opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (5 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 06/64] tcg/arm: Implement field extraction opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-25 11:16   ` Paolo Bonzini
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 08/64] tcg/mips: " Richard Henderson
                   ` (58 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     | 12 +++++++++---
 tcg/i386/tcg-target.inc.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7625188..dc19c47 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -94,8 +94,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      1
+#define TCG_TARGET_HAS_sextract_i32     1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -126,7 +126,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
@@ -142,6 +142,12 @@ extern bool have_bmi1;
      ((ofs) == 0 && (len) == 16))
 #define TCG_TARGET_deposit_i64_valid    TCG_TARGET_deposit_i32_valid
 
+/* Check for the possibility of high-byte extraction and, for 64-bit,
+   zero-extending 32-bit right-shift.  */
+#define TCG_TARGET_extract_i32_valid(ofs, len) ((ofs) == 8 && (len) == 8)
+#define TCG_TARGET_extract_i64_valid(ofs, len) \
+    (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
+
 #if TCG_TARGET_REG_BITS == 64
 # define TCG_AREG0 TCG_REG_R14
 #else
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index eeb1777..39f62bd 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2143,6 +2143,40 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_extract_i64:
+        if (args[2] + args[3] == 32) {
+            /* This is a 32-bit zero-extending right shift.  */
+            tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
+            tcg_out_shifti(s, SHIFT_SHR, args[0], args[2]);
+            break;
+        }
+        /* FALLTHRU */
+    case INDEX_op_extract_i32:
+        /* On the off-chance that we can use the high-byte registers.
+           Otherwise we emit the same ext16 + shift pattern that we
+           would have gotten from the normal tcg-op.c expansion.  */
+        tcg_debug_assert(args[2] == 8 && args[3] == 8);
+        if (args[1] < 4 && args[0] < 8) {
+            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
+        } else {
+            tcg_out_ext16u(s, args[0], args[1]);
+            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
+        }
+        break;
+
+    case INDEX_op_sextract_i32:
+        /* We don't implement sextract_i64, as we cannot sign-extend to
+           64-bits without using the REX prefix that explicitly excludes
+           access to the high-byte registers.  */
+        tcg_debug_assert(args[2] == 8 && args[3] == 8);
+        if (args[1] < 4 && args[0] < 8) {
+            tcg_out_modrm(s, OPC_MOVSBL, args[0], args[1] + 4);
+        } else {
+            tcg_out_ext16s(s, args[0], args[1], 0);
+            tcg_out_shifti(s, SHIFT_SAR, args[0], 8);
+        }
+        break;
+
     case INDEX_op_mb:
         tcg_out_mb(s, args[0]);
         break;
@@ -2204,6 +2238,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_setcond_i32, { "q", "r", "ri" } },
 
     { INDEX_op_deposit_i32, { "Q", "0", "Q" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
+    { INDEX_op_sextract_i32, { "r", "r" } },
+
     { INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
 
     { INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
@@ -2265,6 +2302,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_extu_i32_i64, { "r", "r" } },
 
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
     { INDEX_op_mulu2_i64, { "a", "d", "a", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 08/64] tcg/mips: Implement field extraction opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (6 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 07/64] tcg/i386: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 09/64] tcg/ppc: " Richard Henderson
                   ` (57 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.h     | 2 +-
 tcg/mips/tcg-target.inc.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 1bcea3b..f1c3137 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -123,7 +123,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
-#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_extract_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index abce602..1ecae08 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1637,6 +1637,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_deposit_i32:
         tcg_out_opc_bf(s, OPC_INS, a0, a2, args[3] + args[4] - 1, args[3]);
         break;
+    case INDEX_op_extract_i32:
+        tcg_out_opc_bf(s, OPC_EXT, a0, a1, a2 + args[3] - 1, a2);
+        break;
 
     case INDEX_op_brcond_i32:
         tcg_out_brcond(s, a2, a0, a1, arg_label(args[3]));
@@ -1736,6 +1739,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_ext16s_i32, { "r", "rZ" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_brcond_i32, { "rZ", "rZ" } },
 #if use_mips32r6_instructions
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 09/64] tcg/ppc: Implement field extraction opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (7 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 08/64] tcg/mips: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 10/64] tcg/s390: Expose host facilities to tcg-target.h Richard Henderson
                   ` (56 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     |  4 ++--
 tcg/ppc/tcg-target.inc.c | 10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c765d3e..b42c57a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -69,7 +69,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        0
@@ -102,7 +102,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index a3262cf..7ec54a2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2396,6 +2396,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         break;
 
+    case INDEX_op_extract_i32:
+        tcg_out_rlw(s, RLWINM, args[0], args[1],
+                    32 - args[2], 32 - args[3], 31);
+        break;
+    case INDEX_op_extract_i64:
+        tcg_out_rld(s, RLDICL, args[0], args[1], 64 - args[2], 64 - args[3]);
+        break;
+
     case INDEX_op_movcond_i32:
         tcg_out_movcond(s, TCG_TYPE_I32, args[5], args[0], args[1], args[2],
                         args[3], args[4], const_args[2]);
@@ -2530,6 +2538,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_movcond_i32, { "r", "r", "ri", "rZ", "rZ" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_muluh_i32, { "r", "r", "r" } },
     { INDEX_op_mulsh_i32, { "r", "r", "r" } },
@@ -2585,6 +2594,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "r", "ri", "rZ", "rZ" } },
 
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
 
     { INDEX_op_mulsh_i64, { "r", "r", "r" } },
     { INDEX_op_muluh_i64, { "r", "r", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 10/64] tcg/s390: Expose host facilities to tcg-target.h
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (8 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 09/64] tcg/ppc: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 11/64] tcg/s390: Implement field extraction opcodes Richard Henderson
                   ` (55 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

This lets us expose facilities to TCG_TARGET_HAS_* defines
directly, rather than hiding behind function calls.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     | 126 ++++++++++++++++++++++++----------------------
 tcg/s390/tcg-target.inc.c |  74 +++++++++++----------------
 2 files changed, 96 insertions(+), 104 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 9583df4..d650a72 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -49,67 +49,75 @@ typedef enum TCGReg {
 
 #define TCG_TARGET_NB_REGS 16
 
-/* optional instructions */
-#define TCG_TARGET_HAS_div2_i32         1
-#define TCG_TARGET_HAS_rot_i32          1
-#define TCG_TARGET_HAS_ext8s_i32        1
-#define TCG_TARGET_HAS_ext16s_i32       1
-#define TCG_TARGET_HAS_ext8u_i32        1
-#define TCG_TARGET_HAS_ext16u_i32       1
-#define TCG_TARGET_HAS_bswap16_i32      1
-#define TCG_TARGET_HAS_bswap32_i32      1
-#define TCG_TARGET_HAS_not_i32          0
-#define TCG_TARGET_HAS_neg_i32          1
-#define TCG_TARGET_HAS_andc_i32         0
-#define TCG_TARGET_HAS_orc_i32          0
-#define TCG_TARGET_HAS_eqv_i32          0
-#define TCG_TARGET_HAS_nand_i32         0
-#define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
-#define TCG_TARGET_HAS_movcond_i32      1
-#define TCG_TARGET_HAS_add2_i32         1
-#define TCG_TARGET_HAS_sub2_i32         1
-#define TCG_TARGET_HAS_mulu2_i32        0
-#define TCG_TARGET_HAS_muls2_i32        0
-#define TCG_TARGET_HAS_muluh_i32        0
-#define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_extrl_i64_i32    0
-#define TCG_TARGET_HAS_extrh_i64_i32    0
+/* A list of relevant facilities used by this translator.  Some of these
+   are required for proper operation, and these are checked at startup.  */
+
+#define FACILITY_ZARCH_ACTIVE         (1ULL << (63 - 2))
+#define FACILITY_LONG_DISP            (1ULL << (63 - 18))
+#define FACILITY_EXT_IMM              (1ULL << (63 - 21))
+#define FACILITY_GEN_INST_EXT         (1ULL << (63 - 34))
+#define FACILITY_LOAD_ON_COND         (1ULL << (63 - 45))
+#define FACILITY_FAST_BCR_SER         FACILITY_LOAD_ON_COND
 
-#define TCG_TARGET_HAS_div2_i64         1
-#define TCG_TARGET_HAS_rot_i64          1
-#define TCG_TARGET_HAS_ext8s_i64        1
-#define TCG_TARGET_HAS_ext16s_i64       1
-#define TCG_TARGET_HAS_ext32s_i64       1
-#define TCG_TARGET_HAS_ext8u_i64        1
-#define TCG_TARGET_HAS_ext16u_i64       1
-#define TCG_TARGET_HAS_ext32u_i64       1
-#define TCG_TARGET_HAS_bswap16_i64      1
-#define TCG_TARGET_HAS_bswap32_i64      1
-#define TCG_TARGET_HAS_bswap64_i64      1
-#define TCG_TARGET_HAS_not_i64          0
-#define TCG_TARGET_HAS_neg_i64          1
-#define TCG_TARGET_HAS_andc_i64         0
-#define TCG_TARGET_HAS_orc_i64          0
-#define TCG_TARGET_HAS_eqv_i64          0
-#define TCG_TARGET_HAS_nand_i64         0
-#define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
-#define TCG_TARGET_HAS_sextract_i64     0
-#define TCG_TARGET_HAS_movcond_i64      1
-#define TCG_TARGET_HAS_add2_i64         1
-#define TCG_TARGET_HAS_sub2_i64         1
-#define TCG_TARGET_HAS_mulu2_i64        1
-#define TCG_TARGET_HAS_muls2_i64        0
-#define TCG_TARGET_HAS_muluh_i64        0
-#define TCG_TARGET_HAS_mulsh_i64        0
+extern uint64_t s390_facilities;
+
+/* optional instructions */
+#define TCG_TARGET_HAS_div2_i32       1
+#define TCG_TARGET_HAS_rot_i32        1
+#define TCG_TARGET_HAS_ext8s_i32      1
+#define TCG_TARGET_HAS_ext16s_i32     1
+#define TCG_TARGET_HAS_ext8u_i32      1
+#define TCG_TARGET_HAS_ext16u_i32     1
+#define TCG_TARGET_HAS_bswap16_i32    1
+#define TCG_TARGET_HAS_bswap32_i32    1
+#define TCG_TARGET_HAS_not_i32        0
+#define TCG_TARGET_HAS_neg_i32        1
+#define TCG_TARGET_HAS_andc_i32       0
+#define TCG_TARGET_HAS_orc_i32        0
+#define TCG_TARGET_HAS_eqv_i32        0
+#define TCG_TARGET_HAS_nand_i32       0
+#define TCG_TARGET_HAS_nor_i32        0
+#define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
+#define TCG_TARGET_HAS_extract_i32    0
+#define TCG_TARGET_HAS_sextract_i32   0
+#define TCG_TARGET_HAS_movcond_i32    1
+#define TCG_TARGET_HAS_add2_i32       1
+#define TCG_TARGET_HAS_sub2_i32       1
+#define TCG_TARGET_HAS_mulu2_i32      0
+#define TCG_TARGET_HAS_muls2_i32      0
+#define TCG_TARGET_HAS_muluh_i32      0
+#define TCG_TARGET_HAS_mulsh_i32      0
+#define TCG_TARGET_HAS_extrl_i64_i32  0
+#define TCG_TARGET_HAS_extrh_i64_i32  0
 
-extern bool tcg_target_deposit_valid(int ofs, int len);
-#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
-#define TCG_TARGET_deposit_i64_valid  tcg_target_deposit_valid
+#define TCG_TARGET_HAS_div2_i64       1
+#define TCG_TARGET_HAS_rot_i64        1
+#define TCG_TARGET_HAS_ext8s_i64      1
+#define TCG_TARGET_HAS_ext16s_i64     1
+#define TCG_TARGET_HAS_ext32s_i64     1
+#define TCG_TARGET_HAS_ext8u_i64      1
+#define TCG_TARGET_HAS_ext16u_i64     1
+#define TCG_TARGET_HAS_ext32u_i64     1
+#define TCG_TARGET_HAS_bswap16_i64    1
+#define TCG_TARGET_HAS_bswap32_i64    1
+#define TCG_TARGET_HAS_bswap64_i64    1
+#define TCG_TARGET_HAS_not_i64        0
+#define TCG_TARGET_HAS_neg_i64        1
+#define TCG_TARGET_HAS_andc_i64       0
+#define TCG_TARGET_HAS_orc_i64        0
+#define TCG_TARGET_HAS_eqv_i64        0
+#define TCG_TARGET_HAS_nand_i64       0
+#define TCG_TARGET_HAS_nor_i64        0
+#define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
+#define TCG_TARGET_HAS_extract_i64    0
+#define TCG_TARGET_HAS_sextract_i64   0
+#define TCG_TARGET_HAS_movcond_i64    1
+#define TCG_TARGET_HAS_add2_i64       1
+#define TCG_TARGET_HAS_sub2_i64       1
+#define TCG_TARGET_HAS_mulu2_i64      1
+#define TCG_TARGET_HAS_muls2_i64      0
+#define TCG_TARGET_HAS_muluh_i64      0
+#define TCG_TARGET_HAS_mulsh_i64      0
 
 /* used for function call generation */
 #define TCG_REG_CALL_STACK		TCG_REG_R15
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 253d4a0..9f51133 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -334,18 +334,7 @@ static void * const qemu_st_helpers[16] = {
 #endif
 
 static tcg_insn_unit *tb_ret_addr;
-
-/* A list of relevant facilities used by this translator.  Some of these
-   are required for proper operation, and these are checked at startup.  */
-
-#define FACILITY_ZARCH_ACTIVE	(1ULL << (63 - 2))
-#define FACILITY_LONG_DISP	(1ULL << (63 - 18))
-#define FACILITY_EXT_IMM	(1ULL << (63 - 21))
-#define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
-#define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
-#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
-
-static uint64_t facilities;
+uint64_t s390_facilities;
 
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
@@ -432,7 +421,7 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 
 static int tcg_match_ori(TCGType type, tcg_target_long val)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (type == TCG_TYPE_I32) {
             /* All 32-bit ORs can be performed with 1 48-bit insn.  */
             return 1;
@@ -444,7 +433,7 @@ static int tcg_match_ori(TCGType type, tcg_target_long val)
         if (val == (int16_t)val) {
             return 0;
         }
-        if (facilities & FACILITY_EXT_IMM) {
+        if (s390_facilities & FACILITY_EXT_IMM) {
             if (val == (int32_t)val) {
                 return 0;
             }
@@ -461,7 +450,7 @@ static int tcg_match_ori(TCGType type, tcg_target_long val)
 
 static int tcg_match_xori(TCGType type, tcg_target_long val)
 {
-    if ((facilities & FACILITY_EXT_IMM) == 0) {
+    if ((s390_facilities & FACILITY_EXT_IMM) == 0) {
         return 0;
     }
 
@@ -482,7 +471,7 @@ static int tcg_match_xori(TCGType type, tcg_target_long val)
 
 static int tcg_match_cmpi(TCGType type, tcg_target_long val)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         /* The COMPARE IMMEDIATE instruction is available.  */
         if (type == TCG_TYPE_I32) {
             /* We have a 32-bit immediate and can compare against anything.  */
@@ -511,7 +500,7 @@ static int tcg_match_cmpi(TCGType type, tcg_target_long val)
 
 static int tcg_match_add2i(TCGType type, tcg_target_long val)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (type == TCG_TYPE_I32) {
             return 1;
         } else if (val >= -0xffffffffll && val <= 0xffffffffll) {
@@ -541,7 +530,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
            general-instruction-extensions, then we have MULTIPLY SINGLE
            IMMEDIATE with a signed 32-bit, otherwise we have only
            MULTIPLY HALFWORD IMMEDIATE, with a signed 16-bit.  */
-        if (facilities & FACILITY_GEN_INST_EXT) {
+        if (s390_facilities & FACILITY_GEN_INST_EXT) {
             return val == (int32_t)val;
         } else {
             return val == (int16_t)val;
@@ -668,7 +657,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* Try all 48-bit insns that can load it in one go.  */
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (sval == (int32_t)sval) {
             tcg_out_insn(s, RIL, LGFI, ret, sval);
             return;
@@ -694,7 +683,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 
     /* If extended immediates are not present, then we may have to issue
        several instructions to load the low 32 bits.  */
-    if (!(facilities & FACILITY_EXT_IMM)) {
+    if (!(s390_facilities & FACILITY_EXT_IMM)) {
         /* A 32-bit unsigned value can be loaded in 2 insns.  And given
            that the lli_insns loop above did not succeed, we know that
            both insns are required.  */
@@ -727,7 +716,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 
     /* Insert data into the high 32-bits.  */
     uval = uval >> 31 >> 1;
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (uval < 0x10000) {
             tcg_out_insn(s, RI, IIHL, ret, uval);
         } else if ((uval & 0xffff) == 0) {
@@ -810,7 +799,7 @@ static void tcg_out_ld_abs(TCGContext *s, TCGType type, TCGReg dest, void *abs)
 {
     intptr_t addr = (intptr_t)abs;
 
-    if ((facilities & FACILITY_GEN_INST_EXT) && !(addr & 1)) {
+    if ((s390_facilities & FACILITY_GEN_INST_EXT) && !(addr & 1)) {
         ptrdiff_t disp = tcg_pcrel_diff(s, abs) >> 1;
         if (disp == (int32_t)disp) {
             if (type == TCG_TYPE_I32) {
@@ -837,7 +826,7 @@ static inline void tcg_out_risbg(TCGContext *s, TCGReg dest, TCGReg src,
 
 static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LGBR, dest, src);
         return;
     }
@@ -857,7 +846,7 @@ static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 
 static void tgen_ext8u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LLGCR, dest, src);
         return;
     }
@@ -877,7 +866,7 @@ static void tgen_ext8u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 
 static void tgen_ext16s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LGHR, dest, src);
         return;
     }
@@ -897,7 +886,7 @@ static void tgen_ext16s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 
 static void tgen_ext16u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LLGHR, dest, src);
         return;
     }
@@ -985,7 +974,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         tgen_ext32u(s, dest, dest);
         return;
     }
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if ((val & valid) == 0xff) {
             tgen_ext8u(s, TCG_TYPE_I64, dest, dest);
             return;
@@ -1006,7 +995,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 
     /* Try all 48-bit insns that can perform it in one go.  */
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         for (i = 0; i < 2; i++) {
             tcg_target_ulong mask = ~(0xffffffffull << i*32);
             if (((val | ~valid) & mask) == mask) {
@@ -1015,7 +1004,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
             }
         }
     }
-    if ((facilities & FACILITY_GEN_INST_EXT) && risbg_mask(val)) {
+    if ((s390_facilities & FACILITY_GEN_INST_EXT) && risbg_mask(val)) {
         tgen_andi_risbg(s, dest, dest, val);
         return;
     }
@@ -1045,7 +1034,7 @@ static void tgen64_ori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
         return;
     }
 
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         /* Try all 32-bit insns that can perform it in one go.  */
         for (i = 0; i < 4; i++) {
             tcg_target_ulong mask = (0xffffull << i*16);
@@ -1220,7 +1209,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
     }
 
     cc = tgen_cmp(s, type, cond, c1, c2, c2const);
-    if (facilities & FACILITY_LOAD_ON_COND) {
+    if (s390_facilities & FACILITY_LOAD_ON_COND) {
         /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
         tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
         tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
@@ -1237,7 +1226,7 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
                          TCGReg c1, TCGArg c2, int c2const, TCGReg r3)
 {
     int cc;
-    if (facilities & FACILITY_LOAD_ON_COND) {
+    if (s390_facilities & FACILITY_LOAD_ON_COND) {
         cc = tgen_cmp(s, type, c, c1, c2, c2const);
         tcg_out_insn(s, RRF, LOCGR, dest, r3, cc);
     } else {
@@ -1250,11 +1239,6 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
     }
 }
 
-bool tcg_target_deposit_valid(int ofs, int len)
-{
-    return (facilities & FACILITY_GEN_INST_EXT) != 0;
-}
-
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
                          int ofs, int len)
 {
@@ -1332,7 +1316,7 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
 {
     int cc;
 
-    if (facilities & FACILITY_GEN_INST_EXT) {
+    if (s390_facilities & FACILITY_GEN_INST_EXT) {
         bool is_unsigned = is_unsigned_cond(c);
         bool in_range;
         S390Opcode opc;
@@ -1519,7 +1503,7 @@ static TCGReg tcg_out_tlb_read(TCGContext* s, TCGReg addr_reg, TCGMemOp opc,
     a_off = (a_bits >= s_bits ? 0 : s_mask - a_mask);
     tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
 
-    if (facilities & FACILITY_GEN_INST_EXT) {
+    if (s390_facilities & FACILITY_GEN_INST_EXT) {
         tcg_out_risbg(s, TCG_REG_R2, addr_reg,
                       64 - CPU_TLB_BITS - CPU_TLB_ENTRY_BITS,
                       63 - CPU_TLB_ENTRY_BITS,
@@ -1790,7 +1774,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                     tcg_out_insn(s, RI, AHI, a0, a2);
                     break;
                 }
-                if (facilities & FACILITY_EXT_IMM) {
+                if (s390_facilities & FACILITY_EXT_IMM) {
                     tcg_out_insn(s, RIL, AFI, a0, a2);
                     break;
                 }
@@ -1986,7 +1970,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                     tcg_out_insn(s, RI, AGHI, a0, a2);
                     break;
                 }
-                if (facilities & FACILITY_EXT_IMM) {
+                if (s390_facilities & FACILITY_EXT_IMM) {
                     if (a2 == (int32_t)a2) {
                         tcg_out_insn(s, RIL, AGFI, a0, a2);
                         break;
@@ -2175,7 +2159,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
            serialize the instruction stream.  */
         if (args[0] & TCG_MO_ST_LD) {
             tcg_out_insn(s, RR, BCR,
-                         facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
+                         s390_facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
         }
         break;
 
@@ -2304,7 +2288,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { -1 },
 };
 
-static void query_facilities(void)
+static void query_s390_facilities(void)
 {
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
 
@@ -2315,7 +2299,7 @@ static void query_facilities(void)
         register void *r1 __asm__("1");
 
         /* stfle 0(%r1) */
-        r1 = &facilities;
+        r1 = &s390_facilities;
         asm volatile(".word 0xb2b0,0x1000"
                      : "=r"(r0) : "0"(0), "r"(r1) : "memory", "cc");
     }
@@ -2323,7 +2307,7 @@ static void query_facilities(void)
 
 static void tcg_target_init(TCGContext *s)
 {
-    query_facilities();
+    query_s390_facilities();
 
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 11/64] tcg/s390: Implement field extraction opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (9 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 10/64] tcg/s390: Expose host facilities to tcg-target.h Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 12/64] tcg/s390: Support deposit into zero Richard Henderson
                   ` (54 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     |  4 ++--
 tcg/s390/tcg-target.inc.c | 11 +++++++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index d650a72..e9ac12e 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -78,7 +78,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nand_i32       0
 #define TCG_TARGET_HAS_nor_i32        0
 #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i32    0
+#define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i32   0
 #define TCG_TARGET_HAS_movcond_i32    1
 #define TCG_TARGET_HAS_add2_i32       1
@@ -109,7 +109,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i64    0
+#define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i64   0
 #define TCG_TARGET_HAS_movcond_i64    1
 #define TCG_TARGET_HAS_add2_i64       1
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 9f51133..083c992 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1247,6 +1247,12 @@ static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
     tcg_out_risbg(s, dest, src, msb, lsb, ofs, 0);
 }
 
+static void tgen_extract(TCGContext *s, TCGReg dest, TCGReg src,
+                         int ofs, int len)
+{
+    tcg_out_risbg(s, dest, src, 64 - len, 63, 64 - ofs, 1);
+}
+
 static void tgen_gotoi(TCGContext *s, int cc, tcg_insn_unit *dest)
 {
     ptrdiff_t off = dest - s->code_ptr;
@@ -2153,6 +2159,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     OP_32_64(deposit):
         tgen_deposit(s, args[0], args[2], args[3], args[4]);
         break;
+    OP_32_64(extract):
+        tgen_extract(s, args[0], args[1], args[2], args[3]);
+        break;
 
     case INDEX_op_mb:
         /* The host memory model is quite strong, we simply need to
@@ -2222,6 +2231,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_setcond_i32, { "r", "r", "rC" } },
     { INDEX_op_movcond_i32, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i32, { "r", "0", "r" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "L" } },
@@ -2283,6 +2293,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_setcond_i64, { "r", "r", "rC" } },
     { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i64, { "r", "0", "r" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
 
     { INDEX_op_mb, { } },
     { -1 },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 12/64] tcg/s390: Support deposit into zero
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (10 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 11/64] tcg/s390: Implement field extraction opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 13/64] target-alpha: Use deposit and extract ops Richard Henderson
                   ` (53 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Since we can no longer use matching constraints, this does
mean we must handle that data movement by hand.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 083c992..f4c510e 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -43,6 +43,7 @@
 #define TCG_CT_CONST_XORI  0x400
 #define TCG_CT_CONST_CMPI  0x800
 #define TCG_CT_CONST_ADLI  0x1000
+#define TCG_CT_CONST_ZERO  0x2000
 
 /* Several places within the instruction set 0 means "no register"
    rather than TCG_REG_R0.  */
@@ -404,6 +405,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'C':
         ct->ct |= TCG_CT_CONST_CMPI;
         break;
+    case 'Z':
+        ct->ct |= TCG_CT_CONST_ZERO;
+        break;
     default:
         return -1;
     }
@@ -543,6 +547,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return tcg_match_xori(type, val);
     } else if (ct & TCG_CT_CONST_CMPI) {
         return tcg_match_cmpi(type, val);
+    } else if (ct & TCG_CT_CONST_ZERO) {
+        return val == 0;
     }
 
     return 0;
@@ -1240,11 +1246,11 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
 }
 
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
-                         int ofs, int len)
+                         int ofs, int len, int z)
 {
     int lsb = (63 - ofs);
     int msb = lsb - (len - 1);
-    tcg_out_risbg(s, dest, src, msb, lsb, ofs, 0);
+    tcg_out_risbg(s, dest, src, msb, lsb, ofs, z);
 }
 
 static void tgen_extract(TCGContext *s, TCGReg dest, TCGReg src,
@@ -2157,8 +2163,24 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(deposit):
-        tgen_deposit(s, args[0], args[2], args[3], args[4]);
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        if (const_args[1]) {
+            tgen_deposit(s, a0, a2, args[3], args[4], 1);
+        } else {
+            /* Since we can't support "0Z" as a constraint, we allow a1 in
+               any register.  Fix things up as if a matching constraint.  */
+            if (a0 != a1) {
+                TCGType type = (opc == INDEX_op_deposit_i64);
+                if (a0 == a2) {
+                    tcg_out_mov(s, type, TCG_TMP0, a2);
+                    a2 = TCG_TMP0;
+                }
+                tcg_out_mov(s, type, a0, a1);
+            }
+            tgen_deposit(s, a0, a2, args[3], args[4], 0);
+        }
         break;
+
     OP_32_64(extract):
         tgen_extract(s, args[0], args[1], args[2], args[3]);
         break;
@@ -2230,7 +2252,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_brcond_i32, { "r", "rC" } },
     { INDEX_op_setcond_i32, { "r", "r", "rC" } },
     { INDEX_op_movcond_i32, { "r", "r", "rC", "r", "0" } },
-    { INDEX_op_deposit_i32, { "r", "0", "r" } },
+    { INDEX_op_deposit_i32, { "r", "rZ", "r" } },
     { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 13/64] target-alpha: Use deposit and extract ops
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (11 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 12/64] tcg/s390: Support deposit into zero Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 14/64] target-arm: Use new " Richard Henderson
                   ` (52 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/translate.c | 67 ++++++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 114927b..5ac2277 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -949,7 +949,13 @@ static void gen_ext_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
     if (islit) {
-        tcg_gen_shli_i64(vc, va, (64 - lit * 8) & 0x3f);
+        int pos = (64 - lit * 8) & 0x3f;
+        int len = cto32(byte_mask) * 8;
+        if (pos < len) {
+            tcg_gen_deposit_z_i64(vc, va, pos, len - pos);
+        } else {
+            tcg_gen_movi_i64(vc, 0);
+        }
     } else {
         TCGv tmp = tcg_temp_new();
         tcg_gen_shli_i64(tmp, load_gpr(ctx, rb), 3);
@@ -966,38 +972,44 @@ static void gen_ext_l(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
     if (islit) {
-        tcg_gen_shri_i64(vc, va, (lit & 7) * 8);
+        int pos = (lit & 7) * 8;
+        int len = cto32(byte_mask) * 8;
+        if (pos + len >= 64) {
+            len = 64 - pos;
+        }
+        tcg_gen_extract_i64(vc, va, pos, len);
     } else {
         TCGv tmp = tcg_temp_new();
         tcg_gen_andi_i64(tmp, load_gpr(ctx, rb), 7);
         tcg_gen_shli_i64(tmp, tmp, 3);
         tcg_gen_shr_i64(vc, va, tmp);
         tcg_temp_free(tmp);
+        gen_zapnoti(vc, vc, byte_mask);
     }
-    gen_zapnoti(vc, vc, byte_mask);
 }
 
 /* INSWH, INSLH, INSQH */
 static void gen_ins_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
-    TCGv tmp = tcg_temp_new();
-
-    /* The instruction description has us left-shift the byte mask and extract
-       bits <15:8> and apply that zap at the end.  This is equivalent to simply
-       performing the zap first and shifting afterward.  */
-    gen_zapnoti(tmp, va, byte_mask);
-
     if (islit) {
-        lit &= 7;
-        if (unlikely(lit == 0)) {
-            tcg_gen_movi_i64(vc, 0);
+        int pos = 64 - (lit & 7) * 8;
+        int len = cto32(byte_mask) * 8;
+        if (pos < len) {
+            tcg_gen_extract_i64(vc, va, pos, len - pos);
         } else {
-            tcg_gen_shri_i64(vc, tmp, 64 - lit * 8);
+            tcg_gen_movi_i64(vc, 0);
         }
     } else {
+        TCGv tmp = tcg_temp_new();
         TCGv shift = tcg_temp_new();
 
+        /* The instruction description has us left-shift the byte mask
+           and extract bits <15:8> and apply that zap at the end.  This
+           is equivalent to simply performing the zap first and shifting
+           afterward.  */
+        gen_zapnoti(tmp, va, byte_mask);
+
         /* If (B & 7) == 0, we need to shift by 64 and leave a zero.  Do this
            portably by splitting the shift into two parts: shift_count-1 and 1.
            Arrange for the -1 by using ones-complement instead of
@@ -1010,32 +1022,37 @@ static void gen_ins_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
         tcg_gen_shr_i64(vc, tmp, shift);
         tcg_gen_shri_i64(vc, vc, 1);
         tcg_temp_free(shift);
+        tcg_temp_free(tmp);
     }
-    tcg_temp_free(tmp);
 }
 
 /* INSBL, INSWL, INSLL, INSQL */
 static void gen_ins_l(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
-    TCGv tmp = tcg_temp_new();
-
-    /* The instruction description has us left-shift the byte mask
-       the same number of byte slots as the data and apply the zap
-       at the end.  This is equivalent to simply performing the zap
-       first and shifting afterward.  */
-    gen_zapnoti(tmp, va, byte_mask);
-
     if (islit) {
-        tcg_gen_shli_i64(vc, tmp, (lit & 7) * 8);
+        int pos = (lit & 7) * 8;
+        int len = cto32(byte_mask) * 8;
+        if (pos + len > 64) {
+            len = 64 - pos;
+        }
+        tcg_gen_deposit_z_i64(vc, va, pos, len);
     } else {
+        TCGv tmp = tcg_temp_new();
         TCGv shift = tcg_temp_new();
+
+        /* The instruction description has us left-shift the byte mask
+           and extract bits <15:8> and apply that zap at the end.  This
+           is equivalent to simply performing the zap first and shifting
+           afterward.  */
+        gen_zapnoti(tmp, va, byte_mask);
+
         tcg_gen_andi_i64(shift, load_gpr(ctx, rb), 7);
         tcg_gen_shli_i64(shift, shift, 3);
         tcg_gen_shl_i64(vc, tmp, shift);
         tcg_temp_free(shift);
+        tcg_temp_free(tmp);
     }
-    tcg_temp_free(tmp);
 }
 
 /* MSKWH, MSKLH, MSKQH */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 14/64] target-arm: Use new deposit and extract ops
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (12 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 13/64] target-alpha: Use deposit and extract ops Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-01 17:19   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 15/64] target-i386: " Richard Henderson
                   ` (51 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Use the new primitives for UBFX and SBFX.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 79 +++++++++++++++-------------------------------
 target-arm/translate.c     | 37 +++++-----------------
 2 files changed, 34 insertions(+), 82 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index de48747..e90487b 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3219,67 +3219,40 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
        low 32-bits anyway.  */
     tcg_tmp = read_cpu_reg(s, rn, 1);
 
-    /* Recognize the common aliases.  */
-    if (opc == 0) { /* SBFM */
-        if (ri == 0) {
-            if (si == 7) { /* SXTB */
-                tcg_gen_ext8s_i64(tcg_rd, tcg_tmp);
-                goto done;
-            } else if (si == 15) { /* SXTH */
-                tcg_gen_ext16s_i64(tcg_rd, tcg_tmp);
-                goto done;
-            } else if (si == 31) { /* SXTW */
-                tcg_gen_ext32s_i64(tcg_rd, tcg_tmp);
-                goto done;
-            }
-        }
-        if (si == 63 || (si == 31 && ri <= si)) { /* ASR */
-            if (si == 31) {
-                tcg_gen_ext32s_i64(tcg_tmp, tcg_tmp);
-            }
-            tcg_gen_sari_i64(tcg_rd, tcg_tmp, ri);
+    /* Recognize simple(r) extractions.  */
+    if (ri <= si) {
+        int len = (si - ri) + 1;
+        if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
+            tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
             goto done;
-        }
-    } else if (opc == 2) { /* UBFM */
-        if (ri == 0) { /* UXTB, UXTH, plus non-canonical AND */
-            tcg_gen_andi_i64(tcg_rd, tcg_tmp, bitmask64(si + 1));
-            return;
-        }
-        if (si == 63 || (si == 31 && ri <= si)) { /* LSR */
-            if (si == 31) {
-                tcg_gen_ext32u_i64(tcg_tmp, tcg_tmp);
-            }
-            tcg_gen_shri_i64(tcg_rd, tcg_tmp, ri);
+        } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
+            tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
             return;
         }
-        if (si + 1 == ri && si != bitsize - 1) { /* LSL */
-            int shift = bitsize - 1 - si;
-            tcg_gen_shli_i64(tcg_rd, tcg_tmp, shift);
-            goto done;
-        }
     }
 
-    if (opc != 1) { /* SBFM or UBFM */
-        tcg_gen_movi_i64(tcg_rd, 0);
-    }
+    /* Do the bit move operation.  Note that above we handled ri <= si,
+       Wd<s-r:0> = Wn<s:r>, via tcg_gen_*extract_i64.  Now we handle
+       the ri > si case, Wd<32+s-r,32-r> = Wn<s:0>, via deposit.  */
+    pos = (bitsize - ri) & (bitsize - 1);
+    len = si + 1;
 
-    /* do the bit move operation */
-    if (si >= ri) {
-        /* Wd<s-r:0> = Wn<s:r> */
-        tcg_gen_shri_i64(tcg_tmp, tcg_tmp, ri);
-        pos = 0;
-        len = (si - ri) + 1;
-    } else {
-        /* Wd<32+s-r,32-r> = Wn<s:0> */
-        pos = bitsize - ri;
-        len = si + 1;
+    if (opc == 0 && len < ri) {
+        /* SBFM: sign extend the destination field from len to fill
+           the balance of the word.  Let the deposit below insert all
+           of those sign bits.  */
+        tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
+        len = ri;
     }
 
-    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
-
-    if (opc == 0) { /* SBFM - sign extend the destination field */
-        tcg_gen_shli_i64(tcg_rd, tcg_rd, 64 - (pos + len));
-        tcg_gen_sari_i64(tcg_rd, tcg_rd, 64 - (pos + len));
+    if (opc == 1) { /* BFM */
+        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
+    } else {
+        /* SBFM or UBFM: We start with zero, and we haven't modified
+           any bits outside bitsize, therefore the zero-extension
+           below is unneeded.  */
+        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
+        return;
     }
 
  done:
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 0ad9070..08da9ac 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -288,29 +288,6 @@ static void gen_revsh(TCGv_i32 var)
     tcg_gen_ext16s_i32(var, var);
 }
 
-/* Unsigned bitfield extract.  */
-static void gen_ubfx(TCGv_i32 var, int shift, uint32_t mask)
-{
-    if (shift)
-        tcg_gen_shri_i32(var, var, shift);
-    tcg_gen_andi_i32(var, var, mask);
-}
-
-/* Signed bitfield extract.  */
-static void gen_sbfx(TCGv_i32 var, int shift, int width)
-{
-    uint32_t signbit;
-
-    if (shift)
-        tcg_gen_sari_i32(var, var, shift);
-    if (shift + width < 32) {
-        signbit = 1u << (width - 1);
-        tcg_gen_andi_i32(var, var, (1u << width) - 1);
-        tcg_gen_xori_i32(var, var, signbit);
-        tcg_gen_subi_i32(var, var, signbit);
-    }
-}
-
 /* Return (b << 32) + a. Mark inputs as dead */
 static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
 {
@@ -9178,9 +9155,9 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                             goto illegal_op;
                         if (i < 32) {
                             if (op1 & 0x20) {
-                                gen_ubfx(tmp, shift, (1u << i) - 1);
+                                tcg_gen_extract_i32(tmp, tmp, shift, i);
                             } else {
-                                gen_sbfx(tmp, shift, i);
+                                tcg_gen_sextract_i32(tmp, tmp, shift, i);
                             }
                         }
                         store_reg(s, rd, tmp);
@@ -10497,15 +10474,17 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                         imm++;
                         if (shift + imm > 32)
                             goto illegal_op;
-                        if (imm < 32)
-                            gen_sbfx(tmp, shift, imm);
+                        if (imm < 32) {
+                            tcg_gen_sextract_i32(tmp, tmp, shift, imm);
+                        }
                         break;
                     case 6: /* Unsigned bitfield extract.  */
                         imm++;
                         if (shift + imm > 32)
                             goto illegal_op;
-                        if (imm < 32)
-                            gen_ubfx(tmp, shift, (1u << imm) - 1);
+                        if (imm < 32) {
+                            tcg_gen_extract_i32(tmp, tmp, shift, imm);
+                        }
                         break;
                     case 3: /* Bitfield insert/clear.  */
                         if (imm < shift)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 15/64] target-i386: Use new deposit and extract ops
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (13 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 14/64] target-arm: Use new " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 16/64] target-mips: Use the new extract op Richard Henderson
                   ` (50 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

A couple of places where it was easy to identify a right-shift
followed by an extract or and-with-immediate, and the obvious
sign-extract from a high byte register.

Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 324103c..4d6d36f 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -383,8 +383,7 @@ static void gen_op_mov_reg_v(TCGMemOp ot, int reg, TCGv t0)
 static inline void gen_op_mov_v_reg(TCGMemOp ot, TCGv t0, int reg)
 {
     if (ot == MO_8 && byte_reg_is_xH(reg)) {
-        tcg_gen_shri_tl(t0, cpu_regs[reg - 4], 8);
-        tcg_gen_ext8u_tl(t0, t0);
+        tcg_gen_extract_tl(t0, cpu_regs[reg - 4], 8, 8);
     } else {
         tcg_gen_mov_tl(t0, cpu_regs[reg]);
     }
@@ -3756,8 +3755,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
 
                     /* Extract the LEN into a mask.  Lengths larger than
                        operand size get all ones.  */
-                    tcg_gen_shri_tl(cpu_A0, cpu_regs[s->vex_v], 8);
-                    tcg_gen_ext8u_tl(cpu_A0, cpu_A0);
+                    tcg_gen_extract_tl(cpu_A0, cpu_regs[s->vex_v], 8, 8);
                     tcg_gen_movcond_tl(TCG_COND_LEU, cpu_A0, cpu_A0, bound,
                                        cpu_A0, bound);
                     tcg_temp_free(bound);
@@ -3908,9 +3906,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                             gen_compute_eflags(s);
                         }
                         carry_in = cpu_tmp0;
-                        tcg_gen_shri_tl(carry_in, cpu_cc_src,
-                                        ctz32(b == 0x1f6 ? CC_C : CC_O));
-                        tcg_gen_andi_tl(carry_in, carry_in, 1);
+                        tcg_gen_extract_tl(carry_in, cpu_cc_src,
+                                           ctz32(b == 0x1f6 ? CC_C : CC_O), 1);
                     }
 
                     switch (ot) {
@@ -5435,21 +5432,25 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             rm = (modrm & 7) | REX_B(s);
 
             if (mod == 3) {
-                gen_op_mov_v_reg(ot, cpu_T0, rm);
-                switch (s_ot) {
-                case MO_UB:
-                    tcg_gen_ext8u_tl(cpu_T0, cpu_T0);
-                    break;
-                case MO_SB:
-                    tcg_gen_ext8s_tl(cpu_T0, cpu_T0);
-                    break;
-                case MO_UW:
-                    tcg_gen_ext16u_tl(cpu_T0, cpu_T0);
-                    break;
-                default:
-                case MO_SW:
-                    tcg_gen_ext16s_tl(cpu_T0, cpu_T0);
-                    break;
+                if (s_ot == MO_SB && byte_reg_is_xH(rm)) {
+                    tcg_gen_sextract_tl(cpu_T0, cpu_regs[rm - 4], 8, 8);
+                } else {
+                    gen_op_mov_v_reg(ot, cpu_T0, rm);
+                    switch (s_ot) {
+                    case MO_UB:
+                        tcg_gen_ext8u_tl(cpu_T0, cpu_T0);
+                        break;
+                    case MO_SB:
+                        tcg_gen_ext8s_tl(cpu_T0, cpu_T0);
+                        break;
+                    case MO_UW:
+                        tcg_gen_ext16u_tl(cpu_T0, cpu_T0);
+                        break;
+                    default:
+                    case MO_SW:
+                        tcg_gen_ext16s_tl(cpu_T0, cpu_T0);
+                        break;
+                    }
                 }
                 gen_op_mov_reg_v(d_ot, reg, cpu_T0);
             } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 16/64] target-mips: Use the new extract op
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (14 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 15/64] target-i386: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 17/64] target-ppc: Use the new deposit and extract ops Richard Henderson
                   ` (49 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Use extract for EXT and DEXT.

Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index d8dde7a..cf79aa4 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -4484,11 +4484,12 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
         if (lsb + msb > 31) {
             goto fail;
         }
-        tcg_gen_shri_tl(t0, t1, lsb);
         if (msb != 31) {
-            tcg_gen_andi_tl(t0, t0, (1U << (msb + 1)) - 1);
+            tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
         } else {
-            tcg_gen_ext32s_tl(t0, t0);
+            /* The two checks together imply that lsb == 0,
+               so this is a simple sign-extension.  */
+            tcg_gen_ext32s_tl(t0, t1);
         }
         break;
 #if defined(TARGET_MIPS64)
@@ -4503,10 +4504,7 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
         if (lsb + msb > 63) {
             goto fail;
         }
-        tcg_gen_shri_tl(t0, t1, lsb);
-        if (msb != 63) {
-            tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
-        }
+        tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
         break;
 #endif
     case OPC_INS:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 17/64] target-ppc: Use the new deposit and extract ops
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (15 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 16/64] target-mips: Use the new extract op Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 18/64] target-s390x: " Richard Henderson
                   ` (48 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Use the new primitives for RDWINM and RLDICL.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 59e9552..435c6f0 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1975,16 +1975,16 @@ static void gen_rlwinm(DisasContext *ctx)
 {
     TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
     TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
-    uint32_t sh = SH(ctx->opcode);
-    uint32_t mb = MB(ctx->opcode);
-    uint32_t me = ME(ctx->opcode);
-
-    if (mb == 0 && me == (31 - sh)) {
-        tcg_gen_shli_tl(t_ra, t_rs, sh);
-        tcg_gen_ext32u_tl(t_ra, t_ra);
-    } else if (sh != 0 && me == 31 && sh == (32 - mb)) {
-        tcg_gen_ext32u_tl(t_ra, t_rs);
-        tcg_gen_shri_tl(t_ra, t_ra, mb);
+    int sh = SH(ctx->opcode);
+    int mb = MB(ctx->opcode);
+    int me = ME(ctx->opcode);
+    int len = me - mb + 1;
+    int rsh = (32 - sh) & 31;
+
+    if (sh != 0 && len > 0 && me == (31 - sh)) {
+        tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
+    } else if (me == 31 && rsh + len <= 32) {
+        tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
     } else {
         target_ulong mask;
 #if defined(TARGET_PPC64)
@@ -1992,8 +1992,9 @@ static void gen_rlwinm(DisasContext *ctx)
         me += 32;
 #endif
         mask = MASK(mb, me);
-
-        if (mask <= 0xffffffffu) {
+        if (sh == 0) {
+            tcg_gen_andi_tl(t_ra, t_rs, mask);
+        } else if (mask <= 0xffffffffu) {
             TCGv_i32 t0 = tcg_temp_new_i32();
             tcg_gen_trunc_tl_i32(t0, t_rs);
             tcg_gen_rotli_i32(t0, t0, sh);
@@ -2096,11 +2097,13 @@ static void gen_rldinm(DisasContext *ctx, int mb, int me, int sh)
 {
     TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
     TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
+    int len = me - mb + 1;
+    int rsh = (64 - sh) & 63;
 
-    if (sh != 0 && mb == 0 && me == (63 - sh)) {
-        tcg_gen_shli_tl(t_ra, t_rs, sh);
-    } else if (sh != 0 && me == 63 && sh == (64 - mb)) {
-        tcg_gen_shri_tl(t_ra, t_rs, mb);
+    if (sh != 0 && len > 0 && me == (63 - sh)) {
+        tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
+    } else if (me == 63 && rsh + len <= 64) {
+        tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
     } else {
         tcg_gen_rotli_tl(t_ra, t_rs, sh);
         tcg_gen_andi_tl(t_ra, t_ra, MASK(mb, me));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 18/64] target-s390x: Use the new deposit and extract ops
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (16 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 17/64] target-ppc: Use the new deposit and extract ops Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond Richard Henderson
                   ` (47 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Use the new primitives for RISBG.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-s390x/translate.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index 02bc705..6cebb7e 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -3134,20 +3134,26 @@ static ExitStatus op_risbg(DisasContext *s, DisasOps *o)
         }
     }
 
-    /* In some cases we can implement this with deposit, which can be more
-       efficient on some hosts.  */
-    if (~mask == imask && i3 <= i4) {
-        if (s->fields->op2 == 0x5d) {
-            i3 += 32, i4 += 32;
-        }
+    len = i4 - i3 + 1;
+    pos = 63 - i4;
+    rot = i5 & 63;
+    if (s->fields->op2 == 0x5d) {
+        pos += 32;
+    }
+
+    /* In some cases we can implement this with extract.  */
+    if (imask == 0 && pos == 0 && len > 0 && rot + len <= 64) {
+        tcg_gen_extract_i64(o->out, o->in2, rot, len);
+        return NO_EXIT;
+    }
+
+    /* In some cases we can implement this with deposit.  */
+    if (len > 0 && (imask == 0 || ~mask == imask)) {
         /* Note that we rotate the bits to be inserted to the lsb, not to
            the position as described in the PoO.  */
-        len = i4 - i3 + 1;
-        pos = 63 - i4;
-        rot = (i5 - pos) & 63;
+        rot = (rot - pos) & 63;
     } else {
-        pos = len = -1;
-        rot = i5 & 63;
+        pos = -1;
     }
 
     /* Rotate the input as necessary.  */
@@ -3155,7 +3161,11 @@ static ExitStatus op_risbg(DisasContext *s, DisasOps *o)
 
     /* Insert the selected bits into the output.  */
     if (pos >= 0) {
-        tcg_gen_deposit_i64(o->out, o->out, o->in2, pos, len);
+        if (imask == 0) {
+            tcg_gen_deposit_z_i64(o->out, o->in2, pos, len);
+        } else {
+            tcg_gen_deposit_i64(o->out, o->out, o->in2, pos, len);
+        }
     } else if (imask == 0) {
         tcg_gen_andi_i64(o->out, o->in2, mask);
     } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (17 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 18/64] target-s390x: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 16:22   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 20/64] tcg: Add markup for output requires new register Richard Henderson
                   ` (46 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/optimize.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f41ed2c..9e26bb7 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1105,6 +1105,21 @@ void tcg_optimize(TCGContext *s)
                 tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
                 break;
             }
+            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
+                tcg_target_ulong tv = temps[args[3]].val;
+                tcg_target_ulong fv = temps[args[4]].val;
+                TCGCond cond = args[5];
+                if (fv == 1 && tv == 0) {
+                    cond = tcg_invert_cond(cond);
+                } else if (!(tv == 1 && fv == 0)) {
+                    goto do_default;
+                }
+                args[3] = cond;
+                op->opc = opc = (opc == INDEX_op_movcond_i32
+                                 ? INDEX_op_setcond_i32
+                                 : INDEX_op_setcond_i64);
+                nb_iargs = 2;
+            }
             goto do_default;
 
         case INDEX_op_add2_i32:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 20/64] tcg: Add markup for output requires new register
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (18 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 16:34   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 21/64] tcg: Transition flat op_defs array to a target callback Richard Henderson
                   ` (45 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

This is the same concept as, and same markup as, the
early clobber markup in gcc.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 34 ++++++++++++++++++++++------------
 tcg/tcg.h |  1 +
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index aabf94f..27913f0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1263,6 +1263,10 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
                     if (*ct_str == '\0')
                         break;
                     switch(*ct_str) {
+                    case '&':
+                        def->args_ct[i].ct |= TCG_CT_NEWREG;
+                        ct_str++;
+                        break;
                     case 'i':
                         def->args_ct[i].ct |= TCG_CT_CONST;
                         ct_str++;
@@ -2208,7 +2212,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
                              const TCGOpDef *def, TCGOpcode opc,
                              const TCGArg *args, TCGLifeData arg_life)
 {
-    TCGRegSet allocated_regs;
+    TCGRegSet i_allocated_regs;
+    TCGRegSet o_allocated_regs;
     int i, k, nb_iargs, nb_oargs;
     TCGReg reg;
     TCGArg arg;
@@ -2225,8 +2230,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
            args + nb_oargs + nb_iargs, 
            sizeof(TCGArg) * def->nb_cargs);
 
+    tcg_regset_set(i_allocated_regs, s->reserved_regs);
+    tcg_regset_set(o_allocated_regs, s->reserved_regs);
+
     /* satisfy input constraints */ 
-    tcg_regset_set(allocated_regs, s->reserved_regs);
     for(k = 0; k < nb_iargs; k++) {
         i = def->sorted_args[nb_oargs + k];
         arg = args[i];
@@ -2241,7 +2248,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
             goto iarg_end;
         }
 
-        temp_load(s, ts, arg_ct->u.regs, allocated_regs);
+        temp_load(s, ts, arg_ct->u.regs, i_allocated_regs);
 
         if (arg_ct->ct & TCG_CT_IALIAS) {
             if (ts->fixed_reg) {
@@ -2275,13 +2282,13 @@ static void tcg_reg_alloc_op(TCGContext *s,
         allocate_in_reg:
             /* allocate a new register matching the constraint 
                and move the temporary register into it */
-            reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs,
+            reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
                                 ts->indirect_base);
             tcg_out_mov(s, ts->type, reg, ts->reg);
         }
         new_args[i] = reg;
         const_args[i] = 0;
-        tcg_regset_set_reg(allocated_regs, reg);
+        tcg_regset_set_reg(i_allocated_regs, reg);
     iarg_end: ;
     }
     
@@ -2293,24 +2300,23 @@ static void tcg_reg_alloc_op(TCGContext *s,
     }
 
     if (def->flags & TCG_OPF_BB_END) {
-        tcg_reg_alloc_bb_end(s, allocated_regs);
+        tcg_reg_alloc_bb_end(s, i_allocated_regs);
     } else {
         if (def->flags & TCG_OPF_CALL_CLOBBER) {
             /* XXX: permit generic clobber register list ? */ 
             for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
                 if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
-                    tcg_reg_free(s, i, allocated_regs);
+                    tcg_reg_free(s, i, i_allocated_regs);
                 }
             }
         }
         if (def->flags & TCG_OPF_SIDE_EFFECTS) {
             /* sync globals if the op has side effects and might trigger
                an exception. */
-            sync_globals(s, allocated_regs);
+            sync_globals(s, i_allocated_regs);
         }
         
         /* satisfy the output constraints */
-        tcg_regset_set(allocated_regs, s->reserved_regs);
         for(k = 0; k < nb_oargs; k++) {
             i = def->sorted_args[k];
             arg = args[i];
@@ -2318,6 +2324,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
             ts = &s->temps[arg];
             if (arg_ct->ct & TCG_CT_ALIAS) {
                 reg = new_args[arg_ct->alias_index];
+            } else if (arg_ct->ct & TCG_CT_NEWREG) {
+                reg = tcg_reg_alloc(s, arg_ct->u.regs,
+                                    i_allocated_regs | o_allocated_regs,
+                                    ts->indirect_base);
             } else {
                 /* if fixed register, we try to use it */
                 reg = ts->reg;
@@ -2325,10 +2335,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
                     tcg_regset_test_reg(arg_ct->u.regs, reg)) {
                     goto oarg_end;
                 }
-                reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs,
+                reg = tcg_reg_alloc(s, arg_ct->u.regs, o_allocated_regs,
                                     ts->indirect_base);
             }
-            tcg_regset_set_reg(allocated_regs, reg);
+            tcg_regset_set_reg(o_allocated_regs, reg);
             /* if a fixed register is used, then a move will be done afterwards */
             if (!ts->fixed_reg) {
                 if (ts->val_type == TEMP_VAL_REG) {
@@ -2357,7 +2367,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
             tcg_out_mov(s, ts->type, ts->reg, reg);
         }
         if (NEED_SYNC_ARG(i)) {
-            temp_sync(s, ts, allocated_regs, IS_DEAD_ARG(i));
+            temp_sync(s, ts, o_allocated_regs, IS_DEAD_ARG(i));
         } else if (IS_DEAD_ARG(i)) {
             temp_dead(s, ts);
         }
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5fd3733..ebfcefd 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -851,6 +851,7 @@ void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf);
 
 #define TCG_CT_ALIAS  0x80
 #define TCG_CT_IALIAS 0x40
+#define TCG_CT_NEWREG 0x20 /* output requires a new register */
 #define TCG_CT_REG    0x01
 #define TCG_CT_CONST  0x02 /* any constant of register size */
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 21/64] tcg: Transition flat op_defs array to a target callback
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (19 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 20/64] tcg: Add markup for output requires new register Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 16:38   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 22/64] tcg: Pass the opcode width to target_parse_constraint Richard Henderson
                   ` (44 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

This will allow the target to tailor the constraints to the
auto-detected ISA extensions.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.inc.c | 14 ++++++--
 tcg/arm/tcg-target.inc.c     | 14 ++++++--
 tcg/i386/tcg-target.inc.c    | 14 ++++++--
 tcg/ia64/tcg-target.inc.c    | 14 ++++++--
 tcg/mips/tcg-target.inc.c    | 14 ++++++--
 tcg/ppc/tcg-target.inc.c     | 14 ++++++--
 tcg/s390/tcg-target.inc.c    | 14 ++++++--
 tcg/sparc/tcg-target.inc.c   | 14 ++++++--
 tcg/tcg.c                    | 86 +++++++++++++++-----------------------------
 tcg/tcg.h                    |  2 --
 tcg/tci/tcg-target.inc.c     | 13 ++++++-
 11 files changed, 136 insertions(+), 77 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index c0e9890..416db45 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1812,6 +1812,18 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(aarch64_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (aarch64_op_defs[i].op == op) {
+            return &aarch64_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
@@ -1834,8 +1846,6 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
-
-    tcg_add_target_add_op_defs(aarch64_op_defs);
 }
 
 /* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6765a9d..4500ca7 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2006,6 +2006,18 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(arm_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (arm_op_defs[i].op == op) {
+            return &arm_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     /* Only probe for the platform and capabilities if we havn't already
@@ -2036,8 +2048,6 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_PC);
-
-    tcg_add_target_add_op_defs(arm_op_defs);
 }
 
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 39f62bd..595c399 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2330,6 +2330,18 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(x86_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (x86_op_defs[i].op == op) {
+            return &x86_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static int tcg_target_callee_save_regs[] = {
 #if TCG_TARGET_REG_BITS == 64
     TCG_REG_RBP,
@@ -2471,8 +2483,6 @@ static void tcg_target_init(TCGContext *s)
 
     tcg_regset_clear(s->reserved_regs);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-
-    tcg_add_target_add_op_defs(x86_op_defs);
 }
 
 typedef struct {
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index b04d716..e4d419d 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -2352,6 +2352,18 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(ia64_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (ia64_op_defs[i].op == op) {
+            return &ia64_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
@@ -2471,6 +2483,4 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R5);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R6);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R7);
-
-    tcg_add_target_add_op_defs(ia64_op_defs);
 }
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 1ecae08..7758b6d 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1770,6 +1770,18 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(mips_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (mips_op_defs[i].op == op) {
+            return &mips_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static int tcg_target_callee_save_regs[] = {
     TCG_REG_S0,       /* used for the global env (TCG_AREG0) */
     TCG_REG_S1,
@@ -1930,8 +1942,6 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA);   /* return address */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);   /* stack pointer */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);   /* global pointer */
-
-    tcg_add_target_add_op_defs(mips_op_defs);
 }
 
 void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 7ec54a2..a1b7412 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2634,6 +2634,18 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(ppc_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (ppc_op_defs[i].op == op) {
+            return &ppc_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
@@ -2670,8 +2682,6 @@ static void tcg_target_init(TCGContext *s)
     if (USE_REG_RA) {
         tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA);  /* return addr */
     }
-
-    tcg_add_target_add_op_defs(ppc_op_defs);
 }
 
 #ifdef __ELF__
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index f4c510e..3cb34eb 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -2321,6 +2321,18 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(s390_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (s390_op_defs[i].op == op) {
+            return &s390_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static void query_s390_facilities(void)
 {
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
@@ -2363,8 +2375,6 @@ static void tcg_target_init(TCGContext *s)
     /* XXX many insns can't be used with R0, so we better avoid it for now */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-
-    tcg_add_target_add_op_defs(s390_op_defs);
 }
 
 #define FRAME_SIZE  ((int)(TCG_TARGET_CALL_STACK_OFFSET          \
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 700c434..f2cbf50 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -1583,6 +1583,18 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(sparc_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (sparc_op_defs[i].op == op) {
+            return &sparc_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     /* Only probe for the platform and capabilities if we havn't already
@@ -1622,8 +1634,6 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6); /* stack pointer */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_T1); /* for internal use */
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_T2); /* for internal use */
-
-    tcg_add_target_add_op_defs(sparc_op_defs);
 }
 
 #if SPARC64
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 27913f0..5792c1e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -62,6 +62,7 @@
 /* Forward declarations for functions declared in tcg-target.inc.c and
    used here. */
 static void tcg_target_init(TCGContext *s);
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode);
 static void tcg_target_qemu_prologue(TCGContext *s);
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend);
@@ -319,6 +320,7 @@ static const TCGHelperInfo all_helpers[] = {
 };
 
 static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)];
+static void process_op_defs(TCGContext *s);
 
 void tcg_context_init(TCGContext *s)
 {
@@ -362,6 +364,7 @@ void tcg_context_init(TCGContext *s)
     }
 
     tcg_target_init(s);
+    process_op_defs(s);
 
     /* Reverse the order of the saved registers, assuming they're all at
        the start of tcg_target_reg_alloc_order.  */
@@ -1221,29 +1224,33 @@ static void sort_constraints(TCGOpDef *def, int start, int n)
     }
 }
 
-void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
+static void process_op_defs(TCGContext *s)
 {
     TCGOpcode op;
-    TCGOpDef *def;
-    const char *ct_str;
-    int i, nb_args;
 
-    for(;;) {
-        if (tdefs->op == (TCGOpcode)-1)
-            break;
-        op = tdefs->op;
-        tcg_debug_assert((unsigned)op < NB_OPS);
-        def = &tcg_op_defs[op];
-#if defined(CONFIG_DEBUG_TCG)
-        /* Duplicate entry in op definitions? */
-        tcg_debug_assert(!def->used);
-        def->used = 1;
-#endif
+    for (op = 0; op < NB_OPS; op++) {
+        TCGOpDef *def = &tcg_op_defs[op];
+        const TCGTargetOpDef *tdefs;
+        int i, nb_args, ok;
+
+        if (def->flags & TCG_OPF_NOT_PRESENT) {
+            continue;
+        }
+
         nb_args = def->nb_iargs + def->nb_oargs;
-        for(i = 0; i < nb_args; i++) {
-            ct_str = tdefs->args_ct_str[i];
-            /* Incomplete TCGTargetOpDef entry? */
+        if (nb_args == 0) {
+            continue;
+        }
+
+        tdefs = tcg_target_op_def(op);
+        /* Missing TCGTargetOpDef entry. */
+        tcg_debug_assert(tdefs != NULL);
+
+        for (i = 0; i < nb_args; i++) {
+            const char *ct_str = tdefs->args_ct_str[i];
+            /* Incomplete TCGTargetOpDef entry. */
             tcg_debug_assert(ct_str != NULL);
+
             tcg_regset_clear(def->args_ct[i].u.regs);
             def->args_ct[i].ct = 0;
             if (ct_str[0] >= '0' && ct_str[0] <= '9') {
@@ -1272,11 +1279,9 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
                         ct_str++;
                         break;
                     default:
-                        if (target_parse_constraint(&def->args_ct[i], &ct_str) < 0) {
-                            fprintf(stderr, "Invalid constraint '%s' for arg %d of operation '%s'\n",
-                                    ct_str, i, def->name);
-                            exit(1);
-                        }
+                        ok = target_parse_constraint(&def->args_ct[i], &ct_str);
+                        /* Typo in TCGTargetOpDef constraint. */
+                        tcg_debug_assert(ok == 0);
                     }
                 }
             }
@@ -1288,42 +1293,7 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
         /* sort the constraints (XXX: this is just an heuristic) */
         sort_constraints(def, 0, def->nb_oargs);
         sort_constraints(def, def->nb_oargs, def->nb_iargs);
-
-#if 0
-        {
-            int i;
-
-            printf("%s: sorted=", def->name);
-            for(i = 0; i < def->nb_oargs + def->nb_iargs; i++)
-                printf(" %d", def->sorted_args[i]);
-            printf("\n");
-        }
-#endif
-        tdefs++;
-    }
-
-#if defined(CONFIG_DEBUG_TCG)
-    i = 0;
-    for (op = 0; op < tcg_op_defs_max; op++) {
-        const TCGOpDef *def = &tcg_op_defs[op];
-        if (def->flags & TCG_OPF_NOT_PRESENT) {
-            /* Wrong entry in op definitions? */
-            if (def->used) {
-                fprintf(stderr, "Invalid op definition for %s\n", def->name);
-                i = 1;
-            }
-        } else {
-            /* Missing entry in op definitions? */
-            if (!def->used) {
-                fprintf(stderr, "Missing op definition for %s\n", def->name);
-                i = 1;
-            }
-        }
-    }
-    if (i == 1) {
-        tcg_abort();
     }
-#endif
 }
 
 void tcg_op_remove(TCGContext *s, TCGOp *op)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index ebfcefd..144bdab 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -906,8 +906,6 @@ do {\
     abort();\
 } while (0)
 
-void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs);
-
 #if UINTPTR_MAX == UINT32_MAX
 #define TCGV_NAT_TO_PTR(n) MAKE_TCGV_PTR(GET_TCGV_I32(n))
 #define TCGV_PTR_TO_NAT(n) MAKE_TCGV_I32(GET_TCGV_PTR(n))
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 9dbf4d5..42d4bd6 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -259,6 +259,18 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
     { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    int i, n = ARRAY_SIZE(tcg_target_op_defs);
+
+    for (i = 0; i < n; ++i) {
+        if (tcg_target_op_defs[i].op == op) {
+            return &tcg_target_op_defs[i];
+        }
+    }
+    return NULL;
+}
+
 static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R0,
     TCG_REG_R1,
@@ -875,7 +887,6 @@ static void tcg_target_init(TCGContext *s)
 
     tcg_regset_clear(s->reserved_regs);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-    tcg_add_target_add_op_defs(tcg_target_op_defs);
 
     /* We use negative offsets from "sp" so that we can distinguish
        stores that might pretend to be call arguments.  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 22/64] tcg: Pass the opcode width to target_parse_constraint
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (20 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 21/64] tcg: Transition flat op_defs array to a target callback Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-06 16:43   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant Richard Henderson
                   ` (43 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit.  Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.inc.c | 15 +++++----------
 tcg/arm/tcg-target.inc.c     | 15 +++++----------
 tcg/i386/tcg-target.inc.c    | 14 +++++---------
 tcg/ia64/tcg-target.inc.c    | 14 +++++---------
 tcg/mips/tcg-target.inc.c    | 14 +++++---------
 tcg/ppc/tcg-target.inc.c     | 14 +++++---------
 tcg/s390/tcg-target.inc.c    | 14 +++++---------
 tcg/sparc/tcg-target.inc.c   | 14 +++++---------
 tcg/tcg.c                    | 12 ++++++++----
 tcg/tci/tcg-target.inc.c     | 12 +++++-------
 10 files changed, 53 insertions(+), 85 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 416db45..17c0b20 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -115,12 +115,10 @@ static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_MONE 0x800
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct,
-                                   const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str = *pct_str;
-
-    switch (ct_str[0]) {
+    switch (*ct_str++) {
     case 'r':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, (1ULL << TCG_TARGET_NB_REGS) - 1);
@@ -150,12 +148,9 @@ static int target_parse_constraint(TCGArgConstraint *ct,
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
     default:
-        return -1;
+        return NULL;
     }
-
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 static inline bool is_aimm(uint64_t val)
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 4500ca7..473c170 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -114,12 +114,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_ZERO 0x800
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str;
-
-    ct_str = *pct_str;
-    switch (ct_str[0]) {
+    switch (*ct_str++) {
     case 'I':
         ct->ct |= TCG_CT_CONST_ARM;
         break;
@@ -172,12 +170,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         break;
 
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-
-    return 0;
+    return ct_str;
 }
 
 static inline uint32_t rotl(uint32_t val, int n)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 595c399..aa5a248 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -166,12 +166,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str;
-
-    ct_str = *pct_str;
-    switch(ct_str[0]) {
+    switch(*ct_str++) {
     case 'a':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set_reg(ct->u.regs, TCG_REG_EAX);
@@ -249,11 +247,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         break;
 
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 /* test if a constant matches the constraint */
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index e4d419d..bf9a97d 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -721,12 +721,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
  */
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str;
-
-    ct_str = *pct_str;
-    switch(ct_str[0]) {
+    switch(*ct_str++) {
     case 'r':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set(ct->u.regs, 0xffffffffffffffffull);
@@ -750,11 +748,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 /* test if a constant matches the constraint */
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 7758b6d..4341ea2 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -167,12 +167,10 @@ static inline bool is_p2m1(tcg_target_long val)
 }
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str;
-
-    ct_str = *pct_str;
-    switch(ct_str[0]) {
+    switch(*ct_str++) {
     case 'r':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set(ct->u.regs, 0xffffffff);
@@ -224,11 +222,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 /* test if a constant matches the constraint */
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index a1b7412..bf17161 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -259,12 +259,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str;
-
-    ct_str = *pct_str;
-    switch (ct_str[0]) {
+    switch (*ct_str++) {
     case 'A': case 'B': case 'C': case 'D':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set_reg(ct->u.regs, 3 + ct_str[0] - 'A');
@@ -311,11 +309,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 /* test if a constant matches the constraint */
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 3cb34eb..5275297 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -359,11 +359,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str = *pct_str;
-
-    switch (ct_str[0]) {
+    switch (*ct_str++) {
     case 'r':                  /* all registers */
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, 0xffff);
@@ -409,12 +408,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-
-    return 0;
+    return ct_str;
 }
 
 /* Immediates to be used with logical OR.  This is an optimization only,
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index f2cbf50..d1f4c0d 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -319,12 +319,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str;
-
-    ct_str = *pct_str;
-    switch (ct_str[0]) {
+    switch (*ct_str++) {
     case 'r':
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, 0xffffffff);
@@ -360,11 +358,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 /* test if a constant matches the constraint */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5792c1e..8b4dce7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -96,7 +96,8 @@ static void tcg_register_jit_int(void *buf, size_t size,
     __attribute__((unused));
 
 /* Forward declarations for functions declared and used in tcg-target.inc.c. */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str);
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type);
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
                        intptr_t arg2);
 static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
@@ -1231,7 +1232,8 @@ static void process_op_defs(TCGContext *s)
     for (op = 0; op < NB_OPS; op++) {
         TCGOpDef *def = &tcg_op_defs[op];
         const TCGTargetOpDef *tdefs;
-        int i, nb_args, ok;
+        TCGType type;
+        int i, nb_args;
 
         if (def->flags & TCG_OPF_NOT_PRESENT) {
             continue;
@@ -1246,6 +1248,7 @@ static void process_op_defs(TCGContext *s)
         /* Missing TCGTargetOpDef entry. */
         tcg_debug_assert(tdefs != NULL);
 
+        type = (def->flags & TCG_OPF_64BIT ? TCG_TYPE_I64 : TCG_TYPE_I32);
         for (i = 0; i < nb_args; i++) {
             const char *ct_str = tdefs->args_ct_str[i];
             /* Incomplete TCGTargetOpDef entry. */
@@ -1279,9 +1282,10 @@ static void process_op_defs(TCGContext *s)
                         ct_str++;
                         break;
                     default:
-                        ok = target_parse_constraint(&def->args_ct[i], &ct_str);
+                        ct_str = target_parse_constraint(&def->args_ct[i],
+                                                         ct_str, type);
                         /* Typo in TCGTargetOpDef constraint. */
-                        tcg_debug_assert(ok == 0);
+                        tcg_debug_assert(ct_str != NULL);
                     }
                 }
             }
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 42d4bd6..26ee9b1 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -384,10 +384,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* Parse target specific constraints. */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+                                           const char *ct_str, TCGType type)
 {
-    const char *ct_str = *pct_str;
-    switch (ct_str[0]) {
+    switch (*ct_str++) {
     case 'r':
     case 'L':                   /* qemu_ld constraint */
     case 'S':                   /* qemu_st constraint */
@@ -395,11 +395,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
         break;
     default:
-        return -1;
+        return NULL;
     }
-    ct_str++;
-    *pct_str = ct_str;
-    return 0;
+    return ct_str;
 }
 
 #if defined(CONFIG_DEBUG_TCG_INTERPRETER)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (21 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 22/64] tcg: Pass the opcode width to target_parse_constraint Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-08 17:19   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 24/64] tcg: Add clz and ctz opcodes Richard Henderson
                   ` (42 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

This allows an output operand to match an input operand
only when the input operand needs a register.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.c | 63 ++++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8b4dce7..cb898f1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1256,37 +1256,37 @@ static void process_op_defs(TCGContext *s)
 
             tcg_regset_clear(def->args_ct[i].u.regs);
             def->args_ct[i].ct = 0;
-            if (ct_str[0] >= '0' && ct_str[0] <= '9') {
-                int oarg;
-                oarg = ct_str[0] - '0';
-                tcg_debug_assert(oarg < def->nb_oargs);
-                tcg_debug_assert(def->args_ct[oarg].ct & TCG_CT_REG);
-                /* TCG_CT_ALIAS is for the output arguments. The input
-                   argument is tagged with TCG_CT_IALIAS. */
-                def->args_ct[i] = def->args_ct[oarg];
-                def->args_ct[oarg].ct = TCG_CT_ALIAS;
-                def->args_ct[oarg].alias_index = i;
-                def->args_ct[i].ct |= TCG_CT_IALIAS;
-                def->args_ct[i].alias_index = oarg;
-            } else {
-                for(;;) {
-                    if (*ct_str == '\0')
-                        break;
-                    switch(*ct_str) {
-                    case '&':
-                        def->args_ct[i].ct |= TCG_CT_NEWREG;
-                        ct_str++;
-                        break;
-                    case 'i':
-                        def->args_ct[i].ct |= TCG_CT_CONST;
-                        ct_str++;
-                        break;
-                    default:
-                        ct_str = target_parse_constraint(&def->args_ct[i],
-                                                         ct_str, type);
-                        /* Typo in TCGTargetOpDef constraint. */
-                        tcg_debug_assert(ct_str != NULL);
+            while (*ct_str != '\0') {
+                switch(*ct_str) {
+                case '0' ... '9':
+                    {
+                        int oarg = *ct_str - '0';
+                        tcg_debug_assert(ct_str == tdefs->args_ct_str[i]);
+                        tcg_debug_assert(oarg < def->nb_oargs);
+                        tcg_debug_assert(def->args_ct[oarg].ct & TCG_CT_REG);
+                        /* TCG_CT_ALIAS is for the output arguments.
+                           The input is tagged with TCG_CT_IALIAS. */
+                        def->args_ct[i] = def->args_ct[oarg];
+                        def->args_ct[oarg].ct |= TCG_CT_ALIAS;
+                        def->args_ct[oarg].alias_index = i;
+                        def->args_ct[i].ct |= TCG_CT_IALIAS;
+                        def->args_ct[i].alias_index = oarg;
                     }
+                    ct_str++;
+                    break;
+                case '&':
+                    def->args_ct[i].ct |= TCG_CT_NEWREG;
+                    ct_str++;
+                    break;
+                case 'i':
+                    def->args_ct[i].ct |= TCG_CT_CONST;
+                    ct_str++;
+                    break;
+                default:
+                    ct_str = target_parse_constraint(&def->args_ct[i],
+                                                     ct_str, type);
+                    /* Typo in TCGTargetOpDef constraint. */
+                    tcg_debug_assert(ct_str != NULL);
                 }
             }
         }
@@ -2296,7 +2296,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
             arg = args[i];
             arg_ct = &def->args_ct[i];
             ts = &s->temps[arg];
-            if (arg_ct->ct & TCG_CT_ALIAS) {
+            if ((arg_ct->ct & TCG_CT_ALIAS)
+                && !const_args[arg_ct->alias_index]) {
                 reg = new_args[arg_ct->alias_index];
             } else if (arg_ct->ct & TCG_CT_NEWREG) {
                 reg = tcg_reg_alloc(s, arg_ct->u.regs,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 24/64] tcg: Add clz and ctz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (22 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-08 17:44   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 25/64] disas/i386.c: Handle tzcnt Richard Henderson
                   ` (41 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg-runtime.c            |  20 +++++++
 tcg/README               |   8 +++
 tcg/aarch64/tcg-target.h |   4 ++
 tcg/arm/tcg-target.h     |   2 +
 tcg/i386/tcg-target.h    |   4 ++
 tcg/ia64/tcg-target.h    |   4 ++
 tcg/mips/tcg-target.h    |   2 +
 tcg/optimize.c           |  36 ++++++++++++
 tcg/ppc/tcg-target.h     |   4 ++
 tcg/s390/tcg-target.h    |   4 ++
 tcg/sparc/tcg-target.h   |   4 ++
 tcg/tcg-op.c             | 143 +++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h             |  16 ++++++
 tcg/tcg-opc.h            |   4 ++
 tcg/tcg-runtime.h        |   5 ++
 tcg/tcg.h                |   2 +
 tcg/tci/tcg-target.h     |   4 ++
 17 files changed, 266 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index 9327b6f..eb3bade 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -101,6 +101,26 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
     return h;
 }
 
+uint32_t HELPER(clz_i32)(uint32_t arg, uint32_t zero_val)
+{
+    return arg ? clz32(arg) : zero_val;
+}
+
+uint32_t HELPER(ctz_i32)(uint32_t arg, uint32_t zero_val)
+{
+    return arg ? ctz32(arg) : zero_val;
+}
+
+uint64_t HELPER(clz_i64)(uint64_t arg, uint64_t zero_val)
+{
+    return arg ? clz64(arg) : zero_val;
+}
+
+uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
+{
+    return arg ? ctz64(arg) : zero_val;
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/README b/tcg/README
index 065d9c2..f5ccf04 100644
--- a/tcg/README
+++ b/tcg/README
@@ -246,6 +246,14 @@ t0=~(t1|t2)
 
 t0=t1|~t2
 
+* clz_i32/i64 t0, t1, t2
+
+t0 = t1 ? clz(t1) : t2
+
+* ctz_i32/i64 t0, t1, t2
+
+t0 = t1 ? ctz(t1) : t2
+
 ********* Shifts/Rotates
 
 * shl_i32/i64 t0, t1, t2
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 4a74bd8..976f493 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -62,6 +62,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -94,6 +96,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 4e30728..02cc242 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -110,6 +110,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index dc19c47..f2d9955 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -93,6 +93,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -125,6 +127,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 8856dc8..9a829ae 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -140,6 +140,10 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_orc_i32          1
 #define TCG_TARGET_HAS_orc_i64          1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f1c3137..f133684 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -109,6 +109,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rem_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_andc_i32         0
 #define TCG_TARGET_HAS_orc_i32          0
 #define TCG_TARGET_HAS_eqv_i32          0
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 9e26bb7..e7ecce4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -296,6 +296,18 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(nor):
         return ~(x | y);
 
+    case INDEX_op_clz_i32:
+        return (uint32_t)x ? clz32(x) : y;
+
+    case INDEX_op_clz_i64:
+        return x ? clz64(x) : y;
+
+    case INDEX_op_ctz_i32:
+        return (uint32_t)x ? ctz32(x) : y;
+
+    case INDEX_op_ctz_i64:
+        return x ? ctz64(x) : y;
+
     CASE_OP_32_64(ext8s):
         return (int8_t)x;
 
@@ -896,6 +908,16 @@ void tcg_optimize(TCGContext *s)
             mask = temps[args[1]].mask | temps[args[2]].mask;
             break;
 
+        case INDEX_op_clz_i32:
+        case INDEX_op_ctz_i32:
+            mask = temps[args[2]].mask | 31;
+            break;
+
+        case INDEX_op_clz_i64:
+        case INDEX_op_ctz_i64:
+            mask = temps[args[2]].mask | 63;
+            break;
+
         CASE_OP_32_64(setcond):
         case INDEX_op_setcond2_i32:
             mask = 1;
@@ -1052,6 +1074,20 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
+        CASE_OP_32_64(clz):
+        CASE_OP_32_64(ctz):
+            if (temp_is_const(args[1])) {
+                TCGArg v = temps[args[1]].val;
+                if (v != 0) {
+                    tmp = do_constant_folding(opc, v, 0);
+                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                } else {
+                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
+                }
+                break;
+            }
+            goto do_default;
+
         CASE_OP_32_64(deposit):
             if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = deposit64(temps[args[1]].val, args[3], args[4],
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b42c57a..698a599 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -68,6 +68,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -101,6 +103,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index e9ac12e..3ac2dc9 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -77,6 +77,8 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i32        0
 #define TCG_TARGET_HAS_nand_i32       0
 #define TCG_TARGET_HAS_nor_i32        0
+#define TCG_TARGET_HAS_clz_i32        0
+#define TCG_TARGET_HAS_ctz_i32        0
 #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i32   0
@@ -108,6 +110,8 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i64        0
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
+#define TCG_TARGET_HAS_clz_i64        0
+#define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i64   0
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index a212167..340837a 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -110,6 +110,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      0
 #define TCG_TARGET_HAS_extract_i32      0
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -142,6 +144,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 1927e53..2b520c1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -457,6 +457,85 @@ void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
     }
 }
 
+void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
+{
+    if (TCG_TARGET_HAS_clz_i32) {
+        tcg_gen_op3_i32(INDEX_op_clz_i32, ret, arg1, arg2);
+    } else if (TCG_TARGET_HAS_clz_i64) {
+        TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_temp_new_i64();
+        tcg_gen_extu_i32_i64(t1, arg1);
+        tcg_gen_extu_i32_i64(t2, arg2);
+        tcg_gen_addi_i64(t2, t2, 32);
+        tcg_gen_clz_i64(t1, t1, t2);
+        tcg_gen_extrl_i64_i32(ret, t1);
+        tcg_temp_free_i64(t1);
+        tcg_temp_free_i64(t2);
+        tcg_gen_subi_i32(ret, ret, 32);
+    } else {
+        gen_helper_clz_i32(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
+{
+    TCGv_i32 t = tcg_const_i32(arg2);
+    tcg_gen_clz_i32(ret, arg1, t);
+    tcg_temp_free_i32(t);
+}
+
+void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
+{
+    if (TCG_TARGET_HAS_ctz_i32) {
+        tcg_gen_op3_i32(INDEX_op_ctz_i32, ret, arg1, arg2);
+    } else if (TCG_TARGET_HAS_ctz_i64) {
+        TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_temp_new_i64();
+        tcg_gen_extu_i32_i64(t1, arg1);
+        tcg_gen_extu_i32_i64(t2, arg2);
+        tcg_gen_ctz_i64(t1, t1, t2);
+        tcg_gen_extrl_i64_i32(ret, t1);
+        tcg_temp_free_i64(t1);
+        tcg_temp_free_i64(t2);
+    } else if (TCG_TARGET_HAS_clz_i32) {
+        TCGv_i32 t1 = tcg_temp_new_i32();
+        TCGv_i32 t2 = tcg_temp_new_i32();
+        tcg_gen_neg_i32(t1, arg1);
+        tcg_gen_xori_i32(t2, arg2, 31);
+        tcg_gen_and_i32(t1, t1, arg1);
+        tcg_gen_clz_i32(ret, t1, t2);
+        tcg_temp_free_i32(t1);
+        tcg_temp_free_i32(t2);
+        tcg_gen_xori_i32(ret, ret, 31);
+    } else if (TCG_TARGET_HAS_clz_i64) {
+        TCGv_i32 t1 = tcg_temp_new_i32();
+        TCGv_i32 t2 = tcg_temp_new_i32();
+        TCGv_i64 x1 = tcg_temp_new_i64();
+        TCGv_i64 x2 = tcg_temp_new_i64();
+        tcg_gen_neg_i32(t1, arg1);
+        tcg_gen_xori_i32(t2, arg2, 63);
+        tcg_gen_and_i32(t1, t1, arg1);
+        tcg_gen_extu_i32_i64(x1, t1);
+        tcg_gen_extu_i32_i64(x2, t2);
+        tcg_temp_free_i32(t1);
+        tcg_temp_free_i32(t2);
+        tcg_gen_clz_i64(x1, x1, x2);
+        tcg_gen_extrl_i64_i32(ret, x1);
+        tcg_temp_free_i64(x1);
+        tcg_temp_free_i64(x2);
+        tcg_gen_xori_i32(ret, ret, 63);
+    } else {
+        gen_helper_ctz_i32(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
+{
+    TCGv_i32 t = tcg_const_i32(arg2);
+    tcg_gen_ctz_i32(ret, arg1, t);
+    tcg_temp_free_i32(t);
+}
+
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 {
     if (TCG_TARGET_HAS_rot_i32) {
@@ -1703,6 +1782,70 @@ void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     }
 }
 
+void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
+{
+    if (TCG_TARGET_HAS_clz_i64) {
+        tcg_gen_op3_i64(INDEX_op_clz_i64, ret, arg1, arg2);
+    } else {
+        gen_helper_clz_i64(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
+{
+    if (TCG_TARGET_REG_BITS == 32
+        && TCG_TARGET_HAS_clz_i32
+        && arg2 <= 0xffffffffu) {
+        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
+        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
+        tcg_gen_addi_i32(t, t, 32);
+        tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        tcg_temp_free_i32(t);
+    } else {
+        TCGv_i64 t = tcg_const_i64(arg2);
+        tcg_gen_clz_i64(ret, arg1, t);
+        tcg_temp_free_i64(t);
+    }
+}
+
+void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
+{
+    if (TCG_TARGET_HAS_ctz_i64) {
+        tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
+    } else if (TCG_TARGET_HAS_clz_i64) {
+        TCGv_i64 t1 = tcg_temp_new_i64();
+        TCGv_i64 t2 = tcg_temp_new_i64();
+        tcg_gen_neg_i64(t1, arg1);
+        tcg_gen_xori_i64(t2, arg2, 63);
+        tcg_gen_and_i64(t1, t1, arg1);
+        tcg_gen_clz_i64(ret, t1, t2);
+        tcg_temp_free_i64(t1);
+        tcg_temp_free_i64(t2);
+        tcg_gen_xori_i64(ret, ret, 63);
+    } else {
+        gen_helper_ctz_i64(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
+{
+    if (TCG_TARGET_REG_BITS == 32
+        && TCG_TARGET_HAS_ctz_i32
+        && arg2 <= 0xffffffffu) {
+        TCGv_i32 t32 = tcg_const_i32((uint32_t)arg2 - 32);
+        tcg_gen_ctz_i32(t32, TCGV_HIGH(arg1), t32);
+        tcg_gen_addi_i32(t32, t32, 32);
+        tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        tcg_temp_free_i32(t32);
+    } else {
+        TCGv_i64 t64 = tcg_const_i64(arg2);
+        tcg_gen_ctz_i64(ret, arg1, t64);
+        tcg_temp_free_i64(t64);
+    }
+}
+
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
     if (TCG_TARGET_HAS_rot_i64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index d42fd0d..7a24e84 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -286,6 +286,10 @@ void tcg_gen_eqv_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_nand_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_nor_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
+void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -469,6 +473,10 @@ void tcg_gen_eqv_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_nand_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_nor_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
+void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -958,6 +966,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_nand_tl tcg_gen_nand_i64
 #define tcg_gen_nor_tl tcg_gen_nor_i64
 #define tcg_gen_orc_tl tcg_gen_orc_i64
+#define tcg_gen_clz_tl tcg_gen_clz_i64
+#define tcg_gen_ctz_tl tcg_gen_ctz_i64
+#define tcg_gen_clzi_tl tcg_gen_clzi_i64
+#define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
 #define tcg_gen_rotl_tl tcg_gen_rotl_i64
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
@@ -1049,6 +1061,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_nand_tl tcg_gen_nand_i32
 #define tcg_gen_nor_tl tcg_gen_nor_i32
 #define tcg_gen_orc_tl tcg_gen_orc_i32
+#define tcg_gen_clz_tl tcg_gen_clz_i32
+#define tcg_gen_ctz_tl tcg_gen_ctz_i32
+#define tcg_gen_clzi_tl tcg_gen_clzi_i32
+#define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
 #define tcg_gen_rotl_tl tcg_gen_rotl_i32
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 11563ac..d00db4f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -104,6 +104,8 @@ DEF(orc_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_i32))
 DEF(eqv_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_eqv_i32))
 DEF(nand_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nand_i32))
 DEF(nor_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nor_i32))
+DEF(clz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_clz_i32))
+DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
@@ -171,6 +173,8 @@ DEF(orc_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_orc_i64))
 DEF(eqv_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_eqv_i64))
 DEF(nand_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nand_i64))
 DEF(nor_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nor_i64))
+DEF(clz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_clz_i64))
+DEF(ctz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctz_i64))
 
 DEF(add2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_add2_i64))
 DEF(sub2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_sub2_i64))
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 1deb86a..eb1cd76 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -15,6 +15,11 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
+DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
+DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
 #ifdef CONFIG_SOFTMMU
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 144bdab..e026282 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -111,6 +111,8 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 2065042..0646444 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -74,6 +74,8 @@
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_neg_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_orc_i32          0
@@ -104,6 +106,8 @@
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_neg_i64          1
 #define TCG_TARGET_HAS_not_i64          1
 #define TCG_TARGET_HAS_orc_i64          0
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 25/64] disas/i386.c: Handle tzcnt
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (23 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 24/64] tcg: Add clz and ctz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 26/64] disas/ppc: Handle popcnt and cnttz Richard Henderson
                   ` (40 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 disas/i386.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/disas/i386.c b/disas/i386.c
index 57145d0..07f871f 100644
--- a/disas/i386.c
+++ b/disas/i386.c
@@ -682,6 +682,7 @@ fetch_data(struct disassemble_info *info, bfd_byte *addr)
 #define PREGRP104 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 104 } }
 #define PREGRP105 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 105 } }
 #define PREGRP106 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 106 } }
+#define PREGRP107 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 107 } }
 
 #define X86_64_0  NULL, { { NULL, X86_64_SPECIAL }, { NULL, 0 } }
 #define X86_64_1  NULL, { { NULL, X86_64_SPECIAL }, { NULL, 1 } }
@@ -1247,7 +1248,7 @@ static const struct dis386 dis386_twobyte[] = {
   { "ud2b",		{ XX } },
   { GRP8 },
   { "btcS",		{ Ev, Gv } },
-  { "bsfS",		{ Gv, Ev } },
+  { PREGRP107 },
   { PREGRP36 },
   { "movs{bR|x|bR|x}",	{ Gv, Eb } },
   { "movs{wR|x|wR|x}",	{ Gv, Ew } }, /* yes, there really is movsww ! */
@@ -1431,7 +1432,7 @@ static const unsigned char twobyte_uses_REPZ_prefix[256] = {
   /* 80 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 8f */
   /* 90 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 9f */
   /* a0 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* af */
-  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0, /* bf */
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
   /* c0 */ 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, /* cf */
   /* d0 */ 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0, /* df */
   /* e0 */ 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0, /* ef */
@@ -2800,6 +2801,13 @@ static const struct dis386 prefix_user_table[][4] = {
     { "shrxS",	{ Gv, Ev, Bv } },
   },
 
+  /* PREGRP107 */
+  {
+    { "bsfS",	{ Gv, Ev } },
+    { "tzcntS",	{ Gv, Ev } },
+    { "bsfS",	{ Gv, Ev } },
+    { "(bad)",	{ XX } },
+  },
 };
 
 static const struct dis386 x86_64_table[][2] = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 26/64] disas/ppc: Handle popcnt and cnttz
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (24 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 25/64] disas/i386.c: Handle tzcnt Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 27/64] target-alpha: Use the ctz and clz opcodes Richard Henderson
                   ` (39 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 disas/ppc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/disas/ppc.c b/disas/ppc.c
index bd05623..ed7e0d0 100644
--- a/disas/ppc.c
+++ b/disas/ppc.c
@@ -1955,6 +1955,9 @@ extract_tbr (unsigned long insn,
 #define POWER4	PPC_OPCODE_POWER4
 #define POWER5	PPC_OPCODE_POWER5
 #define POWER6	PPC_OPCODE_POWER6
+/* Documentation purposes only; we don't actually check the isa for disas.  */
+#define POWER7  PPC_OPCODE_POWER6
+#define POWER9  PPC_OPCODE_POWER6
 #define CELL	PPC_OPCODE_CELL
 #define PPC32   PPC_OPCODE_32 | PPC_OPCODE_PPC
 #define PPC64   PPC_OPCODE_64 | PPC_OPCODE_PPC
@@ -3589,6 +3592,13 @@ const struct powerpc_opcode powerpc_opcodes[] = {
 { "lbzux",   X(31,119),	X_MASK,		COM,		{ RT, RAL, RB } },
 
 { "popcntb", X(31,122), XRB_MASK,	POWER5,		{ RA, RS } },
+{ "popcntw", X(31,378), XRB_MASK,       POWER7,         { RA, RS } },
+{ "popcntd", X(31,506), XRB_MASK,       POWER7,         { RA, RS } },
+
+{ "cnttzw",  XRC(31,538,0), XRB_MASK,   POWER9,         { RA, RS } },
+{ "cnttzw.", XRC(31,538,1), XRB_MASK,   POWER9,         { RA, RS } },
+{ "cnttzd",  XRC(31,570,0), XRB_MASK,   POWER9,         { RA, RS } },
+{ "cnttzd.", XRC(31,570,1), XRB_MASK,   POWER9,         { RA, RS } },
 
 { "not",     XRC(31,124,0), X_MASK,	COM,		{ RA, RS, RBS } },
 { "nor",     XRC(31,124,0), X_MASK,	COM,		{ RA, RS, RB } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 27/64] target-alpha: Use the ctz and clz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (25 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 26/64] disas/ppc: Handle popcnt and cnttz Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 28/64] target-cris: Use clz opcode Richard Henderson
                   ` (38 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/helper.h     |  2 --
 target-alpha/int_helper.c | 10 ----------
 target-alpha/translate.c  |  4 ++--
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index 004221d..eed3906 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -4,8 +4,6 @@ DEF_HELPER_FLAGS_1(load_pcc, TCG_CALL_NO_RWG_SE, i64, env)
 DEF_HELPER_FLAGS_3(check_overflow, TCG_CALL_NO_WG, void, env, i64, i64)
 
 DEF_HELPER_FLAGS_1(ctpop, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(ctlz, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cttz, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_2(zap, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(zapnot, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target-alpha/int_helper.c b/target-alpha/int_helper.c
index 19bebfe..3c303bd 100644
--- a/target-alpha/int_helper.c
+++ b/target-alpha/int_helper.c
@@ -29,16 +29,6 @@ uint64_t helper_ctpop(uint64_t arg)
     return ctpop64(arg);
 }
 
-uint64_t helper_ctlz(uint64_t arg)
-{
-    return clz64(arg);
-}
-
-uint64_t helper_cttz(uint64_t arg)
-{
-    return ctz64(arg);
-}
-
 uint64_t helper_zapnot(uint64_t val, uint64_t mskb)
 {
     uint64_t mask;
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5ac2277..6e2e563 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2555,14 +2555,14 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
             REQUIRE_REG_31(ra);
             REQUIRE_NO_LIT;
-            gen_helper_ctlz(vc, vb);
+            tcg_gen_clzi_i64(vc, vb, 64);
             break;
         case 0x33:
             /* CTTZ */
             REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
             REQUIRE_REG_31(ra);
             REQUIRE_NO_LIT;
-            gen_helper_cttz(vc, vb);
+            tcg_gen_ctzi_i64(vc, vb, 64);
             break;
         case 0x34:
             /* UNPKBW */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 28/64] target-cris: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (26 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 27/64] target-alpha: Use the ctz and clz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 29/64] target-microblaze: " Richard Henderson
                   ` (37 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-cris/helper.h    | 1 -
 target-cris/op_helper.c | 5 -----
 target-cris/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-cris/helper.h b/target-cris/helper.h
index ff35956..20d21c4 100644
--- a/target-cris/helper.h
+++ b/target-cris/helper.h
@@ -7,7 +7,6 @@ DEF_HELPER_1(rfn, void, env)
 DEF_HELPER_3(movl_sreg_reg, void, env, i32, i32)
 DEF_HELPER_3(movl_reg_sreg, void, env, i32, i32)
 
-DEF_HELPER_FLAGS_1(lz, TCG_CALL_NO_SE, i32, i32)
 DEF_HELPER_FLAGS_4(btst, TCG_CALL_NO_SE, i32, env, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(evaluate_flags_muls, TCG_CALL_NO_SE, i32, env, i32, i32, i32)
diff --git a/target-cris/op_helper.c b/target-cris/op_helper.c
index 5043039..e92505c 100644
--- a/target-cris/op_helper.c
+++ b/target-cris/op_helper.c
@@ -230,11 +230,6 @@ void helper_rfn(CPUCRISState *env)
 	env->pregs[PR_CCS] |= M_FLAG_V32;
 }
 
-uint32_t helper_lz(uint32_t t0)
-{
-	return clz32(t0);
-}
-
 uint32_t helper_btst(CPUCRISState *env, uint32_t t0, uint32_t t1, uint32_t ccs)
 {
 	/* FIXME: clean this up.  */
diff --git a/target-cris/translate.c b/target-cris/translate.c
index b910427..0ee05ca 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -767,7 +767,7 @@ static void cris_alu_op_exec(DisasContext *dc, int op,
         t_gen_subx_carry(dc, dst);
         break;
     case CC_OP_LZ:
-        gen_helper_lz(dst, b);
+        tcg_gen_clzi_tl(dst, b, TARGET_LONG_BITS);
         break;
     case CC_OP_MULS:
         tcg_gen_muls2_tl(dst, cpu_PR[PR_MOF], a, b);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 29/64] target-microblaze: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (27 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 28/64] target-cris: Use clz opcode Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 30/64] target-mips: " Richard Henderson
                   ` (36 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-microblaze/helper.h    | 1 -
 target-microblaze/op_helper.c | 5 -----
 target-microblaze/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-microblaze/helper.h b/target-microblaze/helper.h
index bd13826..71a6c08 100644
--- a/target-microblaze/helper.h
+++ b/target-microblaze/helper.h
@@ -3,7 +3,6 @@ DEF_HELPER_1(debug, void, env)
 DEF_HELPER_FLAGS_3(carry, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_2(cmp, i32, i32, i32)
 DEF_HELPER_2(cmpu, i32, i32, i32)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 
 DEF_HELPER_3(divs, i32, env, i32, i32)
 DEF_HELPER_3(divu, i32, env, i32, i32)
diff --git a/target-microblaze/op_helper.c b/target-microblaze/op_helper.c
index 4a856e6..1e07e21 100644
--- a/target-microblaze/op_helper.c
+++ b/target-microblaze/op_helper.c
@@ -145,11 +145,6 @@ uint32_t helper_cmpu(uint32_t a, uint32_t b)
     return t;
 }
 
-uint32_t helper_clz(uint32_t t0)
-{
-    return clz32(t0);
-}
-
 uint32_t helper_carry(uint32_t a, uint32_t b, uint32_t cf)
 {
     return compute_carry(a, b, cf);
diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
index de2090a..0bb6095 100644
--- a/target-microblaze/translate.c
+++ b/target-microblaze/translate.c
@@ -768,7 +768,7 @@ static void dec_bit(DisasContext *dc)
                 t_gen_raise_exception(dc, EXCP_HW_EXCP);
             }
             if (dc->cpu->env.pvr.regs[2] & PVR2_USE_PCMP_INSTR) {
-                gen_helper_clz(cpu_R[dc->rd], cpu_R[dc->ra]);
+                tcg_gen_clzi_i32(cpu_R[dc->rd], cpu_R[dc->ra], 32);
             }
             break;
         case 0x1e0:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 30/64] target-mips: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (28 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 29/64] target-microblaze: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 31/64] target-openrisc: Use clz and ctz opcodes Richard Henderson
                   ` (35 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/helper.h    |  7 -------
 target-mips/op_helper.c | 22 ----------------------
 target-mips/translate.c | 23 ++++++++++++++++-------
 3 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index 666936c..60efa01 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -20,13 +20,6 @@ DEF_HELPER_4(scd, tl, env, tl, tl, int)
 #endif
 #endif
 
-DEF_HELPER_FLAGS_1(clo, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl)
-#ifdef TARGET_MIPS64
-DEF_HELPER_FLAGS_1(dclo, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(dclz, TCG_CALL_NO_RWG_SE, tl, tl)
-#endif
-
 DEF_HELPER_3(muls, tl, env, tl, tl)
 DEF_HELPER_3(mulsu, tl, env, tl, tl)
 DEF_HELPER_3(macc, tl, env, tl, tl)
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 7af4c2f..11d781f 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -103,28 +103,6 @@ HELPER_ST(sd, stq, uint64_t)
 #endif
 #undef HELPER_ST
 
-target_ulong helper_clo (target_ulong arg1)
-{
-    return clo32(arg1);
-}
-
-target_ulong helper_clz (target_ulong arg1)
-{
-    return clz32(arg1);
-}
-
-#if defined(TARGET_MIPS64)
-target_ulong helper_dclo (target_ulong arg1)
-{
-    return clo64(arg1);
-}
-
-target_ulong helper_dclz (target_ulong arg1)
-{
-    return clz64(arg1);
-}
-#endif /* TARGET_MIPS64 */
-
 /* 64 bits arithmetic for 32 bits hosts */
 static inline uint64_t get_HILO(CPUMIPSState *env)
 {
diff --git a/target-mips/translate.c b/target-mips/translate.c
index cf79aa4..24d7657 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -3626,29 +3626,38 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
         /* Treat as NOP. */
         return;
     }
-    t0 = tcg_temp_new();
+    t0 = cpu_gpr[rd];
     gen_load_gpr(t0, rs);
+
     switch (opc) {
     case OPC_CLO:
     case R6_OPC_CLO:
-        gen_helper_clo(cpu_gpr[rd], t0);
+#if defined(TARGET_MIPS64)
+    case OPC_DCLO:
+    case R6_OPC_DCLO:
+#endif
+        tcg_gen_not_tl(t0, t0);
         break;
+    }
+
+    switch (opc) {
+    case OPC_CLO:
+    case R6_OPC_CLO:
     case OPC_CLZ:
     case R6_OPC_CLZ:
-        gen_helper_clz(cpu_gpr[rd], t0);
+        tcg_gen_ext32u_tl(t0, t0);
+        tcg_gen_clzi_tl(t0, t0, TARGET_LONG_BITS);
+        tcg_gen_subi_tl(t0, t0, TARGET_LONG_BITS - 32);
         break;
 #if defined(TARGET_MIPS64)
     case OPC_DCLO:
     case R6_OPC_DCLO:
-        gen_helper_dclo(cpu_gpr[rd], t0);
-        break;
     case OPC_DCLZ:
     case R6_OPC_DCLZ:
-        gen_helper_dclz(cpu_gpr[rd], t0);
+        tcg_gen_clzi_i64(t0, t0, 64);
         break;
 #endif
     }
-    tcg_temp_free(t0);
 }
 
 /* Godson integer instructions */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 31/64] target-openrisc: Use clz and ctz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (29 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 30/64] target-mips: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 32/64] target-ppc: " Richard Henderson
                   ` (34 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-openrisc/helper.h     |  2 --
 target-openrisc/int_helper.c | 19 -------------------
 target-openrisc/translate.c  |  6 ++++--
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/target-openrisc/helper.h b/target-openrisc/helper.h
index f53fa21..bcc7245 100644
--- a/target-openrisc/helper.h
+++ b/target-openrisc/helper.h
@@ -54,8 +54,6 @@ FOP_CMP(ge)
 #undef FOP_CMP
 
 /* int */
-DEF_HELPER_FLAGS_1(ff1, 0, tl, tl)
-DEF_HELPER_FLAGS_1(fl1, 0, tl, tl)
 DEF_HELPER_FLAGS_3(mul32, 0, i32, env, i32, i32)
 
 /* interrupt */
diff --git a/target-openrisc/int_helper.c b/target-openrisc/int_helper.c
index 4d1f958..ba0fd27 100644
--- a/target-openrisc/int_helper.c
+++ b/target-openrisc/int_helper.c
@@ -24,25 +24,6 @@
 #include "exception.h"
 #include "qemu/host-utils.h"
 
-target_ulong HELPER(ff1)(target_ulong x)
-{
-/*#ifdef TARGET_OPENRISC64
-    return x ? ctz64(x) + 1 : 0;
-#else*/
-    return x ? ctz32(x) + 1 : 0;
-/*#endif*/
-}
-
-target_ulong HELPER(fl1)(target_ulong x)
-{
-/* not used yet, open it when we need or64.  */
-/*#ifdef TARGET_OPENRISC64
-    return 64 - clz64(x);
-#else*/
-    return 32 - clz32(x);
-/*#endif*/
-}
-
 uint32_t HELPER(mul32)(CPUOpenRISCState *env,
                        uint32_t ra, uint32_t rb)
 {
diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 229361a..03fa7db 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -602,11 +602,13 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
         switch (op1) {
         case 0x00:    /* l.ff1 */
             LOG_DIS("l.ff1 r%d, r%d, r%d\n", rd, ra, rb);
-            gen_helper_ff1(cpu_R[rd], cpu_R[ra]);
+            tcg_gen_ctzi_tl(cpu_R[rd], cpu_R[ra], -1);
+            tcg_gen_addi_tl(cpu_R[rd], cpu_R[rd], 1);
             break;
         case 0x01:    /* l.fl1 */
             LOG_DIS("l.fl1 r%d, r%d, r%d\n", rd, ra, rb);
-            gen_helper_fl1(cpu_R[rd], cpu_R[ra]);
+            tcg_gen_clzi_tl(cpu_R[rd], cpu_R[ra], TARGET_LONG_BITS);
+            tcg_gen_subfi_tl(cpu_R[rd], TARGET_LONG_BITS, cpu_R[rd]);
             break;
 
         default:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 32/64] target-ppc: Use clz and ctz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (30 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 31/64] target-openrisc: Use clz and ctz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 33/64] target-s390x: Use clz opcode Richard Henderson
                   ` (33 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/helper.h     |  4 ----
 target-ppc/int_helper.c | 20 --------------------
 target-ppc/translate.c  | 20 ++++++++++++++++----
 3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index da00f0a..1ed1d2c 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -38,16 +38,12 @@ DEF_HELPER_4(divde, i64, env, i64, i64, i32)
 DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
 DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
 
-DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(cnttzw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_3(sraw, tl, env, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
-DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 9ac204a..a6486ce 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -141,16 +141,6 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
 #endif
 
 
-target_ulong helper_cntlzw(target_ulong t)
-{
-    return clz32(t);
-}
-
-target_ulong helper_cnttzw(target_ulong t)
-{
-    return ctz32(t);
-}
-
 #if defined(TARGET_PPC64)
 /* if x = 0xab, returns 0xababababababababa */
 #define pattern(x) (((x) & 0xff) * (~(target_ulong)0 / 0xff))
@@ -174,16 +164,6 @@ uint32_t helper_cmpeqb(target_ulong ra, target_ulong rb)
 #undef haszero
 #undef hasvalue
 
-target_ulong helper_cntlzd(target_ulong t)
-{
-    return clz64(t);
-}
-
-target_ulong helper_cnttzd(target_ulong t)
-{
-    return ctz64(t);
-}
-
 /* Return invalid random number.
  *
  * FIXME: Add rng backend or other mechanism to get cryptographically suitable
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 435c6f0..1224f56 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1641,7 +1641,13 @@ static void gen_andis_(DisasContext *ctx)
 /* cntlzw */
 static void gen_cntlzw(DisasContext *ctx)
 {
-    gen_helper_cntlzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_clzi_i32(t, t, 32);
+    tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
+    tcg_temp_free_i32(t);
+
     if (unlikely(Rc(ctx->opcode) != 0))
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1649,7 +1655,13 @@ static void gen_cntlzw(DisasContext *ctx)
 /* cnttzw */
 static void gen_cnttzw(DisasContext *ctx)
 {
-    gen_helper_cnttzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_ctzi_i32(t, t, 32);
+    tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
+    tcg_temp_free_i32(t);
+
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
     }
@@ -1891,7 +1903,7 @@ GEN_LOGICAL1(extsw, tcg_gen_ext32s_tl, 0x1E, PPC_64B);
 /* cntlzd */
 static void gen_cntlzd(DisasContext *ctx)
 {
-    gen_helper_cntlzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_clzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
     if (unlikely(Rc(ctx->opcode) != 0))
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1899,7 +1911,7 @@ static void gen_cntlzd(DisasContext *ctx)
 /* cnttzd */
 static void gen_cnttzd(DisasContext *ctx)
 {
-    gen_helper_cnttzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_ctzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 33/64] target-s390x: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (31 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 32/64] target-ppc: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 34/64] target-tilegx: Use clz and ctz opcodes Richard Henderson
                   ` (32 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-s390x/helper.h     | 1 -
 target-s390x/int_helper.c | 6 ------
 target-s390x/translate.c  | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/target-s390x/helper.h b/target-s390x/helper.h
index 207a6e7..9102071 100644
--- a/target-s390x/helper.h
+++ b/target-s390x/helper.h
@@ -70,7 +70,6 @@ DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target-s390x/int_helper.c b/target-s390x/int_helper.c
index 370c94d..5bc470b 100644
--- a/target-s390x/int_helper.c
+++ b/target-s390x/int_helper.c
@@ -117,12 +117,6 @@ uint64_t HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al,
     return ret;
 }
 
-/* count leading zeros, for find leftmost one */
-uint64_t HELPER(clz)(uint64_t v)
-{
-    return clz64(v);
-}
-
 uint64_t HELPER(cvd)(int32_t reg)
 {
     /* positive 0 */
diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index 6cebb7e..01c6217 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -2249,7 +2249,7 @@ static ExitStatus op_flogr(DisasContext *s, DisasOps *o)
     gen_op_update1_cc_i64(s, CC_OP_FLOGR, o->in2);
 
     /* R1 = IN ? CLZ(IN) : 64.  */
-    gen_helper_clz(o->out, o->in2);
+    tcg_gen_clzi_i64(o->out, o->in2, 64);
 
     /* R1+1 = IN & ~(found bit).  Note that we may attempt to shift this
        value by 64, which is undefined.  But since the shift is 64 iff the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 34/64] target-tilegx: Use clz and ctz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (32 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 33/64] target-s390x: Use clz opcode Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 35/64] target-tricore: Use clz opcode Richard Henderson
                   ` (31 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tilegx/helper.c    | 10 ----------
 target-tilegx/helper.h    |  2 --
 target-tilegx/translate.c |  4 ++--
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/target-tilegx/helper.c b/target-tilegx/helper.c
index b4fba9c..b6f5e29 100644
--- a/target-tilegx/helper.c
+++ b/target-tilegx/helper.c
@@ -55,16 +55,6 @@ void helper_ext01_ics(CPUTLGState *env)
     }
 }
 
-uint64_t helper_cntlz(uint64_t arg)
-{
-    return clz64(arg);
-}
-
-uint64_t helper_cnttz(uint64_t arg)
-{
-    return ctz64(arg);
-}
-
 uint64_t helper_pcnt(uint64_t arg)
 {
     return ctpop64(arg);
diff --git a/target-tilegx/helper.h b/target-tilegx/helper.h
index 9281d0f..bab303a 100644
--- a/target-tilegx/helper.h
+++ b/target-tilegx/helper.h
@@ -1,7 +1,5 @@
 DEF_HELPER_2(exception, noreturn, env, i32)
 DEF_HELPER_1(ext01_ics, void, env)
-DEF_HELPER_FLAGS_1(cntlz, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cnttz, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(pcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(revbits, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_3(shufflebytes, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
index 9c734ee..8a2df1b 100644
--- a/target-tilegx/translate.c
+++ b/target-tilegx/translate.c
@@ -608,12 +608,12 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
     switch (opext) {
     case OE_RR_X0(CNTLZ):
     case OE_RR_Y0(CNTLZ):
-        gen_helper_cntlz(tdest, tsrca);
+        tcg_gen_clzi_tl(tdest, tsrca, TARGET_LONG_BITS);
         mnemonic = "cntlz";
         break;
     case OE_RR_X0(CNTTZ):
     case OE_RR_Y0(CNTTZ):
-        gen_helper_cnttz(tdest, tsrca);
+        tcg_gen_ctzi_tl(tdest, tsrca, TARGET_LONG_BITS);
         mnemonic = "cnttz";
         break;
     case OE_RR_X0(FSINGLE_PACK1):
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 35/64] target-tricore: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (33 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 34/64] target-tilegx: Use clz and ctz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 14:58   ` Bastian Koppelmann
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 36/64] target-unicore32: " Richard Henderson
                   ` (30 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tricore/helper.h    |  2 --
 target-tricore/op_helper.c | 10 ----------
 target-tricore/translate.c |  5 +++--
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/target-tricore/helper.h b/target-tricore/helper.h
index 9333e16..2cf04e1 100644
--- a/target-tricore/helper.h
+++ b/target-tricore/helper.h
@@ -87,9 +87,7 @@ DEF_HELPER_FLAGS_2(min_hu, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(ixmin, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 DEF_HELPER_FLAGS_2(ixmin_u, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 /* count leading ... */
-DEF_HELPER_FLAGS_1(clo, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clo_h, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clz_h, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls_h, TCG_CALL_NO_RWG_SE, i32, i32)
diff --git a/target-tricore/op_helper.c b/target-tricore/op_helper.c
index ac02e0a..3731d5e 100644
--- a/target-tricore/op_helper.c
+++ b/target-tricore/op_helper.c
@@ -1733,11 +1733,6 @@ EXTREMA_H_B(min, <)
 
 #undef EXTREMA_H_B
 
-uint32_t helper_clo(target_ulong r1)
-{
-    return clo32(r1);
-}
-
 uint32_t helper_clo_h(target_ulong r1)
 {
     uint32_t ret_hw0 = extract32(r1, 0, 16);
@@ -1756,11 +1751,6 @@ uint32_t helper_clo_h(target_ulong r1)
     return ret_hw0 | (ret_hw1 << 16);
 }
 
-uint32_t helper_clz(target_ulong r1)
-{
-    return clz32(r1);
-}
-
 uint32_t helper_clz_h(target_ulong r1)
 {
     uint32_t ret_hw0 = extract32(r1, 0, 16);
diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index 36f734a..69cdfb9 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -6367,7 +6367,8 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
         tcg_gen_andc_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
         break;
     case OPC2_32_RR_CLO:
-        gen_helper_clo(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_not_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r3], TARGET_LONG_BITS);
         break;
     case OPC2_32_RR_CLO_H:
         gen_helper_clo_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
@@ -6379,7 +6380,7 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
         gen_helper_cls_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
         break;
     case OPC2_32_RR_CLZ:
-        gen_helper_clz(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], TARGET_LONG_BITS);
         break;
     case OPC2_32_RR_CLZ_H:
         gen_helper_clz_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 36/64] target-unicore32: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (34 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 35/64] target-tricore: Use clz opcode Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 37/64] target-xtensa: " Richard Henderson
                   ` (29 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-unicore32/helper.c    | 10 ----------
 target-unicore32/helper.h    |  3 ---
 target-unicore32/translate.c |  6 +++---
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/target-unicore32/helper.c b/target-unicore32/helper.c
index d603bde..7a5613e 100644
--- a/target-unicore32/helper.c
+++ b/target-unicore32/helper.c
@@ -32,16 +32,6 @@ UniCore32CPU *uc32_cpu_init(const char *cpu_model)
     return UNICORE32_CPU(cpu_generic_init(TYPE_UNICORE32_CPU, cpu_model));
 }
 
-uint32_t HELPER(clo)(uint32_t x)
-{
-    return clo32(x);
-}
-
-uint32_t HELPER(clz)(uint32_t x)
-{
-    return clz32(x);
-}
-
 #ifndef CONFIG_USER_ONLY
 void helper_cp0_set(CPUUniCore32State *env, uint32_t val, uint32_t creg,
         uint32_t cop)
diff --git a/target-unicore32/helper.h b/target-unicore32/helper.h
index 9418137..a4a5d45 100644
--- a/target-unicore32/helper.h
+++ b/target-unicore32/helper.h
@@ -13,9 +13,6 @@ DEF_HELPER_3(cp0_get, i32, env, i32, i32)
 DEF_HELPER_1(cp1_putc, void, i32)
 #endif
 
-DEF_HELPER_1(clz, i32, i32)
-DEF_HELPER_1(clo, i32, i32)
-
 DEF_HELPER_2(exception, void, env, i32)
 
 DEF_HELPER_3(asr_write, void, env, i32, i32)
diff --git a/target-unicore32/translate.c b/target-unicore32/translate.c
index 514d460..666a201 100644
--- a/target-unicore32/translate.c
+++ b/target-unicore32/translate.c
@@ -1479,10 +1479,10 @@ static void do_misc(CPUUniCore32State *env, DisasContext *s, uint32_t insn)
         /* clz */
         tmp = load_reg(s, UCOP_REG_M);
         if (UCOP_SET(26)) {
-            gen_helper_clo(tmp, tmp);
-        } else {
-            gen_helper_clz(tmp, tmp);
+            /* clo */
+            tcg_gen_not_i32(tmp, tmp);
         }
+        tcg_gen_clzi_i32(tmp, tmp, 32);
         store_reg(s, UCOP_REG_D, tmp);
         return;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 37/64] target-xtensa: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (35 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 36/64] target-unicore32: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 38/64] target-arm: " Richard Henderson
                   ` (28 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-xtensa/helper.h    |  2 --
 target-xtensa/op_helper.c | 13 -------------
 target-xtensa/translate.c | 13 +++++++++++--
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/target-xtensa/helper.h b/target-xtensa/helper.h
index 5ea9c5b..0c8adae 100644
--- a/target-xtensa/helper.h
+++ b/target-xtensa/helper.h
@@ -3,8 +3,6 @@ DEF_HELPER_3(exception_cause, noreturn, env, i32, i32)
 DEF_HELPER_4(exception_cause_vaddr, noreturn, env, i32, i32, i32)
 DEF_HELPER_3(debug_exception, noreturn, env, i32, i32)
 
-DEF_HELPER_FLAGS_1(nsa, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(nsau, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_2(wsr_windowbase, void, env, i32)
 DEF_HELPER_4(entry, void, env, i32, i32, i32)
 DEF_HELPER_2(retw, i32, env, i32)
diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
index 0a4b214..dc25625 100644
--- a/target-xtensa/op_helper.c
+++ b/target-xtensa/op_helper.c
@@ -161,19 +161,6 @@ void HELPER(debug_exception)(CPUXtensaState *env, uint32_t pc, uint32_t cause)
     HELPER(exception)(env, EXC_DEBUG);
 }
 
-uint32_t HELPER(nsa)(uint32_t v)
-{
-    if (v & 0x80000000) {
-        v = ~v;
-    }
-    return v ? clz32(v) - 1 : 31;
-}
-
-uint32_t HELPER(nsau)(uint32_t v)
-{
-    return v ? clz32(v) : 32;
-}
-
 static void copy_window_from_phys(CPUXtensaState *env,
         uint32_t window, uint32_t phys, uint32_t n)
 {
diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
index 0858c29..5c719a4 100644
--- a/target-xtensa/translate.c
+++ b/target-xtensa/translate.c
@@ -1372,14 +1372,23 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
                 case 14: /*NSAu*/
                     HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
                     if (gen_window_check2(dc, RRR_S, RRR_T)) {
-                        gen_helper_nsa(cpu_R[RRR_T], cpu_R[RRR_S]);
+                        TCGv_i32 t0 = tcg_temp_new_i32();
+
+                        /* if (v & 0x80000000) v = ~v; */
+                        tcg_gen_sari_i32(t0, cpu_R[RRR_S], 31);
+                        tcg_gen_xor_i32(t0, t0, cpu_R[RRR_S]);
+
+                        /* r = (v ? clz(v) : 32) - 1; */
+                        tcg_gen_clzi_i32(t0, t0, 32);
+                        tcg_gen_subi_i32(cpu_R[RRR_T], t0, 1);
+                        tcg_temp_free_i32(t0);
                     }
                     break;
 
                 case 15: /*NSAUu*/
                     HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
                     if (gen_window_check2(dc, RRR_S, RRR_T)) {
-                        gen_helper_nsau(cpu_R[RRR_T], cpu_R[RRR_S]);
+                        tcg_gen_clzi_i32(cpu_R[RRR_T], cpu_R[RRR_S], 32);
                     }
                     break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 38/64] target-arm: Use clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (36 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 37/64] target-xtensa: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-08 17:47   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 39/64] target-i386: Use clz and ctz opcodes Richard Henderson
                   ` (27 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/helper-a64.c    | 10 ----------
 target-arm/helper-a64.h    |  2 --
 target-arm/helper.c        |  5 -----
 target-arm/helper.h        |  1 -
 target-arm/translate-a64.c |  8 ++++----
 target-arm/translate.c     |  6 +++---
 6 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 98b97df..77999ff 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -54,11 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
     return num / den;
 }
 
-uint64_t HELPER(clz64)(uint64_t x)
-{
-    return clz64(x);
-}
-
 uint64_t HELPER(cls64)(uint64_t x)
 {
     return clrsb64(x);
@@ -69,11 +64,6 @@ uint32_t HELPER(cls32)(uint32_t x)
     return clrsb32(x);
 }
 
-uint32_t HELPER(clz32)(uint32_t x)
-{
-    return clz32(x);
-}
-
 uint64_t HELPER(rbit64)(uint64_t x)
 {
     return revbit64(x);
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index dd32000..d320f96 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -18,10 +18,8 @@
  */
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
-DEF_HELPER_FLAGS_1(clz64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(clz32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
diff --git a/target-arm/helper.c b/target-arm/helper.c
index b5b65ca..0cafdbc 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -5718,11 +5718,6 @@ uint32_t HELPER(uxtb16)(uint32_t x)
     return res;
 }
 
-uint32_t HELPER(clz)(uint32_t x)
-{
-    return clz32(x);
-}
-
 int32_t HELPER(sdiv)(int32_t num, int32_t den)
 {
     if (den == 0)
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 84aa637..df86bf7 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -1,4 +1,3 @@
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(sxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(uxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
 
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index e90487b..12621ff 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3953,11 +3953,11 @@ static void handle_clz(DisasContext *s, unsigned int sf,
     tcg_rn = cpu_reg(s, rn);
 
     if (sf) {
-        gen_helper_clz64(tcg_rd, tcg_rn);
+        tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
     } else {
         TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
         tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
-        gen_helper_clz(tcg_tmp32, tcg_tmp32);
+        tcg_gen_clzi_i32(tcg_tmp32, tcg_tmp32, 32);
         tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
         tcg_temp_free_i32(tcg_tmp32);
     }
@@ -7590,7 +7590,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
     switch (opcode) {
     case 0x4: /* CLS, CLZ */
         if (u) {
-            gen_helper_clz64(tcg_rd, tcg_rn);
+            tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
         } else {
             gen_helper_cls64(tcg_rd, tcg_rn);
         }
@@ -10260,7 +10260,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                     goto do_cmop;
                 case 0x4: /* CLS */
                     if (u) {
-                        gen_helper_clz32(tcg_res, tcg_op);
+                        tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
                     } else {
                         gen_helper_cls32(tcg_res, tcg_op);
                     }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 08da9ac..c9186b6 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7037,7 +7037,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             switch (size) {
                             case 0: gen_helper_neon_clz_u8(tmp, tmp); break;
                             case 1: gen_helper_neon_clz_u16(tmp, tmp); break;
-                            case 2: gen_helper_clz(tmp, tmp); break;
+                            case 2: tcg_gen_clzi_i32(tmp, tmp, 32); break;
                             default: abort();
                             }
                             break;
@@ -8219,7 +8219,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                 ARCH(5);
                 rd = (insn >> 12) & 0xf;
                 tmp = load_reg(s, rm);
-                gen_helper_clz(tmp, tmp);
+                tcg_gen_clzi_i32(tmp, tmp, 32);
                 store_reg(s, rd, tmp);
             } else {
                 goto illegal_op;
@@ -9992,7 +9992,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                     tcg_temp_free_i32(tmp2);
                     break;
                 case 0x18: /* clz */
-                    gen_helper_clz(tmp, tmp);
+                    tcg_gen_clzi_i32(tmp, tmp, 32);
                     break;
                 case 0x20:
                 case 0x21:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 39/64] target-i386: Use clz and ctz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (37 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 38/64] target-arm: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 40/64] tcg/ppc: Handle ctz and clz opcodes Richard Henderson
                   ` (26 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/helper.h     |  2 --
 target-i386/int_helper.c | 11 -----------
 target-i386/translate.c  | 31 ++++++++++++++-----------------
 3 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 4e859eb..1e76b09 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -201,8 +201,6 @@ DEF_HELPER_FLAGS_3(xsetbv, TCG_CALL_NO_WG, void, env, i32, i64)
 DEF_HELPER_FLAGS_2(rdpkru, TCG_CALL_NO_WG, i64, env, i32)
 DEF_HELPER_FLAGS_3(wrpkru, TCG_CALL_NO_WG, void, env, i32, i64)
 
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(ctz, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(pdep, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(pext, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
diff --git a/target-i386/int_helper.c b/target-i386/int_helper.c
index 9e873ac..4dc5c65 100644
--- a/target-i386/int_helper.c
+++ b/target-i386/int_helper.c
@@ -417,17 +417,6 @@ void helper_idivq_EAX(CPUX86State *env, target_ulong t0)
 # define clztl  clz64
 #endif
 
-/* bit operations */
-target_ulong helper_ctz(target_ulong t0)
-{
-    return ctztl(t0);
-}
-
-target_ulong helper_clz(target_ulong t0)
-{
-    return clztl(t0);
-}
-
 target_ulong helper_pdep(target_ulong src, target_ulong mask)
 {
     target_ulong dest = 0;
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 4d6d36f..0eac334 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6792,21 +6792,18 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 ? s->cpuid_ext3_features & CPUID_EXT3_ABM
                 : s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1)) {
             int size = 8 << ot;
+            /* For lzcnt/tzcnt, C bit is defined related to the input. */
             tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
             if (b & 1) {
                 /* For lzcnt, reduce the target_ulong result by the
                    number of zeros that we expect to find at the top.  */
-                gen_helper_clz(cpu_T0, cpu_T0);
+                tcg_gen_clzi_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS);
                 tcg_gen_subi_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS - size);
             } else {
-                /* For tzcnt, a zero input must return the operand size:
-                   force all bits outside the operand size to 1.  */
-                target_ulong mask = (target_ulong)-2 << (size - 1);
-                tcg_gen_ori_tl(cpu_T0, cpu_T0, mask);
-                gen_helper_ctz(cpu_T0, cpu_T0);
-            }
-            /* For lzcnt/tzcnt, C and Z bits are defined and are
-               related to the result.  */
+                /* For tzcnt, a zero input must return the operand size.  */
+                tcg_gen_ctzi_tl(cpu_T0, cpu_T0, size);
+            }
+            /* For lzcnt/tzcnt, Z bit is defined related to the result.  */
             gen_op_update1_cc();
             set_cc_op(s, CC_OP_BMILGB + ot);
         } else {
@@ -6814,20 +6811,20 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                to the input and not the result.  */
             tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
             set_cc_op(s, CC_OP_LOGICB + ot);
+
+            /* ??? The manual says that the output is undefined when the
+               input is zero, but real hardware leaves it unchanged, and
+               real programs appear to depend on that.  Accomplish this
+               by passing the output as the value to return upon zero.  */
             if (b & 1) {
                 /* For bsr, return the bit index of the first 1 bit,
                    not the count of leading zeros.  */
-                gen_helper_clz(cpu_T0, cpu_T0);
+                tcg_gen_xori_tl(cpu_T1, cpu_regs[reg], TARGET_LONG_BITS - 1);
+                tcg_gen_clz_tl(cpu_T0, cpu_T0, cpu_T1);
                 tcg_gen_xori_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS - 1);
             } else {
-                gen_helper_ctz(cpu_T0, cpu_T0);
+                tcg_gen_ctz_tl(cpu_T0, cpu_T0, cpu_regs[reg]);
             }
-            /* ??? The manual says that the output is undefined when the
-               input is zero, but real hardware leaves it unchanged, and
-               real programs appear to depend on that.  */
-            tcg_gen_movi_tl(cpu_tmp0, 0);
-            tcg_gen_movcond_tl(TCG_COND_EQ, cpu_T0, cpu_cc_dst, cpu_tmp0,
-                               cpu_regs[reg], cpu_T0);
         }
         gen_op_mov_reg_v(ot, reg, cpu_T0);
         break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 40/64] tcg/ppc: Handle ctz and clz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (38 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 39/64] target-i386: Use clz and ctz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: " Richard Henderson
                   ` (25 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     | 10 +++++---
 tcg/ppc/tcg-target.inc.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 698a599..c798c9c 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -49,6 +49,8 @@ typedef enum {
     TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
+extern bool have_isa_3_00;
+
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_ext8u_i32        0 /* andi */
 #define TCG_TARGET_HAS_ext16u_i32       0
@@ -68,8 +70,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          have_isa_3_00
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -103,8 +105,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          have_isa_3_00
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index bf17161..766bc1a 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -77,11 +77,15 @@
 #define TCG_CT_CONST_U32  0x800
 #define TCG_CT_CONST_ZERO 0x1000
 #define TCG_CT_CONST_MONE 0x2000
+#define TCG_CT_CONST_WSZ  0x4000
 
 static tcg_insn_unit *tb_ret_addr;
 
 #include "elf.h"
+
 static bool have_isa_2_06;
+bool have_isa_3_00;
+
 #define HAVE_ISA_2_06  have_isa_2_06
 #define HAVE_ISEL      have_isa_2_06
 
@@ -305,6 +309,9 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'U':
         ct->ct |= TCG_CT_CONST_U32;
         break;
+    case 'W':
+        ct->ct |= TCG_CT_CONST_WSZ;
+        break;
     case 'Z':
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
@@ -341,6 +348,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return 1;
     } else if ((ct & TCG_CT_CONST_MONE) && val == -1) {
         return 1;
+    } else if ((ct & TCG_CT_CONST_WSZ)
+               && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+        return 1;
     }
     return 0;
 }
@@ -445,6 +455,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define NOR    XO31(124)
 #define CNTLZW XO31( 26)
 #define CNTLZD XO31( 58)
+#define CNTTZW XO31(538)
+#define CNTTZD XO31(570)
 #define ANDC   XO31( 60)
 #define ORC    XO31(412)
 #define EQV    XO31(284)
@@ -1166,6 +1178,32 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
     }
 }
 
+static void tcg_out_cntxz(TCGContext *s, TCGType type, uint32_t opc,
+                          TCGArg a0, TCGArg a1, TCGArg a2, bool const_a2)
+{
+    if (const_a2 && a2 == (type == TCG_TYPE_I32 ? 32 : 64)) {
+        tcg_out32(s, opc | RA(a0) | RS(a1));
+    } else {
+        tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 7, type);
+        /* Note that the only other valid constant for a2 is 0.  */
+        if (HAVE_ISEL) {
+            tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
+            tcg_out32(s, tcg_to_isel[TCG_COND_EQ] | TAB(a0, a2, TCG_REG_R0));
+        } else if (!const_a2 && a0 == a2) {
+            tcg_out32(s, tcg_to_bc[TCG_COND_EQ] | 8);
+            tcg_out32(s, opc | RA(a0) | RS(a1));
+        } else {
+            tcg_out32(s, opc | RA(a0) | RS(a1));
+            tcg_out32(s, tcg_to_bc[TCG_COND_NE] | 8);
+            if (const_a2) {
+                tcg_out_movi(s, type, a0, 0);
+            } else {
+                tcg_out_mov(s, type, a0, a2);
+            }
+        }
+    }
+}
+
 static void tcg_out_cmp2(TCGContext *s, const TCGArg *args,
                          const int *const_args)
 {
@@ -2103,6 +2141,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out32(s, NOR | SAB(args[1], args[0], args[2]));
         break;
 
+    case INDEX_op_clz_i32:
+        tcg_out_cntxz(s, TCG_TYPE_I32, CNTLZW, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+    case INDEX_op_ctz_i32:
+        tcg_out_cntxz(s, TCG_TYPE_I32, CNTTZW, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+
+    case INDEX_op_clz_i64:
+        tcg_out_cntxz(s, TCG_TYPE_I64, CNTLZD, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+    case INDEX_op_ctz_i64:
+        tcg_out_cntxz(s, TCG_TYPE_I64, CNTTZD, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+
     case INDEX_op_mul_i32:
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
@@ -2515,6 +2571,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_eqv_i32, { "r", "r", "ri" } },
     { INDEX_op_nand_i32, { "r", "r", "r" } },
     { INDEX_op_nor_i32, { "r", "r", "r" } },
+    { INDEX_op_clz_i32, { "r", "r", "rZW" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rZW" } },
 
     { INDEX_op_shl_i32, { "r", "r", "ri" } },
     { INDEX_op_shr_i32, { "r", "r", "ri" } },
@@ -2563,6 +2621,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_eqv_i64, { "r", "r", "r" } },
     { INDEX_op_nand_i64, { "r", "r", "r" } },
     { INDEX_op_nor_i64, { "r", "r", "r" } },
+    { INDEX_op_clz_i64, { "r", "r", "rZW" } },
+    { INDEX_op_ctz_i64, { "r", "r", "rZW" } },
 
     { INDEX_op_shl_i64, { "r", "r", "ri" } },
     { INDEX_op_shr_i64, { "r", "r", "ri" } },
@@ -2645,9 +2705,16 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 static void tcg_target_init(TCGContext *s)
 {
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
+    unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
+
     if (hwcap & PPC_FEATURE_ARCH_2_06) {
         have_isa_2_06 = true;
     }
+#ifdef PPC_FEATURE2_ARCH_3_00
+    if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
+        have_isa_3_00 = true;
+    }
+#endif
 
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: Handle ctz and clz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (39 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 40/64] tcg/ppc: Handle ctz and clz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-01 18:36   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 42/64] tcg/arm: " Richard Henderson
                   ` (24 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h     |  8 ++++----
 tcg/aarch64/tcg-target.inc.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 976f493..9d6b00f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -62,8 +62,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -96,8 +96,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 17c0b20..91345fc 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -201,6 +201,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
     if ((ct & TCG_CT_CONST_MONE) && val == -1) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_WSZ) && val == (type ? 64 : 32)) {
+        return 1;
+    }
 
     return 0;
 }
@@ -339,8 +342,12 @@ typedef enum {
     /* Conditional select instructions.  */
     I3506_CSEL      = 0x1a800000,
     I3506_CSINC     = 0x1a800400,
+    I3506_CSINV     = 0x5a800000,
+    I3506_CSNEG     = 0x5a800400,
 
     /* Data-processing (1 source) instructions.  */
+    I3507_CLZ       = 0x5ac01000,
+    I3507_RBIT      = 0x5ac00000,
     I3507_REV16     = 0x5ac00400,
     I3507_REV32     = 0x5ac00800,
     I3507_REV64     = 0x5ac00c00,
@@ -993,6 +1000,32 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
     tcg_out32(s, sync[a0 & TCG_MO_ALL]);
 }
 
+static void tcg_out_clz(TCGContext *s, TCGType ext, TCGReg d,
+                        TCGReg a, TCGArg b, bool const_b)
+{
+    if (const_b && b == (ext ? 64 : 32)) {
+        tcg_out_insn(s, 3507, CLZ, ext, d, a);
+    } else {
+        AArch64Insn sel = I3506_CSEL;
+
+        tcg_out_cmp(s, ext, a, 0, 1);
+        tcg_out_insn(s, 3507, CLZ, ext, TCG_REG_TMP, a);
+
+        if (const_b) {
+            if (b == -1) {
+                b = TCG_REG_XZR;
+                sel = I3506_CSINV;
+            } else if (b == 0) {
+                b = TCG_REG_XZR;
+            } else {
+                tcg_out_movi(s, ext, d, b);
+                b = d;
+            }
+        }
+        tcg_out_insn_3506(s, sel, ext, d, TCG_REG_TMP, b, TCG_COND_NE);
+    }
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1559,6 +1592,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_clz_i64:
+    case INDEX_op_clz_i32:
+        tcg_out_clz(s, ext, a0, a1, a2, c2);
+        break;
+    case INDEX_op_ctz_i64:
+    case INDEX_op_ctz_i32:
+        tcg_out_insn(s, 3507, RBIT, ext, TCG_REG_TMP, a1);
+        tcg_out_clz(s, ext, a0, TCG_REG_TMP, a2, c2);
+        break;
+
     case INDEX_op_brcond_i32:
         a1 = (int32_t)a1;
         /* FALLTHRU */
@@ -1750,11 +1793,15 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "r", "ri" } },
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+    { INDEX_op_clz_i32, { "r", "r", "rAL" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rAL" } },
     { INDEX_op_shl_i64, { "r", "r", "ri" } },
     { INDEX_op_shr_i64, { "r", "r", "ri" } },
     { INDEX_op_sar_i64, { "r", "r", "ri" } },
     { INDEX_op_rotl_i64, { "r", "r", "ri" } },
     { INDEX_op_rotr_i64, { "r", "r", "ri" } },
+    { INDEX_op_clz_i64, { "r", "r", "rAL" } },
+    { INDEX_op_ctz_i64, { "r", "r", "rAL" } },
 
     { INDEX_op_brcond_i32, { "r", "rA" } },
     { INDEX_op_brcond_i64, { "r", "rA" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 42/64] tcg/arm: Handle ctz and clz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (40 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-08 17:56   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 43/64] tcg/mips: Handle clz opcode Richard Henderson
                   ` (23 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     |  4 ++--
 tcg/arm/tcg-target.inc.c | 27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 02cc242..4cb94dc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -110,8 +110,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          use_armv5t_instructions
+#define TCG_TARGET_HAS_ctz_i32          use_armv7_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 473c170..2242d21 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -256,6 +256,9 @@ typedef enum {
     ARITH_BIC = 0xe << 21,
     ARITH_MVN = 0xf << 21,
 
+    INSN_CLZ       = 0x016f0f10,
+    INSN_RBIT      = 0x06ff0f30,
+
     INSN_LDR_IMM   = 0x04100000,
     INSN_LDR_REG   = 0x06100000,
     INSN_STR_IMM   = 0x04000000,
@@ -1827,6 +1830,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_ctz_i32:
+        tcg_out_dat_reg(s, COND_AL, INSN_RBIT, TCG_REG_TMP, 0, args[1], 0);
+        a1 = TCG_REG_TMP;
+        goto do_clz;
+
+    case INDEX_op_clz_i32:
+        a1 = args[1];
+    do_clz:
+        a0 = args[0];
+        a2 = args[2];
+        c = const_args[2];
+        if (c && a2 == 32) {
+            tcg_out_dat_reg(s, COND_AL, INSN_CLZ, a0, 0, a1, 0);
+            break;
+        }
+        tcg_out_dat_imm(s, COND_AL, ARITH_CMP, 0, a1, 0);
+        tcg_out_dat_reg(s, COND_NE, INSN_CLZ, a0, 0, a1, 0);
+        if (c || a0 != a2) {
+            tcg_out_dat_rIK(s, COND_EQ, ARITH_MOV, ARITH_MVN, a0, 0, a2, c);
+        }
+        break;
+
     case INDEX_op_brcond_i32:
         tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
                        args[0], args[1], const_args[1]);
@@ -1961,6 +1986,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "r", "ri" } },
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+    { INDEX_op_clz_i32, { "r", "r", "rIK" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rIK" } },
 
     { INDEX_op_brcond_i32, { "r", "rIN" } },
     { INDEX_op_setcond_i32, { "r", "r", "rIN" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 43/64] tcg/mips: Handle clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (41 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 42/64] tcg/arm: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 44/64] tcg/s390: " Richard Henderson
                   ` (22 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.h     |  4 ++--
 tcg/mips/tcg-target.inc.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f133684..0526018 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -109,8 +109,6 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rem_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_nor_i32          1
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_andc_i32         0
 #define TCG_TARGET_HAS_orc_i32          0
 #define TCG_TARGET_HAS_eqv_i32          0
@@ -130,6 +128,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
+#define TCG_TARGET_HAS_clz_i32          use_mips32r2_instructions
+#define TCG_TARGET_HAS_ctz_i32          0
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_neg_i32          0 /* sub  rd, zero, rt   */
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 4341ea2..732246a 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -160,6 +160,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_S16  0x400    /* Signed 16-bit: -32768 - 32767 */
 #define TCG_CT_CONST_P2M1 0x800    /* Power of 2 minus 1.  */
 #define TCG_CT_CONST_N16  0x1000   /* "Negatable" 16-bit: -32767 - 32767 */
+#define TCG_CT_CONST_WSZ  0x2000   /* word size */
 
 static inline bool is_p2m1(tcg_target_long val)
 {
@@ -215,6 +216,9 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'N':
         ct->ct |= TCG_CT_CONST_N16;
         break;
+    case 'W':
+        ct->ct |= TCG_CT_CONST_WSZ;
+        break;
     case 'Z':
         /* We are cheating a bit here, using the fact that the register
            ZERO is also the register number 0. Hence there is no need
@@ -246,6 +250,8 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
     } else if ((ct & TCG_CT_CONST_P2M1)
                && use_mips32r2_instructions && is_p2m1(val)) {
         return 1;
+    } else if ((ct & TCG_CT_CONST_WSZ) && val == 32) {
+        return 1;
     }
     return 0;
 }
@@ -313,6 +319,7 @@ typedef enum {
     OPC_SLTU     = OPC_SPECIAL | 0x2B,
     OPC_SELEQZ   = OPC_SPECIAL | 0x35,
     OPC_SELNEZ   = OPC_SPECIAL | 0x37,
+    OPC_CLZ_R6   = OPC_SPECIAL | 0120,
 
     OPC_REGIMM   = 0x01 << 26,
     OPC_BLTZ     = OPC_REGIMM | (0x00 << 16),
@@ -320,6 +327,7 @@ typedef enum {
 
     OPC_SPECIAL2 = 0x1c << 26,
     OPC_MUL_R5   = OPC_SPECIAL2 | 0x002,
+    OPC_CLZ      = OPC_SPECIAL2 | 040,
 
     OPC_SPECIAL3 = 0x1f << 26,
     OPC_EXT      = OPC_SPECIAL3 | 0x000,
@@ -1625,6 +1633,31 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_clz_i32:
+        if (use_mips32r6_instructions) {
+            if (a2 == 32) {
+                tcg_out_opc_reg(s, OPC_CLZ_R6, a0, a1, 0);
+            } else {
+                tcg_out_opc_reg(s, OPC_CLZ_R6, TCG_TMP0, a1, 0);
+                tcg_out_movcond(s, TCG_COND_EQ, a0, a1, 0, a2, TCG_TMP0);
+            }
+        } else {
+            if (a2 == 32) {
+                tcg_out_opc_reg(s, OPC_CLZ, a0, a1, a1);
+            } else if (a0 == a2) {
+                tcg_out_opc_reg(s, OPC_CLZ, TCG_TMP0, a1, a1);
+                tcg_out_opc_reg(s, OPC_MOVN, a0, TCG_TMP0, a1);
+            } else if (a0 != a1) {
+                tcg_out_opc_reg(s, OPC_CLZ, a0, a1, a1);
+                tcg_out_opc_reg(s, OPC_MOVZ, a0, a2, a1);
+            } else {
+                tcg_out_opc_reg(s, OPC_CLZ, TCG_TMP0, a1, a1);
+                tcg_out_opc_reg(s, OPC_MOVZ, TCG_TMP0, a2, a1);
+                tcg_out_mov(s, TCG_TYPE_REG, a0, TCG_TMP0);
+            }
+        }
+        break;
+
     case INDEX_op_bswap32_i32:
         tcg_out_opc_reg(s, OPC_WSBH, a0, 0, a1);
         tcg_out_opc_sa(s, OPC_ROTR, a0, a0, 16);
@@ -1727,6 +1760,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "rZ", "ri" } },
     { INDEX_op_rotr_i32, { "r", "rZ", "ri" } },
     { INDEX_op_rotl_i32, { "r", "rZ", "ri" } },
+    { INDEX_op_clz_i32,  { "r", "r", "rWZ" } },
 
     { INDEX_op_bswap16_i32, { "r", "r" } },
     { INDEX_op_bswap32_i32, { "r", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 44/64] tcg/s390: Handle clz opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (42 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 43/64] tcg/mips: Handle clz opcode Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 45/64] tcg/i386: Fuly convert tcg_target_op_def Richard Henderson
                   ` (21 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     |  2 +-
 tcg/s390/tcg-target.inc.c | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 3ac2dc9..22500ba 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -110,7 +110,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i64        0
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
-#define TCG_TARGET_HAS_clz_i64        0
+#define TCG_TARGET_HAS_clz_i64        (s390_facilities & FACILITY_EXT_IMM)
 #define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 5275297..e8d56a0 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -50,7 +50,7 @@
 #define TCG_REG_NONE    0
 
 /* A scratch register that may be be used throughout the backend.  */
-#define TCG_TMP0        TCG_REG_R14
+#define TCG_TMP0        TCG_REG_R1
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG TCG_REG_R13
@@ -133,6 +133,7 @@ typedef enum S390Opcode {
     RRE_DLR     = 0xb997,
     RRE_DSGFR   = 0xb91d,
     RRE_DSGR    = 0xb90d,
+    RRE_FLOGR   = 0xb983,
     RRE_LGBR    = 0xb906,
     RRE_LCGR    = 0xb903,
     RRE_LGFR    = 0xb914,
@@ -1241,6 +1242,33 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
     }
 }
 
+static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
+                     TCGArg a2, int a2const)
+{
+    /* Since this sets both R and R+1, we have no choice but to store the
+       result into R0, allowing R1 == TCG_TMP0 to be clobbered as well.  */
+    QEMU_BUILD_BUG_ON(TCG_TMP0 != TCG_REG_R1);
+    tcg_out_insn(s, RRE, FLOGR, TCG_REG_R0, a1);
+
+    if (a2const && a2 == 64) {
+        tcg_out_mov(s, TCG_TYPE_I64, dest, TCG_REG_R0);
+    } else {
+        if (a2const) {
+            tcg_out_movi(s, TCG_TYPE_I64, dest, a2);
+        } else {
+            tcg_out_mov(s, TCG_TYPE_I64, dest, a2);
+        }
+        if (s390_facilities & FACILITY_LOAD_ON_COND) {
+            /* Emit: if (one bit found) dest = r0.  */
+            tcg_out_insn(s, RRF, LOCGR, dest, TCG_REG_R0, 2);
+        } else {
+            /* Emit: if (no one bit found) goto over; dest = r0; over:  */
+            tcg_out_insn(s, RI, BRC, 8, (4 + 4) >> 1);
+            tcg_out_insn(s, RRE, LGR, dest, TCG_REG_R0);
+        }
+    }
+}
+
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
                          int ofs, int len, int z)
 {
@@ -2181,6 +2209,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_extract(s, args[0], args[1], args[2], args[3]);
         break;
 
+    case INDEX_op_clz_i64:
+        tgen_clz(s, args[0], args[1], args[2], const_args[2]);
+        break;
+
     case INDEX_op_mb:
         /* The host memory model is quite strong, we simply need to
            serialize the instruction stream.  */
@@ -2304,6 +2336,8 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
 
+    { INDEX_op_clz_i64, { "r", "r", "ri" } },
+
     { INDEX_op_add2_i64, { "r", "r", "0", "1", "rA", "r" } },
     { INDEX_op_sub2_i64, { "r", "r", "0", "1", "rA", "r" } },
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 45/64] tcg/i386: Fuly convert tcg_target_op_def
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (43 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 44/64] tcg/s390: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 46/64] tcg/i386: Hoist common arguments in tcg_out_op Richard Henderson
                   ` (20 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Use a switch instead of searching a table.  Share constraints between
32-bit and 64-bit, when at all possible.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 340 +++++++++++++++++++++++++++-------------------
 1 file changed, 198 insertions(+), 142 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index aa5a248..e497bef 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -237,13 +237,13 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         break;
 
     case 'e':
-        ct->ct |= TCG_CT_CONST_S32;
+        ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_S32);
         break;
     case 'Z':
-        ct->ct |= TCG_CT_CONST_U32;
+        ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_U32);
         break;
     case 'I':
-        ct->ct |= TCG_CT_CONST_I32;
+        ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_I32);
         break;
 
     default:
@@ -2188,152 +2188,208 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 #undef OP_32_64
 }
 
-static const TCGTargetOpDef x86_op_defs[] = {
-    { INDEX_op_exit_tb, { } },
-    { INDEX_op_goto_tb, { } },
-    { INDEX_op_br, { } },
-    { INDEX_op_ld8u_i32, { "r", "r" } },
-    { INDEX_op_ld8s_i32, { "r", "r" } },
-    { INDEX_op_ld16u_i32, { "r", "r" } },
-    { INDEX_op_ld16s_i32, { "r", "r" } },
-    { INDEX_op_ld_i32, { "r", "r" } },
-    { INDEX_op_st8_i32, { "qi", "r" } },
-    { INDEX_op_st16_i32, { "ri", "r" } },
-    { INDEX_op_st_i32, { "ri", "r" } },
-
-    { INDEX_op_add_i32, { "r", "r", "ri" } },
-    { INDEX_op_sub_i32, { "r", "0", "ri" } },
-    { INDEX_op_mul_i32, { "r", "0", "ri" } },
-    { INDEX_op_div2_i32, { "a", "d", "0", "1", "r" } },
-    { INDEX_op_divu2_i32, { "a", "d", "0", "1", "r" } },
-    { INDEX_op_and_i32, { "r", "0", "ri" } },
-    { INDEX_op_or_i32, { "r", "0", "ri" } },
-    { INDEX_op_xor_i32, { "r", "0", "ri" } },
-    { INDEX_op_andc_i32, { "r", "r", "ri" } },
-
-    { INDEX_op_shl_i32, { "r", "0", "Ci" } },
-    { INDEX_op_shr_i32, { "r", "0", "Ci" } },
-    { INDEX_op_sar_i32, { "r", "0", "Ci" } },
-    { INDEX_op_rotl_i32, { "r", "0", "ci" } },
-    { INDEX_op_rotr_i32, { "r", "0", "ci" } },
-
-    { INDEX_op_brcond_i32, { "r", "ri" } },
-
-    { INDEX_op_bswap16_i32, { "r", "0" } },
-    { INDEX_op_bswap32_i32, { "r", "0" } },
-
-    { INDEX_op_neg_i32, { "r", "0" } },
-
-    { INDEX_op_not_i32, { "r", "0" } },
-
-    { INDEX_op_ext8s_i32, { "r", "q" } },
-    { INDEX_op_ext16s_i32, { "r", "r" } },
-    { INDEX_op_ext8u_i32, { "r", "q" } },
-    { INDEX_op_ext16u_i32, { "r", "r" } },
-
-    { INDEX_op_setcond_i32, { "q", "r", "ri" } },
-
-    { INDEX_op_deposit_i32, { "Q", "0", "Q" } },
-    { INDEX_op_extract_i32, { "r", "r" } },
-    { INDEX_op_sextract_i32, { "r", "r" } },
-
-    { INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+    static const TCGTargetOpDef ri_r = { .args_ct_str = { "ri", "r" } };
+    static const TCGTargetOpDef re_r = { .args_ct_str = { "re", "r" } };
+    static const TCGTargetOpDef qi_r = { .args_ct_str = { "qi", "r" } };
+    static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } };
+    static const TCGTargetOpDef r_q = { .args_ct_str = { "r", "q" } };
+    static const TCGTargetOpDef r_re = { .args_ct_str = { "r", "re" } };
+    static const TCGTargetOpDef r_0 = { .args_ct_str = { "r", "0" } };
+    static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
+    static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
+    static const TCGTargetOpDef r_0_Ci = { .args_ct_str = { "r", "0", "Ci" } };
+    static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
+    static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
+    static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
+    static const TCGTargetOpDef r_L_L = { .args_ct_str = { "r", "L", "L" } };
+    static const TCGTargetOpDef r_r_L = { .args_ct_str = { "r", "r", "L" } };
+    static const TCGTargetOpDef L_L_L = { .args_ct_str = { "L", "L", "L" } };
+    static const TCGTargetOpDef r_r_L_L
+        = { .args_ct_str = { "r", "r", "L", "L" } };
+    static const TCGTargetOpDef L_L_L_L
+        = { .args_ct_str = { "L", "L", "L", "L" } };
+
+    switch (op) {
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8u_i64:
+    case INDEX_op_ld8s_i32:
+    case INDEX_op_ld8s_i64:
+    case INDEX_op_ld16u_i32:
+    case INDEX_op_ld16u_i64:
+    case INDEX_op_ld16s_i32:
+    case INDEX_op_ld16s_i64:
+    case INDEX_op_ld_i32:
+    case INDEX_op_ld32u_i64:
+    case INDEX_op_ld32s_i64:
+    case INDEX_op_ld_i64:
+        return &r_r;
 
-    { INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
-    { INDEX_op_muls2_i32, { "a", "d", "a", "r" } },
-    { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
-    { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
+    case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64:
+        return &qi_r;
+    case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64:
+    case INDEX_op_st_i32:
+    case INDEX_op_st32_i64:
+        return &ri_r;
+    case INDEX_op_st_i64:
+        return &re_r;
+
+    case INDEX_op_add_i32:
+    case INDEX_op_add_i64:
+        return &r_r_re;
+    case INDEX_op_sub_i32:
+    case INDEX_op_sub_i64:
+    case INDEX_op_mul_i32:
+    case INDEX_op_mul_i64:
+    case INDEX_op_or_i32:
+    case INDEX_op_or_i64:
+    case INDEX_op_xor_i32:
+    case INDEX_op_xor_i64:
+        return &r_0_re;
+
+    case INDEX_op_and_i32:
+    case INDEX_op_and_i64:
+        {
+            static const TCGTargetOpDef and
+                = { .args_ct_str = { "r", "0", "reZ" } };
+            return &and;
+        }
+        break;
+    case INDEX_op_andc_i32:
+    case INDEX_op_andc_i64:
+        {
+            static const TCGTargetOpDef andc
+                = { .args_ct_str = { "r", "r", "rI" } };
+            return &andc;
+        }
+        break;
 
-    { INDEX_op_mb, { } },
+    case INDEX_op_shl_i32:
+    case INDEX_op_shl_i64:
+    case INDEX_op_shr_i32:
+    case INDEX_op_shr_i64:
+    case INDEX_op_sar_i32:
+    case INDEX_op_sar_i64:
+        return &r_0_Ci;
+    case INDEX_op_rotl_i32:
+    case INDEX_op_rotl_i64:
+    case INDEX_op_rotr_i32:
+    case INDEX_op_rotr_i64:
+        return &r_0_ci;
 
-#if TCG_TARGET_REG_BITS == 32
-    { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
-    { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
-#else
-    { INDEX_op_ld8u_i64, { "r", "r" } },
-    { INDEX_op_ld8s_i64, { "r", "r" } },
-    { INDEX_op_ld16u_i64, { "r", "r" } },
-    { INDEX_op_ld16s_i64, { "r", "r" } },
-    { INDEX_op_ld32u_i64, { "r", "r" } },
-    { INDEX_op_ld32s_i64, { "r", "r" } },
-    { INDEX_op_ld_i64, { "r", "r" } },
-    { INDEX_op_st8_i64, { "ri", "r" } },
-    { INDEX_op_st16_i64, { "ri", "r" } },
-    { INDEX_op_st32_i64, { "ri", "r" } },
-    { INDEX_op_st_i64, { "re", "r" } },
-
-    { INDEX_op_add_i64, { "r", "r", "re" } },
-    { INDEX_op_mul_i64, { "r", "0", "re" } },
-    { INDEX_op_div2_i64, { "a", "d", "0", "1", "r" } },
-    { INDEX_op_divu2_i64, { "a", "d", "0", "1", "r" } },
-    { INDEX_op_sub_i64, { "r", "0", "re" } },
-    { INDEX_op_and_i64, { "r", "0", "reZ" } },
-    { INDEX_op_or_i64, { "r", "0", "re" } },
-    { INDEX_op_xor_i64, { "r", "0", "re" } },
-    { INDEX_op_andc_i64, { "r", "r", "rI" } },
-
-    { INDEX_op_shl_i64, { "r", "0", "Ci" } },
-    { INDEX_op_shr_i64, { "r", "0", "Ci" } },
-    { INDEX_op_sar_i64, { "r", "0", "Ci" } },
-    { INDEX_op_rotl_i64, { "r", "0", "ci" } },
-    { INDEX_op_rotr_i64, { "r", "0", "ci" } },
-
-    { INDEX_op_brcond_i64, { "r", "re" } },
-    { INDEX_op_setcond_i64, { "r", "r", "re" } },
-
-    { INDEX_op_bswap16_i64, { "r", "0" } },
-    { INDEX_op_bswap32_i64, { "r", "0" } },
-    { INDEX_op_bswap64_i64, { "r", "0" } },
-    { INDEX_op_neg_i64, { "r", "0" } },
-    { INDEX_op_not_i64, { "r", "0" } },
-
-    { INDEX_op_ext8s_i64, { "r", "r" } },
-    { INDEX_op_ext16s_i64, { "r", "r" } },
-    { INDEX_op_ext32s_i64, { "r", "r" } },
-    { INDEX_op_ext8u_i64, { "r", "r" } },
-    { INDEX_op_ext16u_i64, { "r", "r" } },
-    { INDEX_op_ext32u_i64, { "r", "r" } },
-
-    { INDEX_op_ext_i32_i64, { "r", "r" } },
-    { INDEX_op_extu_i32_i64, { "r", "r" } },
-
-    { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
-    { INDEX_op_extract_i64, { "r", "r" } },
-    { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
-
-    { INDEX_op_mulu2_i64, { "a", "d", "a", "r" } },
-    { INDEX_op_muls2_i64, { "a", "d", "a", "r" } },
-    { INDEX_op_add2_i64, { "r", "r", "0", "1", "re", "re" } },
-    { INDEX_op_sub2_i64, { "r", "r", "0", "1", "re", "re" } },
-#endif
+    case INDEX_op_brcond_i32:
+    case INDEX_op_brcond_i64:
+        return &r_re;
 
-#if TCG_TARGET_REG_BITS == 64
-    { INDEX_op_qemu_ld_i32, { "r", "L" } },
-    { INDEX_op_qemu_st_i32, { "L", "L" } },
-    { INDEX_op_qemu_ld_i64, { "r", "L" } },
-    { INDEX_op_qemu_st_i64, { "L", "L" } },
-#elif TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
-    { INDEX_op_qemu_ld_i32, { "r", "L" } },
-    { INDEX_op_qemu_st_i32, { "L", "L" } },
-    { INDEX_op_qemu_ld_i64, { "r", "r", "L" } },
-    { INDEX_op_qemu_st_i64, { "L", "L", "L" } },
-#else
-    { INDEX_op_qemu_ld_i32, { "r", "L", "L" } },
-    { INDEX_op_qemu_st_i32, { "L", "L", "L" } },
-    { INDEX_op_qemu_ld_i64, { "r", "r", "L", "L" } },
-    { INDEX_op_qemu_st_i64, { "L", "L", "L", "L" } },
-#endif
-    { -1 },
-};
+    case INDEX_op_bswap16_i32:
+    case INDEX_op_bswap16_i64:
+    case INDEX_op_bswap32_i32:
+    case INDEX_op_bswap32_i64:
+    case INDEX_op_bswap64_i64:
+    case INDEX_op_neg_i32:
+    case INDEX_op_neg_i64:
+    case INDEX_op_not_i32:
+    case INDEX_op_not_i64:
+        return &r_0;
+
+    case INDEX_op_ext8s_i32:
+    case INDEX_op_ext8s_i64:
+    case INDEX_op_ext8u_i32:
+    case INDEX_op_ext8u_i64:
+        return &r_q;
+    case INDEX_op_ext16s_i32:
+    case INDEX_op_ext16s_i64:
+    case INDEX_op_ext16u_i32:
+    case INDEX_op_ext16u_i64:
+    case INDEX_op_ext32s_i64:
+    case INDEX_op_ext32u_i64:
+    case INDEX_op_ext_i32_i64:
+    case INDEX_op_extu_i32_i64:
+    case INDEX_op_extract_i32:
+    case INDEX_op_extract_i64:
+    case INDEX_op_sextract_i32:
+        return &r_r;
+
+    case INDEX_op_deposit_i32:
+    case INDEX_op_deposit_i64:
+        {
+            static const TCGTargetOpDef dep
+                = { .args_ct_str = { "Q", "0", "Q" } };
+            return &dep;
+        }
+    case INDEX_op_setcond_i32:
+    case INDEX_op_setcond_i64:
+        {
+            static const TCGTargetOpDef setc
+                = { .args_ct_str = { "q", "r", "re" } };
+            return &setc;
+        }
+    case INDEX_op_movcond_i32:
+    case INDEX_op_movcond_i64:
+        {
+            static const TCGTargetOpDef movc
+                = { .args_ct_str = { "r", "r", "re", "r", "0" } };
+            return &movc;
+        }
+    case INDEX_op_div2_i32:
+    case INDEX_op_div2_i64:
+    case INDEX_op_divu2_i32:
+    case INDEX_op_divu2_i64:
+        {
+            static const TCGTargetOpDef div2
+                = { .args_ct_str = { "a", "d", "0", "1", "r" } };
+            return &div2;
+        }
+    case INDEX_op_mulu2_i32:
+    case INDEX_op_mulu2_i64:
+    case INDEX_op_muls2_i32:
+    case INDEX_op_muls2_i64:
+        {
+            static const TCGTargetOpDef mul2
+                = { .args_ct_str = { "a", "d", "a", "r" } };
+            return &mul2;
+        }
+    case INDEX_op_add2_i32:
+    case INDEX_op_add2_i64:
+    case INDEX_op_sub2_i32:
+    case INDEX_op_sub2_i64:
+        {
+            static const TCGTargetOpDef arith2
+                = { .args_ct_str = { "r", "r", "0", "1", "re", "re" } };
+            return &arith2;
+        }
 
-static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
-{
-    int i, n = ARRAY_SIZE(x86_op_defs);
+    case INDEX_op_qemu_ld_i32:
+        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_L : &r_L_L;
+    case INDEX_op_qemu_st_i32:
+        return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L : &L_L_L;
+    case INDEX_op_qemu_ld_i64:
+        return (TCG_TARGET_REG_BITS == 64 ? &r_L
+                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_r_L
+                : &r_r_L_L);
+    case INDEX_op_qemu_st_i64:
+        return (TCG_TARGET_REG_BITS == 64 ? &L_L
+                : TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &L_L_L
+                : &L_L_L_L);
 
-    for (i = 0; i < n; ++i) {
-        if (x86_op_defs[i].op == op) {
-            return &x86_op_defs[i];
+    case INDEX_op_brcond2_i32:
+        {
+            static const TCGTargetOpDef b2
+                = { .args_ct_str = { "r", "r", "ri", "ri" } };
+            return &b2;
+        }
+    case INDEX_op_setcond2_i32:
+        {
+            static const TCGTargetOpDef s2
+                = { .args_ct_str = { "r", "r", "r", "ri", "ri" } };
+            return &s2;
         }
+
+    default:
+        break;
     }
     return NULL;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 46/64] tcg/i386: Hoist common arguments in tcg_out_op
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (44 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 45/64] tcg/i386: Fuly convert tcg_target_op_def Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 47/64] tcg/i386: Allow bmi2 shiftx to have non-matching operands Richard Henderson
                   ` (19 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 197 ++++++++++++++++++++++------------------------
 1 file changed, 95 insertions(+), 102 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index e497bef..83572ac 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1791,7 +1791,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                               const TCGArg *args, const int *const_args)
 {
-    int c, vexop, rexw = 0;
+    TCGArg a0, a1, a2;
+    int c, const_a2, vexop, rexw = 0;
 
 #if TCG_TARGET_REG_BITS == 64
 # define OP_32_64(x) \
@@ -1803,9 +1804,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         case glue(glue(INDEX_op_, x), _i32)
 #endif
 
-    switch(opc) {
+    /* Hoist the loads of the most common arguments.  */
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
+    const_a2 = const_args[2];
+
+    switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, args[0]);
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, a0);
         tcg_out_jmp(s, tb_ret_addr);
         break;
     case INDEX_op_goto_tb:
@@ -1820,57 +1827,53 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 tcg_out_nopn(s, gap - 1);
             }
             tcg_out8(s, OPC_JMP_long); /* jmp im */
-            s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
+            s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
             tcg_out32(s, 0);
         } else {
             /* indirect jump method */
             tcg_out_modrm_offset(s, OPC_GRP5, EXT5_JMPN_Ev, -1,
-                                 (intptr_t)(s->tb_jmp_target_addr + args[0]));
+                                 (intptr_t)(s->tb_jmp_target_addr + a0));
         }
-        s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
+        s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
     case INDEX_op_br:
-        tcg_out_jxx(s, JCC_JMP, arg_label(args[0]), 0);
+        tcg_out_jxx(s, JCC_JMP, arg_label(a0), 0);
         break;
     OP_32_64(ld8u):
         /* Note that we can ignore REXW for the zero-extend to 64-bit.  */
-        tcg_out_modrm_offset(s, OPC_MOVZBL, args[0], args[1], args[2]);
+        tcg_out_modrm_offset(s, OPC_MOVZBL, a0, a1, a2);
         break;
     OP_32_64(ld8s):
-        tcg_out_modrm_offset(s, OPC_MOVSBL + rexw, args[0], args[1], args[2]);
+        tcg_out_modrm_offset(s, OPC_MOVSBL + rexw, a0, a1, a2);
         break;
     OP_32_64(ld16u):
         /* Note that we can ignore REXW for the zero-extend to 64-bit.  */
-        tcg_out_modrm_offset(s, OPC_MOVZWL, args[0], args[1], args[2]);
+        tcg_out_modrm_offset(s, OPC_MOVZWL, a0, a1, a2);
         break;
     OP_32_64(ld16s):
-        tcg_out_modrm_offset(s, OPC_MOVSWL + rexw, args[0], args[1], args[2]);
+        tcg_out_modrm_offset(s, OPC_MOVSWL + rexw, a0, a1, a2);
         break;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_ld32u_i64:
 #endif
     case INDEX_op_ld_i32:
-        tcg_out_ld(s, TCG_TYPE_I32, args[0], args[1], args[2]);
+        tcg_out_ld(s, TCG_TYPE_I32, a0, a1, a2);
         break;
 
     OP_32_64(st8):
         if (const_args[0]) {
-            tcg_out_modrm_offset(s, OPC_MOVB_EvIz,
-                                 0, args[1], args[2]);
-            tcg_out8(s, args[0]);
+            tcg_out_modrm_offset(s, OPC_MOVB_EvIz, 0, a1, a2);
+            tcg_out8(s, a0);
         } else {
-            tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R,
-                                 args[0], args[1], args[2]);
+            tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R, a0, a1, a2);
         }
         break;
     OP_32_64(st16):
         if (const_args[0]) {
-            tcg_out_modrm_offset(s, OPC_MOVL_EvIz | P_DATA16,
-                                 0, args[1], args[2]);
-            tcg_out16(s, args[0]);
+            tcg_out_modrm_offset(s, OPC_MOVL_EvIz | P_DATA16, 0, a1, a2);
+            tcg_out16(s, a0);
         } else {
-            tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_DATA16,
-                                 args[0], args[1], args[2]);
+            tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_DATA16, a0, a1, a2);
         }
         break;
 #if TCG_TARGET_REG_BITS == 64
@@ -1878,19 +1881,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 #endif
     case INDEX_op_st_i32:
         if (const_args[0]) {
-            tcg_out_modrm_offset(s, OPC_MOVL_EvIz, 0, args[1], args[2]);
-            tcg_out32(s, args[0]);
+            tcg_out_modrm_offset(s, OPC_MOVL_EvIz, 0, a1, a2);
+            tcg_out32(s, a0);
         } else {
-            tcg_out_st(s, TCG_TYPE_I32, args[0], args[1], args[2]);
+            tcg_out_st(s, TCG_TYPE_I32, a0, a1, a2);
         }
         break;
 
     OP_32_64(add):
         /* For 3-operand addition, use LEA.  */
-        if (args[0] != args[1]) {
-            TCGArg a0 = args[0], a1 = args[1], a2 = args[2], c3 = 0;
-
-            if (const_args[2]) {
+        if (a0 != a1) {
+            TCGArg c3 = 0;
+            if (const_a2) {
                 c3 = a2, a2 = -1;
             } else if (a0 == a2) {
                 /* Watch out for dest = src + dest, since we've removed
@@ -1917,36 +1919,35 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         c = ARITH_XOR;
         goto gen_arith;
     gen_arith:
-        if (const_args[2]) {
-            tgen_arithi(s, c + rexw, args[0], args[2], 0);
+        if (const_a2) {
+            tgen_arithi(s, c + rexw, a0, a2, 0);
         } else {
-            tgen_arithr(s, c + rexw, args[0], args[2]);
+            tgen_arithr(s, c + rexw, a0, a2);
         }
         break;
 
     OP_32_64(andc):
-        if (const_args[2]) {
-            tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32,
-                        args[0], args[1]);
-            tgen_arithi(s, ARITH_AND + rexw, args[0], ~args[2], 0);
+        if (const_a2) {
+            tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, a0, a1);
+            tgen_arithi(s, ARITH_AND + rexw, a0, ~a2, 0);
         } else {
-            tcg_out_vex_modrm(s, OPC_ANDN + rexw, args[0], args[2], args[1]);
+            tcg_out_vex_modrm(s, OPC_ANDN + rexw, a0, a2, a1);
         }
         break;
 
     OP_32_64(mul):
-        if (const_args[2]) {
+        if (const_a2) {
             int32_t val;
-            val = args[2];
+            val = a2;
             if (val == (int8_t)val) {
-                tcg_out_modrm(s, OPC_IMUL_GvEvIb + rexw, args[0], args[0]);
+                tcg_out_modrm(s, OPC_IMUL_GvEvIb + rexw, a0, a0);
                 tcg_out8(s, val);
             } else {
-                tcg_out_modrm(s, OPC_IMUL_GvEvIz + rexw, args[0], args[0]);
+                tcg_out_modrm(s, OPC_IMUL_GvEvIz + rexw, a0, a0);
                 tcg_out32(s, val);
             }
         } else {
-            tcg_out_modrm(s, OPC_IMUL_GvEv + rexw, args[0], args[2]);
+            tcg_out_modrm(s, OPC_IMUL_GvEv + rexw, a0, a2);
         }
         break;
 
@@ -1976,57 +1977,54 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         c = SHIFT_ROR;
         goto gen_shift;
     gen_shift_maybe_vex:
-        if (have_bmi2 && !const_args[2]) {
-            tcg_out_vex_modrm(s, vexop + rexw, args[0], args[2], args[1]);
+        if (have_bmi2 && !const_a2) {
+            tcg_out_vex_modrm(s, vexop + rexw, a0, a2, a1);
             break;
         }
         /* FALLTHRU */
     gen_shift:
-        if (const_args[2]) {
-            tcg_out_shifti(s, c + rexw, args[0], args[2]);
+        if (const_a2) {
+            tcg_out_shifti(s, c + rexw, a0, a2);
         } else {
-            tcg_out_modrm(s, OPC_SHIFT_cl + rexw, c, args[0]);
+            tcg_out_modrm(s, OPC_SHIFT_cl + rexw, c, a0);
         }
         break;
 
     case INDEX_op_brcond_i32:
-        tcg_out_brcond32(s, args[2], args[0], args[1], const_args[1],
-                         arg_label(args[3]), 0);
+        tcg_out_brcond32(s, a2, a0, a1, const_args[1], arg_label(args[3]), 0);
         break;
     case INDEX_op_setcond_i32:
-        tcg_out_setcond32(s, args[3], args[0], args[1],
-                          args[2], const_args[2]);
+        tcg_out_setcond32(s, args[3], a0, a1, a2, const_a2);
         break;
     case INDEX_op_movcond_i32:
-        tcg_out_movcond32(s, args[5], args[0], args[1],
-                          args[2], const_args[2], args[3]);
+        tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
         break;
 
     OP_32_64(bswap16):
-        tcg_out_rolw_8(s, args[0]);
+        tcg_out_rolw_8(s, a0);
         break;
     OP_32_64(bswap32):
-        tcg_out_bswap32(s, args[0]);
+        tcg_out_bswap32(s, a0);
         break;
 
     OP_32_64(neg):
-        tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NEG, args[0]);
+        tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NEG, a0);
         break;
     OP_32_64(not):
-        tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, args[0]);
+        tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, a0);
         break;
 
     OP_32_64(ext8s):
-        tcg_out_ext8s(s, args[0], args[1], rexw);
+        tcg_out_ext8s(s, a0, a1, rexw);
         break;
     OP_32_64(ext16s):
-        tcg_out_ext16s(s, args[0], args[1], rexw);
+        tcg_out_ext16s(s, a0, a1, rexw);
         break;
     OP_32_64(ext8u):
-        tcg_out_ext8u(s, args[0], args[1]);
+        tcg_out_ext8u(s, a0, a1);
         break;
     OP_32_64(ext16u):
-        tcg_out_ext16u(s, args[0], args[1]);
+        tcg_out_ext16u(s, a0, a1);
         break;
 
     case INDEX_op_qemu_ld_i32:
@@ -2050,26 +2048,26 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     OP_32_64(add2):
         if (const_args[4]) {
-            tgen_arithi(s, ARITH_ADD + rexw, args[0], args[4], 1);
+            tgen_arithi(s, ARITH_ADD + rexw, a0, args[4], 1);
         } else {
-            tgen_arithr(s, ARITH_ADD + rexw, args[0], args[4]);
+            tgen_arithr(s, ARITH_ADD + rexw, a0, args[4]);
         }
         if (const_args[5]) {
-            tgen_arithi(s, ARITH_ADC + rexw, args[1], args[5], 1);
+            tgen_arithi(s, ARITH_ADC + rexw, a1, args[5], 1);
         } else {
-            tgen_arithr(s, ARITH_ADC + rexw, args[1], args[5]);
+            tgen_arithr(s, ARITH_ADC + rexw, a1, args[5]);
         }
         break;
     OP_32_64(sub2):
         if (const_args[4]) {
-            tgen_arithi(s, ARITH_SUB + rexw, args[0], args[4], 1);
+            tgen_arithi(s, ARITH_SUB + rexw, a0, args[4], 1);
         } else {
-            tgen_arithr(s, ARITH_SUB + rexw, args[0], args[4]);
+            tgen_arithr(s, ARITH_SUB + rexw, a0, args[4]);
         }
         if (const_args[5]) {
-            tgen_arithi(s, ARITH_SBB + rexw, args[1], args[5], 1);
+            tgen_arithi(s, ARITH_SBB + rexw, a1, args[5], 1);
         } else {
-            tgen_arithr(s, ARITH_SBB + rexw, args[1], args[5]);
+            tgen_arithr(s, ARITH_SBB + rexw, a1, args[5]);
         }
         break;
 
@@ -2082,68 +2080,63 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 #else /* TCG_TARGET_REG_BITS == 64 */
     case INDEX_op_ld32s_i64:
-        tcg_out_modrm_offset(s, OPC_MOVSLQ, args[0], args[1], args[2]);
+        tcg_out_modrm_offset(s, OPC_MOVSLQ, a0, a1, a2);
         break;
     case INDEX_op_ld_i64:
-        tcg_out_ld(s, TCG_TYPE_I64, args[0], args[1], args[2]);
+        tcg_out_ld(s, TCG_TYPE_I64, a0, a1, a2);
         break;
     case INDEX_op_st_i64:
         if (const_args[0]) {
-            tcg_out_modrm_offset(s, OPC_MOVL_EvIz | P_REXW,
-                                 0, args[1], args[2]);
-            tcg_out32(s, args[0]);
+            tcg_out_modrm_offset(s, OPC_MOVL_EvIz | P_REXW, 0, a1, a2);
+            tcg_out32(s, a0);
         } else {
-            tcg_out_st(s, TCG_TYPE_I64, args[0], args[1], args[2]);
+            tcg_out_st(s, TCG_TYPE_I64, a0, a1, a2);
         }
         break;
 
     case INDEX_op_brcond_i64:
-        tcg_out_brcond64(s, args[2], args[0], args[1], const_args[1],
-                         arg_label(args[3]), 0);
+        tcg_out_brcond64(s, a2, a0, a1, const_args[1], arg_label(args[3]), 0);
         break;
     case INDEX_op_setcond_i64:
-        tcg_out_setcond64(s, args[3], args[0], args[1],
-                          args[2], const_args[2]);
+        tcg_out_setcond64(s, args[3], a0, a1, a2, const_a2);
         break;
     case INDEX_op_movcond_i64:
-        tcg_out_movcond64(s, args[5], args[0], args[1],
-                          args[2], const_args[2], args[3]);
+        tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
         break;
 
     case INDEX_op_bswap64_i64:
-        tcg_out_bswap64(s, args[0]);
+        tcg_out_bswap64(s, a0);
         break;
     case INDEX_op_extu_i32_i64:
     case INDEX_op_ext32u_i64:
-        tcg_out_ext32u(s, args[0], args[1]);
+        tcg_out_ext32u(s, a0, a1);
         break;
     case INDEX_op_ext_i32_i64:
     case INDEX_op_ext32s_i64:
-        tcg_out_ext32s(s, args[0], args[1]);
+        tcg_out_ext32s(s, a0, a1);
         break;
 #endif
 
     OP_32_64(deposit):
         if (args[3] == 0 && args[4] == 8) {
             /* load bits 0..7 */
-            tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM,
-                          args[2], args[0]);
+            tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
         } else if (args[3] == 8 && args[4] == 8) {
             /* load bits 8..15 */
-            tcg_out_modrm(s, OPC_MOVB_EvGv, args[2], args[0] + 4);
+            tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
         } else if (args[3] == 0 && args[4] == 16) {
             /* load bits 0..15 */
-            tcg_out_modrm(s, OPC_MOVL_EvGv | P_DATA16, args[2], args[0]);
+            tcg_out_modrm(s, OPC_MOVL_EvGv | P_DATA16, a2, a0);
         } else {
             tcg_abort();
         }
         break;
 
     case INDEX_op_extract_i64:
-        if (args[2] + args[3] == 32) {
+        if (a2 + args[3] == 32) {
             /* This is a 32-bit zero-extending right shift.  */
-            tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
-            tcg_out_shifti(s, SHIFT_SHR, args[0], args[2]);
+            tcg_out_mov(s, TCG_TYPE_I32, a0, a1);
+            tcg_out_shifti(s, SHIFT_SHR, a0, a2);
             break;
         }
         /* FALLTHRU */
@@ -2151,12 +2144,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* On the off-chance that we can use the high-byte registers.
            Otherwise we emit the same ext16 + shift pattern that we
            would have gotten from the normal tcg-op.c expansion.  */
-        tcg_debug_assert(args[2] == 8 && args[3] == 8);
-        if (args[1] < 4 && args[0] < 8) {
-            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
+        tcg_debug_assert(a2 == 8 && args[3] == 8);
+        if (a1 < 4 && a0 < 8) {
+            tcg_out_modrm(s, OPC_MOVZBL, a0, a1 + 4);
         } else {
-            tcg_out_ext16u(s, args[0], args[1]);
-            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
+            tcg_out_ext16u(s, a0, a1);
+            tcg_out_shifti(s, SHIFT_SHR, a0, 8);
         }
         break;
 
@@ -2164,17 +2157,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* We don't implement sextract_i64, as we cannot sign-extend to
            64-bits without using the REX prefix that explicitly excludes
            access to the high-byte registers.  */
-        tcg_debug_assert(args[2] == 8 && args[3] == 8);
-        if (args[1] < 4 && args[0] < 8) {
-            tcg_out_modrm(s, OPC_MOVSBL, args[0], args[1] + 4);
+        tcg_debug_assert(a2 == 8 && args[3] == 8);
+        if (a1 < 4 && a0 < 8) {
+            tcg_out_modrm(s, OPC_MOVSBL, a0, a1 + 4);
         } else {
-            tcg_out_ext16s(s, args[0], args[1], 0);
-            tcg_out_shifti(s, SHIFT_SAR, args[0], 8);
+            tcg_out_ext16s(s, a0, a1, 0);
+            tcg_out_shifti(s, SHIFT_SAR, a0, 8);
         }
         break;
 
     case INDEX_op_mb:
-        tcg_out_mb(s, args[0]);
+        tcg_out_mb(s, a0);
         break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 47/64] tcg/i386: Allow bmi2 shiftx to have non-matching operands
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (45 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 46/64] tcg/i386: Hoist common arguments in tcg_out_op Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 48/64] tcg/i386: Handle ctz and clz opcodes Richard Henderson
                   ` (18 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Previously we could not have different constraints for different ISA levels,
which prevented us from eliding the matching constraint for shifts.

We do now have to make sure that the operands match for constant shifts.
We can also handle some small left shifts via lea.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 83572ac..651d96c 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -179,7 +179,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         tcg_regset_set_reg(ct->u.regs, TCG_REG_EBX);
         break;
     case 'c':
-    case_c:
         ct->ct |= TCG_CT_REG;
         tcg_regset_set_reg(ct->u.regs, TCG_REG_ECX);
         break;
@@ -208,7 +207,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
         tcg_regset_set32(ct->u.regs, 0, 0xf);
         break;
     case 'r':
-    case_r:
         ct->ct |= TCG_CT_REG;
         if (TCG_TARGET_REG_BITS == 64) {
             tcg_regset_set32(ct->u.regs, 0, 0xffff);
@@ -216,13 +214,6 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
             tcg_regset_set32(ct->u.regs, 0, 0xff);
         }
         break;
-    case 'C':
-        /* With SHRX et al, we need not use ECX as shift count register.  */
-        if (have_bmi2) {
-            goto case_r;
-        } else {
-            goto case_c;
-        }
 
         /* qemu_ld/st address constraint */
     case 'L':
@@ -1959,6 +1950,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(shl):
+        /* For small constant 3-operand shift, use LEA.  */
+        if (const_a2 && a0 != a1 && (a2 - 1) < 3) {
+            if (a2 - 1 == 0) {
+                /* shl $1,a1,a0 -> lea (a1,a1),a0 */
+                tcg_out_modrm_sib_offset(s, OPC_LEA + rexw, a0, a1, a1, 0, 0);
+            } else {
+                /* shl $n,a1,a0 -> lea 0(,a1,n),a0 */
+                tcg_out_modrm_sib_offset(s, OPC_LEA + rexw, a0, -1, a1, a2, 0);
+            }
+            break;
+        }
         c = SHIFT_SHL;
         vexop = OPC_SHLX;
         goto gen_shift_maybe_vex;
@@ -1977,9 +1979,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         c = SHIFT_ROR;
         goto gen_shift;
     gen_shift_maybe_vex:
-        if (have_bmi2 && !const_a2) {
-            tcg_out_vex_modrm(s, vexop + rexw, a0, a2, a1);
-            break;
+        if (have_bmi2) {
+            if (!const_a2) {
+                tcg_out_vex_modrm(s, vexop + rexw, a0, a2, a1);
+                break;
+            }
+            tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, a0, a1);
         }
         /* FALLTHRU */
     gen_shift:
@@ -2190,9 +2195,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef r_q = { .args_ct_str = { "r", "q" } };
     static const TCGTargetOpDef r_re = { .args_ct_str = { "r", "re" } };
     static const TCGTargetOpDef r_0 = { .args_ct_str = { "r", "0" } };
+    static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
     static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
-    static const TCGTargetOpDef r_0_Ci = { .args_ct_str = { "r", "0", "Ci" } };
     static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
     static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
     static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
@@ -2266,7 +2271,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shr_i64:
     case INDEX_op_sar_i32:
     case INDEX_op_sar_i64:
-        return &r_0_Ci;
+        return have_bmi2 ? &r_r_ri : &r_0_ci;
     case INDEX_op_rotl_i32:
     case INDEX_op_rotl_i64:
     case INDEX_op_rotr_i32:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 48/64] tcg/i386: Handle ctz and clz opcodes
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (46 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 47/64] tcg/i386: Allow bmi2 shiftx to have non-matching operands Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 49/64] tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR Richard Henderson
                   ` (17 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     |   8 +--
 tcg/i386/tcg-target.inc.c | 125 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 120 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f2d9955..8fff287 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -93,8 +93,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -127,8 +127,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 651d96c..3ed8cd1 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -92,6 +92,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define TCG_CT_CONST_S32 0x100
 #define TCG_CT_CONST_U32 0x200
 #define TCG_CT_CONST_I32 0x400
+#define TCG_CT_CONST_WSZ 0x800
 
 /* Registers used with L constraint, which are the first argument 
    registers on x86_64, and two random call clobbered registers on
@@ -138,6 +139,11 @@ static bool have_bmi2;
 #else
 # define have_bmi2 0
 #endif
+#if defined(CONFIG_CPUID_H) && defined(bit_LZCNT)
+static bool have_lzcnt;
+#else
+# define have_lzcnt 0
+#endif
 
 static tcg_insn_unit *tb_ret_addr;
 
@@ -214,6 +220,10 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
             tcg_regset_set32(ct->u.regs, 0, 0xff);
         }
         break;
+    case 'W':
+        /* With TZCNT/LZCNT, we can have operand-size as an input.  */
+        ct->ct |= TCG_CT_CONST_WSZ;
+        break;
 
         /* qemu_ld/st address constraint */
     case 'L':
@@ -260,6 +270,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
     if ((ct & TCG_CT_CONST_I32) && ~val == (int32_t)~val) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+        return 1;
+    }
     return 0;
 }
 
@@ -293,6 +306,8 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_ARITH_GvEv	(0x03)		/* ... plus (ARITH_FOO << 3) */
 #define OPC_ANDN        (0xf2 | P_EXT38)
 #define OPC_ADD_GvEv	(OPC_ARITH_GvEv | (ARITH_ADD << 3))
+#define OPC_BSF         (0xbc | P_EXT)
+#define OPC_BSR         (0xbd | P_EXT)
 #define OPC_BSWAP	(0xc8 | P_EXT)
 #define OPC_CALL_Jz	(0xe8)
 #define OPC_CMOVCC      (0x40 | P_EXT)  /* ... plus condition code */
@@ -307,6 +322,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_JMP_long	(0xe9)
 #define OPC_JMP_short	(0xeb)
 #define OPC_LEA         (0x8d)
+#define OPC_LZCNT       (0xbd | P_EXT | P_SIMDF3)
 #define OPC_MOVB_EvGv	(0x88)		/* stores, more or less */
 #define OPC_MOVL_EvGv	(0x89)		/* stores, more or less */
 #define OPC_MOVL_GvEv	(0x8b)		/* loads, more or less */
@@ -333,6 +349,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_SHLX        (0xf7 | P_EXT38 | P_DATA16)
 #define OPC_SHRX        (0xf7 | P_EXT38 | P_SIMDF2)
 #define OPC_TESTL	(0x85)
+#define OPC_TZCNT       (0xbc | P_EXT | P_SIMDF3)
 #define OPC_XCHG_ax_r32	(0x90)
 
 #define OPC_GRP3_Ev	(0xf7)
@@ -418,6 +435,11 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x)
     if (opc & P_ADDR32) {
         tcg_out8(s, 0x67);
     }
+    if (opc & P_SIMDF3) {
+        tcg_out8(s, 0xf3);
+    } else if (opc & P_SIMDF2) {
+        tcg_out8(s, 0xf2);
+    }
 
     rex = 0;
     rex |= (opc & P_REXW) ? 0x8 : 0x0;  /* REX.W */
@@ -452,6 +474,11 @@ static void tcg_out_opc(TCGContext *s, int opc)
     if (opc & P_DATA16) {
         tcg_out8(s, 0x66);
     }
+    if (opc & P_SIMDF3) {
+        tcg_out8(s, 0xf3);
+    } else if (opc & P_SIMDF2) {
+        tcg_out8(s, 0xf2);
+    }
     if (opc & (P_EXT | P_EXT38)) {
         tcg_out8(s, 0x0f);
         if (opc & P_EXT38) {
@@ -1080,13 +1107,11 @@ static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
 }
 #endif
 
-static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGArg dest,
-                              TCGArg c1, TCGArg c2, int const_c2,
-                              TCGArg v1)
+static void tcg_out_cmov(TCGContext *s, TCGCond cond, int rexw,
+                         TCGReg dest, TCGReg v1)
 {
-    tcg_out_cmp(s, c1, c2, const_c2, 0);
     if (have_cmov) {
-        tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond], dest, v1);
+        tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond] | rexw, dest, v1);
     } else {
         TCGLabel *over = gen_new_label();
         tcg_out_jxx(s, tcg_cond_to_jcc[tcg_invert_cond(cond)], over, 1);
@@ -1095,16 +1120,64 @@ static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGArg dest,
     }
 }
 
+static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGReg dest,
+                              TCGReg c1, TCGArg c2, int const_c2,
+                              TCGReg v1)
+{
+    tcg_out_cmp(s, c1, c2, const_c2, 0);
+    tcg_out_cmov(s, cond, 0, dest, v1);
+}
+
 #if TCG_TARGET_REG_BITS == 64
-static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGArg dest,
-                              TCGArg c1, TCGArg c2, int const_c2,
-                              TCGArg v1)
+static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest,
+                              TCGReg c1, TCGArg c2, int const_c2,
+                              TCGReg v1)
 {
     tcg_out_cmp(s, c1, c2, const_c2, P_REXW);
-    tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond] | P_REXW, dest, v1);
+    tcg_out_cmov(s, cond, P_REXW, dest, v1);
 }
 #endif
 
+static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
+                        TCGArg arg2, bool const_a2)
+{
+    if (const_a2) {
+        tcg_debug_assert(have_bmi1);
+        tcg_debug_assert(arg2 == (rexw ? 64 : 32));
+        tcg_out_modrm(s, OPC_TZCNT + rexw, dest, arg1);
+    } else {
+        tcg_debug_assert(dest != arg2);
+        tcg_out_modrm(s, OPC_BSF + rexw, dest, arg1);
+        tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+    }
+}
+
+static void tcg_out_clz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
+                        TCGArg arg2, bool const_a2)
+{
+    if (have_lzcnt) {
+        tcg_out_modrm(s, OPC_LZCNT + rexw, dest, arg1);
+        if (const_a2) {
+            tcg_debug_assert(arg2 == (rexw ? 64 : 32));
+        } else {
+            tcg_debug_assert(dest != arg2);
+            tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2);
+        }
+    } else {
+        tcg_debug_assert(!const_a2);
+        tcg_debug_assert(dest != arg1);
+        tcg_debug_assert(dest != arg2);
+
+        /* Recall that the output of BSR is the index not the count.  */
+        tcg_out_modrm(s, OPC_BSR + rexw, dest, arg1);
+        tgen_arithi(s, ARITH_XOR + rexw, dest, rexw ? 63 : 31, 0);
+
+        /* Since we have destroyed the flags from BSR, we have to re-test.  */
+        tcg_out_cmp(s, arg1, 0, 1, rexw);
+        tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+    }
+}
+
 static void tcg_out_branch(TCGContext *s, int call, tcg_insn_unit *dest)
 {
     intptr_t disp = tcg_pcrel_diff(s, dest) - 5;
@@ -1995,6 +2068,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    OP_32_64(ctz):
+        tcg_out_ctz(s, rexw, args[0], args[1], args[2], const_args[2]);
+        break;
+    OP_32_64(clz):
+        tcg_out_clz(s, rexw, args[0], args[1], args[2], const_args[2]);
+        break;
+
     case INDEX_op_brcond_i32:
         tcg_out_brcond32(s, a2, a0, a1, const_args[1], arg_label(args[3]), 0);
         break;
@@ -2359,6 +2439,24 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
                 = { .args_ct_str = { "r", "r", "0", "1", "re", "re" } };
             return &arith2;
         }
+    case INDEX_op_ctz_i32:
+    case INDEX_op_ctz_i64:
+        {
+            static const TCGTargetOpDef ctz[2] = {
+                { .args_ct_str = { "&r", "r", "r" } },
+                { .args_ct_str = { "&r", "r", "rW" } },
+            };
+            return &ctz[have_bmi1];
+        }
+    case INDEX_op_clz_i32:
+    case INDEX_op_clz_i64:
+        {
+            static const TCGTargetOpDef clz[2] = {
+                { .args_ct_str = { "&r", "r", "r" } },
+                { .args_ct_str = { "&r", "r", "rW" } },
+            };
+            return &clz[have_lzcnt];
+        }
 
     case INDEX_op_qemu_ld_i32:
         return TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? &r_L : &r_L_L;
@@ -2509,6 +2607,15 @@ static void tcg_target_init(TCGContext *s)
     }
 #endif
 
+#ifndef have_lzcnt
+    max = __get_cpuid_max(0x8000000, 0);
+    if (max >= 1) {
+        __cpuid(0x80000001, a, b, c, d);
+        /* LZCNT was introduced with AMD Barcelona and Intel Haswell CPUs.  */
+        have_lzcnt = (c & bit_LZCNT) != 0;
+    }
+#endif
+
     if (TCG_TARGET_REG_BITS == 64) {
         tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
         tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 49/64] tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (47 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 48/64] tcg/i386: Handle ctz and clz opcodes Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 50/64] tcg: Add helpers for clrsb Richard Henderson
                   ` (16 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

The ISA manual documents the output is undefined if the input was zero.

However, we document in target-i386 that the behavior of real silicon
is to preserve the contents of the output register.  We also mention
that there are real applications that depend on this.  That this is
baked into silicon is mentioned as a potential cause for some false
sharing behaviour wrt lzcnt/tzcnt.

Taking advantage of this allows us to save 2 insns in the normal case,
and 4 insns for i686 emulating a 64-bit clz.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3ed8cd1..3650340 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1146,9 +1146,12 @@ static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
         tcg_debug_assert(arg2 == (rexw ? 64 : 32));
         tcg_out_modrm(s, OPC_TZCNT + rexw, dest, arg1);
     } else {
-        tcg_debug_assert(dest != arg2);
+        /* ??? The manual says that the output is undefined when the
+           input is zero, but real hardware leaves it unchanged.  As
+           noted in target-i386/translate.c, real programs depend on
+           this -- now we are one more of those.  */
+        tcg_debug_assert(dest == arg2);
         tcg_out_modrm(s, OPC_BSF + rexw, dest, arg1);
-        tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
     }
 }
 
@@ -1161,20 +1164,26 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
             tcg_debug_assert(arg2 == (rexw ? 64 : 32));
         } else {
             tcg_debug_assert(dest != arg2);
+            /* LZCNT sets C if the input was zero.  */
             tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2);
         }
     } else {
-        tcg_debug_assert(!const_a2);
-        tcg_debug_assert(dest != arg1);
-        tcg_debug_assert(dest != arg2);
+        TCGType type = rexw ? TCG_TYPE_I64: TCG_TYPE_I32;
+        TCGArg rev = rexw ? 63 : 31;
 
-        /* Recall that the output of BSR is the index not the count.  */
+        /* Recall that the output of BSR is the index not the count.
+           Therefore we must adjust the result by ^ (SIZE-1).  In some
+           cases below, we prefer an extra XOR to a JMP.  */
+        /* ??? See the comment in tcg_out_ctz re BSF.  */
+        if (const_a2) {
+            tcg_debug_assert(dest != arg1);
+            tcg_out_movi(s, type, dest, arg2 ^ rev);
+        } else {
+            tcg_debug_assert(dest == arg2);
+            tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
+        }
         tcg_out_modrm(s, OPC_BSR + rexw, dest, arg1);
-        tgen_arithi(s, ARITH_XOR + rexw, dest, rexw ? 63 : 31, 0);
-
-        /* Since we have destroyed the flags from BSR, we have to re-test.  */
-        tcg_out_cmp(s, arg1, 0, 1, rexw);
-        tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+        tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
     }
 }
 
@@ -2443,7 +2452,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ctz_i64:
         {
             static const TCGTargetOpDef ctz[2] = {
-                { .args_ct_str = { "&r", "r", "r" } },
+                { .args_ct_str = { "r", "r", "0" } },
                 { .args_ct_str = { "&r", "r", "rW" } },
             };
             return &ctz[have_bmi1];
@@ -2452,7 +2461,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_clz_i64:
         {
             static const TCGTargetOpDef clz[2] = {
-                { .args_ct_str = { "&r", "r", "r" } },
+                { .args_ct_str = { "&r", "r", "0i" } },
                 { .args_ct_str = { "&r", "r", "rW" } },
             };
             return &clz[have_lzcnt];
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 50/64] tcg: Add helpers for clrsb
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (48 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 49/64] tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-09  9:51   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 51/64] target-arm: Use clrsb helper Richard Henderson
                   ` (15 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

The number of actual invocations does not warrent an opcode,
and the backends generating it.  But at least we can eliminate
redundant helpers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg-runtime.c     | 10 ++++++++++
 tcg/tcg-op.c      | 28 ++++++++++++++++++++++++++++
 tcg/tcg-op.h      |  4 ++++
 tcg/tcg-runtime.h |  2 ++
 4 files changed, 44 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index eb3bade..c8b98df 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -121,6 +121,16 @@ uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
     return arg ? ctz64(arg) : zero_val;
 }
 
+uint32_t HELPER(clrsb_i32)(uint32_t arg)
+{
+    return clrsb32(arg);
+}
+
+uint64_t HELPER(clrsb_i64)(uint64_t arg)
+{
+    return clrsb64(arg);
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 2b520c1..620e268 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -536,6 +536,20 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
     tcg_temp_free_i32(t);
 }
 
+void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
+{
+    if (TCG_TARGET_HAS_clz_i32) {
+        TCGv_i32 t = tcg_temp_new_i32();
+        tcg_gen_sari_i32(t, arg, 31);
+        tcg_gen_xor_i32(t, t, arg);
+        tcg_gen_clzi_i32(t, t, 32);
+        tcg_gen_subi_i32(ret, t, 1);
+        tcg_temp_free_i32(t);
+    } else {
+        gen_helper_clrsb_i32(ret, arg);
+    }
+}
+
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 {
     if (TCG_TARGET_HAS_rot_i32) {
@@ -1846,6 +1860,20 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
     }
 }
 
+void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg)
+{
+    if (TCG_TARGET_HAS_clz_i64 || TCG_TARGET_HAS_clz_i32) {
+        TCGv_i64 t = tcg_temp_new_i64();
+        tcg_gen_sari_i64(t, arg, 63);
+        tcg_gen_xor_i64(t, t, arg);
+        tcg_gen_clzi_i64(t, t, 64);
+        tcg_gen_subi_i64(ret, t, 1);
+        tcg_temp_free_i64(t);
+    } else {
+        gen_helper_clrsb_i64(ret, arg);
+    }
+}
+
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
     if (TCG_TARGET_HAS_rot_i64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 7a24e84..c2f3db9 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -290,6 +290,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
+void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -477,6 +478,7 @@ void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
+void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -970,6 +972,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_ctz_tl tcg_gen_ctz_i64
 #define tcg_gen_clzi_tl tcg_gen_clzi_i64
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
+#define tcg_gen_clrsb_tl tcg_gen_clrsb_i64
 #define tcg_gen_rotl_tl tcg_gen_rotl_i64
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
@@ -1065,6 +1068,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_ctz_tl tcg_gen_ctz_i32
 #define tcg_gen_clzi_tl tcg_gen_clzi_i32
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
+#define tcg_gen_clrsb_tl tcg_gen_clrsb_i32
 #define tcg_gen_rotl_tl tcg_gen_rotl_i32
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index eb1cd76..0d30f1a 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -19,6 +19,8 @@ DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_1(clrsb_i32, TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 51/64] target-arm: Use clrsb helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (49 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 50/64] tcg: Add helpers for clrsb Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-09  9:52   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 52/64] target-tricore: " Richard Henderson
                   ` (14 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/helper-a64.c    | 10 ----------
 target-arm/helper-a64.h    |  2 --
 target-arm/translate-a64.c |  8 ++++----
 3 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 77999ff..d9df82c 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -54,16 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
     return num / den;
 }
 
-uint64_t HELPER(cls64)(uint64_t x)
-{
-    return clrsb64(x);
-}
-
-uint32_t HELPER(cls32)(uint32_t x)
-{
-    return clrsb32(x);
-}
-
 uint64_t HELPER(rbit64)(uint64_t x)
 {
     return revbit64(x);
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index d320f96..6f9eaba 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -18,8 +18,6 @@
  */
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
-DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 12621ff..f73d63b 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3971,11 +3971,11 @@ static void handle_cls(DisasContext *s, unsigned int sf,
     tcg_rn = cpu_reg(s, rn);
 
     if (sf) {
-        gen_helper_cls64(tcg_rd, tcg_rn);
+        tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
     } else {
         TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
         tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
-        gen_helper_cls32(tcg_tmp32, tcg_tmp32);
+        tcg_gen_clrsb_i32(tcg_tmp32, tcg_tmp32);
         tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
         tcg_temp_free_i32(tcg_tmp32);
     }
@@ -7592,7 +7592,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
         if (u) {
             tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
         } else {
-            gen_helper_cls64(tcg_rd, tcg_rn);
+            tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
         }
         break;
     case 0x5: /* NOT */
@@ -10262,7 +10262,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                     if (u) {
                         tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
                     } else {
-                        gen_helper_cls32(tcg_res, tcg_op);
+                        tcg_gen_clrsb_i32(tcg_res, tcg_op);
                     }
                     break;
                 case 0x7: /* SQABS, SQNEG */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 52/64] target-tricore: Use clrsb helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (50 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 51/64] target-arm: Use clrsb helper Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 14:58   ` Bastian Koppelmann
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 53/64] target-xtensa: " Richard Henderson
                   ` (13 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tricore/helper.h    | 1 -
 target-tricore/op_helper.c | 5 -----
 target-tricore/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-tricore/helper.h b/target-tricore/helper.h
index 2cf04e1..d215349 100644
--- a/target-tricore/helper.h
+++ b/target-tricore/helper.h
@@ -89,7 +89,6 @@ DEF_HELPER_FLAGS_2(ixmin_u, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 /* count leading ... */
 DEF_HELPER_FLAGS_1(clo_h, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clz_h, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(cls, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls_h, TCG_CALL_NO_RWG_SE, i32, i32)
 /* sh */
 DEF_HELPER_FLAGS_2(sh, TCG_CALL_NO_RWG_SE, i32, i32, i32)
diff --git a/target-tricore/op_helper.c b/target-tricore/op_helper.c
index 3731d5e..7af202c 100644
--- a/target-tricore/op_helper.c
+++ b/target-tricore/op_helper.c
@@ -1769,11 +1769,6 @@ uint32_t helper_clz_h(target_ulong r1)
     return ret_hw0 | (ret_hw1 << 16);
 }
 
-uint32_t helper_cls(target_ulong r1)
-{
-    return clrsb32(r1);
-}
-
 uint32_t helper_cls_h(target_ulong r1)
 {
     uint32_t ret_hw0 = extract32(r1, 0, 16);
diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index 69cdfb9..41b1d27 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -6374,7 +6374,7 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
         gen_helper_clo_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
         break;
     case OPC2_32_RR_CLS:
-        gen_helper_cls(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_clrsb_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
         break;
     case OPC2_32_RR_CLS_H:
         gen_helper_cls_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 53/64] target-xtensa: Use clrsb helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (51 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 52/64] target-tricore: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 54/64] tcg: Add opcode for ctpop Richard Henderson
                   ` (12 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-xtensa/translate.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
index 5c719a4..5a93705 100644
--- a/target-xtensa/translate.c
+++ b/target-xtensa/translate.c
@@ -1372,16 +1372,7 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
                 case 14: /*NSAu*/
                     HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
                     if (gen_window_check2(dc, RRR_S, RRR_T)) {
-                        TCGv_i32 t0 = tcg_temp_new_i32();
-
-                        /* if (v & 0x80000000) v = ~v; */
-                        tcg_gen_sari_i32(t0, cpu_R[RRR_S], 31);
-                        tcg_gen_xor_i32(t0, t0, cpu_R[RRR_S]);
-
-                        /* r = (v ? clz(v) : 32) - 1; */
-                        tcg_gen_clzi_i32(t0, t0, 32);
-                        tcg_gen_subi_i32(cpu_R[RRR_T], t0, 1);
-                        tcg_temp_free_i32(t0);
+                        tcg_gen_clrsb_i32(cpu_R[RRR_T], cpu_R[RRR_S]);
                     }
                     break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 54/64] tcg: Add opcode for ctpop
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (52 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 53/64] target-xtensa: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-09  9:57   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 55/64] target-alpha: Use ctpop helper Richard Henderson
                   ` (11 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

The number of actual invocations of ctpop itself does not warrent
an opcode, but it is very helpful for POWER7 to use in generating
an expansion for ctz.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg-runtime.c            | 10 ++++++++++
 tcg/aarch64/tcg-target.h |  2 ++
 tcg/arm/tcg-target.h     |  1 +
 tcg/i386/tcg-target.h    |  2 ++
 tcg/ia64/tcg-target.h    |  2 ++
 tcg/mips/tcg-target.h    |  1 +
 tcg/optimize.c           | 14 ++++++++++++++
 tcg/ppc/tcg-target.h     |  2 ++
 tcg/s390/tcg-target.h    |  2 ++
 tcg/sparc/tcg-target.h   |  2 ++
 tcg/tcg-op.c             | 29 +++++++++++++++++++++++++++++
 tcg/tcg-op.h             |  4 ++++
 tcg/tcg-opc.h            |  2 ++
 tcg/tcg-runtime.h        |  2 ++
 tcg/tcg.h                |  1 +
 tcg/tci/tcg-target.h     |  2 ++
 16 files changed, 78 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index c8b98df..4c60c96 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -131,6 +131,16 @@ uint64_t HELPER(clrsb_i64)(uint64_t arg)
     return clrsb64(arg);
 }
 
+uint32_t HELPER(ctpop_i32)(uint32_t arg)
+{
+    return ctpop32(arg);
+}
+
+uint64_t HELPER(ctpop_i64)(uint64_t arg)
+{
+    return ctpop64(arg);
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 9d6b00f..1a5ea23 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -64,6 +64,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_clz_i32          1
 #define TCG_TARGET_HAS_ctz_i32          1
+#define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -98,6 +99,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_clz_i64          1
 #define TCG_TARGET_HAS_ctz_i64          1
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 4cb94dc..09a19c6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -112,6 +112,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_clz_i32          use_armv5t_instructions
 #define TCG_TARGET_HAS_ctz_i32          use_armv7_instructions
+#define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 8fff287..b8f73f5 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -95,6 +95,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_clz_i32          1
 #define TCG_TARGET_HAS_ctz_i32          1
+#define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -129,6 +130,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_clz_i64          1
 #define TCG_TARGET_HAS_ctz_i64          1
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 9a829ae..42aea03 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -144,6 +144,8 @@ typedef enum {
 #define TCG_TARGET_HAS_clz_i64          0
 #define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_ctpop_i32        0
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_orc_i32          1
 #define TCG_TARGET_HAS_orc_i64          1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 0526018..aa7c2b2 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -130,6 +130,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
 #define TCG_TARGET_HAS_clz_i32          use_mips32r2_instructions
 #define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_ctpop_i32        0
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_neg_i32          0 /* sub  rd, zero, rt   */
diff --git a/tcg/optimize.c b/tcg/optimize.c
index e7ecce4..adfc56c 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -308,6 +308,12 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     case INDEX_op_ctz_i64:
         return x ? ctz64(x) : y;
 
+    case INDEX_op_ctpop_i32:
+        return ctpop32(x);
+
+    case INDEX_op_ctpop_i64:
+        return ctpop64(x);
+
     CASE_OP_32_64(ext8s):
         return (int8_t)x;
 
@@ -918,6 +924,13 @@ void tcg_optimize(TCGContext *s)
             mask = temps[args[2]].mask | 63;
             break;
 
+        case INDEX_op_ctpop_i32:
+            mask = 32 | 31;
+            break;
+        case INDEX_op_ctpop_i64:
+            mask = 64 | 63;
+            break;
+
         CASE_OP_32_64(setcond):
         case INDEX_op_setcond2_i32:
             mask = 1;
@@ -1031,6 +1044,7 @@ void tcg_optimize(TCGContext *s)
         CASE_OP_32_64(ext8u):
         CASE_OP_32_64(ext16s):
         CASE_OP_32_64(ext16u):
+        CASE_OP_32_64(ctpop):
         case INDEX_op_ext32s_i64:
         case INDEX_op_ext32u_i64:
         case INDEX_op_ext_i32_i64:
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c798c9c..57e66cf 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -72,6 +72,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_clz_i32          1
 #define TCG_TARGET_HAS_ctz_i32          have_isa_3_00
+#define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -107,6 +108,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_clz_i64          1
 #define TCG_TARGET_HAS_ctz_i64          have_isa_3_00
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 22500ba..cbdd2a6 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -79,6 +79,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nor_i32        0
 #define TCG_TARGET_HAS_clz_i32        0
 #define TCG_TARGET_HAS_ctz_i32        0
+#define TCG_TARGET_HAS_ctpop_i32      0
 #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i32   0
@@ -112,6 +113,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nor_i64        0
 #define TCG_TARGET_HAS_clz_i64        (s390_facilities & FACILITY_EXT_IMM)
 #define TCG_TARGET_HAS_ctz_i64        0
+#define TCG_TARGET_HAS_ctpop_i64      0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i64   0
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 340837a..b8b74f96f 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -112,6 +112,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_clz_i32          0
 #define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_deposit_i32      0
 #define TCG_TARGET_HAS_extract_i32      0
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -146,6 +147,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_clz_i64          0
 #define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_deposit_i64      0
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 620e268..6f4b1b6 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -550,6 +550,21 @@ void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
     }
 }
 
+void tcg_gen_ctpop_i32(TCGv_i32 ret, TCGv_i32 arg1)
+{
+    if (TCG_TARGET_HAS_ctpop_i32) {
+        tcg_gen_op2_i32(INDEX_op_ctpop_i32, ret, arg1);
+    } else if (TCG_TARGET_HAS_ctpop_i64) {
+        TCGv_i64 t = tcg_temp_new_i64();
+        tcg_gen_extu_i32_i64(t, arg1);
+        tcg_gen_ctpop_i64(t, t);
+        tcg_gen_extrl_i64_i32(ret, t);
+        tcg_temp_free_i64(t);
+    } else {
+        gen_helper_ctpop_i32(ret, arg1);
+    }
+}
+
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 {
     if (TCG_TARGET_HAS_rot_i32) {
@@ -1874,6 +1889,20 @@ void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg)
     }
 }
 
+void tcg_gen_ctpop_i64(TCGv_i64 ret, TCGv_i64 arg1)
+{
+    if (TCG_TARGET_HAS_ctpop_i64) {
+        tcg_gen_op2_i64(INDEX_op_ctpop_i64, ret, arg1);
+    } else if (TCG_TARGET_REG_BITS == 32 && TCG_TARGET_HAS_ctpop_i32) {
+        tcg_gen_ctpop_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1));
+        tcg_gen_ctpop_i32(TCGV_LOW(ret), TCGV_LOW(arg1));
+        tcg_gen_add_i32(TCGV_LOW(ret), TCGV_LOW(ret), TCGV_HIGH(ret));
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+    } else {
+        gen_helper_ctpop_i64(ret, arg1);
+    }
+}
+
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
     if (TCG_TARGET_HAS_rot_i64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index c2f3db9..c68e300 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -291,6 +291,7 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
+void tcg_gen_ctpop_i32(TCGv_i32 a1, TCGv_i32 a2);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -479,6 +480,7 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
+void tcg_gen_ctpop_i64(TCGv_i64 a1, TCGv_i64 a2);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -973,6 +975,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_clzi_tl tcg_gen_clzi_i64
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
 #define tcg_gen_clrsb_tl tcg_gen_clrsb_i64
+#define tcg_gen_ctpop_tl tcg_gen_ctpop_i64
 #define tcg_gen_rotl_tl tcg_gen_rotl_i64
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
@@ -1069,6 +1072,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_clzi_tl tcg_gen_clzi_i32
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
 #define tcg_gen_clrsb_tl tcg_gen_clrsb_i32
+#define tcg_gen_ctpop_tl tcg_gen_ctpop_i32
 #define tcg_gen_rotl_tl tcg_gen_rotl_i32
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index d00db4f..f06f894 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -106,6 +106,7 @@ DEF(nand_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nand_i32))
 DEF(nor_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nor_i32))
 DEF(clz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_clz_i32))
 DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
+DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
@@ -175,6 +176,7 @@ DEF(nand_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nand_i64))
 DEF(nor_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nor_i64))
 DEF(clz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_clz_i64))
 DEF(ctz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctz_i64))
+DEF(ctpop_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctpop_i64))
 
 DEF(add2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_add2_i64))
 DEF(sub2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_sub2_i64))
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 0d30f1a..114ea6f 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -21,6 +21,8 @@ DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_1(clrsb_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
+DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index e026282..631c6f6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -113,6 +113,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_clz_i64          0
 #define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_deposit_i64      0
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 0646444..838bf3a 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -76,6 +76,7 @@
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_clz_i32          0
 #define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_ctpop_i32        0
 #define TCG_TARGET_HAS_neg_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_orc_i32          0
@@ -108,6 +109,7 @@
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_clz_i64          0
 #define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_ctpop_i64        0
 #define TCG_TARGET_HAS_neg_i64          1
 #define TCG_TARGET_HAS_not_i64          1
 #define TCG_TARGET_HAS_orc_i64          0
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 55/64] target-alpha: Use ctpop helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (53 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 54/64] tcg: Add opcode for ctpop Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 56/64] target-ppc: " Richard Henderson
                   ` (10 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/helper.h     | 2 --
 target-alpha/int_helper.c | 5 -----
 target-alpha/translate.c  | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index eed3906..d60f208 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -3,8 +3,6 @@ DEF_HELPER_FLAGS_1(load_pcc, TCG_CALL_NO_RWG_SE, i64, env)
 
 DEF_HELPER_FLAGS_3(check_overflow, TCG_CALL_NO_WG, void, env, i64, i64)
 
-DEF_HELPER_FLAGS_1(ctpop, TCG_CALL_NO_RWG_SE, i64, i64)
-
 DEF_HELPER_FLAGS_2(zap, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(zapnot, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
diff --git a/target-alpha/int_helper.c b/target-alpha/int_helper.c
index 3c303bd..e43b50a 100644
--- a/target-alpha/int_helper.c
+++ b/target-alpha/int_helper.c
@@ -24,11 +24,6 @@
 #include "qemu/host-utils.h"
 
 
-uint64_t helper_ctpop(uint64_t arg)
-{
-    return ctpop64(arg);
-}
-
 uint64_t helper_zapnot(uint64_t val, uint64_t mskb)
 {
     uint64_t mask;
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 6e2e563..055286a 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2541,7 +2541,7 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
             REQUIRE_REG_31(ra);
             REQUIRE_NO_LIT;
-            gen_helper_ctpop(vc, vb);
+            tcg_gen_ctpop_i64(vc, vb);
             break;
         case 0x31:
             /* PERR */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 56/64] target-ppc: Use ctpop helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (54 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 55/64] target-alpha: Use ctpop helper Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 57/64] target-s390x: Avoid a loop for popcnt Richard Henderson
                   ` (9 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/helper.h     |  3 +--
 target-ppc/int_helper.c | 18 +++---------------
 target-ppc/translate.c  |  6 +++++-
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 1ed1d2c..0a8fbba 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -39,12 +39,11 @@ DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
 DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
 
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_3(sraw, tl, env, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
-DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
+DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
 DEF_HELPER_0(darn32, tl)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index a6486ce..dcd5d42 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -272,6 +272,7 @@ target_ulong helper_srad(CPUPPCState *env, target_ulong value,
 #if defined(TARGET_PPC64)
 target_ulong helper_popcntb(target_ulong val)
 {
+    /* Note that we don't fold past bytes */
     val = (val & 0x5555555555555555ULL) + ((val >>  1) &
                                            0x5555555555555555ULL);
     val = (val & 0x3333333333333333ULL) + ((val >>  2) &
@@ -283,6 +284,7 @@ target_ulong helper_popcntb(target_ulong val)
 
 target_ulong helper_popcntw(target_ulong val)
 {
+    /* Note that we don't fold past words.  */
     val = (val & 0x5555555555555555ULL) + ((val >>  1) &
                                            0x5555555555555555ULL);
     val = (val & 0x3333333333333333ULL) + ((val >>  2) &
@@ -295,29 +297,15 @@ target_ulong helper_popcntw(target_ulong val)
                                            0x0000ffff0000ffffULL);
     return val;
 }
-
-target_ulong helper_popcntd(target_ulong val)
-{
-    return ctpop64(val);
-}
 #else
 target_ulong helper_popcntb(target_ulong val)
 {
+    /* Note that we don't fold past bytes */
     val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
     val = (val & 0x33333333) + ((val >>  2) & 0x33333333);
     val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
     return val;
 }
-
-target_ulong helper_popcntw(target_ulong val)
-{
-    val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
-    val = (val & 0x33333333) + ((val >>  2) & 0x33333333);
-    val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
-    val = (val & 0x00ff00ff) + ((val >>  8) & 0x00ff00ff);
-    val = (val & 0x0000ffff) + ((val >> 16) & 0x0000ffff);
-    return val;
-}
 #endif
 
 /*****************************************************************************/
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 1224f56..1212180 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1844,14 +1844,18 @@ static void gen_popcntb(DisasContext *ctx)
 
 static void gen_popcntw(DisasContext *ctx)
 {
+#if defined(TARGET_PPC64)
     gen_helper_popcntw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+#else
+    tcg_gen_ctpop_i32(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+#endif
 }
 
 #if defined(TARGET_PPC64)
 /* popcntd: PowerPC 2.06 specification */
 static void gen_popcntd(DisasContext *ctx)
 {
-    gen_helper_popcntd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_ctpop_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
 }
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 57/64] target-s390x: Avoid a loop for popcnt
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (55 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 56/64] target-ppc: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 58/64] target-sparc: Use ctpop helper Richard Henderson
                   ` (8 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-s390x/int_helper.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target-s390x/int_helper.c b/target-s390x/int_helper.c
index 5bc470b..f26f36a 100644
--- a/target-s390x/int_helper.c
+++ b/target-s390x/int_helper.c
@@ -137,14 +137,11 @@ uint64_t HELPER(cvd)(int32_t reg)
     return dec;
 }
 
-uint64_t HELPER(popcnt)(uint64_t r2)
+uint64_t HELPER(popcnt)(uint64_t val)
 {
-    uint64_t ret = 0;
-    int i;
-
-    for (i = 0; i < 64; i += 8) {
-        uint64_t t = ctpop32((r2 >> i) & 0xff);
-        ret |= t << i;
-    }
-    return ret;
+    /* Note that we don't fold past bytes. */
+    val = (val & 0x5555555555555555ULL) + ((val >> 1) & 0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >> 2) & 0x3333333333333333ULL);
+    val = (val + (val >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
+    return val;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 58/64] target-sparc: Use ctpop helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (56 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 57/64] target-s390x: Avoid a loop for popcnt Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 59/64] target-tilegx: " Richard Henderson
                   ` (7 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-sparc/helper.c    | 5 -----
 target-sparc/helper.h    | 1 -
 target-sparc/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-sparc/helper.c b/target-sparc/helper.c
index 359b0b1..1d85489 100644
--- a/target-sparc/helper.c
+++ b/target-sparc/helper.c
@@ -49,11 +49,6 @@ void helper_debug(CPUSPARCState *env)
 }
 
 #ifdef TARGET_SPARC64
-target_ulong helper_popc(target_ulong val)
-{
-    return ctpop64(val);
-}
-
 void helper_tick_set_count(void *opaque, uint64_t count)
 {
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 0cf1bfb..3ef38b9 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -16,7 +16,6 @@ DEF_HELPER_2(wrccr, void, env, tl)
 DEF_HELPER_1(rdcwp, tl, env)
 DEF_HELPER_2(wrcwp, void, env, tl)
 DEF_HELPER_FLAGS_2(array8, TCG_CALL_NO_RWG_SE, tl, tl, tl)
-DEF_HELPER_FLAGS_1(popc, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(set_softint, TCG_CALL_NO_RWG, void, env, i64)
 DEF_HELPER_FLAGS_2(clear_softint, TCG_CALL_NO_RWG, void, env, i64)
 DEF_HELPER_FLAGS_2(write_softint, TCG_CALL_NO_RWG, void, env, i64)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 2205f89..ead585e 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -4647,7 +4647,7 @@ static void disas_sparc_insn(DisasContext * dc, unsigned int insn)
                         gen_store_gpr(dc, rd, cpu_dst);
                         break;
                     case 0x2e: /* V9 popc */
-                        gen_helper_popc(cpu_dst, cpu_src2);
+                        tcg_gen_ctpop_tl(cpu_dst, cpu_src2);
                         gen_store_gpr(dc, rd, cpu_dst);
                         break;
                     case 0x2f: /* V9 movr */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 59/64] target-tilegx: Use ctpop helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (57 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 58/64] target-sparc: Use ctpop helper Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 60/64] target-i386: " Richard Henderson
                   ` (6 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tilegx/helper.c    | 5 -----
 target-tilegx/helper.h    | 1 -
 target-tilegx/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-tilegx/helper.c b/target-tilegx/helper.c
index b6f5e29..4964bb9 100644
--- a/target-tilegx/helper.c
+++ b/target-tilegx/helper.c
@@ -55,11 +55,6 @@ void helper_ext01_ics(CPUTLGState *env)
     }
 }
 
-uint64_t helper_pcnt(uint64_t arg)
-{
-    return ctpop64(arg);
-}
-
 uint64_t helper_revbits(uint64_t arg)
 {
     return revbit64(arg);
diff --git a/target-tilegx/helper.h b/target-tilegx/helper.h
index bab303a..16745c2 100644
--- a/target-tilegx/helper.h
+++ b/target-tilegx/helper.h
@@ -1,6 +1,5 @@
 DEF_HELPER_2(exception, noreturn, env, i32)
 DEF_HELPER_1(ext01_ics, void, env)
-DEF_HELPER_FLAGS_1(pcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(revbits, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_3(shufflebytes, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_2(crc32_8, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
index 8a2df1b..ff2ef7b 100644
--- a/target-tilegx/translate.c
+++ b/target-tilegx/translate.c
@@ -697,7 +697,7 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
         break;
     case OE_RR_X0(PCNT):
     case OE_RR_Y0(PCNT):
-        gen_helper_pcnt(tdest, tsrca);
+        tcg_gen_ctpop_tl(tdest, tsrca);
         mnemonic = "pcnt";
         break;
     case OE_RR_X0(REVBITS):
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 60/64] target-i386: Use ctpop helper
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (58 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 59/64] target-tilegx: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop Richard Henderson
                   ` (5 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/cc_helper.c      |  3 +++
 target-i386/cpu.h            |  1 +
 target-i386/ops_sse.h        | 26 --------------------------
 target-i386/ops_sse_header.h |  1 -
 target-i386/translate.c      | 13 +++++++++++--
 5 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/target-i386/cc_helper.c b/target-i386/cc_helper.c
index 83af223..c9c90e1 100644
--- a/target-i386/cc_helper.c
+++ b/target-i386/cc_helper.c
@@ -105,6 +105,8 @@ target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
         return src1;
     case CC_OP_CLR:
         return CC_Z | CC_P;
+    case CC_OP_POPCNT:
+        return src1 ? 0 : CC_Z;
 
     case CC_OP_MULB:
         return compute_all_mulb(dst, src1);
@@ -232,6 +234,7 @@ target_ulong helper_cc_compute_c(target_ulong dst, target_ulong src1,
     case CC_OP_LOGICL:
     case CC_OP_LOGICQ:
     case CC_OP_CLR:
+    case CC_OP_POPCNT:
         return 0;
 
     case CC_OP_EFLAGS:
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index c605724..041d201 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -774,6 +774,7 @@ typedef enum {
     CC_OP_ADCOX, /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest.  */
 
     CC_OP_CLR, /* Z set, all other flags clear.  */
+    CC_OP_POPCNT, /* Z via CC_SRC, all other flags clear.  */
 
     CC_OP_NB,
 } CCOp;
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 7a98f53..16509d0 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -2157,32 +2157,6 @@ target_ulong helper_crc32(uint32_t crc1, target_ulong msg, uint32_t len)
     return crc;
 }
 
-#define POPMASK(i)     ((target_ulong) -1 / ((1LL << (1 << i)) + 1))
-#define POPCOUNT(n, i) ((n & POPMASK(i)) + ((n >> (1 << i)) & POPMASK(i)))
-target_ulong helper_popcnt(CPUX86State *env, target_ulong n, uint32_t type)
-{
-    CC_SRC = n ? 0 : CC_Z;
-
-    n = POPCOUNT(n, 0);
-    n = POPCOUNT(n, 1);
-    n = POPCOUNT(n, 2);
-    n = POPCOUNT(n, 3);
-    if (type == 1) {
-        return n & 0xff;
-    }
-
-    n = POPCOUNT(n, 4);
-#ifndef TARGET_X86_64
-    return n;
-#else
-    if (type == 2) {
-        return n & 0xff;
-    }
-
-    return POPCOUNT(n, 5);
-#endif
-}
-
 void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                     uint32_t ctrl)
 {
diff --git a/target-i386/ops_sse_header.h b/target-i386/ops_sse_header.h
index 64c5857..094aafc 100644
--- a/target-i386/ops_sse_header.h
+++ b/target-i386/ops_sse_header.h
@@ -333,7 +333,6 @@ DEF_HELPER_4(glue(pcmpestrm, SUFFIX), void, env, Reg, Reg, i32)
 DEF_HELPER_4(glue(pcmpistri, SUFFIX), void, env, Reg, Reg, i32)
 DEF_HELPER_4(glue(pcmpistrm, SUFFIX), void, env, Reg, Reg, i32)
 DEF_HELPER_3(crc32, tl, i32, tl, i32)
-DEF_HELPER_3(popcnt, tl, env, tl, i32)
 #endif
 
 /* AES-NI op helpers */
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 0eac334..bf88a00 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -222,6 +222,7 @@ static const uint8_t cc_op_live[CC_OP_NB] = {
     [CC_OP_ADOX] = USES_CC_SRC | USES_CC_SRC2,
     [CC_OP_ADCOX] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
     [CC_OP_CLR] = 0,
+    [CC_OP_POPCNT] = USES_CC_SRC,
 };
 
 static void set_cc_op(DisasContext *s, CCOp op)
@@ -757,6 +758,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv reg)
 
     case CC_OP_LOGICB ... CC_OP_LOGICQ:
     case CC_OP_CLR:
+    case CC_OP_POPCNT:
         return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
 
     case CC_OP_INCB ... CC_OP_INCQ:
@@ -824,6 +826,7 @@ static CCPrepare gen_prepare_eflags_s(DisasContext *s, TCGv reg)
         return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src,
                              .mask = CC_S };
     case CC_OP_CLR:
+    case CC_OP_POPCNT:
         return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
     default:
         {
@@ -843,6 +846,7 @@ static CCPrepare gen_prepare_eflags_o(DisasContext *s, TCGv reg)
         return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src2,
                              .mask = -1, .no_setcond = true };
     case CC_OP_CLR:
+    case CC_OP_POPCNT:
         return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
     default:
         gen_compute_eflags(s);
@@ -866,6 +870,9 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, TCGv reg)
                              .mask = CC_Z };
     case CC_OP_CLR:
         return (CCPrepare) { .cond = TCG_COND_ALWAYS, .mask = -1 };
+    case CC_OP_POPCNT:
+        return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src,
+                             .mask = -1 };
     default:
         {
             TCGMemOp size = (s->cc_op - CC_OP_ADDB) & 3;
@@ -8186,10 +8193,12 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         }
 
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
-        gen_helper_popcnt(cpu_T0, cpu_env, cpu_T0, tcg_const_i32(ot));
+        gen_extu(ot, cpu_T0);
+        tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
+        tcg_gen_ctpop_tl(cpu_T0, cpu_T0);
         gen_op_mov_reg_v(ot, reg, cpu_T0);
 
-        set_cc_op(s, CC_OP_EFLAGS);
+        set_cc_op(s, CC_OP_POPCNT);
         break;
     case 0x10e ... 0x10f:
         /* 3DNow! instructions, ignore prefixes */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (59 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 60/64] target-i386: " Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-09 14:41   ` Alex Bennée
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed Richard Henderson
                   ` (4 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/qemu/host-utils.h | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 46187bb..96288d0 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -327,7 +327,7 @@ static inline int ctpop8(uint8_t val)
 #else
     val = (val & 0x55) + ((val >> 1) & 0x55);
     val = (val & 0x33) + ((val >> 2) & 0x33);
-    val = (val & 0x0f) + ((val >> 4) & 0x0f);
+    val = (val + (val >> 4)) & 0x0f;
 
     return val;
 #endif
@@ -344,8 +344,8 @@ static inline int ctpop16(uint16_t val)
 #else
     val = (val & 0x5555) + ((val >> 1) & 0x5555);
     val = (val & 0x3333) + ((val >> 2) & 0x3333);
-    val = (val & 0x0f0f) + ((val >> 4) & 0x0f0f);
-    val = (val & 0x00ff) + ((val >> 8) & 0x00ff);
+    val = (val + (val >> 4)) & 0x0f0f;
+    val = (val + (val >> 8)) & 0x00ff;
 
     return val;
 #endif
@@ -360,11 +360,10 @@ static inline int ctpop32(uint32_t val)
 #if QEMU_GNUC_PREREQ(3, 4)
     return __builtin_popcount(val);
 #else
-    val = (val & 0x55555555) + ((val >>  1) & 0x55555555);
-    val = (val & 0x33333333) + ((val >>  2) & 0x33333333);
-    val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
-    val = (val & 0x00ff00ff) + ((val >>  8) & 0x00ff00ff);
-    val = (val & 0x0000ffff) + ((val >> 16) & 0x0000ffff);
+    val = (val & 0x55555555) + ((val >> 1) & 0x55555555);
+    val = (val & 0x33333333) + ((val >> 2) & 0x33333333);
+    val = (val + (val >> 4)) & 0x0f0f0f0f;
+    val = (val * 0x01010101) >> 24;
 
     return val;
 #endif
@@ -379,12 +378,10 @@ static inline int ctpop64(uint64_t val)
 #if QEMU_GNUC_PREREQ(3, 4)
     return __builtin_popcountll(val);
 #else
-    val = (val & 0x5555555555555555ULL) + ((val >>  1) & 0x5555555555555555ULL);
-    val = (val & 0x3333333333333333ULL) + ((val >>  2) & 0x3333333333333333ULL);
-    val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) & 0x0f0f0f0f0f0f0f0fULL);
-    val = (val & 0x00ff00ff00ff00ffULL) + ((val >>  8) & 0x00ff00ff00ff00ffULL);
-    val = (val & 0x0000ffff0000ffffULL) + ((val >> 16) & 0x0000ffff0000ffffULL);
-    val = (val & 0x00000000ffffffffULL) + ((val >> 32) & 0x00000000ffffffffULL);
+    val = (val & 0x5555555555555555ULL) + ((val >> 1) & 0x5555555555555555ULL);
+    val = (val & 0x3333333333333333ULL) + ((val >> 2) & 0x3333333333333333ULL);
+    val = (val + (val >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
+    val = (val * 0x0101010101010101ULL) >> 56;
 
     return val;
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (60 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop Richard Henderson
@ 2016-11-23 13:01 ` Richard Henderson
  2016-12-09 16:07   ` Alex Bennée
  2016-11-23 13:02 ` [Qemu-devel] [PATCH v4 63/64] tcg/ppc: Handle ctpop opcode Richard Henderson
                   ` (3 subsequent siblings)
  65 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Particularly when andc is also available, this is two insns
shorter than using clz to compute ctz.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.c | 107 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 65 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 6f4b1b6..d1debde 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -497,43 +497,46 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
         tcg_gen_extrl_i64_i32(ret, t1);
         tcg_temp_free_i64(t1);
         tcg_temp_free_i64(t2);
-    } else if (TCG_TARGET_HAS_clz_i32) {
-        TCGv_i32 t1 = tcg_temp_new_i32();
-        TCGv_i32 t2 = tcg_temp_new_i32();
-        tcg_gen_neg_i32(t1, arg1);
-        tcg_gen_xori_i32(t2, arg2, 31);
-        tcg_gen_and_i32(t1, t1, arg1);
-        tcg_gen_clz_i32(ret, t1, t2);
-        tcg_temp_free_i32(t1);
-        tcg_temp_free_i32(t2);
-        tcg_gen_xori_i32(ret, ret, 31);
-    } else if (TCG_TARGET_HAS_clz_i64) {
-        TCGv_i32 t1 = tcg_temp_new_i32();
-        TCGv_i32 t2 = tcg_temp_new_i32();
-        TCGv_i64 x1 = tcg_temp_new_i64();
-        TCGv_i64 x2 = tcg_temp_new_i64();
-        tcg_gen_neg_i32(t1, arg1);
-        tcg_gen_xori_i32(t2, arg2, 63);
-        tcg_gen_and_i32(t1, t1, arg1);
-        tcg_gen_extu_i32_i64(x1, t1);
-        tcg_gen_extu_i32_i64(x2, t2);
-        tcg_temp_free_i32(t1);
-        tcg_temp_free_i32(t2);
-        tcg_gen_clz_i64(x1, x1, x2);
-        tcg_gen_extrl_i64_i32(ret, x1);
-        tcg_temp_free_i64(x1);
-        tcg_temp_free_i64(x2);
-        tcg_gen_xori_i32(ret, ret, 63);
     } else {
-        gen_helper_ctz_i32(ret, arg1, arg2);
+        TCGv_i32 z, t;
+        if (TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32) {
+            t = tcg_temp_new_i32();
+            tcg_gen_subi_i32(t, arg1, 1);
+            tcg_gen_andc_i32(t, t, arg1);
+            tcg_gen_ctpop_i32(t, t);
+        do_movc:
+            z = tcg_const_i32(0);
+            tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
+            tcg_temp_free_i32(t);
+            tcg_temp_free_i32(z);
+        } else if (TCG_TARGET_HAS_clz_i32 || TCG_TARGET_HAS_clz_i64) {
+            /* Since all non-x86 hosts have clz(0) == 32, don't fight it.  */
+            t = tcg_temp_new_i32();
+            tcg_gen_neg_i32(t, arg1);
+            tcg_gen_and_i32(t, t, arg1);
+            tcg_gen_clzi_i32(t, t, 32);
+            tcg_gen_xori_i32(t, t, 31);
+            goto do_movc;
+        } else {
+            gen_helper_ctz_i32(ret, arg1, arg2);
+        }
     }
 }
 
 void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
 {
-    TCGv_i32 t = tcg_const_i32(arg2);
-    tcg_gen_ctz_i32(ret, arg1, t);
-    tcg_temp_free_i32(t);
+    if (!TCG_TARGET_HAS_ctz_i32 && TCG_TARGET_HAS_ctpop_i32 && arg2 == 32) {
+        /* This equivalence has the advantage of not requiring a fixup.  */
+        TCGv_i32 t = tcg_temp_new_i32();
+        tcg_gen_subi_i32(t, arg1, 1);
+        tcg_gen_andc_i32(t, t, arg1);
+        tcg_gen_ctpop_i32(ret, t);
+        tcg_temp_free_i32(t);
+    } else {
+        TCGv_i32 t = tcg_const_i32(arg2);
+        tcg_gen_ctz_i32(ret, arg1, t);
+        tcg_temp_free_i32(t);
+    }
 }
 
 void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
@@ -1842,18 +1845,29 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
     if (TCG_TARGET_HAS_ctz_i64) {
         tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
-    } else if (TCG_TARGET_HAS_clz_i64) {
-        TCGv_i64 t1 = tcg_temp_new_i64();
-        TCGv_i64 t2 = tcg_temp_new_i64();
-        tcg_gen_neg_i64(t1, arg1);
-        tcg_gen_xori_i64(t2, arg2, 63);
-        tcg_gen_and_i64(t1, t1, arg1);
-        tcg_gen_clz_i64(ret, t1, t2);
-        tcg_temp_free_i64(t1);
-        tcg_temp_free_i64(t2);
-        tcg_gen_xori_i64(ret, ret, 63);
     } else {
-        gen_helper_ctz_i64(ret, arg1, arg2);
+        TCGv_i64 z, t;
+        if (TCG_TARGET_HAS_ctpop_i64 && TCG_TARGET_HAS_andc_i64) {
+            t = tcg_temp_new_i64();
+            tcg_gen_subi_i64(t, arg1, 1);
+            tcg_gen_andc_i64(t, t, arg1);
+            tcg_gen_ctpop_i64(t, t);
+        do_movc:
+            z = tcg_const_i64(0);
+            tcg_gen_movcond_i64(TCG_COND_EQ, ret, arg1, z, arg2, t);
+            tcg_temp_free_i64(t);
+            tcg_temp_free_i64(z);
+        } else if (TCG_TARGET_HAS_clz_i64) {
+            /* Since all non-x86 hosts have clz(0) == 64, don't fight it.  */
+            t = tcg_temp_new_i64();
+            tcg_gen_neg_i64(t, arg1);
+            tcg_gen_and_i64(t, t, arg1);
+            tcg_gen_clzi_i64(t, t, 64);
+            tcg_gen_xori_i64(t, t, 63);
+            goto do_movc;
+        } else {
+            gen_helper_ctz_i64(ret, arg1, arg2);
+        }
     }
 }
 
@@ -1868,6 +1882,15 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
         tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
         tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
         tcg_temp_free_i32(t32);
+    } else if (!TCG_TARGET_HAS_ctz_i64
+               && TCG_TARGET_HAS_ctpop_i64
+               && arg2 == 64) {
+        /* This equivalence has the advantage of not requiring a fixup.  */
+        TCGv_i64 t = tcg_temp_new_i64();
+        tcg_gen_subi_i64(t, arg1, 1);
+        tcg_gen_andc_i64(t, t, arg1);
+        tcg_gen_ctpop_i64(ret, t);
+        tcg_temp_free_i64(t);
     } else {
         TCGv_i64 t64 = tcg_const_i64(arg2);
         tcg_gen_ctz_i64(ret, arg1, t64);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 63/64] tcg/ppc: Handle ctpop opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (61 preceding siblings ...)
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed Richard Henderson
@ 2016-11-23 13:02 ` Richard Henderson
  2016-11-23 13:02 ` [Qemu-devel] [PATCH v4 64/64] tcg/i386: " Richard Henderson
                   ` (2 subsequent siblings)
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     |  5 +++--
 tcg/ppc/tcg-target.inc.c | 12 +++++++++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 57e66cf..abd8b3d 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -49,6 +49,7 @@ typedef enum {
     TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
+extern bool have_isa_2_06;
 extern bool have_isa_3_00;
 
 /* optional instructions automatically implemented */
@@ -72,7 +73,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_clz_i32          1
 #define TCG_TARGET_HAS_ctz_i32          have_isa_3_00
-#define TCG_TARGET_HAS_ctpop_i32        0
+#define TCG_TARGET_HAS_ctpop_i32        have_isa_2_06
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -108,7 +109,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_clz_i64          1
 #define TCG_TARGET_HAS_ctz_i64          have_isa_3_00
-#define TCG_TARGET_HAS_ctpop_i64        0
+#define TCG_TARGET_HAS_ctpop_i64        have_isa_2_06
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 766bc1a..64f67d2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -83,7 +83,7 @@ static tcg_insn_unit *tb_ret_addr;
 
 #include "elf.h"
 
-static bool have_isa_2_06;
+bool have_isa_2_06;
 bool have_isa_3_00;
 
 #define HAVE_ISA_2_06  have_isa_2_06
@@ -457,6 +457,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define CNTLZD XO31( 58)
 #define CNTTZW XO31(538)
 #define CNTTZD XO31(570)
+#define CNTPOPW XO31(378)
+#define CNTPOPD XO31(506)
 #define ANDC   XO31( 60)
 #define ORC    XO31(412)
 #define EQV    XO31(284)
@@ -2149,6 +2151,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_cntxz(s, TCG_TYPE_I32, CNTTZW, args[0], args[1],
                       args[2], const_args[2]);
         break;
+    case INDEX_op_ctpop_i32:
+        tcg_out32(s, CNTPOPW | SAB(args[1], args[0], 0));
+        break;
 
     case INDEX_op_clz_i64:
         tcg_out_cntxz(s, TCG_TYPE_I64, CNTLZD, args[0], args[1],
@@ -2158,6 +2163,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_cntxz(s, TCG_TYPE_I64, CNTTZD, args[0], args[1],
                       args[2], const_args[2]);
         break;
+    case INDEX_op_ctpop_i64:
+        tcg_out32(s, CNTPOPD | SAB(args[1], args[0], 0));
+        break;
 
     case INDEX_op_mul_i32:
         a0 = args[0], a1 = args[1], a2 = args[2];
@@ -2573,6 +2581,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_nor_i32, { "r", "r", "r" } },
     { INDEX_op_clz_i32, { "r", "r", "rZW" } },
     { INDEX_op_ctz_i32, { "r", "r", "rZW" } },
+    { INDEX_op_ctpop_i32, { "r", "r" } },
 
     { INDEX_op_shl_i32, { "r", "r", "ri" } },
     { INDEX_op_shr_i32, { "r", "r", "ri" } },
@@ -2623,6 +2632,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_nor_i64, { "r", "r", "r" } },
     { INDEX_op_clz_i64, { "r", "r", "rZW" } },
     { INDEX_op_ctz_i64, { "r", "r", "rZW" } },
+    { INDEX_op_ctpop_i64, { "r", "r" } },
 
     { INDEX_op_shl_i64, { "r", "r", "ri" } },
     { INDEX_op_shr_i64, { "r", "r", "ri" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [Qemu-devel] [PATCH v4 64/64] tcg/i386: Handle ctpop opcode
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (62 preceding siblings ...)
  2016-11-23 13:02 ` [Qemu-devel] [PATCH v4 63/64] tcg/ppc: Handle ctpop opcode Richard Henderson
@ 2016-11-23 13:02 ` Richard Henderson
  2016-11-29 13:33 ` [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue no-reply
  2016-12-09 16:08 ` Alex Bennée
  65 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-23 13:02 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.bennee

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     |  5 +++--
 tcg/i386/tcg-target.inc.c | 12 +++++++++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b8f73f5..21d96ec 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -76,6 +76,7 @@ typedef enum {
 #endif
 
 extern bool have_bmi1;
+extern bool have_popcnt;
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32         1
@@ -95,7 +96,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_clz_i32          1
 #define TCG_TARGET_HAS_ctz_i32          1
-#define TCG_TARGET_HAS_ctpop_i32        0
+#define TCG_TARGET_HAS_ctpop_i32        have_popcnt
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -130,7 +131,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_clz_i64          1
 #define TCG_TARGET_HAS_ctz_i64          1
-#define TCG_TARGET_HAS_ctpop_i64        0
+#define TCG_TARGET_HAS_ctpop_i64        have_popcnt
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3650340..01177a9 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -130,9 +130,10 @@ static bool have_movbe;
 # define have_movbe 0
 #endif
 
-/* We need this symbol in tcg-target.h, and we can't properly conditionalize
+/* We need these symbols in tcg-target.h, and we can't properly conditionalize
    it there.  Therefore we always define the variable.  */
 bool have_bmi1;
+bool have_popcnt;
 
 #if defined(CONFIG_CPUID_H) && defined(bit_BMI2)
 static bool have_bmi2;
@@ -337,6 +338,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_MOVZBL	(0xb6 | P_EXT)
 #define OPC_MOVZWL	(0xb7 | P_EXT)
 #define OPC_POP_r32	(0x58)
+#define OPC_POPCNT      (0xb8 | P_EXT | P_SIMDF3)
 #define OPC_PUSH_r32	(0x50)
 #define OPC_PUSH_Iv	(0x68)
 #define OPC_PUSH_Ib	(0x6a)
@@ -2083,6 +2085,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     OP_32_64(clz):
         tcg_out_clz(s, rexw, args[0], args[1], args[2], const_args[2]);
         break;
+    OP_32_64(ctpop):
+        tcg_out_modrm(s, OPC_POPCNT + rexw, a0, a1);
+        break;
 
     case INDEX_op_brcond_i32:
         tcg_out_brcond32(s, a2, a0, a1, const_args[1], arg_label(args[3]), 0);
@@ -2398,6 +2403,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_extract_i32:
     case INDEX_op_extract_i64:
     case INDEX_op_sextract_i32:
+    case INDEX_op_ctpop_i32:
+    case INDEX_op_ctpop_i64:
         return &r_r;
 
     case INDEX_op_deposit_i32:
@@ -2602,6 +2609,9 @@ static void tcg_target_init(TCGContext *s)
            need to probe for it.  */
         have_movbe = (c & bit_MOVBE) != 0;
 #endif
+#ifdef bit_POPCNT
+        have_popcnt = (c & bit_POPCNT) != 0;
+#endif
     }
 
     if (max >= 7) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 35/64] target-tricore: Use clz opcode
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 35/64] target-tricore: Use clz opcode Richard Henderson
@ 2016-11-23 14:58   ` Bastian Koppelmann
  0 siblings, 0 replies; 102+ messages in thread
From: Bastian Koppelmann @ 2016-11-23 14:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 11/23/2016 02:01 PM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-tricore/helper.h    |  2 --
>  target-tricore/op_helper.c | 10 ----------
>  target-tricore/translate.c |  5 +++--
>  3 files changed, 3 insertions(+), 14 deletions(-)
> 

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Cheers,
    Bastian

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 52/64] target-tricore: Use clrsb helper
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 52/64] target-tricore: " Richard Henderson
@ 2016-11-23 14:58   ` Bastian Koppelmann
  0 siblings, 0 replies; 102+ messages in thread
From: Bastian Koppelmann @ 2016-11-23 14:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee

On 11/23/2016 02:01 PM, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-tricore/helper.h    | 1 -
>  target-tricore/op_helper.c | 5 -----
>  target-tricore/translate.c | 2 +-
>  3 files changed, 1 insertion(+), 7 deletions(-)
> 

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Cheers,
    Bastian

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/64] tcg/i386: Implement field extraction opcodes
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 07/64] tcg/i386: " Richard Henderson
@ 2016-11-25 11:16   ` Paolo Bonzini
  2016-11-25 11:21     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Paolo Bonzini @ 2016-11-25 11:16 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: alex.bennee



On 23/11/2016 14:01, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/i386/tcg-target.h     | 12 +++++++++---
>  tcg/i386/tcg-target.inc.c | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+), 3 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 7625188..dc19c47 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -94,8 +94,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> -#define TCG_TARGET_HAS_extract_i32      0
> -#define TCG_TARGET_HAS_sextract_i32     0
> +#define TCG_TARGET_HAS_extract_i32      1
> +#define TCG_TARGET_HAS_sextract_i32     1
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -126,7 +126,7 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
> -#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
> @@ -142,6 +142,12 @@ extern bool have_bmi1;
>       ((ofs) == 0 && (len) == 16))
>  #define TCG_TARGET_deposit_i64_valid    TCG_TARGET_deposit_i32_valid
>  
> +/* Check for the possibility of high-byte extraction and, for 64-bit,
> +   zero-extending 32-bit right-shift.  */
> +#define TCG_TARGET_extract_i32_valid(ofs, len) ((ofs) == 8 && (len) == 8)
> +#define TCG_TARGET_extract_i64_valid(ofs, len) \
> +    (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)

32-bit x86 can do an "extract2" with shld or shrd when the length is 32.
 I wonder if other architectures have a similar instruction, or if it
would be a useful addition.  With the length limited as in x86, it would
be a rehash of the trunc_shr_i32 instruction that was removed last year.

Paolo

>  #if TCG_TARGET_REG_BITS == 64
>  # define TCG_AREG0 TCG_REG_R14
>  #else
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index eeb1777..39f62bd 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -2143,6 +2143,40 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          }
>          break;
>  
> +    case INDEX_op_extract_i64:
> +        if (args[2] + args[3] == 32) {
> +            /* This is a 32-bit zero-extending right shift.  */
> +            tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
> +            tcg_out_shifti(s, SHIFT_SHR, args[0], args[2]);
> +            break;
> +        }
> +        /* FALLTHRU */
> +    case INDEX_op_extract_i32:
> +        /* On the off-chance that we can use the high-byte registers.
> +           Otherwise we emit the same ext16 + shift pattern that we
> +           would have gotten from the normal tcg-op.c expansion.  */
> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
> +        if (args[1] < 4 && args[0] < 8) {
> +            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
> +        } else {
> +            tcg_out_ext16u(s, args[0], args[1]);
> +            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
> +        }
> +        break;
> +
> +    case INDEX_op_sextract_i32:
> +        /* We don't implement sextract_i64, as we cannot sign-extend to
> +           64-bits without using the REX prefix that explicitly excludes
> +           access to the high-byte registers.  */
> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
> +        if (args[1] < 4 && args[0] < 8) {
> +            tcg_out_modrm(s, OPC_MOVSBL, args[0], args[1] + 4);
> +        } else {
> +            tcg_out_ext16s(s, args[0], args[1], 0);
> +            tcg_out_shifti(s, SHIFT_SAR, args[0], 8);
> +        }
> +        break;
> +
>      case INDEX_op_mb:
>          tcg_out_mb(s, args[0]);
>          break;
> @@ -2204,6 +2238,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { INDEX_op_setcond_i32, { "q", "r", "ri" } },
>  
>      { INDEX_op_deposit_i32, { "Q", "0", "Q" } },
> +    { INDEX_op_extract_i32, { "r", "r" } },
> +    { INDEX_op_sextract_i32, { "r", "r" } },
> +
>      { INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
>  
>      { INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
> @@ -2265,6 +2302,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { INDEX_op_extu_i32_i64, { "r", "r" } },
>  
>      { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
> +    { INDEX_op_extract_i64, { "r", "r" } },
>      { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
>  
>      { INDEX_op_mulu2_i64, { "a", "d", "a", "r" } },
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 07/64] tcg/i386: Implement field extraction opcodes
  2016-11-25 11:16   ` Paolo Bonzini
@ 2016-11-25 11:21     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-11-25 11:21 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: alex.bennee

On 11/25/2016 12:16 PM, Paolo Bonzini wrote:
>
>
> On 23/11/2016 14:01, Richard Henderson wrote:
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  tcg/i386/tcg-target.h     | 12 +++++++++---
>>  tcg/i386/tcg-target.inc.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 47 insertions(+), 3 deletions(-)
>>
>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>> index 7625188..dc19c47 100644
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -94,8 +94,8 @@ extern bool have_bmi1;
>>  #define TCG_TARGET_HAS_nand_i32         0
>>  #define TCG_TARGET_HAS_nor_i32          0
>>  #define TCG_TARGET_HAS_deposit_i32      1
>> -#define TCG_TARGET_HAS_extract_i32      0
>> -#define TCG_TARGET_HAS_sextract_i32     0
>> +#define TCG_TARGET_HAS_extract_i32      1
>> +#define TCG_TARGET_HAS_sextract_i32     1
>>  #define TCG_TARGET_HAS_movcond_i32      1
>>  #define TCG_TARGET_HAS_add2_i32         1
>>  #define TCG_TARGET_HAS_sub2_i32         1
>> @@ -126,7 +126,7 @@ extern bool have_bmi1;
>>  #define TCG_TARGET_HAS_nand_i64         0
>>  #define TCG_TARGET_HAS_nor_i64          0
>>  #define TCG_TARGET_HAS_deposit_i64      1
>> -#define TCG_TARGET_HAS_extract_i64      0
>> +#define TCG_TARGET_HAS_extract_i64      1
>>  #define TCG_TARGET_HAS_sextract_i64     0
>>  #define TCG_TARGET_HAS_movcond_i64      1
>>  #define TCG_TARGET_HAS_add2_i64         1
>> @@ -142,6 +142,12 @@ extern bool have_bmi1;
>>       ((ofs) == 0 && (len) == 16))
>>  #define TCG_TARGET_deposit_i64_valid    TCG_TARGET_deposit_i32_valid
>>
>> +/* Check for the possibility of high-byte extraction and, for 64-bit,
>> +   zero-extending 32-bit right-shift.  */
>> +#define TCG_TARGET_extract_i32_valid(ofs, len) ((ofs) == 8 && (len) == 8)
>> +#define TCG_TARGET_extract_i64_valid(ofs, len) \
>> +    (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
>
> 32-bit x86 can do an "extract2" with shld or shrd when the length is 32.
>  I wonder if other architectures have a similar instruction, or if it
> would be a useful addition.  With the length limited as in x86, it would
> be a rehash of the trunc_shr_i32 instruction that was removed last year.

Lots of architectures can do a double-word shift like shrd.  On x86, it turns 
out to be slow (for whatever silly architectural reason), so I've never pursued 
that.

As for 32-bit x86, in this context, it will never be presented with a 64-bit 
extract.  That's 64-bit only.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (63 preceding siblings ...)
  2016-11-23 13:02 ` [Qemu-devel] [PATCH v4 64/64] tcg/i386: " Richard Henderson
@ 2016-11-29 13:33 ` no-reply
  2016-12-09 16:08 ` Alex Bennée
  65 siblings, 0 replies; 102+ messages in thread
From: no-reply @ 2016-11-29 13:33 UTC (permalink / raw)
  To: rth; +Cc: famz, qemu-devel, alex.bennee

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue
Type: series
Message-id: 1479906121-12211-1-git-send-email-rth@twiddle.net

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
d9cf44b tcg/i386: Handle ctpop opcode
2135740 tcg/ppc: Handle ctpop opcode
01fe85d tcg: Use ctpop to generate ctz if needed
50c5255 qemu/host-utils.h: Reduce the operation count in the fallback ctpop
d3a1e47 target-i386: Use ctpop helper
bfb17b0 target-tilegx: Use ctpop helper
62f5c57 target-sparc: Use ctpop helper
235a776 target-s390x: Avoid a loop for popcnt
f06956a target-ppc: Use ctpop helper
a34a8e2 target-alpha: Use ctpop helper
da33f5a tcg: Add opcode for ctpop
55e9496 target-xtensa: Use clrsb helper
af0a1ac target-tricore: Use clrsb helper
f5ef084 target-arm: Use clrsb helper
30810b8 tcg: Add helpers for clrsb
3789563 tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR
d619f8b tcg/i386: Handle ctz and clz opcodes
e933f33 tcg/i386: Allow bmi2 shiftx to have non-matching operands
6ec3b09 tcg/i386: Hoist common arguments in tcg_out_op
1aeb285 tcg/i386: Fuly convert tcg_target_op_def
79d13c2 tcg/s390: Handle clz opcode
9dde0af tcg/mips: Handle clz opcode
2a5b955 tcg/arm: Handle ctz and clz opcodes
3bdff99 tcg/aarch64: Handle ctz and clz opcodes
328af62 tcg/ppc: Handle ctz and clz opcodes
e87ef0c target-i386: Use clz and ctz opcodes
d655fb7 target-arm: Use clz opcode
ececb9b target-xtensa: Use clz opcode
cd2de8a target-unicore32: Use clz opcode
a642147 target-tricore: Use clz opcode
ef908df target-tilegx: Use clz and ctz opcodes
9befbbf target-s390x: Use clz opcode
08c3fec target-ppc: Use clz and ctz opcodes
02399d7 target-openrisc: Use clz and ctz opcodes
1c3d6ee target-mips: Use clz opcode
02d3b77 target-microblaze: Use clz opcode
a529119 target-cris: Use clz opcode
b3fcec6 target-alpha: Use the ctz and clz opcodes
fec845f disas/ppc: Handle popcnt and cnttz
56b7fce disas/i386.c: Handle tzcnt
fb98cd3 tcg: Add clz and ctz opcodes
6714ebd tcg: Allow an operand to be matching or a constant
5e21b56 tcg: Pass the opcode width to target_parse_constraint
8027172 tcg: Transition flat op_defs array to a target callback
f2ca9f9 tcg: Add markup for output requires new register
edd62f8 tcg/optimize: Fold movcond 0/1 into setcond
73c88ab target-s390x: Use the new deposit and extract ops
34a03f9 target-ppc: Use the new deposit and extract ops
9ce784b target-mips: Use the new extract op
b1a7872 target-i386: Use new deposit and extract ops
af37128 target-arm: Use new deposit and extract ops
7b44604 target-alpha: Use deposit and extract ops
941d1b3 tcg/s390: Support deposit into zero
2817b7a tcg/s390: Implement field extraction opcodes
d96c84d tcg/s390: Expose host facilities to tcg-target.h
0b19a3e tcg/ppc: Implement field extraction opcodes
ee17a4a tcg/mips: Implement field extraction opcodes
1c241a4 tcg/i386: Implement field extraction opcodes
8e64b7a tcg/arm: Implement field extraction opcodes
cf7b4c2 tcg/arm: Move isa detection to tcg-target.h
2580ec1 tcg/aarch64: Implement field extraction opcodes
1d8166c tcg: Add deposit_z expander
0850885 tcg: Minor adjustments to deposit expanders
886686d tcg: Add field extraction primitives

=== OUTPUT BEGIN ===
Checking PATCH 1/64: tcg: Add field extraction primitives...
ERROR: spaces required around that ':' (ctx:VxE)
#139: FILE: tcg/optimize.c:881:
+        CASE_OP_32_64(extract):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#145: FILE: tcg/optimize.c:887:
+        CASE_OP_32_64(sextract):
                                ^

ERROR: spaces required around that ':' (ctx:VxE)
#159: FILE: tcg/optimize.c:1064:
+        CASE_OP_32_64(extract):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#167: FILE: tcg/optimize.c:1072:
+        CASE_OP_32_64(sextract):
                                ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#271: FILE: tcg/tcg-op.c:582:
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#334: FILE: tcg/tcg-op.c:645:
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#420: FILE: tcg/tcg-op.c:1799:
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#526: FILE: tcg/tcg-op.c:1905:
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
         ^

total: 8 errors, 0 warnings, 599 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/64: tcg: Minor adjustments to deposit expanders...
Checking PATCH 3/64: tcg: Add deposit_z expander...
ERROR: space prohibited after that '&&' (ctx:ExW)
#33: FILE: tcg/tcg-op.c:577:
+               && TCG_TARGET_deposit_i32_valid(ofs, len)) {
                ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#98: FILE: tcg/tcg-op.c:1836:
+               && TCG_TARGET_deposit_i64_valid(ofs, len)) {
                ^

total: 2 errors, 0 warnings, 185 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/64: tcg/aarch64: Implement field extraction opcodes...
Checking PATCH 5/64: tcg/arm: Move isa detection to tcg-target.h...
WARNING: architecture specific defines should be avoided
#18: FILE: tcg/arm/tcg-target.h:30:
+#ifndef __ARM_ARCH

WARNING: architecture specific defines should be avoided
#19: FILE: tcg/arm/tcg-target.h:31:
+# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \

WARNING: architecture specific defines should be avoided
#38: FILE: tcg/arm/tcg-target.h:50:
+#if defined(__ARM_ARCH_5T__) \

total: 0 errors, 3 warnings, 107 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 6/64: tcg/arm: Implement field extraction opcodes...
Checking PATCH 7/64: tcg/i386: Implement field extraction opcodes...
Checking PATCH 8/64: tcg/mips: Implement field extraction opcodes...
Checking PATCH 9/64: tcg/ppc: Implement field extraction opcodes...
Checking PATCH 10/64: tcg/s390: Expose host facilities to tcg-target.h...
Checking PATCH 11/64: tcg/s390: Implement field extraction opcodes...
ERROR: spaces required around that ':' (ctx:VxE)
#52: FILE: tcg/s390/tcg-target.inc.c:2162:
+    OP_32_64(extract):
                      ^

total: 1 errors, 0 warnings, 51 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 12/64: tcg/s390: Support deposit into zero...
Checking PATCH 13/64: target-alpha: Use deposit and extract ops...
Checking PATCH 14/64: target-arm: Use new deposit and extract ops...
Checking PATCH 15/64: target-i386: Use new deposit and extract ops...
Checking PATCH 16/64: target-mips: Use the new extract op...
Checking PATCH 17/64: target-ppc: Use the new deposit and extract ops...
Checking PATCH 18/64: target-s390x: Use the new deposit and extract ops...
Checking PATCH 19/64: tcg/optimize: Fold movcond 0/1 into setcond...
Checking PATCH 20/64: tcg: Add markup for output requires new register...
Checking PATCH 21/64: tcg: Transition flat op_defs array to a target callback...
Checking PATCH 22/64: tcg: Pass the opcode width to target_parse_constraint...
ERROR: space required before the open parenthesis '('
#102: FILE: tcg/i386/tcg-target.inc.c:172:
+    switch(*ct_str++) {

ERROR: space required before the open parenthesis '('
#136: FILE: tcg/ia64/tcg-target.inc.c:727:
+    switch(*ct_str++) {

ERROR: space required before the open parenthesis '('
#170: FILE: tcg/mips/tcg-target.inc.c:173:
+    switch(*ct_str++) {

total: 3 errors, 0 warnings, 289 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 23/64: tcg: Allow an operand to be matching or a constant...
ERROR: space required before the open parenthesis '('
#51: FILE: tcg/tcg.c:1260:
+                switch(*ct_str) {

total: 1 errors, 0 warnings, 76 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 24/64: tcg: Add clz and ctz opcodes...
ERROR: spaces required around that ':' (ctx:VxE)
#188: FILE: tcg/optimize.c:1077:
+        CASE_OP_32_64(clz):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#189: FILE: tcg/optimize.c:1078:
+        CASE_OP_32_64(ctz):
                           ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#378: FILE: tcg/tcg-op.c:1798:
+        && arg2 <= 0xffffffffu) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#415: FILE: tcg/tcg-op.c:1835:
+        && arg2 <= 0xffffffffu) {
         ^

total: 4 errors, 0 warnings, 446 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 25/64: disas/i386.c: Handle tzcnt...
ERROR: Macros with complex values should be enclosed in parenthesis
#17: FILE: disas/i386.c:685:
+#define PREGRP107 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 107 } }

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
             ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
               ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                 ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                   ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                     ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                       ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                         ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                           ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                             ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                               ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                                 ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                                   ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                                     ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                                       ^

ERROR: space required after that ',' (ctx:VxV)
#35: FILE: disas/i386.c:1435:
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
                                         ^

ERROR: code indent should never use tabs
#45: FILE: disas/i386.c:2806:
+    { "bsfS",^I{ Gv, Ev } },$

ERROR: code indent should never use tabs
#46: FILE: disas/i386.c:2807:
+    { "tzcntS",^I{ Gv, Ev } },$

ERROR: code indent should never use tabs
#47: FILE: disas/i386.c:2808:
+    { "bsfS",^I{ Gv, Ev } },$

ERROR: code indent should never use tabs
#48: FILE: disas/i386.c:2809:
+    { "(bad)",^I{ XX } },$

total: 20 errors, 0 warnings, 36 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 26/64: disas/ppc: Handle popcnt and cnttz...
ERROR: space required after that ',' (ctx:VxV)
#27: FILE: disas/ppc.c:3595:
+{ "popcntw", X(31,378), XRB_MASK,       POWER7,         { RA, RS } },
                  ^

ERROR: space required after that ',' (ctx:VxV)
#28: FILE: disas/ppc.c:3596:
+{ "popcntd", X(31,506), XRB_MASK,       POWER7,         { RA, RS } },
                  ^

ERROR: space required after that ',' (ctx:VxV)
#30: FILE: disas/ppc.c:3598:
+{ "cnttzw",  XRC(31,538,0), XRB_MASK,   POWER9,         { RA, RS } },
                    ^

ERROR: space required after that ',' (ctx:VxV)
#30: FILE: disas/ppc.c:3598:
+{ "cnttzw",  XRC(31,538,0), XRB_MASK,   POWER9,         { RA, RS } },
                        ^

ERROR: space required after that ',' (ctx:VxV)
#31: FILE: disas/ppc.c:3599:
+{ "cnttzw.", XRC(31,538,1), XRB_MASK,   POWER9,         { RA, RS } },
                    ^

ERROR: space required after that ',' (ctx:VxV)
#31: FILE: disas/ppc.c:3599:
+{ "cnttzw.", XRC(31,538,1), XRB_MASK,   POWER9,         { RA, RS } },
                        ^

ERROR: space required after that ',' (ctx:VxV)
#32: FILE: disas/ppc.c:3600:
+{ "cnttzd",  XRC(31,570,0), XRB_MASK,   POWER9,         { RA, RS } },
                    ^

ERROR: space required after that ',' (ctx:VxV)
#32: FILE: disas/ppc.c:3600:
+{ "cnttzd",  XRC(31,570,0), XRB_MASK,   POWER9,         { RA, RS } },
                        ^

ERROR: space required after that ',' (ctx:VxV)
#33: FILE: disas/ppc.c:3601:
+{ "cnttzd.", XRC(31,570,1), XRB_MASK,   POWER9,         { RA, RS } },
                    ^

ERROR: space required after that ',' (ctx:VxV)
#33: FILE: disas/ppc.c:3601:
+{ "cnttzd.", XRC(31,570,1), XRB_MASK,   POWER9,         { RA, RS } },
                        ^

total: 10 errors, 0 warnings, 22 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 27/64: target-alpha: Use the ctz and clz opcodes...
Checking PATCH 28/64: target-cris: Use clz opcode...
Checking PATCH 29/64: target-microblaze: Use clz opcode...
Checking PATCH 30/64: target-mips: Use clz opcode...
Checking PATCH 31/64: target-openrisc: Use clz and ctz opcodes...
Checking PATCH 32/64: target-ppc: Use clz and ctz opcodes...
Checking PATCH 33/64: target-s390x: Use clz opcode...
Checking PATCH 34/64: target-tilegx: Use clz and ctz opcodes...
Checking PATCH 35/64: target-tricore: Use clz opcode...
Checking PATCH 36/64: target-unicore32: Use clz opcode...
Checking PATCH 37/64: target-xtensa: Use clz opcode...
Checking PATCH 38/64: target-arm: Use clz opcode...
ERROR: trailing statements should be on next line
#122: FILE: target-arm/translate.c:7040:
+                            case 2: tcg_gen_clzi_i32(tmp, tmp, 32); break;

total: 1 errors, 0 warnings, 100 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 39/64: target-i386: Use clz and ctz opcodes...
Checking PATCH 40/64: tcg/ppc: Handle ctz and clz opcodes...
Checking PATCH 41/64: tcg/aarch64: Handle ctz and clz opcodes...
Checking PATCH 42/64: tcg/arm: Handle ctz and clz opcodes...
Checking PATCH 43/64: tcg/mips: Handle clz opcode...
Checking PATCH 44/64: tcg/s390: Handle clz opcode...
Checking PATCH 45/64: tcg/i386: Fuly convert tcg_target_op_def...
Checking PATCH 46/64: tcg/i386: Hoist common arguments in tcg_out_op...
Checking PATCH 47/64: tcg/i386: Allow bmi2 shiftx to have non-matching operands...
Checking PATCH 48/64: tcg/i386: Handle ctz and clz opcodes...
ERROR: spaces required around that ':' (ctx:VxE)
#219: FILE: tcg/i386/tcg-target.inc.c:2071:
+    OP_32_64(ctz):
                  ^

ERROR: spaces required around that ':' (ctx:VxE)
#222: FILE: tcg/i386/tcg-target.inc.c:2074:
+    OP_32_64(clz):
                  ^

total: 2 errors, 0 warnings, 237 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 49/64: tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR...
ERROR: spaces required around that ':' (ctx:VxW)
#50: FILE: tcg/i386/tcg-target.inc.c:1171:
+        TCGType type = rexw ? TCG_TYPE_I64: TCG_TYPE_I32;
                                           ^

total: 1 errors, 0 warnings, 65 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 50/64: tcg: Add helpers for clrsb...
Checking PATCH 51/64: target-arm: Use clrsb helper...
Checking PATCH 52/64: target-tricore: Use clrsb helper...
Checking PATCH 53/64: target-xtensa: Use clrsb helper...
Checking PATCH 54/64: tcg: Add opcode for ctpop...
ERROR: spaces required around that ':' (ctx:VxE)
#146: FILE: tcg/optimize.c:1047:
+        CASE_OP_32_64(ctpop):
                             ^

total: 1 errors, 0 warnings, 252 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 55/64: target-alpha: Use ctpop helper...
Checking PATCH 56/64: target-ppc: Use ctpop helper...
Checking PATCH 57/64: target-s390x: Avoid a loop for popcnt...
Checking PATCH 58/64: target-sparc: Use ctpop helper...
Checking PATCH 59/64: target-tilegx: Use ctpop helper...
Checking PATCH 60/64: target-i386: Use ctpop helper...
Checking PATCH 61/64: qemu/host-utils.h: Reduce the operation count in the fallback ctpop...
Checking PATCH 62/64: tcg: Use ctpop to generate ctz if needed...
ERROR: space prohibited after that '&&' (ctx:ExW)
#140: FILE: tcg/tcg-op.c:1886:
+               && TCG_TARGET_HAS_ctpop_i64
                ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#141: FILE: tcg/tcg-op.c:1887:
+               && arg2 == 64) {
                ^

total: 2 errors, 0 warnings, 132 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 63/64: tcg/ppc: Handle ctpop opcode...
Checking PATCH 64/64: tcg/i386: Handle ctpop opcode...
ERROR: spaces required around that ':' (ctx:VxE)
#67: FILE: tcg/i386/tcg-target.inc.c:2088:
+    OP_32_64(ctpop):
                    ^

total: 1 errors, 0 warnings, 67 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/64] target-arm: Use new deposit and extract ops
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 14/64] target-arm: Use new " Richard Henderson
@ 2016-12-01 17:19   ` Alex Bennée
  2016-12-03 21:01     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-01 17:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Use the new primitives for UBFX and SBFX.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-arm/translate-a64.c | 79 +++++++++++++++-------------------------------
>  target-arm/translate.c     | 37 +++++-----------------
>  2 files changed, 34 insertions(+), 82 deletions(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index de48747..e90487b 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -3219,67 +3219,40 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
>         low 32-bits anyway.  */
>      tcg_tmp = read_cpu_reg(s, rn, 1);
>
> -    /* Recognize the common aliases.  */
> -    if (opc == 0) { /* SBFM */
> -        if (ri == 0) {
> -            if (si == 7) { /* SXTB */
> -                tcg_gen_ext8s_i64(tcg_rd, tcg_tmp);
> -                goto done;
> -            } else if (si == 15) { /* SXTH */
> -                tcg_gen_ext16s_i64(tcg_rd, tcg_tmp);
> -                goto done;
> -            } else if (si == 31) { /* SXTW */
> -                tcg_gen_ext32s_i64(tcg_rd, tcg_tmp);
> -                goto done;
> -            }
> -        }
> -        if (si == 63 || (si == 31 && ri <= si)) { /* ASR */
> -            if (si == 31) {
> -                tcg_gen_ext32s_i64(tcg_tmp, tcg_tmp);
> -            }
> -            tcg_gen_sari_i64(tcg_rd, tcg_tmp, ri);
> +    /* Recognize simple(r) extractions.  */
> +    if (ri <= si) {
> +        int len = (si - ri) + 1;

This is confusing as you have now aliased with len above.

> +        if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
> +            tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
>              goto done;
> -        }
> -    } else if (opc == 2) { /* UBFM */
> -        if (ri == 0) { /* UXTB, UXTH, plus non-canonical AND */
> -            tcg_gen_andi_i64(tcg_rd, tcg_tmp, bitmask64(si + 1));
> -            return;
> -        }
> -        if (si == 63 || (si == 31 && ri <= si)) { /* LSR */
> -            if (si == 31) {
> -                tcg_gen_ext32u_i64(tcg_tmp, tcg_tmp);
> -            }
> -            tcg_gen_shri_i64(tcg_rd, tcg_tmp, ri);
> +        } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
> +            tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
>              return;
>          }
> -        if (si + 1 == ri && si != bitsize - 1) { /* LSL */
> -            int shift = bitsize - 1 - si;
> -            tcg_gen_shli_i64(tcg_rd, tcg_tmp, shift);
> -            goto done;
> -        }
>      }
>
> -    if (opc != 1) { /* SBFM or UBFM */
> -        tcg_gen_movi_i64(tcg_rd, 0);
> -    }
> +    /* Do the bit move operation.  Note that above we handled ri <= si,
> +       Wd<s-r:0> = Wn<s:r>, via tcg_gen_*extract_i64.  Now we handle
> +       the ri > si case, Wd<32+s-r,32-r> = Wn<s:0>, via deposit.  */
> +    pos = (bitsize - ri) & (bitsize - 1);
> +    len = si + 1;

The comment implies this is for the ri > si case but you'll still catch
ri <= si for opc = 1, e.g.:

  0x331168a7      bfxil w7, w5, #17, #10

>
> -    /* do the bit move operation */
> -    if (si >= ri) {

In fact we seem to have subtly reversed the test here but ri <= si is
not exactly equivalent to si >= ri.

My version is as follows:

    /* Recognize simple(r) extractions.  */
    if (si >= ri) {
        /* Wd<s-r:0> = Wn<s:r> */
        len = (si - ri) + 1;
        if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
            tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
            goto done;
        } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
            tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
            return;
        }
        /* opc == 1, BXFIL fall through to deposit */
        tcg_gen_extract_i64(tcg_tmp, tcg_tmp, ri, len);
        pos = 0;
    } else {
        /* Handle the ri > si case with a deposit
         * Wd<32+s-r,32-r> = Wn<s:0>
         */
        len = si + 1;
        pos = (bitsize - ri) & (bitsize - 1);
    }

I've tested that with risu and all the bitfield tests seem ok. The full
patch on top of your commit was:

target-arm: fix bxfil case

1 file changed, 13 insertions(+), 9 deletions(-)
target-arm/translate-a64.c | 22 +++++++++++++---------

modified   target-arm/translate-a64.c
@@ -3220,8 +3220,9 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
     tcg_tmp = read_cpu_reg(s, rn, 1);

     /* Recognize simple(r) extractions.  */
-    if (ri <= si) {
-        int len = (si - ri) + 1;
+    if (si >= ri) {
+        /* Wd<s-r:0> = Wn<s:r> */
+        len = (si - ri) + 1;
         if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
             tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
             goto done;
@@ -3229,14 +3230,17 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
             tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
             return;
         }
+        /* opc == 1, BXFIL fall through to deposit */
+        tcg_gen_extract_i64(tcg_tmp, tcg_tmp, ri, len);
+        pos = 0;
+    } else {
+        /* Handle the ri > si case with a deposit
+         * Wd<32+s-r,32-r> = Wn<s:0>
+         */
+        len = si + 1;
+        pos = (bitsize - ri) & (bitsize - 1);
     }

-    /* Do the bit move operation.  Note that above we handled ri <= si,
-       Wd<s-r:0> = Wn<s:r>, via tcg_gen_*extract_i64.  Now we handle
-       the ri > si case, Wd<32+s-r,32-r> = Wn<s:0>, via deposit.  */
-    pos = (bitsize - ri) & (bitsize - 1);
-    len = si + 1;
-
     if (opc == 0 && len < ri) {
         /* SBFM: sign extend the destination field from len to fill
            the balance of the word.  Let the deposit below insert all
@@ -3245,7 +3249,7 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
         len = ri;
     }

-    if (opc == 1) { /* BFM */
+    if (opc == 1) { /* BFM, BXFIL */
         tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
     } else {
         /* SBFM or UBFM: We start with zero, and we haven't modified
--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: Handle ctz and clz opcodes
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: " Richard Henderson
@ 2016-12-01 18:36   ` Alex Bennée
  2016-12-01 18:44     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-01 18:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.h     |  8 ++++----
>  tcg/aarch64/tcg-target.inc.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 976f493..9d6b00f 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -62,8 +62,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i32          1
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> -#define TCG_TARGET_HAS_clz_i32          0
> -#define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_clz_i32          1
> +#define TCG_TARGET_HAS_ctz_i32          1
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -96,8 +96,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i64          1
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> -#define TCG_TARGET_HAS_clz_i64          0
> -#define TCG_TARGET_HAS_ctz_i64          0
> +#define TCG_TARGET_HAS_clz_i64          1
> +#define TCG_TARGET_HAS_ctz_i64          1
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     1
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 17c0b20..91345fc 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -201,6 +201,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>      if ((ct & TCG_CT_CONST_MONE) && val == -1) {
>          return 1;
>      }
> +    if ((ct & TCG_CT_CONST_WSZ) && val == (type ? 64 : 32)) {
> +        return 1;
> +    }
>

Did this sneak in again? This break the aarch64 build due to the missing
constant.

>      return 0;
>  }
> @@ -339,8 +342,12 @@ typedef enum {
>      /* Conditional select instructions.  */
>      I3506_CSEL      = 0x1a800000,
>      I3506_CSINC     = 0x1a800400,
> +    I3506_CSINV     = 0x5a800000,
> +    I3506_CSNEG     = 0x5a800400,
>
>      /* Data-processing (1 source) instructions.  */
> +    I3507_CLZ       = 0x5ac01000,
> +    I3507_RBIT      = 0x5ac00000,
>      I3507_REV16     = 0x5ac00400,
>      I3507_REV32     = 0x5ac00800,
>      I3507_REV64     = 0x5ac00c00,
> @@ -993,6 +1000,32 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
>      tcg_out32(s, sync[a0 & TCG_MO_ALL]);
>  }
>
> +static void tcg_out_clz(TCGContext *s, TCGType ext, TCGReg d,
> +                        TCGReg a, TCGArg b, bool const_b)
> +{
> +    if (const_b && b == (ext ? 64 : 32)) {
> +        tcg_out_insn(s, 3507, CLZ, ext, d, a);
> +    } else {
> +        AArch64Insn sel = I3506_CSEL;
> +
> +        tcg_out_cmp(s, ext, a, 0, 1);
> +        tcg_out_insn(s, 3507, CLZ, ext, TCG_REG_TMP, a);
> +
> +        if (const_b) {
> +            if (b == -1) {
> +                b = TCG_REG_XZR;
> +                sel = I3506_CSINV;
> +            } else if (b == 0) {
> +                b = TCG_REG_XZR;
> +            } else {
> +                tcg_out_movi(s, ext, d, b);
> +                b = d;
> +            }
> +        }
> +        tcg_out_insn_3506(s, sel, ext, d, TCG_REG_TMP, b, TCG_COND_NE);
> +    }
> +}
> +
>  #ifdef CONFIG_SOFTMMU
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     TCGMemOpIdx oi, uintptr_t ra)
> @@ -1559,6 +1592,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          }
>          break;
>
> +    case INDEX_op_clz_i64:
> +    case INDEX_op_clz_i32:
> +        tcg_out_clz(s, ext, a0, a1, a2, c2);
> +        break;
> +    case INDEX_op_ctz_i64:
> +    case INDEX_op_ctz_i32:
> +        tcg_out_insn(s, 3507, RBIT, ext, TCG_REG_TMP, a1);
> +        tcg_out_clz(s, ext, a0, TCG_REG_TMP, a2, c2);
> +        break;
> +
>      case INDEX_op_brcond_i32:
>          a1 = (int32_t)a1;
>          /* FALLTHRU */
> @@ -1750,11 +1793,15 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_sar_i32, { "r", "r", "ri" } },
>      { INDEX_op_rotl_i32, { "r", "r", "ri" } },
>      { INDEX_op_rotr_i32, { "r", "r", "ri" } },
> +    { INDEX_op_clz_i32, { "r", "r", "rAL" } },
> +    { INDEX_op_ctz_i32, { "r", "r", "rAL" } },
>      { INDEX_op_shl_i64, { "r", "r", "ri" } },
>      { INDEX_op_shr_i64, { "r", "r", "ri" } },
>      { INDEX_op_sar_i64, { "r", "r", "ri" } },
>      { INDEX_op_rotl_i64, { "r", "r", "ri" } },
>      { INDEX_op_rotr_i64, { "r", "r", "ri" } },
> +    { INDEX_op_clz_i64, { "r", "r", "rAL" } },
> +    { INDEX_op_ctz_i64, { "r", "r", "rAL" } },
>
>      { INDEX_op_brcond_i32, { "r", "rA" } },
>      { INDEX_op_brcond_i64, { "r", "rA" } },


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: Handle ctz and clz opcodes
  2016-12-01 18:36   ` Alex Bennée
@ 2016-12-01 18:44     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-01 18:44 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/01/2016 10:36 AM, Alex Bennée wrote:
> Did this sneak in again? This break the aarch64 build due to the missing
> constant.

Gah, it may have.  I probably forgot to push the change from the rh pool
machine that I had checked out.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/64] target-arm: Use new deposit and extract ops
  2016-12-01 17:19   ` Alex Bennée
@ 2016-12-03 21:01     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-03 21:01 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/01/2016 09:19 AM, Alex Bennée wrote:
> In fact we seem to have subtly reversed the test here but ri <= si is
> not exactly equivalent to si >= ri.
>
> My version is as follows:
>
>     /* Recognize simple(r) extractions.  */
>     if (si >= ri) {
>         /* Wd<s-r:0> = Wn<s:r> */
>         len = (si - ri) + 1;
>         if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
>             tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
>             goto done;
>         } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
>             tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
>             return;
>         }
>         /* opc == 1, BXFIL fall through to deposit */
>         tcg_gen_extract_i64(tcg_tmp, tcg_tmp, ri, len);
>         pos = 0;
>     } else {
>         /* Handle the ri > si case with a deposit
>          * Wd<32+s-r,32-r> = Wn<s:0>
>          */
>         len = si + 1;
>         pos = (bitsize - ri) & (bitsize - 1);
>     }
>
> I've tested that with risu and all the bitfield tests seem ok.

Thanks.  Merged.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives Richard Henderson
@ 2016-12-05 13:17   ` Alex Bennée
  2016-12-05 15:14     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-05 13:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
> fixed position bitfields, much like we already have for deposit.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/README               |  20 ++-
>  tcg/aarch64/tcg-target.h |   4 +
>  tcg/arm/tcg-target.h     |   2 +
>  tcg/i386/tcg-target.h    |   4 +
>  tcg/ia64/tcg-target.h    |   4 +
>  tcg/mips/tcg-target.h    |   2 +
>  tcg/optimize.c           |  29 +++++
>  tcg/ppc/tcg-target.h     |   4 +
>  tcg/s390/tcg-target.h    |   4 +
>  tcg/sparc/tcg-target.h   |   4 +
>  tcg/tcg-op.c             | 323 +++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op.h             |  12 ++
>  tcg/tcg-opc.h            |   4 +
>  tcg/tcg.h                |   8 ++
>  tcg/tci/tcg-target.h     |   4 +
>  15 files changed, 426 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/README b/tcg/README
> index ae31388..065d9c2 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -314,11 +314,27 @@ The bitfield is described by POS/LEN, which are immediate values:
>    LEN - the length of the bitfield
>    POS - the position of the first bit, counting from the LSB
>
> -For example, pos=8, len=4 indicates a 4-bit field at bit 8.
> -This operation would be equivalent to
> +For example, "deposit_i32 dest, t1, t2, 8, 4" indicates a 4-bit field
> +at bit 8.  This operation would be equivalent to
>
>    dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
>
> +* extract_i32/i64 dest, t1, pos, len
> +* sextract_i32/i64 dest, t1, pos, len
> +
> +Extract a bitfield from T1, placing the result in DEST.
> +The bitfield is described by POS/LEN, which are immediate values,
> +as above for deposit.  For extract_*, the result will be extended
> +to the left with zeros; for sextract_*, the result will be extended
> +to the left with copies of the bitfield sign bit at pos + len - 1.
> +
> +For example, "sextract_i32 dest, t1, 8, 4" indicates a 4-bit field
> +at bit 8.  This operation would be equivalent to
> +
> +  dest = (t1 << 20) >> 28
> +
> +(using an arithmetic right shift).
> +
>  * extrl_i64_i32 t0, t1
>
>  For 64-bit hosts only, extract the low 32-bits of input T1 and place it
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index a1d101f..410c31b 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -63,6 +63,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -93,6 +95,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index a0e1acf..8e724be 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -80,6 +80,8 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_mulu2_i32        1
>  #define TCG_TARGET_HAS_muls2_i32        1
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 524cfc6..7625188 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -94,6 +94,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -124,6 +126,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 6dddb7f..8856dc8 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -149,6 +149,10 @@ typedef enum {
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_deposit_i64      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i32     0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_add2_i32         0
>  #define TCG_TARGET_HAS_add2_i64         0
>  #define TCG_TARGET_HAS_sub2_i32         0
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index 3aeac87..1bcea3b 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -123,6 +123,8 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
>  #define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
>  #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
>  #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
>  #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 0f13490..f41ed2c 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -878,6 +878,19 @@ void tcg_optimize(TCGContext *s)
>                               temps[args[2]].mask);
>              break;
>
> +        CASE_OP_32_64(extract):
> +            mask = extract64(temps[args[1]].mask, args[2], args[3]);
> +            if (args[2] == 0) {
> +                affected = temps[args[1]].mask & ~mask;
> +            }
> +            break;
> +        CASE_OP_32_64(sextract):
> +            mask = sextract64(temps[args[1]].mask, args[2], args[3]);
> +            if (args[2] == 0 && (tcg_target_long)mask >= 0) {
> +                affected = temps[args[1]].mask & ~mask;
> +            }
> +            break;
> +
>          CASE_OP_32_64(or):
>          CASE_OP_32_64(xor):
>              mask = temps[args[1]].mask | temps[args[2]].mask;
> @@ -1048,6 +1061,22 @@ void tcg_optimize(TCGContext *s)
>              }
>              goto do_default;
>
> +        CASE_OP_32_64(extract):
> +            if (temp_is_const(args[1])) {
> +                tmp = extract64(temps[args[1]].val, args[2], args[3]);
> +                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +                break;
> +            }
> +            goto do_default;
> +
> +        CASE_OP_32_64(sextract):
> +            if (temp_is_const(args[1])) {
> +                tmp = sextract64(temps[args[1]].val, args[2], args[3]);
> +                tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +                break;
> +            }
> +            goto do_default;
> +
>          CASE_OP_32_64(setcond):
>              tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
>              if (tmp != 2) {
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index dd032f2..c765d3e 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -69,6 +69,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         1
>  #define TCG_TARGET_HAS_nor_i32          1
>  #define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_mulu2_i32        0
>  #define TCG_TARGET_HAS_muls2_i32        0
> @@ -100,6 +102,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i64         1
>  #define TCG_TARGET_HAS_nor_i64          1
>  #define TCG_TARGET_HAS_deposit_i64      1
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 0c1af24..9583df4 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -66,6 +66,8 @@ typedef enum TCGReg {
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -95,6 +97,8 @@ typedef enum TCGReg {
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index 88f9c90..a212167 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -111,6 +111,8 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      0
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -141,6 +143,8 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 6e2fb35..c185b9c 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -560,6 +560,131 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
>      tcg_temp_free_i32(t1);
>  }
>
> +void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
> +                         unsigned int ofs, unsigned int len)
> +{
> +    tcg_debug_assert(ofs < 32);
> +    tcg_debug_assert(len > 0);
> +    tcg_debug_assert(len <= 32);
> +    tcg_debug_assert(ofs + len <= 32);
> +
> +    /* Canonicalize certain special cases, even if extract is supported.  */
> +    if (ofs + len == 32) {
> +        tcg_gen_shri_i32(ret, arg, 32 - len);
> +        return;
> +    }
> +    if (ofs == 0) {
> +        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
> +        return;
> +    }
> +
> +    if (TCG_TARGET_HAS_extract_i32
> +        && TCG_TARGET_extract_i32_valid(ofs, len)) {
> +        tcg_gen_op4ii_i32(INDEX_op_extract_i32, ret, arg, ofs, len);
> +        return;
> +    }
> +
> +    /* Assume that zero-extension, if available, is cheaper than a shift.  */
> +    switch (ofs + len) {
> +    case 16:
> +        if (TCG_TARGET_HAS_ext16u_i32) {
> +            tcg_gen_ext16u_i32(ret, arg);
> +            tcg_gen_shri_i32(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    case 8:
> +        if (TCG_TARGET_HAS_ext8u_i32) {
> +            tcg_gen_ext8u_i32(ret, arg);
> +            tcg_gen_shri_i32(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    }
> +
> +    /* ??? Ideally we'd know what values are available for immediate AND.
> +       Assume that 8 bits are available, plus the special case of 16,
> +       so that we get ext8u, ext16u.  */
> +    switch (len) {
> +    case 1 ... 8: case 16:
> +        tcg_gen_shri_i32(ret, arg, ofs);
> +        tcg_gen_andi_i32(ret, ret, (1u << len) - 1);
> +        break;
> +    default:
> +        tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
> +        tcg_gen_shri_i32(ret, ret, 32 - len);
> +        break;
> +    }

Hmm is this starting to make a case for backend specific optimisation
passes which have a better idea of the code that can be generated or
exposing a TCG_TARGET_HAS_8IMM_BITS or some such from the backend to the
generators?

> +}
> +
> +void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
> +                          unsigned int ofs, unsigned int len)
> +{
> +    tcg_debug_assert(ofs < 32);
> +    tcg_debug_assert(len > 0);
> +    tcg_debug_assert(len <= 32);
> +    tcg_debug_assert(ofs + len <= 32);
> +
> +    /* Canonicalize certain special cases, even if extract is supported.  */
> +    if (ofs + len == 32) {
> +        tcg_gen_sari_i32(ret, arg, 32 - len);
> +        return;
> +    }
> +    if (ofs == 0) {
> +        switch (len) {
> +        case 16:
> +            tcg_gen_ext16s_i32(ret, arg);
> +            return;
> +        case 8:
> +            tcg_gen_ext8s_i32(ret, arg);
> +            return;
> +        }
> +    }
> +
> +    if (TCG_TARGET_HAS_sextract_i32
> +        && TCG_TARGET_extract_i32_valid(ofs, len)) {
> +        tcg_gen_op4ii_i32(INDEX_op_sextract_i32, ret, arg, ofs, len);
> +        return;
> +    }
> +
> +    /* Assume that sign-extension, if available, is cheaper than a shift.  */
> +    switch (ofs + len) {
> +    case 16:
> +        if (TCG_TARGET_HAS_ext16s_i32) {
> +            tcg_gen_ext16s_i32(ret, arg);
> +            tcg_gen_sari_i32(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    case 8:
> +        if (TCG_TARGET_HAS_ext8s_i32) {
> +            tcg_gen_ext8s_i32(ret, arg);
> +            tcg_gen_sari_i32(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    }
> +    switch (len) {
> +    case 16:
> +        if (TCG_TARGET_HAS_ext16s_i32) {
> +            tcg_gen_shri_i32(ret, arg, ofs);
> +            tcg_gen_ext16s_i32(ret, ret);
> +            return;
> +        }
> +        break;
> +    case 8:
> +        if (TCG_TARGET_HAS_ext8s_i32) {
> +            tcg_gen_shri_i32(ret, arg, ofs);
> +            tcg_gen_ext8s_i32(ret, ret);
> +            return;
> +        }
> +        break;
> +    }
> +
> +    tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
> +    tcg_gen_sari_i32(ret, ret, 32 - len);
> +}
> +
>  void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
>                           TCGv_i32 c2, TCGv_i32 v1, TCGv_i32 v2)
>  {
> @@ -1635,6 +1760,204 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
>      tcg_temp_free_i64(t1);
>  }
>
> +void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
> +                         unsigned int ofs, unsigned int len)
> +{
> +    tcg_debug_assert(ofs < 64);
> +    tcg_debug_assert(len > 0);
> +    tcg_debug_assert(len <= 64);
> +    tcg_debug_assert(ofs + len <= 64);
> +
> +    /* Canonicalize certain special cases, even if extract is supported.  */
> +    if (ofs + len == 64) {
> +        tcg_gen_shri_i64(ret, arg, 64 - len);
> +        return;
> +    }
> +    if (ofs == 0) {
> +        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
> +        return;
> +    }
> +
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        /* Look for a 32-bit extract within one of the two words.  */
> +        if (ofs >= 32) {
> +            tcg_gen_extract_i32(TCGV_LOW(ret), TCGV_HIGH(arg), ofs - 32, len);
> +            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +            return;
> +        }
> +        if (ofs + len <= 32) {
> +            tcg_gen_extract_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
> +            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +            return;
> +        }
> +        /* The field is split across two words.  One double-word
> +           shift is better than two double-word shifts.  */
> +        goto do_shift_and;
> +    }
> +
> +    if (TCG_TARGET_HAS_extract_i64
> +        && TCG_TARGET_extract_i64_valid(ofs, len)) {
> +        tcg_gen_op4ii_i64(INDEX_op_extract_i64, ret, arg, ofs, len);
> +        return;
> +    }
> +
> +    /* Assume that zero-extension, if available, is cheaper than a shift.  */
> +    switch (ofs + len) {
> +    case 32:
> +        if (TCG_TARGET_HAS_ext32u_i64) {
> +            tcg_gen_ext32u_i64(ret, arg);
> +            tcg_gen_shri_i64(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    case 16:
> +        if (TCG_TARGET_HAS_ext16u_i64) {
> +            tcg_gen_ext16u_i64(ret, arg);
> +            tcg_gen_shri_i64(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    case 8:
> +        if (TCG_TARGET_HAS_ext8u_i64) {
> +            tcg_gen_ext8u_i64(ret, arg);
> +            tcg_gen_shri_i64(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    }
> +
> +    /* ??? Ideally we'd know what values are available for immediate AND.
> +       Assume that 8 bits are available, plus the special cases of 16 and 32,
> +       so that we get ext8u, ext16u, and ext32u.  */
> +    switch (len) {
> +    case 1 ... 8: case 16: case 32:
> +    do_shift_and:
> +        tcg_gen_shri_i64(ret, arg, ofs);
> +        tcg_gen_andi_i64(ret, ret, (1ull << len) - 1);
> +        break;
> +    default:
> +        tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
> +        tcg_gen_shri_i64(ret, ret, 64 - len);
> +        break;
> +    }
> +}
> +
> +void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
> +                          unsigned int ofs, unsigned int len)
> +{
> +    tcg_debug_assert(ofs < 64);
> +    tcg_debug_assert(len > 0);
> +    tcg_debug_assert(len <= 64);
> +    tcg_debug_assert(ofs + len <= 64);
> +
> +    /* Canonicalize certain special cases, even if sextract is supported.  */
> +    if (ofs + len == 64) {
> +        tcg_gen_sari_i64(ret, arg, 64 - len);
> +        return;
> +    }
> +    if (ofs == 0) {
> +        switch (len) {
> +        case 32:
> +            tcg_gen_ext32s_i64(ret, arg);
> +            return;
> +        case 16:
> +            tcg_gen_ext16s_i64(ret, arg);
> +            return;
> +        case 8:
> +            tcg_gen_ext8s_i64(ret, arg);
> +            return;
> +        }
> +    }
> +
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        /* Look for a 32-bit extract within one of the two words.  */
> +        if (ofs >= 32) {
> +            tcg_gen_sextract_i32(TCGV_LOW(ret), TCGV_HIGH(arg), ofs - 32, len);
> +        } else if (ofs + len <= 32) {
> +            tcg_gen_sextract_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
> +        } else if (ofs == 0) {
> +            tcg_gen_mov_i32(TCGV_LOW(ret), TCGV_LOW(arg));
> +            tcg_gen_sextract_i32(TCGV_HIGH(ret), TCGV_HIGH(arg), 0, len - 32);
> +            return;
> +        } else if (len > 32) {
> +            TCGv_i32 t = tcg_temp_new_i32();
> +            /* Extract the bits for the high word normally.  */
> +            tcg_gen_sextract_i32(t, TCGV_HIGH(arg), ofs + 32, len - 32);
> +            /* Shift the field down for the low part.  */
> +            tcg_gen_shri_i64(ret, arg, ofs);
> +            /* Overwrite the shift into the high part.  */
> +            tcg_gen_mov_i32(TCGV_HIGH(ret), t);
> +            tcg_temp_free_i32(t);
> +            return;
> +        } else {
> +            /* Shift the field down for the low part, such that the
> +               field sits at the MSB.  */
> +            tcg_gen_shri_i64(ret, arg, ofs + len - 32);
> +            /* Shift the field down from the MSB, sign extending.  */
> +            tcg_gen_sari_i32(TCGV_LOW(ret), TCGV_LOW(ret), 32 - len);
> +        }
> +        /* Sign-extend the field from 32 bits.  */
> +        tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
> +        return;
> +    }
> +
> +    if (TCG_TARGET_HAS_sextract_i64
> +        && TCG_TARGET_extract_i64_valid(ofs, len)) {
> +        tcg_gen_op4ii_i64(INDEX_op_sextract_i64, ret, arg, ofs, len);
> +        return;
> +    }
> +
> +    /* Assume that sign-extension, if available, is cheaper than a shift.  */
> +    switch (ofs + len) {
> +    case 32:
> +        if (TCG_TARGET_HAS_ext32s_i64) {
> +            tcg_gen_ext32s_i64(ret, arg);
> +            tcg_gen_sari_i64(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    case 16:
> +        if (TCG_TARGET_HAS_ext16s_i64) {
> +            tcg_gen_ext16s_i64(ret, arg);
> +            tcg_gen_sari_i64(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    case 8:
> +        if (TCG_TARGET_HAS_ext8s_i64) {
> +            tcg_gen_ext8s_i64(ret, arg);
> +            tcg_gen_sari_i64(ret, ret, ofs);
> +            return;
> +        }
> +        break;
> +    }
> +    switch (len) {
> +    case 32:
> +        if (TCG_TARGET_HAS_ext32s_i64) {
> +            tcg_gen_shri_i64(ret, arg, ofs);
> +            tcg_gen_ext32s_i64(ret, ret);
> +            return;
> +        }
> +        break;
> +    case 16:
> +        if (TCG_TARGET_HAS_ext16s_i64) {
> +            tcg_gen_shri_i64(ret, arg, ofs);
> +            tcg_gen_ext16s_i64(ret, ret);
> +            return;
> +        }
> +        break;
> +    case 8:
> +        if (TCG_TARGET_HAS_ext8s_i64) {
> +            tcg_gen_shri_i64(ret, arg, ofs);
> +            tcg_gen_ext8s_i64(ret, ret);
> +            return;
> +        }
> +        break;
> +    }
> +    tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
> +    tcg_gen_sari_i64(ret, ret, 64 - len);
> +}
> +
>  void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
>                           TCGv_i64 c2, TCGv_i64 v1, TCGv_i64 v2)
>  {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 6d044b7..b515e6f 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -292,6 +292,10 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
>  void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
>                           unsigned int ofs, unsigned int len);
> +void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
> +                         unsigned int ofs, unsigned int len);
> +void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
> +                          unsigned int ofs, unsigned int len);
>  void tcg_gen_brcond_i32(TCGCond cond, TCGv_i32 arg1, TCGv_i32 arg2, TCGLabel *);
>  void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *);
>  void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
> @@ -469,6 +473,10 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
>  void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
>                           unsigned int ofs, unsigned int len);
> +void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
> +                         unsigned int ofs, unsigned int len);
> +void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
> +                          unsigned int ofs, unsigned int len);
>  void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *);
>  void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *);
>  void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
> @@ -951,6 +959,8 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
>  #define tcg_gen_rotri_tl tcg_gen_rotri_i64
>  #define tcg_gen_deposit_tl tcg_gen_deposit_i64
> +#define tcg_gen_extract_tl tcg_gen_extract_i64
> +#define tcg_gen_sextract_tl tcg_gen_sextract_i64
>  #define tcg_const_tl tcg_const_i64
>  #define tcg_const_local_tl tcg_const_local_i64
>  #define tcg_gen_movcond_tl tcg_gen_movcond_i64
> @@ -1039,6 +1049,8 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
>  #define tcg_gen_rotri_tl tcg_gen_rotri_i32
>  #define tcg_gen_deposit_tl tcg_gen_deposit_i32
> +#define tcg_gen_extract_tl tcg_gen_extract_i32
> +#define tcg_gen_sextract_tl tcg_gen_sextract_i32
>  #define tcg_const_tl tcg_const_i32
>  #define tcg_const_local_tl tcg_const_local_i32
>  #define tcg_gen_movcond_tl tcg_gen_movcond_i32
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 45528d2..11563ac 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -77,6 +77,8 @@ DEF(sar_i32, 1, 2, 0, 0)
>  DEF(rotl_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
>  DEF(rotr_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
>  DEF(deposit_i32, 1, 2, 2, IMPL(TCG_TARGET_HAS_deposit_i32))
> +DEF(extract_i32, 1, 1, 2, IMPL(TCG_TARGET_HAS_extract_i32))
> +DEF(sextract_i32, 1, 1, 2, IMPL(TCG_TARGET_HAS_sextract_i32))
>
>  DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END)
>
> @@ -139,6 +141,8 @@ DEF(sar_i64, 1, 2, 0, IMPL64)
>  DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
>  DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
> +DEF(extract_i64, 1, 1, 2, IMPL64 | IMPL(TCG_TARGET_HAS_extract_i64))
> +DEF(sextract_i64, 1, 1, 2, IMPL64 | IMPL(TCG_TARGET_HAS_sextract_i64))
>
>  /* size changing ops */
>  DEF(ext_i32_i64, 1, 1, 0, IMPL64)
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index a35e4c4..5fd3733 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -112,6 +112,8 @@ typedef uint64_t TCGRegSet;
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_movcond_i64      0
>  #define TCG_TARGET_HAS_add2_i64         0
>  #define TCG_TARGET_HAS_sub2_i64         0
> @@ -130,6 +132,12 @@ typedef uint64_t TCGRegSet;
>  #ifndef TCG_TARGET_deposit_i64_valid
>  #define TCG_TARGET_deposit_i64_valid(ofs, len) 1
>  #endif
> +#ifndef TCG_TARGET_extract_i32_valid
> +#define TCG_TARGET_extract_i32_valid(ofs, len) 1
> +#endif
> +#ifndef TCG_TARGET_extract_i64_valid
> +#define TCG_TARGET_extract_i64_valid(ofs, len) 1
> +#endif
>
>  /* Only one of DIV or DIV2 should be defined.  */
>  #if defined(TCG_TARGET_HAS_div_i32)
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 868228b..2065042 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -69,6 +69,8 @@
>  #define TCG_TARGET_HAS_ext16u_i32       1
>  #define TCG_TARGET_HAS_andc_i32         0
>  #define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> @@ -88,6 +90,8 @@
>  #define TCG_TARGET_HAS_bswap32_i64      1
>  #define TCG_TARGET_HAS_bswap64_i64      1
>  #define TCG_TARGET_HAS_deposit_i64      1
> +#define TCG_TARGET_HAS_extract_i64      0
> +#define TCG_TARGET_HAS_sextract_i64     0
>  #define TCG_TARGET_HAS_div_i64          0
>  #define TCG_TARGET_HAS_rem_i64          0
>  #define TCG_TARGET_HAS_ext8s_i64        1

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/64] tcg: Minor adjustments to deposit expanders
  2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 02/64] tcg: Minor adjustments to deposit expanders Richard Henderson
@ 2016-12-05 13:18   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-05 13:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Assert that len is not 0.
>
> Since we have asserted that ofs + len <= N, a later
> check for len == N implies that ofs == 0.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg-op.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index c185b9c..b17f03f 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -533,10 +533,11 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
>      TCGv_i32 t1;
>
>      tcg_debug_assert(ofs < 32);
> +    tcg_debug_assert(len > 0);
>      tcg_debug_assert(len <= 32);
>      tcg_debug_assert(ofs + len <= 32);
>
> -    if (ofs == 0 && len == 32) {
> +    if (len == 32) {
>          tcg_gen_mov_i32(ret, arg2);
>          return;
>      }
> @@ -1718,10 +1719,11 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
>      TCGv_i64 t1;
>
>      tcg_debug_assert(ofs < 64);
> +    tcg_debug_assert(len > 0);
>      tcg_debug_assert(len <= 64);
>      tcg_debug_assert(ofs + len <= 64);
>
> -    if (ofs == 0 && len == 64) {
> +    if (len == 64) {
>          tcg_gen_mov_i64(ret, arg2);
>          return;
>      }


Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives
  2016-12-05 13:17   ` Alex Bennée
@ 2016-12-05 15:14     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-05 15:14 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/05/2016 05:17 AM, Alex Bennée wrote:
>> +    /* ??? Ideally we'd know what values are available for immediate AND.
>> +       Assume that 8 bits are available, plus the special case of 16,
>> +       so that we get ext8u, ext16u.  */
>> +    switch (len) {
>> +    case 1 ... 8: case 16:
>> +        tcg_gen_shri_i32(ret, arg, ofs);
>> +        tcg_gen_andi_i32(ret, ret, (1u << len) - 1);
>> +        break;
>> +    default:
>> +        tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
>> +        tcg_gen_shri_i32(ret, ret, 32 - len);
>> +        break;
>> +    }
> 
> Hmm is this starting to make a case for backend specific optimisation
> passes which have a better idea of the code that can be generated or
> exposing a TCG_TARGET_HAS_8IMM_BITS or some such from the backend to the
> generators?

Thanks for the prod.  In theory the information is already available.

  tcg_target_const_match((1u << len) - 1, TCG_TYPE_I32,
                         &tcg_op_defs[INDEX_op_and_i32].args_ct[2]);

That's currently static in tcg.c, but that could be fixed.

There could well be a call for backend-specific passes.  I've been thinking of
the problems surrounding constant generation and reverse-endian stores for a
while now, which also sort of fall into this category.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes Richard Henderson
@ 2016-12-06 12:24   ` Alex Bennée
  2016-12-06 16:36     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 12:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Claudio Fontana


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.h     |  8 ++++----
>  tcg/aarch64/tcg-target.inc.c | 14 ++++++++++++++
>  2 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 410c31b..4a74bd8 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -63,8 +63,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> -#define TCG_TARGET_HAS_extract_i32      0
> -#define TCG_TARGET_HAS_sextract_i32     0
> +#define TCG_TARGET_HAS_extract_i32      1
> +#define TCG_TARGET_HAS_sextract_i32     1
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -95,8 +95,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
> -#define TCG_TARGET_HAS_extract_i64      0
> -#define TCG_TARGET_HAS_sextract_i64     0
> +#define TCG_TARGET_HAS_extract_i64      1
> +#define TCG_TARGET_HAS_sextract_i64     1
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 1939d35..c0e9890 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -1640,6 +1640,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
>          break;
>
> +    case INDEX_op_extract_i64:
> +    case INDEX_op_extract_i32:
> +        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
> +        break;
> +
> +    case INDEX_op_sextract_i64:
> +    case INDEX_op_sextract_i32:
> +        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
> +        break;
> +

This isn't right is it? As I'm reading it extract takes from a
offset+len from the source register to low bits of the destination
register. The Bitfield Move instructions are the other way around,
moving from the low order bits in the source register to an offset+len
in the destination.

>      case INDEX_op_add2_i32:
>          tcg_out_addsub2(s, TCG_TYPE_I32, a0, a1, REG0(2), REG0(3),
>                          (int32_t)args[4], args[5], const_args[4],
> @@ -1785,6 +1795,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>
>      { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
>      { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
> +    { INDEX_op_extract_i32, { "r", "r" } },
> +    { INDEX_op_extract_i64, { "r", "r" } },
> +    { INDEX_op_sextract_i32, { "r", "r" } },
> +    { INDEX_op_sextract_i64, { "r", "r" } },
>
>      { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
>      { INDEX_op_add2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 05/64] tcg/arm: Move isa detection to tcg-target.h
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 05/64] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
@ 2016-12-06 12:34   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 12:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

A slightly expanded commit message to mention why you are moving it
wouldn't go amiss. Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/arm/tcg-target.h     | 36 ++++++++++++++++++++++++++++++++----
>  tcg/arm/tcg-target.inc.c | 41 +----------------------------------------
>  2 files changed, 33 insertions(+), 44 deletions(-)
>
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 8e724be..d1fe12b 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -26,6 +26,37 @@
>  #ifndef ARM_TCG_TARGET_H
>  #define ARM_TCG_TARGET_H
>
> +/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
> +#ifndef __ARM_ARCH
> +# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
> +     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
> +     || defined(__ARM_ARCH_7EM__)
> +#  define __ARM_ARCH 7
> +# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
> +       || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
> +       || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
> +#  define __ARM_ARCH 6
> +# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
> +       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
> +       || defined(__ARM_ARCH_5TEJ__)
> +#  define __ARM_ARCH 5
> +# else
> +#  define __ARM_ARCH 4
> +# endif
> +#endif
> +
> +extern int arm_arch;
> +
> +#if defined(__ARM_ARCH_5T__) \
> +    || defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
> +# define use_armv5t_instructions 1
> +#else
> +# define use_armv5t_instructions use_armv6_instructions
> +#endif
> +
> +#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
> +#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
> +
>  #undef TCG_TARGET_STACK_GROWSUP
>  #define TCG_TARGET_INSN_UNIT_SIZE 4
>  #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
> @@ -79,7 +110,7 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> -#define TCG_TARGET_HAS_deposit_i32      1
> +#define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_extract_i32      0
>  #define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_movcond_i32      1
> @@ -90,9 +121,6 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
>  #define TCG_TARGET_HAS_rem_i32          0
>
> -extern bool tcg_target_deposit_valid(int ofs, int len);
> -#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
> -
>  enum {
>      TCG_AREG0 = TCG_REG_R6,
>  };
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index ffa0d40..1415c27 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -25,36 +25,7 @@
>  #include "elf.h"
>  #include "tcg-be-ldst.h"
>
> -/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
> -#ifndef __ARM_ARCH
> -# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
> -     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
> -     || defined(__ARM_ARCH_7EM__)
> -#  define __ARM_ARCH 7
> -# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
> -       || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
> -       || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
> -#  define __ARM_ARCH 6
> -# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
> -       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
> -       || defined(__ARM_ARCH_5TEJ__)
> -#  define __ARM_ARCH 5
> -# else
> -#  define __ARM_ARCH 4
> -# endif
> -#endif
> -
> -static int arm_arch = __ARM_ARCH;
> -
> -#if defined(__ARM_ARCH_5T__) \
> -    || defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
> -# define use_armv5t_instructions 1
> -#else
> -# define use_armv5t_instructions use_armv6_instructions
> -#endif
> -
> -#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
> -#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
> +int arm_arch = __ARM_ARCH;
>
>  #ifndef use_idiv_instructions
>  bool use_idiv_instructions;
> @@ -730,16 +701,6 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
>      }
>  }
>
> -bool tcg_target_deposit_valid(int ofs, int len)
> -{
> -    /* ??? Without bfi, we could improve over generic code by combining
> -       the right-shift from a non-zero ofs with the orr.  We do run into
> -       problems when rd == rs, and the mask generated from ofs+len doesn't
> -       fit into an immediate.  We would have to be careful not to pessimize
> -       wrt the optimizations performed on the expanded code.  */
> -    return use_armv7_instructions;
> -}
> -
>  static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
>                                     TCGArg a1, int ofs, int len, bool const_a1)
>  {


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 06/64] tcg/arm: Implement field extraction opcodes
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 06/64] tcg/arm: Implement field extraction opcodes Richard Henderson
@ 2016-12-06 16:16   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 16:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.h     |  4 ++--
>  tcg/arm/tcg-target.inc.c | 22 ++++++++++++++++++++++
>  2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index d1fe12b..4e30728 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -111,8 +111,8 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
> -#define TCG_TARGET_HAS_extract_i32      0
> -#define TCG_TARGET_HAS_sextract_i32     0
> +#define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
> +#define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_mulu2_i32        1
>  #define TCG_TARGET_HAS_muls2_i32        1
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 1415c27..6765a9d 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -713,6 +713,20 @@ static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
>                | (ofs << 7) | ((ofs + len - 1) << 16));
>  }
>
> +static inline void tcg_out_extract(TCGContext *s, int cond, TCGReg rd,
> +                                   TCGArg a1, int ofs, int len)
> +{
> +    tcg_out32(s, 0x07e00050 | (cond << 28) | (rd << 12) | a1
> +              | (ofs << 7) | ((len - 1) << 16));
> +}

It would be nice to mention these are ubfx and sbfx in a comment so you
don't need to hand disassemble the opcode.

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> +
> +static inline void tcg_out_sextract(TCGContext *s, int cond, TCGReg rd,
> +                                    TCGArg a1, int ofs, int len)
> +{
> +    tcg_out32(s, 0x07a00050 | (cond << 28) | (rd << 12) | a1
> +              | (ofs << 7) | ((len - 1) << 16));
> +}
> +
>  /* Note that this routine is used for both LDR and LDRH formats, so we do
>     not wish to include an immediate shift at this point.  */
>  static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
> @@ -1894,6 +1908,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_deposit(s, COND_AL, args[0], args[2],
>                          args[3], args[4], const_args[2]);
>          break;
> +    case INDEX_op_extract_i32:
> +        tcg_out_extract(s, COND_AL, args[0], args[1], args[2], args[3]);
> +        break;
> +    case INDEX_op_sextract_i32:
> +        tcg_out_sextract(s, COND_AL, args[0], args[1], args[2], args[3]);
> +        break;
>
>      case INDEX_op_div_i32:
>          tcg_out_sdiv(s, COND_AL, args[0], args[1], args[2]);
> @@ -1976,6 +1996,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_ext16u_i32, { "r", "r" } },
>
>      { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
> +    { INDEX_op_extract_i32, { "r", "r" } },
> +    { INDEX_op_sextract_i32, { "r", "r" } },
>
>      { INDEX_op_div_i32, { "r", "r", "r" } },
>      { INDEX_op_divu_i32, { "r", "r", "r" } },


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond Richard Henderson
@ 2016-12-06 16:22   ` Alex Bennée
  2016-12-06 16:33     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 16:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/optimize.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index f41ed2c..9e26bb7 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1105,6 +1105,21 @@ void tcg_optimize(TCGContext *s)
>                  tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
>                  break;
>              }
> +            if (temp_is_const(args[3]) && temp_is_const(args[4])) {
> +                tcg_target_ulong tv = temps[args[3]].val;
> +                tcg_target_ulong fv = temps[args[4]].val;
> +                TCGCond cond = args[5];
> +                if (fv == 1 && tv == 0) {
> +                    cond = tcg_invert_cond(cond);
> +                } else if (!(tv == 1 && fv == 0)) {
> +                    goto do_default;
> +                }

Why the weird exit early here on an inverted test. Couldn't it just be

                } else if (tv == 1 && fv == 0) {
                    args[3] = cond;
                    op->opc = opc = (opc == INDEX_op_movcond_i32
                                     ? INDEX_op_setcond_i32
                                     : INDEX_op_setcond_i64);
                    nb_iargs = 2;
                }

And fall through to the goto do_default as before?

>              goto do_default;
>
>          case INDEX_op_add2_i32:


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond
  2016-12-06 16:22   ` Alex Bennée
@ 2016-12-06 16:33     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-06 16:33 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/06/2016 08:22 AM, Alex Bennée wrote:
>> > +                if (fv == 1 && tv == 0) {
>> > +                    cond = tcg_invert_cond(cond);
>> > +                } else if (!(tv == 1 && fv == 0)) {
>> > +                    goto do_default;
>> > +                }
> Why the weird exit early here on an inverted test. Couldn't it just be
> 
>                 } else if (tv == 1 && fv == 0) {
>                     args[3] = cond;
>                     op->opc = opc = (opc == INDEX_op_movcond_i32
>                                      ? INDEX_op_setcond_i32
>                                      : INDEX_op_setcond_i64);
>                     nb_iargs = 2;
>                 }
> 
> And fall through to the goto do_default as before?
> 

Not if you want to share the update code with the first case above.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 20/64] tcg: Add markup for output requires new register
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 20/64] tcg: Add markup for output requires new register Richard Henderson
@ 2016-12-06 16:34   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 16:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> This is the same concept as, and same markup as, the
> early clobber markup in gcc.

With the proviso this is way in the (unfamiliar to me) guts of tcg:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg.c | 34 ++++++++++++++++++++++------------
>  tcg/tcg.h |  1 +
>  2 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index aabf94f..27913f0 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1263,6 +1263,10 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
>                      if (*ct_str == '\0')
>                          break;
>                      switch(*ct_str) {
> +                    case '&':
> +                        def->args_ct[i].ct |= TCG_CT_NEWREG;
> +                        ct_str++;
> +                        break;
>                      case 'i':
>                          def->args_ct[i].ct |= TCG_CT_CONST;
>                          ct_str++;
> @@ -2208,7 +2212,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
>                               const TCGOpDef *def, TCGOpcode opc,
>                               const TCGArg *args, TCGLifeData arg_life)
>  {
> -    TCGRegSet allocated_regs;
> +    TCGRegSet i_allocated_regs;
> +    TCGRegSet o_allocated_regs;
>      int i, k, nb_iargs, nb_oargs;
>      TCGReg reg;
>      TCGArg arg;
> @@ -2225,8 +2230,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
>             args + nb_oargs + nb_iargs,
>             sizeof(TCGArg) * def->nb_cargs);
>
> +    tcg_regset_set(i_allocated_regs, s->reserved_regs);
> +    tcg_regset_set(o_allocated_regs, s->reserved_regs);
> +
>      /* satisfy input constraints */
> -    tcg_regset_set(allocated_regs, s->reserved_regs);
>      for(k = 0; k < nb_iargs; k++) {
>          i = def->sorted_args[nb_oargs + k];
>          arg = args[i];
> @@ -2241,7 +2248,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
>              goto iarg_end;
>          }
>
> -        temp_load(s, ts, arg_ct->u.regs, allocated_regs);
> +        temp_load(s, ts, arg_ct->u.regs, i_allocated_regs);
>
>          if (arg_ct->ct & TCG_CT_IALIAS) {
>              if (ts->fixed_reg) {
> @@ -2275,13 +2282,13 @@ static void tcg_reg_alloc_op(TCGContext *s,
>          allocate_in_reg:
>              /* allocate a new register matching the constraint
>                 and move the temporary register into it */
> -            reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs,
> +            reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
>                                  ts->indirect_base);
>              tcg_out_mov(s, ts->type, reg, ts->reg);
>          }
>          new_args[i] = reg;
>          const_args[i] = 0;
> -        tcg_regset_set_reg(allocated_regs, reg);
> +        tcg_regset_set_reg(i_allocated_regs, reg);
>      iarg_end: ;
>      }
>
> @@ -2293,24 +2300,23 @@ static void tcg_reg_alloc_op(TCGContext *s,
>      }
>
>      if (def->flags & TCG_OPF_BB_END) {
> -        tcg_reg_alloc_bb_end(s, allocated_regs);
> +        tcg_reg_alloc_bb_end(s, i_allocated_regs);
>      } else {
>          if (def->flags & TCG_OPF_CALL_CLOBBER) {
>              /* XXX: permit generic clobber register list ? */
>              for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
>                  if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
> -                    tcg_reg_free(s, i, allocated_regs);
> +                    tcg_reg_free(s, i, i_allocated_regs);
>                  }
>              }
>          }
>          if (def->flags & TCG_OPF_SIDE_EFFECTS) {
>              /* sync globals if the op has side effects and might trigger
>                 an exception. */
> -            sync_globals(s, allocated_regs);
> +            sync_globals(s, i_allocated_regs);
>          }
>
>          /* satisfy the output constraints */
> -        tcg_regset_set(allocated_regs, s->reserved_regs);
>          for(k = 0; k < nb_oargs; k++) {
>              i = def->sorted_args[k];
>              arg = args[i];
> @@ -2318,6 +2324,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
>              ts = &s->temps[arg];
>              if (arg_ct->ct & TCG_CT_ALIAS) {
>                  reg = new_args[arg_ct->alias_index];
> +            } else if (arg_ct->ct & TCG_CT_NEWREG) {
> +                reg = tcg_reg_alloc(s, arg_ct->u.regs,
> +                                    i_allocated_regs | o_allocated_regs,
> +                                    ts->indirect_base);
>              } else {
>                  /* if fixed register, we try to use it */
>                  reg = ts->reg;
> @@ -2325,10 +2335,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
>                      tcg_regset_test_reg(arg_ct->u.regs, reg)) {
>                      goto oarg_end;
>                  }
> -                reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs,
> +                reg = tcg_reg_alloc(s, arg_ct->u.regs, o_allocated_regs,
>                                      ts->indirect_base);
>              }
> -            tcg_regset_set_reg(allocated_regs, reg);
> +            tcg_regset_set_reg(o_allocated_regs, reg);
>              /* if a fixed register is used, then a move will be done afterwards */
>              if (!ts->fixed_reg) {
>                  if (ts->val_type == TEMP_VAL_REG) {
> @@ -2357,7 +2367,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
>              tcg_out_mov(s, ts->type, ts->reg, reg);
>          }
>          if (NEED_SYNC_ARG(i)) {
> -            temp_sync(s, ts, allocated_regs, IS_DEAD_ARG(i));
> +            temp_sync(s, ts, o_allocated_regs, IS_DEAD_ARG(i));
>          } else if (IS_DEAD_ARG(i)) {
>              temp_dead(s, ts);
>          }
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 5fd3733..ebfcefd 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -851,6 +851,7 @@ void tcg_dump_op_count(FILE *f, fprintf_function cpu_fprintf);
>
>  #define TCG_CT_ALIAS  0x80
>  #define TCG_CT_IALIAS 0x40
> +#define TCG_CT_NEWREG 0x20 /* output requires a new register */
>  #define TCG_CT_REG    0x01
>  #define TCG_CT_CONST  0x02 /* any constant of register size */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes
  2016-12-06 12:24   ` Alex Bennée
@ 2016-12-06 16:36     ` Richard Henderson
  2016-12-09 15:41       ` Alex Bennée
  0 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-12-06 16:36 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Claudio Fontana

On 12/06/2016 04:24 AM, Alex Bennée wrote:
>> > +    case INDEX_op_extract_i64:
>> > +    case INDEX_op_extract_i32:
>> > +        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
>> > +        break;
>> > +
>> > +    case INDEX_op_sextract_i64:
>> > +    case INDEX_op_sextract_i32:
>> > +        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
>> > +        break;
>> > +
> This isn't right is it? As I'm reading it extract takes from a
> offset+len from the source register to low bits of the destination
> register. The Bitfield Move instructions are the other way around,
> moving from the low order bits in the source register to an offset+len
> in the destination.
> 

It is right.  Extract is written as ofs/len in assembly, but encoded as lsb/msb
in the opcode -- just like bitfield move.

Boot an armv7 guest and there should be enough uses to convince you.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 21/64] tcg: Transition flat op_defs array to a target callback
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 21/64] tcg: Transition flat op_defs array to a target callback Richard Henderson
@ 2016-12-06 16:38   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 16:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> This will allow the target to tailor the constraints to the
> auto-detected ISA extensions.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.inc.c | 14 ++++++--
>  tcg/arm/tcg-target.inc.c     | 14 ++++++--
>  tcg/i386/tcg-target.inc.c    | 14 ++++++--
>  tcg/ia64/tcg-target.inc.c    | 14 ++++++--
>  tcg/mips/tcg-target.inc.c    | 14 ++++++--
>  tcg/ppc/tcg-target.inc.c     | 14 ++++++--
>  tcg/s390/tcg-target.inc.c    | 14 ++++++--
>  tcg/sparc/tcg-target.inc.c   | 14 ++++++--
>  tcg/tcg.c                    | 86 +++++++++++++++-----------------------------
>  tcg/tcg.h                    |  2 --
>  tcg/tci/tcg-target.inc.c     | 13 ++++++-
>  11 files changed, 136 insertions(+), 77 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index c0e9890..416db45 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -1812,6 +1812,18 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(aarch64_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (aarch64_op_defs[i].op == op) {
> +            return &aarch64_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static void tcg_target_init(TCGContext *s)
>  {
>      tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
> @@ -1834,8 +1846,6 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
> -
> -    tcg_add_target_add_op_defs(aarch64_op_defs);
>  }
>
>  /* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 6765a9d..4500ca7 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -2006,6 +2006,18 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(arm_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (arm_op_defs[i].op == op) {
> +            return &arm_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static void tcg_target_init(TCGContext *s)
>  {
>      /* Only probe for the platform and capabilities if we havn't already
> @@ -2036,8 +2048,6 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_PC);
> -
> -    tcg_add_target_add_op_defs(arm_op_defs);
>  }
>
>  static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 39f62bd..595c399 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -2330,6 +2330,18 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(x86_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (x86_op_defs[i].op == op) {
> +            return &x86_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static int tcg_target_callee_save_regs[] = {
>  #if TCG_TARGET_REG_BITS == 64
>      TCG_REG_RBP,
> @@ -2471,8 +2483,6 @@ static void tcg_target_init(TCGContext *s)
>
>      tcg_regset_clear(s->reserved_regs);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
> -
> -    tcg_add_target_add_op_defs(x86_op_defs);
>  }
>
>  typedef struct {
> diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
> index b04d716..e4d419d 100644
> --- a/tcg/ia64/tcg-target.inc.c
> +++ b/tcg/ia64/tcg-target.inc.c
> @@ -2352,6 +2352,18 @@ static const TCGTargetOpDef ia64_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(ia64_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (ia64_op_defs[i].op == op) {
> +            return &ia64_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  /* Generate global QEMU prologue and epilogue code */
>  static void tcg_target_qemu_prologue(TCGContext *s)
>  {
> @@ -2471,6 +2483,4 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_R5);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_R6);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_R7);
> -
> -    tcg_add_target_add_op_defs(ia64_op_defs);
>  }
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 1ecae08..7758b6d 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -1770,6 +1770,18 @@ static const TCGTargetOpDef mips_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(mips_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (mips_op_defs[i].op == op) {
> +            return &mips_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static int tcg_target_callee_save_regs[] = {
>      TCG_REG_S0,       /* used for the global env (TCG_AREG0) */
>      TCG_REG_S1,
> @@ -1930,8 +1942,6 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA);   /* return address */
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);   /* stack pointer */
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);   /* global pointer */
> -
> -    tcg_add_target_add_op_defs(mips_op_defs);
>  }
>
>  void tb_set_jmp_target1(uintptr_t jmp_addr, uintptr_t addr)
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 7ec54a2..a1b7412 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -2634,6 +2634,18 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(ppc_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (ppc_op_defs[i].op == op) {
> +            return &ppc_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static void tcg_target_init(TCGContext *s)
>  {
>      unsigned long hwcap = qemu_getauxval(AT_HWCAP);
> @@ -2670,8 +2682,6 @@ static void tcg_target_init(TCGContext *s)
>      if (USE_REG_RA) {
>          tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA);  /* return addr */
>      }
> -
> -    tcg_add_target_add_op_defs(ppc_op_defs);
>  }
>
>  #ifdef __ELF__
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index f4c510e..3cb34eb 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -2321,6 +2321,18 @@ static const TCGTargetOpDef s390_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(s390_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (s390_op_defs[i].op == op) {
> +            return &s390_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static void query_s390_facilities(void)
>  {
>      unsigned long hwcap = qemu_getauxval(AT_HWCAP);
> @@ -2363,8 +2375,6 @@ static void tcg_target_init(TCGContext *s)
>      /* XXX many insns can't be used with R0, so we better avoid it for now */
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_R0);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
> -
> -    tcg_add_target_add_op_defs(s390_op_defs);
>  }
>
>  #define FRAME_SIZE  ((int)(TCG_TARGET_CALL_STACK_OFFSET          \
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 700c434..f2cbf50 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -1583,6 +1583,18 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(sparc_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (sparc_op_defs[i].op == op) {
> +            return &sparc_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static void tcg_target_init(TCGContext *s)
>  {
>      /* Only probe for the platform and capabilities if we havn't already
> @@ -1622,8 +1634,6 @@ static void tcg_target_init(TCGContext *s)
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6); /* stack pointer */
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_T1); /* for internal use */
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_T2); /* for internal use */
> -
> -    tcg_add_target_add_op_defs(sparc_op_defs);
>  }
>
>  #if SPARC64
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 27913f0..5792c1e 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -62,6 +62,7 @@
>  /* Forward declarations for functions declared in tcg-target.inc.c and
>     used here. */
>  static void tcg_target_init(TCGContext *s);
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode);
>  static void tcg_target_qemu_prologue(TCGContext *s);
>  static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>                          intptr_t value, intptr_t addend);
> @@ -319,6 +320,7 @@ static const TCGHelperInfo all_helpers[] = {
>  };
>
>  static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)];
> +static void process_op_defs(TCGContext *s);
>
>  void tcg_context_init(TCGContext *s)
>  {
> @@ -362,6 +364,7 @@ void tcg_context_init(TCGContext *s)
>      }
>
>      tcg_target_init(s);
> +    process_op_defs(s);
>
>      /* Reverse the order of the saved registers, assuming they're all at
>         the start of tcg_target_reg_alloc_order.  */
> @@ -1221,29 +1224,33 @@ static void sort_constraints(TCGOpDef *def, int start, int n)
>      }
>  }
>
> -void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
> +static void process_op_defs(TCGContext *s)
>  {
>      TCGOpcode op;
> -    TCGOpDef *def;
> -    const char *ct_str;
> -    int i, nb_args;
>
> -    for(;;) {
> -        if (tdefs->op == (TCGOpcode)-1)
> -            break;
> -        op = tdefs->op;
> -        tcg_debug_assert((unsigned)op < NB_OPS);
> -        def = &tcg_op_defs[op];
> -#if defined(CONFIG_DEBUG_TCG)
> -        /* Duplicate entry in op definitions? */
> -        tcg_debug_assert(!def->used);
> -        def->used = 1;
> -#endif
> +    for (op = 0; op < NB_OPS; op++) {
> +        TCGOpDef *def = &tcg_op_defs[op];
> +        const TCGTargetOpDef *tdefs;
> +        int i, nb_args, ok;
> +
> +        if (def->flags & TCG_OPF_NOT_PRESENT) {
> +            continue;
> +        }
> +
>          nb_args = def->nb_iargs + def->nb_oargs;
> -        for(i = 0; i < nb_args; i++) {
> -            ct_str = tdefs->args_ct_str[i];
> -            /* Incomplete TCGTargetOpDef entry? */
> +        if (nb_args == 0) {
> +            continue;
> +        }
> +
> +        tdefs = tcg_target_op_def(op);
> +        /* Missing TCGTargetOpDef entry. */
> +        tcg_debug_assert(tdefs != NULL);
> +
> +        for (i = 0; i < nb_args; i++) {
> +            const char *ct_str = tdefs->args_ct_str[i];
> +            /* Incomplete TCGTargetOpDef entry. */
>              tcg_debug_assert(ct_str != NULL);
> +
>              tcg_regset_clear(def->args_ct[i].u.regs);
>              def->args_ct[i].ct = 0;
>              if (ct_str[0] >= '0' && ct_str[0] <= '9') {
> @@ -1272,11 +1279,9 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
>                          ct_str++;
>                          break;
>                      default:
> -                        if (target_parse_constraint(&def->args_ct[i], &ct_str) < 0) {
> -                            fprintf(stderr, "Invalid constraint '%s' for arg %d of operation '%s'\n",
> -                                    ct_str, i, def->name);
> -                            exit(1);
> -                        }
> +                        ok = target_parse_constraint(&def->args_ct[i], &ct_str);
> +                        /* Typo in TCGTargetOpDef constraint. */
> +                        tcg_debug_assert(ok == 0);
>                      }
>                  }
>              }
> @@ -1288,42 +1293,7 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs)
>          /* sort the constraints (XXX: this is just an heuristic) */
>          sort_constraints(def, 0, def->nb_oargs);
>          sort_constraints(def, def->nb_oargs, def->nb_iargs);
> -
> -#if 0
> -        {
> -            int i;
> -
> -            printf("%s: sorted=", def->name);
> -            for(i = 0; i < def->nb_oargs + def->nb_iargs; i++)
> -                printf(" %d", def->sorted_args[i]);
> -            printf("\n");
> -        }
> -#endif
> -        tdefs++;
> -    }
> -
> -#if defined(CONFIG_DEBUG_TCG)
> -    i = 0;
> -    for (op = 0; op < tcg_op_defs_max; op++) {
> -        const TCGOpDef *def = &tcg_op_defs[op];
> -        if (def->flags & TCG_OPF_NOT_PRESENT) {
> -            /* Wrong entry in op definitions? */
> -            if (def->used) {
> -                fprintf(stderr, "Invalid op definition for %s\n", def->name);
> -                i = 1;
> -            }
> -        } else {
> -            /* Missing entry in op definitions? */
> -            if (!def->used) {
> -                fprintf(stderr, "Missing op definition for %s\n", def->name);
> -                i = 1;
> -            }
> -        }
> -    }
> -    if (i == 1) {
> -        tcg_abort();
>      }
> -#endif
>  }
>
>  void tcg_op_remove(TCGContext *s, TCGOp *op)
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index ebfcefd..144bdab 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -906,8 +906,6 @@ do {\
>      abort();\
>  } while (0)
>
> -void tcg_add_target_add_op_defs(const TCGTargetOpDef *tdefs);
> -
>  #if UINTPTR_MAX == UINT32_MAX
>  #define TCGV_NAT_TO_PTR(n) MAKE_TCGV_PTR(GET_TCGV_I32(n))
>  #define TCGV_PTR_TO_NAT(n) MAKE_TCGV_I32(GET_TCGV_PTR(n))
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 9dbf4d5..42d4bd6 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -259,6 +259,18 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
>      { -1 },
>  };
>
> +static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
> +{
> +    int i, n = ARRAY_SIZE(tcg_target_op_defs);
> +
> +    for (i = 0; i < n; ++i) {
> +        if (tcg_target_op_defs[i].op == op) {
> +            return &tcg_target_op_defs[i];
> +        }
> +    }
> +    return NULL;
> +}
> +
>  static const int tcg_target_reg_alloc_order[] = {
>      TCG_REG_R0,
>      TCG_REG_R1,
> @@ -875,7 +887,6 @@ static void tcg_target_init(TCGContext *s)
>
>      tcg_regset_clear(s->reserved_regs);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
> -    tcg_add_target_add_op_defs(tcg_target_op_defs);
>
>      /* We use negative offsets from "sp" so that we can distinguish
>         stores that might pretend to be call arguments.  */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 22/64] tcg: Pass the opcode width to target_parse_constraint
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 22/64] tcg: Pass the opcode width to target_parse_constraint Richard Henderson
@ 2016-12-06 16:43   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-06 16:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> This will let us choose how to interpret a given constraint
> depending on whether the opcode is 32- or 64-bit.  Which will
> let us share more constraint combinations between opcodes.
>
> At the same time, change the interface to return the advanced
> pointer instead of passing it in/out by reference.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/aarch64/tcg-target.inc.c | 15 +++++----------
>  tcg/arm/tcg-target.inc.c     | 15 +++++----------
>  tcg/i386/tcg-target.inc.c    | 14 +++++---------
>  tcg/ia64/tcg-target.inc.c    | 14 +++++---------
>  tcg/mips/tcg-target.inc.c    | 14 +++++---------
>  tcg/ppc/tcg-target.inc.c     | 14 +++++---------
>  tcg/s390/tcg-target.inc.c    | 14 +++++---------
>  tcg/sparc/tcg-target.inc.c   | 14 +++++---------
>  tcg/tcg.c                    | 12 ++++++++----
>  tcg/tci/tcg-target.inc.c     | 12 +++++-------
>  10 files changed, 53 insertions(+), 85 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 416db45..17c0b20 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -115,12 +115,10 @@ static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  #define TCG_CT_CONST_MONE 0x800
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct,
> -                                   const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str = *pct_str;
> -
> -    switch (ct_str[0]) {
> +    switch (*ct_str++) {
>      case 'r':
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set32(ct->u.regs, 0, (1ULL << TCG_TARGET_NB_REGS) - 1);
> @@ -150,12 +148,9 @@ static int target_parse_constraint(TCGArgConstraint *ct,
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  static inline bool is_aimm(uint64_t val)
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 4500ca7..473c170 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -114,12 +114,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  #define TCG_CT_CONST_ZERO 0x800
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str;
> -
> -    ct_str = *pct_str;
> -    switch (ct_str[0]) {
> +    switch (*ct_str++) {
>      case 'I':
>          ct->ct |= TCG_CT_CONST_ARM;
>          break;
> @@ -172,12 +170,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          break;
>
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -
> -    return 0;
> +    return ct_str;
>  }
>
>  static inline uint32_t rotl(uint32_t val, int n)
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 595c399..aa5a248 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -166,12 +166,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  }
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str;
> -
> -    ct_str = *pct_str;
> -    switch(ct_str[0]) {
> +    switch(*ct_str++) {
>      case 'a':
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set_reg(ct->u.regs, TCG_REG_EAX);
> @@ -249,11 +247,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          break;
>
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  /* test if a constant matches the constraint */
> diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
> index e4d419d..bf9a97d 100644
> --- a/tcg/ia64/tcg-target.inc.c
> +++ b/tcg/ia64/tcg-target.inc.c
> @@ -721,12 +721,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>   */
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str;
> -
> -    ct_str = *pct_str;
> -    switch(ct_str[0]) {
> +    switch(*ct_str++) {
>      case 'r':
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set(ct->u.regs, 0xffffffffffffffffull);
> @@ -750,11 +748,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  /* test if a constant matches the constraint */
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 7758b6d..4341ea2 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -167,12 +167,10 @@ static inline bool is_p2m1(tcg_target_long val)
>  }
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str;
> -
> -    ct_str = *pct_str;
> -    switch(ct_str[0]) {
> +    switch(*ct_str++) {
>      case 'r':
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set(ct->u.regs, 0xffffffff);
> @@ -224,11 +222,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  /* test if a constant matches the constraint */
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index a1b7412..bf17161 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -259,12 +259,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  }
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str;
> -
> -    ct_str = *pct_str;
> -    switch (ct_str[0]) {
> +    switch (*ct_str++) {
>      case 'A': case 'B': case 'C': case 'D':
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set_reg(ct->u.regs, 3 + ct_str[0] - 'A');
> @@ -311,11 +309,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  /* test if a constant matches the constraint */
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index 3cb34eb..5275297 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -359,11 +359,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  }
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str = *pct_str;
> -
> -    switch (ct_str[0]) {
> +    switch (*ct_str++) {
>      case 'r':                  /* all registers */
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set32(ct->u.regs, 0, 0xffff);
> @@ -409,12 +408,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -
> -    return 0;
> +    return ct_str;
>  }
>
>  /* Immediates to be used with logical OR.  This is an optimization only,
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index f2cbf50..d1f4c0d 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -319,12 +319,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  }
>
>  /* parse target specific constraints */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str;
> -
> -    ct_str = *pct_str;
> -    switch (ct_str[0]) {
> +    switch (*ct_str++) {
>      case 'r':
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set32(ct->u.regs, 0, 0xffffffff);
> @@ -360,11 +358,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          ct->ct |= TCG_CT_CONST_ZERO;
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  /* test if a constant matches the constraint */
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 5792c1e..8b4dce7 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -96,7 +96,8 @@ static void tcg_register_jit_int(void *buf, size_t size,
>      __attribute__((unused));
>
>  /* Forward declarations for functions declared and used in tcg-target.inc.c. */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str);
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type);
>  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
>                         intptr_t arg2);
>  static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
> @@ -1231,7 +1232,8 @@ static void process_op_defs(TCGContext *s)
>      for (op = 0; op < NB_OPS; op++) {
>          TCGOpDef *def = &tcg_op_defs[op];
>          const TCGTargetOpDef *tdefs;
> -        int i, nb_args, ok;
> +        TCGType type;
> +        int i, nb_args;
>
>          if (def->flags & TCG_OPF_NOT_PRESENT) {
>              continue;
> @@ -1246,6 +1248,7 @@ static void process_op_defs(TCGContext *s)
>          /* Missing TCGTargetOpDef entry. */
>          tcg_debug_assert(tdefs != NULL);
>
> +        type = (def->flags & TCG_OPF_64BIT ? TCG_TYPE_I64 : TCG_TYPE_I32);
>          for (i = 0; i < nb_args; i++) {
>              const char *ct_str = tdefs->args_ct_str[i];
>              /* Incomplete TCGTargetOpDef entry. */
> @@ -1279,9 +1282,10 @@ static void process_op_defs(TCGContext *s)
>                          ct_str++;
>                          break;
>                      default:
> -                        ok = target_parse_constraint(&def->args_ct[i], &ct_str);
> +                        ct_str = target_parse_constraint(&def->args_ct[i],
> +                                                         ct_str, type);
>                          /* Typo in TCGTargetOpDef constraint. */
> -                        tcg_debug_assert(ok == 0);
> +                        tcg_debug_assert(ct_str != NULL);
>                      }
>                  }
>              }
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index 42d4bd6..26ee9b1 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -384,10 +384,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
>  }
>
>  /* Parse target specific constraints. */
> -static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
> +static const char *target_parse_constraint(TCGArgConstraint *ct,
> +                                           const char *ct_str, TCGType type)
>  {
> -    const char *ct_str = *pct_str;
> -    switch (ct_str[0]) {
> +    switch (*ct_str++) {
>      case 'r':
>      case 'L':                   /* qemu_ld constraint */
>      case 'S':                   /* qemu_st constraint */
> @@ -395,11 +395,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          tcg_regset_set32(ct->u.regs, 0, BIT(TCG_TARGET_NB_REGS) - 1);
>          break;
>      default:
> -        return -1;
> +        return NULL;
>      }
> -    ct_str++;
> -    *pct_str = ct_str;
> -    return 0;
> +    return ct_str;
>  }
>
>  #if defined(CONFIG_DEBUG_TCG_INTERPRETER)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant Richard Henderson
@ 2016-12-08 17:19   ` Alex Bennée
  2016-12-08 17:49     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-08 17:19 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> This allows an output operand to match an input operand
> only when the input operand needs a register.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

It's hard to offer anything more than a mechanical review for this as
the constraints aren't intuitive to me (I guess not as a gcc hacker!).

Could we either expand the documentation of constraints in tcg/README
with a summary of the global ones?

Should there be a one to one mapping of textual constraint descriptions
to the TCG_CT_FOO defines? I'm finding it hard to figure out why the
text->bitfield step is needed. Is it something to do with the merging of
generic TCG op constraints with their backend counterparts?

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> ---
>  tcg/tcg.c | 63 ++++++++++++++++++++++++++++++++-------------------------------
>  1 file changed, 32 insertions(+), 31 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 8b4dce7..cb898f1 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1256,37 +1256,37 @@ static void process_op_defs(TCGContext *s)
>
>              tcg_regset_clear(def->args_ct[i].u.regs);
>              def->args_ct[i].ct = 0;
> -            if (ct_str[0] >= '0' && ct_str[0] <= '9') {
> -                int oarg;
> -                oarg = ct_str[0] - '0';
> -                tcg_debug_assert(oarg < def->nb_oargs);
> -                tcg_debug_assert(def->args_ct[oarg].ct & TCG_CT_REG);
> -                /* TCG_CT_ALIAS is for the output arguments. The input
> -                   argument is tagged with TCG_CT_IALIAS. */
> -                def->args_ct[i] = def->args_ct[oarg];
> -                def->args_ct[oarg].ct = TCG_CT_ALIAS;
> -                def->args_ct[oarg].alias_index = i;
> -                def->args_ct[i].ct |= TCG_CT_IALIAS;
> -                def->args_ct[i].alias_index = oarg;
> -            } else {
> -                for(;;) {
> -                    if (*ct_str == '\0')
> -                        break;
> -                    switch(*ct_str) {
> -                    case '&':
> -                        def->args_ct[i].ct |= TCG_CT_NEWREG;
> -                        ct_str++;
> -                        break;
> -                    case 'i':
> -                        def->args_ct[i].ct |= TCG_CT_CONST;
> -                        ct_str++;
> -                        break;
> -                    default:
> -                        ct_str = target_parse_constraint(&def->args_ct[i],
> -                                                         ct_str, type);
> -                        /* Typo in TCGTargetOpDef constraint. */
> -                        tcg_debug_assert(ct_str != NULL);
> +            while (*ct_str != '\0') {
> +                switch(*ct_str) {
> +                case '0' ... '9':
> +                    {
> +                        int oarg = *ct_str - '0';
> +                        tcg_debug_assert(ct_str == tdefs->args_ct_str[i]);
> +                        tcg_debug_assert(oarg < def->nb_oargs);
> +                        tcg_debug_assert(def->args_ct[oarg].ct & TCG_CT_REG);
> +                        /* TCG_CT_ALIAS is for the output arguments.
> +                           The input is tagged with TCG_CT_IALIAS. */
> +                        def->args_ct[i] = def->args_ct[oarg];
> +                        def->args_ct[oarg].ct |= TCG_CT_ALIAS;
> +                        def->args_ct[oarg].alias_index = i;
> +                        def->args_ct[i].ct |= TCG_CT_IALIAS;
> +                        def->args_ct[i].alias_index = oarg;
>                      }
> +                    ct_str++;
> +                    break;
> +                case '&':
> +                    def->args_ct[i].ct |= TCG_CT_NEWREG;
> +                    ct_str++;
> +                    break;
> +                case 'i':
> +                    def->args_ct[i].ct |= TCG_CT_CONST;
> +                    ct_str++;
> +                    break;
> +                default:
> +                    ct_str = target_parse_constraint(&def->args_ct[i],
> +                                                     ct_str, type);
> +                    /* Typo in TCGTargetOpDef constraint. */
> +                    tcg_debug_assert(ct_str != NULL);
>                  }
>              }
>          }
> @@ -2296,7 +2296,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
>              arg = args[i];
>              arg_ct = &def->args_ct[i];
>              ts = &s->temps[arg];
> -            if (arg_ct->ct & TCG_CT_ALIAS) {
> +            if ((arg_ct->ct & TCG_CT_ALIAS)
> +                && !const_args[arg_ct->alias_index]) {
>                  reg = new_args[arg_ct->alias_index];
>              } else if (arg_ct->ct & TCG_CT_NEWREG) {
>                  reg = tcg_reg_alloc(s, arg_ct->u.regs,


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 24/64] tcg: Add clz and ctz opcodes
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 24/64] tcg: Add clz and ctz opcodes Richard Henderson
@ 2016-12-08 17:44   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-08 17:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg-runtime.c            |  20 +++++++
>  tcg/README               |   8 +++
>  tcg/aarch64/tcg-target.h |   4 ++
>  tcg/arm/tcg-target.h     |   2 +
>  tcg/i386/tcg-target.h    |   4 ++
>  tcg/ia64/tcg-target.h    |   4 ++
>  tcg/mips/tcg-target.h    |   2 +
>  tcg/optimize.c           |  36 ++++++++++++
>  tcg/ppc/tcg-target.h     |   4 ++
>  tcg/s390/tcg-target.h    |   4 ++
>  tcg/sparc/tcg-target.h   |   4 ++
>  tcg/tcg-op.c             | 143 +++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op.h             |  16 ++++++
>  tcg/tcg-opc.h            |   4 ++
>  tcg/tcg-runtime.h        |   5 ++
>  tcg/tcg.h                |   2 +
>  tcg/tci/tcg-target.h     |   4 ++
>  17 files changed, 266 insertions(+)
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 9327b6f..eb3bade 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -101,6 +101,26 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
>      return h;
>  }
>
> +uint32_t HELPER(clz_i32)(uint32_t arg, uint32_t zero_val)
> +{
> +    return arg ? clz32(arg) : zero_val;
> +}
> +
> +uint32_t HELPER(ctz_i32)(uint32_t arg, uint32_t zero_val)
> +{
> +    return arg ? ctz32(arg) : zero_val;
> +}
> +
> +uint64_t HELPER(clz_i64)(uint64_t arg, uint64_t zero_val)
> +{
> +    return arg ? clz64(arg) : zero_val;
> +}
> +
> +uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
> +{
> +    return arg ? ctz64(arg) : zero_val;
> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/README b/tcg/README
> index 065d9c2..f5ccf04 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -246,6 +246,14 @@ t0=~(t1|t2)
>
>  t0=t1|~t2
>
> +* clz_i32/i64 t0, t1, t2
> +
> +t0 = t1 ? clz(t1) : t2
> +
> +* ctz_i32/i64 t0, t1, t2
> +
> +t0 = t1 ? ctz(t1) : t2
> +
>  ********* Shifts/Rotates
>
>  * shl_i32/i64 t0, t1, t2
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 4a74bd8..976f493 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -62,6 +62,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i32          1
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -94,6 +96,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i64          1
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     1
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 4e30728..02cc242 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -110,6 +110,8 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index dc19c47..f2d9955 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -93,6 +93,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -125,6 +127,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 8856dc8..9a829ae 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -140,6 +140,10 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         1
>  #define TCG_TARGET_HAS_nand_i64         1
>  #define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_nor_i64          1
>  #define TCG_TARGET_HAS_orc_i32          1
>  #define TCG_TARGET_HAS_orc_i64          1
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index f1c3137..f133684 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -109,6 +109,8 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_rem_i32          1
>  #define TCG_TARGET_HAS_not_i32          1
>  #define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_andc_i32         0
>  #define TCG_TARGET_HAS_orc_i32          0
>  #define TCG_TARGET_HAS_eqv_i32          0
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 9e26bb7..e7ecce4 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -296,6 +296,18 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
>      CASE_OP_32_64(nor):
>          return ~(x | y);
>
> +    case INDEX_op_clz_i32:
> +        return (uint32_t)x ? clz32(x) : y;
> +
> +    case INDEX_op_clz_i64:
> +        return x ? clz64(x) : y;
> +
> +    case INDEX_op_ctz_i32:
> +        return (uint32_t)x ? ctz32(x) : y;
> +
> +    case INDEX_op_ctz_i64:
> +        return x ? ctz64(x) : y;
> +
>      CASE_OP_32_64(ext8s):
>          return (int8_t)x;
>
> @@ -896,6 +908,16 @@ void tcg_optimize(TCGContext *s)
>              mask = temps[args[1]].mask | temps[args[2]].mask;
>              break;
>
> +        case INDEX_op_clz_i32:
> +        case INDEX_op_ctz_i32:
> +            mask = temps[args[2]].mask | 31;
> +            break;
> +
> +        case INDEX_op_clz_i64:
> +        case INDEX_op_ctz_i64:
> +            mask = temps[args[2]].mask | 63;
> +            break;
> +
>          CASE_OP_32_64(setcond):
>          case INDEX_op_setcond2_i32:
>              mask = 1;
> @@ -1052,6 +1074,20 @@ void tcg_optimize(TCGContext *s)
>              }
>              goto do_default;
>
> +        CASE_OP_32_64(clz):
> +        CASE_OP_32_64(ctz):
> +            if (temp_is_const(args[1])) {
> +                TCGArg v = temps[args[1]].val;
> +                if (v != 0) {
> +                    tmp = do_constant_folding(opc, v, 0);
> +                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +                } else {
> +                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
> +                }
> +                break;
> +            }
> +            goto do_default;
> +
>          CASE_OP_32_64(deposit):
>              if (temp_is_const(args[1]) && temp_is_const(args[2])) {
>                  tmp = deposit64(temps[args[1]].val, args[3], args[4],
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index b42c57a..698a599 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -68,6 +68,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i32          1
>  #define TCG_TARGET_HAS_nand_i32         1
>  #define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     0
> @@ -101,6 +103,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i64          1
>  #define TCG_TARGET_HAS_nand_i64         1
>  #define TCG_TARGET_HAS_nor_i64          1
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index e9ac12e..3ac2dc9 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -77,6 +77,8 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_eqv_i32        0
>  #define TCG_TARGET_HAS_nand_i32       0
>  #define TCG_TARGET_HAS_nor_i32        0
> +#define TCG_TARGET_HAS_clz_i32        0
> +#define TCG_TARGET_HAS_ctz_i32        0
>  #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_sextract_i32   0
> @@ -108,6 +110,8 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_eqv_i64        0
>  #define TCG_TARGET_HAS_nand_i64       0
>  #define TCG_TARGET_HAS_nor_i64        0
> +#define TCG_TARGET_HAS_clz_i64        0
> +#define TCG_TARGET_HAS_ctz_i64        0
>  #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_sextract_i64   0
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index a212167..340837a 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -110,6 +110,8 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      0
>  #define TCG_TARGET_HAS_extract_i32      0
>  #define TCG_TARGET_HAS_sextract_i32     0
> @@ -142,6 +144,8 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
>  #define TCG_TARGET_HAS_extract_i64      0
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 1927e53..2b520c1 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -457,6 +457,85 @@ void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>      }
>  }
>
> +void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> +{
> +    if (TCG_TARGET_HAS_clz_i32) {
> +        tcg_gen_op3_i32(INDEX_op_clz_i32, ret, arg1, arg2);
> +    } else if (TCG_TARGET_HAS_clz_i64) {
> +        TCGv_i64 t1 = tcg_temp_new_i64();
> +        TCGv_i64 t2 = tcg_temp_new_i64();
> +        tcg_gen_extu_i32_i64(t1, arg1);
> +        tcg_gen_extu_i32_i64(t2, arg2);
> +        tcg_gen_addi_i64(t2, t2, 32);
> +        tcg_gen_clz_i64(t1, t1, t2);
> +        tcg_gen_extrl_i64_i32(ret, t1);
> +        tcg_temp_free_i64(t1);
> +        tcg_temp_free_i64(t2);
> +        tcg_gen_subi_i32(ret, ret, 32);
> +    } else {
> +        gen_helper_clz_i32(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
> +{
> +    TCGv_i32 t = tcg_const_i32(arg2);
> +    tcg_gen_clz_i32(ret, arg1, t);
> +    tcg_temp_free_i32(t);
> +}
> +
> +void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> +{
> +    if (TCG_TARGET_HAS_ctz_i32) {
> +        tcg_gen_op3_i32(INDEX_op_ctz_i32, ret, arg1, arg2);
> +    } else if (TCG_TARGET_HAS_ctz_i64) {
> +        TCGv_i64 t1 = tcg_temp_new_i64();
> +        TCGv_i64 t2 = tcg_temp_new_i64();
> +        tcg_gen_extu_i32_i64(t1, arg1);
> +        tcg_gen_extu_i32_i64(t2, arg2);
> +        tcg_gen_ctz_i64(t1, t1, t2);
> +        tcg_gen_extrl_i64_i32(ret, t1);
> +        tcg_temp_free_i64(t1);
> +        tcg_temp_free_i64(t2);
> +    } else if (TCG_TARGET_HAS_clz_i32) {
> +        TCGv_i32 t1 = tcg_temp_new_i32();
> +        TCGv_i32 t2 = tcg_temp_new_i32();
> +        tcg_gen_neg_i32(t1, arg1);
> +        tcg_gen_xori_i32(t2, arg2, 31);
> +        tcg_gen_and_i32(t1, t1, arg1);
> +        tcg_gen_clz_i32(ret, t1, t2);
> +        tcg_temp_free_i32(t1);
> +        tcg_temp_free_i32(t2);
> +        tcg_gen_xori_i32(ret, ret, 31);
> +    } else if (TCG_TARGET_HAS_clz_i64) {
> +        TCGv_i32 t1 = tcg_temp_new_i32();
> +        TCGv_i32 t2 = tcg_temp_new_i32();
> +        TCGv_i64 x1 = tcg_temp_new_i64();
> +        TCGv_i64 x2 = tcg_temp_new_i64();
> +        tcg_gen_neg_i32(t1, arg1);
> +        tcg_gen_xori_i32(t2, arg2, 63);
> +        tcg_gen_and_i32(t1, t1, arg1);
> +        tcg_gen_extu_i32_i64(x1, t1);
> +        tcg_gen_extu_i32_i64(x2, t2);
> +        tcg_temp_free_i32(t1);
> +        tcg_temp_free_i32(t2);
> +        tcg_gen_clz_i64(x1, x1, x2);
> +        tcg_gen_extrl_i64_i32(ret, x1);
> +        tcg_temp_free_i64(x1);
> +        tcg_temp_free_i64(x2);
> +        tcg_gen_xori_i32(ret, ret, 63);
> +    } else {
> +        gen_helper_ctz_i32(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
> +{
> +    TCGv_i32 t = tcg_const_i32(arg2);
> +    tcg_gen_ctz_i32(ret, arg1, t);
> +    tcg_temp_free_i32(t);
> +}
> +
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i32) {
> @@ -1703,6 +1782,70 @@ void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      }
>  }
>
> +void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
> +{
> +    if (TCG_TARGET_HAS_clz_i64) {
> +        tcg_gen_op3_i64(INDEX_op_clz_i64, ret, arg1, arg2);
> +    } else {
> +        gen_helper_clz_i64(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
> +{
> +    if (TCG_TARGET_REG_BITS == 32
> +        && TCG_TARGET_HAS_clz_i32
> +        && arg2 <= 0xffffffffu) {
> +        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
> +        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
> +        tcg_gen_addi_i32(t, t, 32);
> +        tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +        tcg_temp_free_i32(t);
> +    } else {
> +        TCGv_i64 t = tcg_const_i64(arg2);
> +        tcg_gen_clz_i64(ret, arg1, t);
> +        tcg_temp_free_i64(t);
> +    }
> +}
> +
> +void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
> +{
> +    if (TCG_TARGET_HAS_ctz_i64) {
> +        tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
> +    } else if (TCG_TARGET_HAS_clz_i64) {
> +        TCGv_i64 t1 = tcg_temp_new_i64();
> +        TCGv_i64 t2 = tcg_temp_new_i64();
> +        tcg_gen_neg_i64(t1, arg1);
> +        tcg_gen_xori_i64(t2, arg2, 63);
> +        tcg_gen_and_i64(t1, t1, arg1);
> +        tcg_gen_clz_i64(ret, t1, t2);
> +        tcg_temp_free_i64(t1);
> +        tcg_temp_free_i64(t2);
> +        tcg_gen_xori_i64(ret, ret, 63);
> +    } else {
> +        gen_helper_ctz_i64(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
> +{
> +    if (TCG_TARGET_REG_BITS == 32
> +        && TCG_TARGET_HAS_ctz_i32
> +        && arg2 <= 0xffffffffu) {
> +        TCGv_i32 t32 = tcg_const_i32((uint32_t)arg2 - 32);
> +        tcg_gen_ctz_i32(t32, TCGV_HIGH(arg1), t32);
> +        tcg_gen_addi_i32(t32, t32, 32);
> +        tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +        tcg_temp_free_i32(t32);
> +    } else {
> +        TCGv_i64 t64 = tcg_const_i64(arg2);
> +        tcg_gen_ctz_i64(ret, arg1, t64);
> +        tcg_temp_free_i64(t64);
> +    }
> +}
> +
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i64) {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index d42fd0d..7a24e84 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -286,6 +286,10 @@ void tcg_gen_eqv_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_nand_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_nor_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
> +void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
>  void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> @@ -469,6 +473,10 @@ void tcg_gen_eqv_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_nand_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_nor_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
> +void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
>  void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> @@ -958,6 +966,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_nand_tl tcg_gen_nand_i64
>  #define tcg_gen_nor_tl tcg_gen_nor_i64
>  #define tcg_gen_orc_tl tcg_gen_orc_i64
> +#define tcg_gen_clz_tl tcg_gen_clz_i64
> +#define tcg_gen_ctz_tl tcg_gen_ctz_i64
> +#define tcg_gen_clzi_tl tcg_gen_clzi_i64
> +#define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i64
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i64
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
> @@ -1049,6 +1061,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_nand_tl tcg_gen_nand_i32
>  #define tcg_gen_nor_tl tcg_gen_nor_i32
>  #define tcg_gen_orc_tl tcg_gen_orc_i32
> +#define tcg_gen_clz_tl tcg_gen_clz_i32
> +#define tcg_gen_ctz_tl tcg_gen_ctz_i32
> +#define tcg_gen_clzi_tl tcg_gen_clzi_i32
> +#define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i32
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i32
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 11563ac..d00db4f 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -104,6 +104,8 @@ DEF(orc_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_i32))
>  DEF(eqv_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_eqv_i32))
>  DEF(nand_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nand_i32))
>  DEF(nor_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nor_i32))
> +DEF(clz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_clz_i32))
> +DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
>
>  DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
>  DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
> @@ -171,6 +173,8 @@ DEF(orc_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_orc_i64))
>  DEF(eqv_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_eqv_i64))
>  DEF(nand_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nand_i64))
>  DEF(nor_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nor_i64))
> +DEF(clz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_clz_i64))
> +DEF(ctz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctz_i64))
>
>  DEF(add2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_add2_i64))
>  DEF(sub2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_sub2_i64))
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 1deb86a..eb1cd76 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -15,6 +15,11 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>
> +DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
>  #ifdef CONFIG_SOFTMMU
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 144bdab..e026282 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -111,6 +111,8 @@ typedef uint64_t TCGRegSet;
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
>  #define TCG_TARGET_HAS_extract_i64      0
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 2065042..0646444 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -74,6 +74,8 @@
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_neg_i32          1
>  #define TCG_TARGET_HAS_not_i32          1
>  #define TCG_TARGET_HAS_orc_i32          0
> @@ -104,6 +106,8 @@
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_neg_i64          1
>  #define TCG_TARGET_HAS_not_i64          1
>  #define TCG_TARGET_HAS_orc_i64          0


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 38/64] target-arm: Use clz opcode
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 38/64] target-arm: " Richard Henderson
@ 2016-12-08 17:47   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-08 17:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target-arm/helper-a64.c    | 10 ----------
>  target-arm/helper-a64.h    |  2 --
>  target-arm/helper.c        |  5 -----
>  target-arm/helper.h        |  1 -
>  target-arm/translate-a64.c |  8 ++++----
>  target-arm/translate.c     |  6 +++---
>  6 files changed, 7 insertions(+), 25 deletions(-)
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 98b97df..77999ff 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -54,11 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
>      return num / den;
>  }
>
> -uint64_t HELPER(clz64)(uint64_t x)
> -{
> -    return clz64(x);
> -}
> -
>  uint64_t HELPER(cls64)(uint64_t x)
>  {
>      return clrsb64(x);
> @@ -69,11 +64,6 @@ uint32_t HELPER(cls32)(uint32_t x)
>      return clrsb32(x);
>  }
>
> -uint32_t HELPER(clz32)(uint32_t x)
> -{
> -    return clz32(x);
> -}
> -
>  uint64_t HELPER(rbit64)(uint64_t x)
>  {
>      return revbit64(x);
> diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
> index dd32000..d320f96 100644
> --- a/target-arm/helper-a64.h
> +++ b/target-arm/helper-a64.h
> @@ -18,10 +18,8 @@
>   */
>  DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
> -DEF_HELPER_FLAGS_1(clz64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
> -DEF_HELPER_FLAGS_1(clz32, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
>  DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index b5b65ca..0cafdbc 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -5718,11 +5718,6 @@ uint32_t HELPER(uxtb16)(uint32_t x)
>      return res;
>  }
>
> -uint32_t HELPER(clz)(uint32_t x)
> -{
> -    return clz32(x);
> -}
> -
>  int32_t HELPER(sdiv)(int32_t num, int32_t den)
>  {
>      if (den == 0)
> diff --git a/target-arm/helper.h b/target-arm/helper.h
> index 84aa637..df86bf7 100644
> --- a/target-arm/helper.h
> +++ b/target-arm/helper.h
> @@ -1,4 +1,3 @@
> -DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(sxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(uxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index e90487b..12621ff 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -3953,11 +3953,11 @@ static void handle_clz(DisasContext *s, unsigned int sf,
>      tcg_rn = cpu_reg(s, rn);
>
>      if (sf) {
> -        gen_helper_clz64(tcg_rd, tcg_rn);
> +        tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
>      } else {
>          TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
>          tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
> -        gen_helper_clz(tcg_tmp32, tcg_tmp32);
> +        tcg_gen_clzi_i32(tcg_tmp32, tcg_tmp32, 32);
>          tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
>          tcg_temp_free_i32(tcg_tmp32);
>      }
> @@ -7590,7 +7590,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
>      switch (opcode) {
>      case 0x4: /* CLS, CLZ */
>          if (u) {
> -            gen_helper_clz64(tcg_rd, tcg_rn);
> +            tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
>          } else {
>              gen_helper_cls64(tcg_rd, tcg_rn);
>          }
> @@ -10260,7 +10260,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
>                      goto do_cmop;
>                  case 0x4: /* CLS */
>                      if (u) {
> -                        gen_helper_clz32(tcg_res, tcg_op);
> +                        tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
>                      } else {
>                          gen_helper_cls32(tcg_res, tcg_op);
>                      }
> diff --git a/target-arm/translate.c b/target-arm/translate.c
> index 08da9ac..c9186b6 100644
> --- a/target-arm/translate.c
> +++ b/target-arm/translate.c
> @@ -7037,7 +7037,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
>                              switch (size) {
>                              case 0: gen_helper_neon_clz_u8(tmp, tmp); break;
>                              case 1: gen_helper_neon_clz_u16(tmp, tmp); break;
> -                            case 2: gen_helper_clz(tmp, tmp); break;
> +                            case 2: tcg_gen_clzi_i32(tmp, tmp, 32); break;
>                              default: abort();
>                              }
>                              break;
> @@ -8219,7 +8219,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
>                  ARCH(5);
>                  rd = (insn >> 12) & 0xf;
>                  tmp = load_reg(s, rm);
> -                gen_helper_clz(tmp, tmp);
> +                tcg_gen_clzi_i32(tmp, tmp, 32);
>                  store_reg(s, rd, tmp);
>              } else {
>                  goto illegal_op;
> @@ -9992,7 +9992,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
>                      tcg_temp_free_i32(tmp2);
>                      break;
>                  case 0x18: /* clz */
> -                    gen_helper_clz(tmp, tmp);
> +                    tcg_gen_clzi_i32(tmp, tmp, 32);
>                      break;
>                  case 0x20:
>                  case 0x21:


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant
  2016-12-08 17:19   ` Alex Bennée
@ 2016-12-08 17:49     ` Richard Henderson
  2016-12-08 20:38       ` Alex Bennée
  0 siblings, 1 reply; 102+ messages in thread
From: Richard Henderson @ 2016-12-08 17:49 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/08/2016 09:19 AM, Alex Bennée wrote:
>
> Richard Henderson <rth@twiddle.net> writes:
>
>> This allows an output operand to match an input operand
>> only when the input operand needs a register.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>
> It's hard to offer anything more than a mechanical review for this as
> the constraints aren't intuitive to me (I guess not as a gcc hacker!).

It's even more confusing for a gcc hacker, as we don't have a concept of 
alternatives.  You get one set of constraints.

> Could we either expand the documentation of constraints in tcg/README
> with a summary of the global ones?

There's only one global constraint: 'i'.  So... sure, but I don't know how much 
that will help.

> Should there be a one to one mapping of textual constraint descriptions
> to the TCG_CT_FOO defines?

After the entire patch set, there is *not* a 1-1 mapping.  I dispense with that 
in the i386 backend.

> I'm finding it hard to figure out why the
> text->bitfield step is needed. Is it something to do with the merging of
> generic TCG op constraints with their backend counterparts?

It isn't really needed.  I've been considering different ways to get rid of 
that step.  Indeed, to take the whole backend and make it more tabular; make it 
verifiable as a compile-time step; make such data as can be read-only.

But what we have is what we have, for now.


r~

>
> Anyway:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 42/64] tcg/arm: Handle ctz and clz opcodes
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 42/64] tcg/arm: " Richard Henderson
@ 2016-12-08 17:56   ` Alex Bennée
  2016-12-08 18:13     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-08 17:56 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.h     |  4 ++--
>  tcg/arm/tcg-target.inc.c | 27 +++++++++++++++++++++++++++
>  2 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 02cc242..4cb94dc 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -110,8 +110,8 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> -#define TCG_TARGET_HAS_clz_i32          0
> -#define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_clz_i32          use_armv5t_instructions
> +#define TCG_TARGET_HAS_ctz_i32          use_armv7_instructions
>  #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index 473c170..2242d21 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -256,6 +256,9 @@ typedef enum {
>      ARITH_BIC = 0xe << 21,
>      ARITH_MVN = 0xf << 21,
>
> +    INSN_CLZ       = 0x016f0f10,
> +    INSN_RBIT      = 0x06ff0f30,
> +
>      INSN_LDR_IMM   = 0x04100000,
>      INSN_LDR_REG   = 0x06100000,
>      INSN_STR_IMM   = 0x04000000,
> @@ -1827,6 +1830,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          }
>          break;
>
> +    case INDEX_op_ctz_i32:
> +        tcg_out_dat_reg(s, COND_AL, INSN_RBIT, TCG_REG_TMP, 0, args[1], 0);
> +        a1 = TCG_REG_TMP;
> +        goto do_clz;
> +
> +    case INDEX_op_clz_i32:
> +        a1 = args[1];
> +    do_clz:
> +        a0 = args[0];
> +        a2 = args[2];
> +        c = const_args[2];
> +        if (c && a2 == 32) {
> +            tcg_out_dat_reg(s, COND_AL, INSN_CLZ, a0, 0, a1, 0);
> +            break;
> +        }

Why the early break instead of else leg?

> +        tcg_out_dat_imm(s, COND_AL, ARITH_CMP, 0, a1, 0);
> +        tcg_out_dat_reg(s, COND_NE, INSN_CLZ, a0, 0, a1, 0);
> +        if (c || a0 != a2) {
> +            tcg_out_dat_rIK(s, COND_EQ, ARITH_MOV, ARITH_MVN, a0, 0, a2, c);
> +        }
> +        break;
> +
>      case INDEX_op_brcond_i32:
>          tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
>                         args[0], args[1], const_args[1]);
> @@ -1961,6 +1986,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_sar_i32, { "r", "r", "ri" } },
>      { INDEX_op_rotl_i32, { "r", "r", "ri" } },
>      { INDEX_op_rotr_i32, { "r", "r", "ri" } },
> +    { INDEX_op_clz_i32, { "r", "r", "rIK" } },
> +    { INDEX_op_ctz_i32, { "r", "r", "rIK" } },
>
>      { INDEX_op_brcond_i32, { "r", "rIN" } },
>      { INDEX_op_setcond_i32, { "r", "r", "rIN" } },

Otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 42/64] tcg/arm: Handle ctz and clz opcodes
  2016-12-08 17:56   ` Alex Bennée
@ 2016-12-08 18:13     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-08 18:13 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/08/2016 09:56 AM, Alex Bennée wrote:
>> +        if (c && a2 == 32) {
>> +            tcg_out_dat_reg(s, COND_AL, INSN_CLZ, a0, 0, a1, 0);
>> +            break;
>> +        }
>
> Why the early break instead of else leg?
>
>> +        tcg_out_dat_imm(s, COND_AL, ARITH_CMP, 0, a1, 0);
>> +        tcg_out_dat_reg(s, COND_NE, INSN_CLZ, a0, 0, a1, 0);
>> +        if (c || a0 != a2) {
>> +            tcg_out_dat_rIK(s, COND_EQ, ARITH_MOV, ARITH_MVN, a0, 0, a2, c);

It keeps this line under 80 columns, is all.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant
  2016-12-08 17:49     ` Richard Henderson
@ 2016-12-08 20:38       ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-08 20:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> On 12/08/2016 09:19 AM, Alex Bennée wrote:
>>
>> Richard Henderson <rth@twiddle.net> writes:
>>
>>> This allows an output operand to match an input operand
>>> only when the input operand needs a register.
>>>
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>
>> It's hard to offer anything more than a mechanical review for this as
>> the constraints aren't intuitive to me (I guess not as a gcc hacker!).
>
> It's even more confusing for a gcc hacker, as we don't have a concept of
> alternatives.  You get one set of constraints.
>
>> Could we either expand the documentation of constraints in tcg/README
>> with a summary of the global ones?
>
> There's only one global constraint: 'i'.  So... sure, but I don't know how much
> that will help.

So what do 0..9 and & mean? Or do you mean they are only constraints
applied by the backend?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 50/64] tcg: Add helpers for clrsb
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 50/64] tcg: Add helpers for clrsb Richard Henderson
@ 2016-12-09  9:51   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-09  9:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> The number of actual invocations does not warrent an opcode,
> and the backends generating it.  But at least we can eliminate
> redundant helpers.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg-runtime.c     | 10 ++++++++++
>  tcg/tcg-op.c      | 28 ++++++++++++++++++++++++++++
>  tcg/tcg-op.h      |  4 ++++
>  tcg/tcg-runtime.h |  2 ++
>  4 files changed, 44 insertions(+)
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index eb3bade..c8b98df 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -121,6 +121,16 @@ uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
>      return arg ? ctz64(arg) : zero_val;
>  }
>
> +uint32_t HELPER(clrsb_i32)(uint32_t arg)
> +{
> +    return clrsb32(arg);
> +}
> +
> +uint64_t HELPER(clrsb_i64)(uint64_t arg)
> +{
> +    return clrsb64(arg);
> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 2b520c1..620e268 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -536,6 +536,20 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
>      tcg_temp_free_i32(t);
>  }
>
> +void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
> +{
> +    if (TCG_TARGET_HAS_clz_i32) {
> +        TCGv_i32 t = tcg_temp_new_i32();
> +        tcg_gen_sari_i32(t, arg, 31);
> +        tcg_gen_xor_i32(t, t, arg);
> +        tcg_gen_clzi_i32(t, t, 32);
> +        tcg_gen_subi_i32(ret, t, 1);
> +        tcg_temp_free_i32(t);
> +    } else {
> +        gen_helper_clrsb_i32(ret, arg);
> +    }
> +}
> +
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i32) {
> @@ -1846,6 +1860,20 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
>      }
>  }
>
> +void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg)
> +{
> +    if (TCG_TARGET_HAS_clz_i64 || TCG_TARGET_HAS_clz_i32) {
> +        TCGv_i64 t = tcg_temp_new_i64();
> +        tcg_gen_sari_i64(t, arg, 63);
> +        tcg_gen_xor_i64(t, t, arg);
> +        tcg_gen_clzi_i64(t, t, 64);
> +        tcg_gen_subi_i64(ret, t, 1);
> +        tcg_temp_free_i64(t);
> +    } else {
> +        gen_helper_clrsb_i64(ret, arg);
> +    }
> +}
> +
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i64) {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 7a24e84..c2f3db9 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -290,6 +290,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
>  void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
> +void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
>  void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> @@ -477,6 +478,7 @@ void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
>  void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
> +void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
>  void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> @@ -970,6 +972,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_ctz_tl tcg_gen_ctz_i64
>  #define tcg_gen_clzi_tl tcg_gen_clzi_i64
>  #define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
> +#define tcg_gen_clrsb_tl tcg_gen_clrsb_i64
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i64
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i64
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
> @@ -1065,6 +1068,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_ctz_tl tcg_gen_ctz_i32
>  #define tcg_gen_clzi_tl tcg_gen_clzi_i32
>  #define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
> +#define tcg_gen_clrsb_tl tcg_gen_clrsb_i32
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i32
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i32
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index eb1cd76..0d30f1a 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -19,6 +19,8 @@ DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
>  DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
>  DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +DEF_HELPER_FLAGS_1(clrsb_i32, TCG_CALL_NO_RWG_SE, i32, i32)
> +DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 51/64] target-arm: Use clrsb helper
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 51/64] target-arm: Use clrsb helper Richard Henderson
@ 2016-12-09  9:52   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-09  9:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  target-arm/helper-a64.c    | 10 ----------
>  target-arm/helper-a64.h    |  2 --
>  target-arm/translate-a64.c |  8 ++++----
>  3 files changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 77999ff..d9df82c 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -54,16 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
>      return num / den;
>  }
>
> -uint64_t HELPER(cls64)(uint64_t x)
> -{
> -    return clrsb64(x);
> -}
> -
> -uint32_t HELPER(cls32)(uint32_t x)
> -{
> -    return clrsb32(x);
> -}
> -
>  uint64_t HELPER(rbit64)(uint64_t x)
>  {
>      return revbit64(x);
> diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
> index d320f96..6f9eaba 100644
> --- a/target-arm/helper-a64.h
> +++ b/target-arm/helper-a64.h
> @@ -18,8 +18,6 @@
>   */
>  DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
> -DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
> -DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
>  DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index 12621ff..f73d63b 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -3971,11 +3971,11 @@ static void handle_cls(DisasContext *s, unsigned int sf,
>      tcg_rn = cpu_reg(s, rn);
>
>      if (sf) {
> -        gen_helper_cls64(tcg_rd, tcg_rn);
> +        tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
>      } else {
>          TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
>          tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
> -        gen_helper_cls32(tcg_tmp32, tcg_tmp32);
> +        tcg_gen_clrsb_i32(tcg_tmp32, tcg_tmp32);
>          tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
>          tcg_temp_free_i32(tcg_tmp32);
>      }
> @@ -7592,7 +7592,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
>          if (u) {
>              tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
>          } else {
> -            gen_helper_cls64(tcg_rd, tcg_rn);
> +            tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
>          }
>          break;
>      case 0x5: /* NOT */
> @@ -10262,7 +10262,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
>                      if (u) {
>                          tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
>                      } else {
> -                        gen_helper_cls32(tcg_res, tcg_op);
> +                        tcg_gen_clrsb_i32(tcg_res, tcg_op);
>                      }
>                      break;
>                  case 0x7: /* SQABS, SQNEG */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 54/64] tcg: Add opcode for ctpop
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 54/64] tcg: Add opcode for ctpop Richard Henderson
@ 2016-12-09  9:57   ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-09  9:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> The number of actual invocations of ctpop itself does not warrent
> an opcode, but it is very helpful for POWER7 to use in generating
> an expansion for ctz.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg-runtime.c            | 10 ++++++++++
>  tcg/aarch64/tcg-target.h |  2 ++
>  tcg/arm/tcg-target.h     |  1 +
>  tcg/i386/tcg-target.h    |  2 ++
>  tcg/ia64/tcg-target.h    |  2 ++
>  tcg/mips/tcg-target.h    |  1 +
>  tcg/optimize.c           | 14 ++++++++++++++
>  tcg/ppc/tcg-target.h     |  2 ++
>  tcg/s390/tcg-target.h    |  2 ++
>  tcg/sparc/tcg-target.h   |  2 ++
>  tcg/tcg-op.c             | 29 +++++++++++++++++++++++++++++
>  tcg/tcg-op.h             |  4 ++++
>  tcg/tcg-opc.h            |  2 ++
>  tcg/tcg-runtime.h        |  2 ++
>  tcg/tcg.h                |  1 +
>  tcg/tci/tcg-target.h     |  2 ++
>  16 files changed, 78 insertions(+)
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index c8b98df..4c60c96 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -131,6 +131,16 @@ uint64_t HELPER(clrsb_i64)(uint64_t arg)
>      return clrsb64(arg);
>  }
>
> +uint32_t HELPER(ctpop_i32)(uint32_t arg)
> +{
> +    return ctpop32(arg);
> +}
> +
> +uint64_t HELPER(ctpop_i64)(uint64_t arg)
> +{
> +    return ctpop64(arg);
> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 9d6b00f..1a5ea23 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -64,6 +64,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_clz_i32          1
>  #define TCG_TARGET_HAS_ctz_i32          1
> +#define TCG_TARGET_HAS_ctpop_i32        0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -98,6 +99,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_clz_i64          1
>  #define TCG_TARGET_HAS_ctz_i64          1
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     1
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 4cb94dc..09a19c6 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -112,6 +112,7 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_clz_i32          use_armv5t_instructions
>  #define TCG_TARGET_HAS_ctz_i32          use_armv7_instructions
> +#define TCG_TARGET_HAS_ctpop_i32        0
>  #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 8fff287..b8f73f5 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -95,6 +95,7 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_clz_i32          1
>  #define TCG_TARGET_HAS_ctz_i32          1
> +#define TCG_TARGET_HAS_ctpop_i32        0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -129,6 +130,7 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_clz_i64          1
>  #define TCG_TARGET_HAS_ctz_i64          1
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 9a829ae..42aea03 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -144,6 +144,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_clz_i64          0
>  #define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_ctz_i64          0
> +#define TCG_TARGET_HAS_ctpop_i32        0
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_nor_i64          1
>  #define TCG_TARGET_HAS_orc_i32          1
>  #define TCG_TARGET_HAS_orc_i64          1
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index 0526018..aa7c2b2 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -130,6 +130,7 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
>  #define TCG_TARGET_HAS_clz_i32          use_mips32r2_instructions
>  #define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_ctpop_i32        0
>
>  /* optional instructions automatically implemented */
>  #define TCG_TARGET_HAS_neg_i32          0 /* sub  rd, zero, rt   */
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index e7ecce4..adfc56c 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -308,6 +308,12 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
>      case INDEX_op_ctz_i64:
>          return x ? ctz64(x) : y;
>
> +    case INDEX_op_ctpop_i32:
> +        return ctpop32(x);
> +
> +    case INDEX_op_ctpop_i64:
> +        return ctpop64(x);
> +
>      CASE_OP_32_64(ext8s):
>          return (int8_t)x;
>
> @@ -918,6 +924,13 @@ void tcg_optimize(TCGContext *s)
>              mask = temps[args[2]].mask | 63;
>              break;
>
> +        case INDEX_op_ctpop_i32:
> +            mask = 32 | 31;
> +            break;
> +        case INDEX_op_ctpop_i64:
> +            mask = 64 | 63;
> +            break;
> +
>          CASE_OP_32_64(setcond):
>          case INDEX_op_setcond2_i32:
>              mask = 1;
> @@ -1031,6 +1044,7 @@ void tcg_optimize(TCGContext *s)
>          CASE_OP_32_64(ext8u):
>          CASE_OP_32_64(ext16s):
>          CASE_OP_32_64(ext16u):
> +        CASE_OP_32_64(ctpop):
>          case INDEX_op_ext32s_i64:
>          case INDEX_op_ext32u_i64:
>          case INDEX_op_ext_i32_i64:
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index c798c9c..57e66cf 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -72,6 +72,7 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_nor_i32          1
>  #define TCG_TARGET_HAS_clz_i32          1
>  #define TCG_TARGET_HAS_ctz_i32          have_isa_3_00
> +#define TCG_TARGET_HAS_ctpop_i32        0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     0
> @@ -107,6 +108,7 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_nor_i64          1
>  #define TCG_TARGET_HAS_clz_i64          1
>  #define TCG_TARGET_HAS_ctz_i64          have_isa_3_00
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index 22500ba..cbdd2a6 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -79,6 +79,7 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_nor_i32        0
>  #define TCG_TARGET_HAS_clz_i32        0
>  #define TCG_TARGET_HAS_ctz_i32        0
> +#define TCG_TARGET_HAS_ctpop_i32      0
>  #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_sextract_i32   0
> @@ -112,6 +113,7 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_nor_i64        0
>  #define TCG_TARGET_HAS_clz_i64        (s390_facilities & FACILITY_EXT_IMM)
>  #define TCG_TARGET_HAS_ctz_i64        0
> +#define TCG_TARGET_HAS_ctpop_i64      0
>  #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_sextract_i64   0
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index 340837a..b8b74f96f 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -112,6 +112,7 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_clz_i32          0
>  #define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_ctpop_i32        0
>  #define TCG_TARGET_HAS_deposit_i32      0
>  #define TCG_TARGET_HAS_extract_i32      0
>  #define TCG_TARGET_HAS_sextract_i32     0
> @@ -146,6 +147,7 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_clz_i64          0
>  #define TCG_TARGET_HAS_ctz_i64          0
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_deposit_i64      0
>  #define TCG_TARGET_HAS_extract_i64      0
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 620e268..6f4b1b6 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -550,6 +550,21 @@ void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
>      }
>  }
>
> +void tcg_gen_ctpop_i32(TCGv_i32 ret, TCGv_i32 arg1)
> +{
> +    if (TCG_TARGET_HAS_ctpop_i32) {
> +        tcg_gen_op2_i32(INDEX_op_ctpop_i32, ret, arg1);
> +    } else if (TCG_TARGET_HAS_ctpop_i64) {
> +        TCGv_i64 t = tcg_temp_new_i64();
> +        tcg_gen_extu_i32_i64(t, arg1);
> +        tcg_gen_ctpop_i64(t, t);
> +        tcg_gen_extrl_i64_i32(ret, t);
> +        tcg_temp_free_i64(t);
> +    } else {
> +        gen_helper_ctpop_i32(ret, arg1);
> +    }
> +}
> +
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i32) {
> @@ -1874,6 +1889,20 @@ void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg)
>      }
>  }
>
> +void tcg_gen_ctpop_i64(TCGv_i64 ret, TCGv_i64 arg1)
> +{
> +    if (TCG_TARGET_HAS_ctpop_i64) {
> +        tcg_gen_op2_i64(INDEX_op_ctpop_i64, ret, arg1);
> +    } else if (TCG_TARGET_REG_BITS == 32 && TCG_TARGET_HAS_ctpop_i32) {
> +        tcg_gen_ctpop_i32(TCGV_HIGH(ret), TCGV_HIGH(arg1));
> +        tcg_gen_ctpop_i32(TCGV_LOW(ret), TCGV_LOW(arg1));
> +        tcg_gen_add_i32(TCGV_LOW(ret), TCGV_LOW(ret), TCGV_HIGH(ret));
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +    } else {
> +        gen_helper_ctpop_i64(ret, arg1);
> +    }
> +}
> +
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i64) {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index c2f3db9..c68e300 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -291,6 +291,7 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
>  void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
>  void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
> +void tcg_gen_ctpop_i32(TCGv_i32 a1, TCGv_i32 a2);
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
>  void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> @@ -479,6 +480,7 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
>  void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
>  void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
> +void tcg_gen_ctpop_i64(TCGv_i64 a1, TCGv_i64 a2);
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
>  void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> @@ -973,6 +975,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_clzi_tl tcg_gen_clzi_i64
>  #define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
>  #define tcg_gen_clrsb_tl tcg_gen_clrsb_i64
> +#define tcg_gen_ctpop_tl tcg_gen_ctpop_i64
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i64
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i64
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
> @@ -1069,6 +1072,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_clzi_tl tcg_gen_clzi_i32
>  #define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
>  #define tcg_gen_clrsb_tl tcg_gen_clrsb_i32
> +#define tcg_gen_ctpop_tl tcg_gen_ctpop_i32
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i32
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i32
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index d00db4f..f06f894 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -106,6 +106,7 @@ DEF(nand_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nand_i32))
>  DEF(nor_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nor_i32))
>  DEF(clz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_clz_i32))
>  DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
> +DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
>
>  DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
>  DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
> @@ -175,6 +176,7 @@ DEF(nand_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nand_i64))
>  DEF(nor_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nor_i64))
>  DEF(clz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_clz_i64))
>  DEF(ctz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctz_i64))
> +DEF(ctpop_i64, 1, 1, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctpop_i64))
>
>  DEF(add2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_add2_i64))
>  DEF(sub2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_sub2_i64))
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 0d30f1a..114ea6f 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -21,6 +21,8 @@ DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_FLAGS_1(clrsb_i32, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
> +DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
> +DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index e026282..631c6f6 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -113,6 +113,7 @@ typedef uint64_t TCGRegSet;
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_clz_i64          0
>  #define TCG_TARGET_HAS_ctz_i64          0
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_deposit_i64      0
>  #define TCG_TARGET_HAS_extract_i64      0
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 0646444..838bf3a 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -76,6 +76,7 @@
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_clz_i32          0
>  #define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_ctpop_i32        0
>  #define TCG_TARGET_HAS_neg_i32          1
>  #define TCG_TARGET_HAS_not_i32          1
>  #define TCG_TARGET_HAS_orc_i32          0
> @@ -108,6 +109,7 @@
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_clz_i64          0
>  #define TCG_TARGET_HAS_ctz_i64          0
> +#define TCG_TARGET_HAS_ctpop_i64        0
>  #define TCG_TARGET_HAS_neg_i64          1
>  #define TCG_TARGET_HAS_not_i64          1
>  #define TCG_TARGET_HAS_orc_i64          0


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop Richard Henderson
@ 2016-12-09 14:41   ` Alex Bennée
  2016-12-09 17:18     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-09 14:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

As it looked like we were messing about with the Hackers Delight
algorithms I thought it might be worth defending any changes with some
unit tests. Feel free to include the unit test bellow:

--8<---------------cut here---------------start------------->8---
>From 66857f0be793c86ce9aaa6e02ffccc6552f6e894 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>
Date: Fri, 9 Dec 2016 14:36:00 +0000
Subject: [PATCH] new: tests/test-bitcnt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add some unit tests for bit count functions (currently only ctpop). As
the routines are based on the Hackers Delight optimisations I based
the test patterns on their tests.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 tests/.gitignore       |   1 +
 tests/Makefile.include |   2 +
 tests/test-bitcnt.c    | 135 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100644 tests/test-bitcnt.c

diff --git a/tests/.gitignore b/tests/.gitignore
index c0d7857538..96986efc1a 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -13,6 +13,7 @@ rcutorture
 test-aio
 test-base64
 test-bitops
+test-bitcnt
 test-blockjob
 test-blockjob-txn
 test-bufferiszero
diff --git a/tests/Makefile.include b/tests/Makefile.include
index e98d3b6bb3..8b85c5399a 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -81,6 +81,7 @@ gcov-files-test-qht-y = util/qht.c
 check-unit-y += tests/test-qht-par$(EXESUF)
 gcov-files-test-qht-par-y = util/qht.c
 check-unit-y += tests/test-bitops$(EXESUF)
+check-unit-y += tests/test-bitcnt$(EXESUF)
 check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += tests/test-qdev-global-props$(EXESUF)
 check-unit-y += tests/check-qom-interface$(EXESUF)
 gcov-files-check-qom-interface-y = qom/object.c
@@ -570,6 +571,7 @@ tests/test-opts-visitor$(EXESUF): tests/test-opts-visitor.o $(test-qapi-obj-y)

 tests/test-mul64$(EXESUF): tests/test-mul64.o $(test-util-obj-y)
 tests/test-bitops$(EXESUF): tests/test-bitops.o $(test-util-obj-y)
+tests/test-bitcnt$(EXESUF): tests/test-bitcnt.o $(test-util-obj-y)
 tests/test-crypto-hash$(EXESUF): tests/test-crypto-hash.o $(test-crypto-obj-y)
 tests/test-crypto-cipher$(EXESUF): tests/test-crypto-cipher.o $(test-crypto-obj-y)
 tests/test-crypto-secret$(EXESUF): tests/test-crypto-secret.o $(test-crypto-obj-y)
diff --git a/tests/test-bitcnt.c b/tests/test-bitcnt.c
new file mode 100644
index 0000000000..3969b32803
--- /dev/null
+++ b/tests/test-bitcnt.c
@@ -0,0 +1,135 @@
+/*
+ * Test bit count routines
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+
+struct bitcnt_test_data {
+    /* value to count */
+    union {
+        uint8_t  w8;
+        uint16_t w16;
+        uint32_t w32;
+        uint64_t w64;
+    } value;
+    /* expected result */
+    int popct;
+};
+
+struct bitcnt_test_data eight_bit_data[] = {
+    { { .w8 = 0x01 }, .popct=1 },
+    { { .w8 = 0x03 }, .popct=2 },
+    { { .w8 = 0x04 }, .popct=1 },
+    { { .w8 = 0x0f }, .popct=4 },
+    { { .w8 = 0x3f }, .popct=6 },
+    { { .w8 = 0x40 }, .popct=1 },
+    { { .w8 = 0xf0 }, .popct=4 },
+    { { .w8 = 0x7f }, .popct=7 },
+    { { .w8 = 0x80 }, .popct=1 },
+    { { .w8 = 0xf1 }, .popct=5 },
+    { { .w8 = 0xfe }, .popct=7 },
+    { { .w8 = 0xff }, .popct=8 },
+};
+
+static void test_ctpop8(void)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(eight_bit_data); i++) {
+        struct bitcnt_test_data *d = &eight_bit_data[i];
+        g_assert(ctpop8(d->value.w8)==d->popct);
+    }
+}
+
+struct bitcnt_test_data sixteen_bit_data[] = {
+    { { .w16 = 0x0001 }, .popct=1 },
+    { { .w16 = 0x0003 }, .popct=2 },
+    { { .w16 = 0x000f }, .popct=4 },
+    { { .w16 = 0x003f }, .popct=6 },
+    { { .w16 = 0x00f0 }, .popct=4 },
+    { { .w16 = 0x0f0f }, .popct=8 },
+    { { .w16 = 0x1f1f }, .popct=10 },
+    { { .w16 = 0x4000 }, .popct=1 },
+    { { .w16 = 0x4001 }, .popct=2 },
+    { { .w16 = 0x7000 }, .popct=3 },
+    { { .w16 = 0x7fff }, .popct=15 },
+};
+
+static void test_ctpop16(void)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(sixteen_bit_data); i++) {
+        struct bitcnt_test_data *d = &sixteen_bit_data[i];
+        g_assert(ctpop16(d->value.w16)==d->popct);
+    }
+}
+
+struct bitcnt_test_data thirtytwo_bit_data[] = {
+    { { .w32 = 0x00000001 }, .popct=1 },
+    { { .w32 = 0x0000000f }, .popct=4 },
+    { { .w32 = 0x00000f0f }, .popct=8 },
+    { { .w32 = 0x00001f1f }, .popct=10 },
+    { { .w32 = 0x00004001 }, .popct=2 },
+    { { .w32 = 0x00007000 }, .popct=3 },
+    { { .w32 = 0x00007fff }, .popct=15 },
+    { { .w32 = 0x55555555 }, .popct=16 },
+    { { .w32 = 0xaaaaaaaa }, .popct=16 },
+    { { .w32 = 0xff000000 }, .popct=8 },
+    { { .w32 = 0xc0c0c0c0 }, .popct=8 },
+    { { .w32 = 0x0ffffff0 }, .popct=24 },
+    { { .w32 = 0x80000000 }, .popct=1 },
+    { { .w32 = 0xffffffff }, .popct=32 },
+};
+
+static void test_ctpop32(void)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(thirtytwo_bit_data); i++) {
+        struct bitcnt_test_data *d = &thirtytwo_bit_data[i];
+        g_assert(ctpop32(d->value.w32)==d->popct);
+    }
+}
+
+struct bitcnt_test_data sixtyfour_bit_data[] = {
+    { { .w64 = 0x0000000000000001 }, .popct=1 },
+    { { .w64 = 0x000000000000000f }, .popct=4 },
+    { { .w64 = 0x0000000000000f0f }, .popct=8 },
+    { { .w64 = 0x0000000000001f1f }, .popct=10 },
+    { { .w64 = 0x0000000000004001 }, .popct=2 },
+    { { .w64 = 0x0000000000007000 }, .popct=3 },
+    { { .w64 = 0x0000000000007fff }, .popct=15 },
+    { { .w64 = 0x0000005500555555 }, .popct=16 },
+    { { .w64 = 0x00aa0000aaaa00aa }, .popct=16 },
+    { { .w64 = 0x000f000000f00000 }, .popct=8 },
+    { { .w64 = 0x0c0c0000c0c0c0c0 }, .popct=12 },
+    { { .w64 = 0xf00f00f0f0f0f000 }, .popct=24 },
+    { { .w64 = 0x8000000000000000 }, .popct=1 },
+    { { .w64 = 0xf0f0f0f0f0f0f0f0 }, .popct=32 },
+};
+
+static void test_ctpop64(void)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(sixtyfour_bit_data); i++) {
+        struct bitcnt_test_data *d = &sixtyfour_bit_data[i];
+        g_assert(ctpop64(d->value.w64)==d->popct);
+    }
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+    g_test_add_func("/bitcnt/ctpop8", test_ctpop8);
+    g_test_add_func("/bitcnt/ctpop16", test_ctpop16);
+    g_test_add_func("/bitcnt/ctpop32", test_ctpop32);
+    g_test_add_func("/bitcnt/ctpop64", test_ctpop64);
+    return g_test_run();
+}
--
2.11.0


--8<---------------cut here---------------end--------------->8---

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes
  2016-12-06 16:36     ` Richard Henderson
@ 2016-12-09 15:41       ` Alex Bennée
  0 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-09 15:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Claudio Fontana


Richard Henderson <rth@twiddle.net> writes:

> On 12/06/2016 04:24 AM, Alex Bennée wrote:
>>> > +    case INDEX_op_extract_i64:
>>> > +    case INDEX_op_extract_i32:
>>> > +        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
>>> > +        break;
>>> > +
>>> > +    case INDEX_op_sextract_i64:
>>> > +    case INDEX_op_sextract_i32:
>>> > +        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
>>> > +        break;
>>> > +
>> This isn't right is it? As I'm reading it extract takes from a
>> offset+len from the source register to low bits of the destination
>> register. The Bitfield Move instructions are the other way around,
>> moving from the low order bits in the source register to an offset+len
>> in the destination.
>>
>
> It is right.  Extract is written as ofs/len in assembly, but encoded as lsb/msb
> in the opcode -- just like bitfield move.
>
> Boot an armv7 guest and there should be enough uses to convince you.

Yeah I got confused by the description and missed that UBFX is an alias
in those cases.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed
  2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed Richard Henderson
@ 2016-12-09 16:07   ` Alex Bennée
  2016-12-09 16:48     ` Richard Henderson
  0 siblings, 1 reply; 102+ messages in thread
From: Alex Bennée @ 2016-12-09 16:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Particularly when andc is also available, this is two insns
> shorter than using clz to compute ctz.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/tcg-op.c | 107 ++++++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 65 insertions(+), 42 deletions(-)
>
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 6f4b1b6..d1debde 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -497,43 +497,46 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
<snip>
>      } else {
> -        gen_helper_ctz_i32(ret, arg1, arg2);
> +        TCGv_i32 z, t;
> +        if (TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32) {
> +            t = tcg_temp_new_i32();
> +            tcg_gen_subi_i32(t, arg1, 1);
> +            tcg_gen_andc_i32(t, t, arg1);
> +            tcg_gen_ctpop_i32(t, t);
> +        do_movc:

Hmmm and...

<snip>
>  void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
> @@ -1842,18 +1845,29 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  {
>      if (TCG_TARGET_HAS_ctz_i64) {
>          tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
<snip>
>      } else {
> -        gen_helper_ctz_i64(ret, arg1, arg2);
> +        TCGv_i64 z, t;
> +        if (TCG_TARGET_HAS_ctpop_i64 && TCG_TARGET_HAS_andc_i64) {
> +            t = tcg_temp_new_i64();
> +            tcg_gen_subi_i64(t, arg1, 1);
> +            tcg_gen_andc_i64(t, t, arg1);
> +            tcg_gen_ctpop_i64(t, t);
> +        do_movc:

Hmmm.

So I'm not a goto hater as it makes sense for a bunch of things. But
this seems just a little too liberal usage to my eyes. What's wrong with
a little extra nesting (seeing the compiler sorts it all out in the
end):

        if ((TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32)
            || TCG_TARGET_HAS_clz_i32 || TCG_TARGET_HAS_clz_i64) {

            TCGv_i32 z, t;

            if (TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32) {
                t = tcg_temp_new_i32();
                tcg_gen_subi_i32(t, arg1, 1);
                tcg_gen_andc_i32(t, t, arg1);
                tcg_gen_ctpop_i32(t, t);
            } else if (TCG_TARGET_HAS_clz_i32 || TCG_TARGET_HAS_clz_i64) {
                /* Since all non-x86 hosts have clz(0) == 32, don't fight it.  */
                t = tcg_temp_new_i32();
                tcg_gen_neg_i32(t, arg1);
                tcg_gen_and_i32(t, t, arg1);
                tcg_gen_clzi_i32(t, t, 32);
                tcg_gen_xori_i32(t, t, 31);
            }
            /* final movc */
            z = tcg_const_i32(0);
            tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
            tcg_temp_free_i32(t);
            tcg_temp_free_i32(z);
        } else {
            gen_helper_ctz_i32(ret, arg1, arg2);
        }

--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue
  2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
                   ` (64 preceding siblings ...)
  2016-11-29 13:33 ` [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue no-reply
@ 2016-12-09 16:08 ` Alex Bennée
  65 siblings, 0 replies; 102+ messages in thread
From: Alex Bennée @ 2016-12-09 16:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> This is a combination of two patch sets that have had previous
> revisions, as well as some new patches.  I wanted to post this
> all together since Alex was having trouble with prerequisites.
>
> The full tree is at
>
>   git://github.com/rth7680/qemu.git tcg-2.9

OK I've finished my pass through this uber-set. I've given it a fairly
good tyre-kicking on ARM (guest and hosts) with RISU so I think you can
have a:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

for the ARM bits at least.

>
> Changes since v3:
>   * PPC host patches have been properly annotated for cpu revision,
>     - cnttz[wd] are power9 inventions,
>     - cntpop[wd] are power7 inventions.
>
>   * X86 host checks the correct cpuid bit for lzcnt.
>
>   * Generic TCG has significant changes to enable "interesting"
>     combinations of constraints for X86 host bsr/bsf and to some
>     extent lzcnt/tzcnt.
>
>   * Opcode for ctpop.  I had begun with only the helpers for ctpop,
>     but added the opcode after I discovered that power7/8 could use
>     that as a better alternative for implementing ctz.
>
>   * Updates to the i386 and ppc disassemblers, to handle the new
>     insns that we're emitting.
>
>
> r~
>
>
> Richard Henderson (64):
>   tcg: Add field extraction primitives
>   tcg: Minor adjustments to deposit expanders
>   tcg: Add deposit_z expander
>   tcg/aarch64: Implement field extraction opcodes
>   tcg/arm: Move isa detection to tcg-target.h
>   tcg/arm: Implement field extraction opcodes
>   tcg/i386: Implement field extraction opcodes
>   tcg/mips: Implement field extraction opcodes
>   tcg/ppc: Implement field extraction opcodes
>   tcg/s390: Expose host facilities to tcg-target.h
>   tcg/s390: Implement field extraction opcodes
>   tcg/s390: Support deposit into zero
>   target-alpha: Use deposit and extract ops
>   target-arm: Use new deposit and extract ops
>   target-i386: Use new deposit and extract ops
>   target-mips: Use the new extract op
>   target-ppc: Use the new deposit and extract ops
>   target-s390x: Use the new deposit and extract ops
>   tcg/optimize: Fold movcond 0/1 into setcond
>   tcg: Add markup for output requires new register
>   tcg: Transition flat op_defs array to a target callback
>   tcg: Pass the opcode width to target_parse_constraint
>   tcg: Allow an operand to be matching or a constant
>   tcg: Add clz and ctz opcodes
>   disas/i386.c: Handle tzcnt
>   disas/ppc: Handle popcnt and cnttz
>   target-alpha: Use the ctz and clz opcodes
>   target-cris: Use clz opcode
>   target-microblaze: Use clz opcode
>   target-mips: Use clz opcode
>   target-openrisc: Use clz and ctz opcodes
>   target-ppc: Use clz and ctz opcodes
>   target-s390x: Use clz opcode
>   target-tilegx: Use clz and ctz opcodes
>   target-tricore: Use clz opcode
>   target-unicore32: Use clz opcode
>   target-xtensa: Use clz opcode
>   target-arm: Use clz opcode
>   target-i386: Use clz and ctz opcodes
>   tcg/ppc: Handle ctz and clz opcodes
>   tcg/aarch64: Handle ctz and clz opcodes
>   tcg/arm: Handle ctz and clz opcodes
>   tcg/mips: Handle clz opcode
>   tcg/s390: Handle clz opcode
>   tcg/i386: Fuly convert tcg_target_op_def
>   tcg/i386: Hoist common arguments in tcg_out_op
>   tcg/i386: Allow bmi2 shiftx to have non-matching operands
>   tcg/i386: Handle ctz and clz opcodes
>   tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR
>   tcg: Add helpers for clrsb
>   target-arm: Use clrsb helper
>   target-tricore: Use clrsb helper
>   target-xtensa: Use clrsb helper
>   tcg: Add opcode for ctpop
>   target-alpha: Use ctpop helper
>   target-ppc: Use ctpop helper
>   target-s390x: Avoid a loop for popcnt
>   target-sparc: Use ctpop helper
>   target-tilegx: Use ctpop helper
>   target-i386: Use ctpop helper
>   qemu/host-utils.h: Reduce the operation count in the fallback ctpop
>   tcg: Use ctpop to generate ctz if needed
>   tcg/ppc: Handle ctpop opcode
>   tcg/i386: Handle ctpop opcode
>
>  disas/i386.c                  |  12 +-
>  disas/ppc.c                   |  10 +
>  include/qemu/host-utils.h     |  25 +-
>  target-alpha/helper.h         |   4 -
>  target-alpha/int_helper.c     |  15 -
>  target-alpha/translate.c      |  73 +++--
>  target-arm/helper-a64.c       |  20 --
>  target-arm/helper-a64.h       |   4 -
>  target-arm/helper.c           |   5 -
>  target-arm/helper.h           |   1 -
>  target-arm/translate-a64.c    |  95 ++----
>  target-arm/translate.c        |  43 +--
>  target-cris/helper.h          |   1 -
>  target-cris/op_helper.c       |   5 -
>  target-cris/translate.c       |   2 +-
>  target-i386/cc_helper.c       |   3 +
>  target-i386/cpu.h             |   1 +
>  target-i386/helper.h          |   2 -
>  target-i386/int_helper.c      |  11 -
>  target-i386/ops_sse.h         |  26 --
>  target-i386/ops_sse_header.h  |   1 -
>  target-i386/translate.c       |  89 ++---
>  target-microblaze/helper.h    |   1 -
>  target-microblaze/op_helper.c |   5 -
>  target-microblaze/translate.c |   2 +-
>  target-mips/helper.h          |   7 -
>  target-mips/op_helper.c       |  22 --
>  target-mips/translate.c       |  35 +-
>  target-openrisc/helper.h      |   2 -
>  target-openrisc/int_helper.c  |  19 --
>  target-openrisc/translate.c   |   6 +-
>  target-ppc/helper.h           |   7 +-
>  target-ppc/int_helper.c       |  38 +--
>  target-ppc/translate.c        |  61 ++--
>  target-s390x/helper.h         |   1 -
>  target-s390x/int_helper.c     |  21 +-
>  target-s390x/translate.c      |  36 ++-
>  target-sparc/helper.c         |   5 -
>  target-sparc/helper.h         |   1 -
>  target-sparc/translate.c      |   2 +-
>  target-tilegx/helper.c        |  15 -
>  target-tilegx/helper.h        |   3 -
>  target-tilegx/translate.c     |   6 +-
>  target-tricore/helper.h       |   3 -
>  target-tricore/op_helper.c    |  15 -
>  target-tricore/translate.c    |   7 +-
>  target-unicore32/helper.c     |  10 -
>  target-unicore32/helper.h     |   3 -
>  target-unicore32/translate.c  |   6 +-
>  target-xtensa/helper.h        |   2 -
>  target-xtensa/op_helper.c     |  13 -
>  target-xtensa/translate.c     |   4 +-
>  tcg-runtime.c                 |  40 +++
>  tcg/README                    |  28 +-
>  tcg/aarch64/tcg-target.h      |  10 +
>  tcg/aarch64/tcg-target.inc.c  |  90 +++++-
>  tcg/arm/tcg-target.h          |  41 ++-
>  tcg/arm/tcg-target.inc.c      | 119 ++++---
>  tcg/i386/tcg-target.h         |  17 +
>  tcg/i386/tcg-target.inc.c     | 732 +++++++++++++++++++++++++++---------------
>  tcg/ia64/tcg-target.h         |  10 +
>  tcg/ia64/tcg-target.inc.c     |  28 +-
>  tcg/mips/tcg-target.h         |   5 +
>  tcg/mips/tcg-target.inc.c     |  66 +++-
>  tcg/optimize.c                |  94 ++++++
>  tcg/ppc/tcg-target.h          |  13 +
>  tcg/ppc/tcg-target.inc.c      | 117 ++++++-
>  tcg/s390/tcg-target.h         | 128 ++++----
>  tcg/s390/tcg-target.inc.c     | 173 ++++++----
>  tcg/sparc/tcg-target.h        |  10 +
>  tcg/sparc/tcg-target.inc.c    |  28 +-
>  tcg/tcg-op.c                  | 695 ++++++++++++++++++++++++++++++++++++++-
>  tcg/tcg-op.h                  |  42 +++
>  tcg/tcg-opc.h                 |  10 +
>  tcg/tcg-runtime.h             |   9 +
>  tcg/tcg.c                     | 173 +++++-----
>  tcg/tcg.h                     |  14 +-
>  tcg/tci/tcg-target.h          |  10 +
>  tcg/tci/tcg-target.inc.c      |  25 +-
>  79 files changed, 2425 insertions(+), 1108 deletions(-)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed
  2016-12-09 16:07   ` Alex Bennée
@ 2016-12-09 16:48     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-09 16:48 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/09/2016 08:07 AM, Alex Bennée wrote:
> 
> Richard Henderson <rth@twiddle.net> writes:
> 
>> Particularly when andc is also available, this is two insns
>> shorter than using clz to compute ctz.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  tcg/tcg-op.c | 107 ++++++++++++++++++++++++++++++++++++-----------------------
>>  1 file changed, 65 insertions(+), 42 deletions(-)
>>
>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>> index 6f4b1b6..d1debde 100644
>> --- a/tcg/tcg-op.c
>> +++ b/tcg/tcg-op.c
>> @@ -497,43 +497,46 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> <snip>
>>      } else {
>> -        gen_helper_ctz_i32(ret, arg1, arg2);
>> +        TCGv_i32 z, t;
>> +        if (TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32) {
>> +            t = tcg_temp_new_i32();
>> +            tcg_gen_subi_i32(t, arg1, 1);
>> +            tcg_gen_andc_i32(t, t, arg1);
>> +            tcg_gen_ctpop_i32(t, t);
>> +        do_movc:
> 
> Hmmm and...
> 
> <snip>
>>  void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
>> @@ -1842,18 +1845,29 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>>  {
>>      if (TCG_TARGET_HAS_ctz_i64) {
>>          tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
> <snip>
>>      } else {
>> -        gen_helper_ctz_i64(ret, arg1, arg2);
>> +        TCGv_i64 z, t;
>> +        if (TCG_TARGET_HAS_ctpop_i64 && TCG_TARGET_HAS_andc_i64) {
>> +            t = tcg_temp_new_i64();
>> +            tcg_gen_subi_i64(t, arg1, 1);
>> +            tcg_gen_andc_i64(t, t, arg1);
>> +            tcg_gen_ctpop_i64(t, t);
>> +        do_movc:
> 
> Hmmm.
> 
> So I'm not a goto hater as it makes sense for a bunch of things. But
> this seems just a little too liberal usage to my eyes. What's wrong with
> a little extra nesting (seeing the compiler sorts it all out in the
> end):
> 
>         if ((TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32)
>             || TCG_TARGET_HAS_clz_i32 || TCG_TARGET_HAS_clz_i64) {
> 
>             TCGv_i32 z, t;
> 
>             if (TCG_TARGET_HAS_ctpop_i32 && TCG_TARGET_HAS_andc_i32) {
>                 t = tcg_temp_new_i32();
>                 tcg_gen_subi_i32(t, arg1, 1);
>                 tcg_gen_andc_i32(t, t, arg1);
>                 tcg_gen_ctpop_i32(t, t);
>             } else if (TCG_TARGET_HAS_clz_i32 || TCG_TARGET_HAS_clz_i64) {
>                 /* Since all non-x86 hosts have clz(0) == 32, don't fight it.  */
>                 t = tcg_temp_new_i32();
>                 tcg_gen_neg_i32(t, arg1);
>                 tcg_gen_and_i32(t, t, arg1);
>                 tcg_gen_clzi_i32(t, t, 32);
>                 tcg_gen_xori_i32(t, t, 31);
>             }
>             /* final movc */
>             z = tcg_const_i32(0);
>             tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
>             tcg_temp_free_i32(t);
>             tcg_temp_free_i32(z);
>         } else {
>             gen_helper_ctz_i32(ret, arg1, arg2);
>         }

That does look better.  Thanks,


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop
  2016-12-09 14:41   ` Alex Bennée
@ 2016-12-09 17:18     ` Richard Henderson
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Henderson @ 2016-12-09 17:18 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 12/09/2016 06:41 AM, Alex Bennée wrote:
> +struct bitcnt_test_data sixtyfour_bit_data[] = {
> +    { { .w64 = 0x0000000000000001 }, .popct=1 },

Thanks.  Merged with added ULL for the 64-bit data.


r~

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2016-12-09 17:19 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-23 13:00 [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue Richard Henderson
2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 01/64] tcg: Add field extraction primitives Richard Henderson
2016-12-05 13:17   ` Alex Bennée
2016-12-05 15:14     ` Richard Henderson
2016-11-23 13:00 ` [Qemu-devel] [PATCH v4 02/64] tcg: Minor adjustments to deposit expanders Richard Henderson
2016-12-05 13:18   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 03/64] tcg: Add deposit_z expander Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 04/64] tcg/aarch64: Implement field extraction opcodes Richard Henderson
2016-12-06 12:24   ` Alex Bennée
2016-12-06 16:36     ` Richard Henderson
2016-12-09 15:41       ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 05/64] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
2016-12-06 12:34   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 06/64] tcg/arm: Implement field extraction opcodes Richard Henderson
2016-12-06 16:16   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 07/64] tcg/i386: " Richard Henderson
2016-11-25 11:16   ` Paolo Bonzini
2016-11-25 11:21     ` Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 08/64] tcg/mips: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 09/64] tcg/ppc: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 10/64] tcg/s390: Expose host facilities to tcg-target.h Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 11/64] tcg/s390: Implement field extraction opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 12/64] tcg/s390: Support deposit into zero Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 13/64] target-alpha: Use deposit and extract ops Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 14/64] target-arm: Use new " Richard Henderson
2016-12-01 17:19   ` Alex Bennée
2016-12-03 21:01     ` Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 15/64] target-i386: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 16/64] target-mips: Use the new extract op Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 17/64] target-ppc: Use the new deposit and extract ops Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 18/64] target-s390x: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 19/64] tcg/optimize: Fold movcond 0/1 into setcond Richard Henderson
2016-12-06 16:22   ` Alex Bennée
2016-12-06 16:33     ` Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 20/64] tcg: Add markup for output requires new register Richard Henderson
2016-12-06 16:34   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 21/64] tcg: Transition flat op_defs array to a target callback Richard Henderson
2016-12-06 16:38   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 22/64] tcg: Pass the opcode width to target_parse_constraint Richard Henderson
2016-12-06 16:43   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 23/64] tcg: Allow an operand to be matching or a constant Richard Henderson
2016-12-08 17:19   ` Alex Bennée
2016-12-08 17:49     ` Richard Henderson
2016-12-08 20:38       ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 24/64] tcg: Add clz and ctz opcodes Richard Henderson
2016-12-08 17:44   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 25/64] disas/i386.c: Handle tzcnt Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 26/64] disas/ppc: Handle popcnt and cnttz Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 27/64] target-alpha: Use the ctz and clz opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 28/64] target-cris: Use clz opcode Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 29/64] target-microblaze: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 30/64] target-mips: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 31/64] target-openrisc: Use clz and ctz opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 32/64] target-ppc: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 33/64] target-s390x: Use clz opcode Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 34/64] target-tilegx: Use clz and ctz opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 35/64] target-tricore: Use clz opcode Richard Henderson
2016-11-23 14:58   ` Bastian Koppelmann
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 36/64] target-unicore32: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 37/64] target-xtensa: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 38/64] target-arm: " Richard Henderson
2016-12-08 17:47   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 39/64] target-i386: Use clz and ctz opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 40/64] tcg/ppc: Handle ctz and clz opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 41/64] tcg/aarch64: " Richard Henderson
2016-12-01 18:36   ` Alex Bennée
2016-12-01 18:44     ` Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 42/64] tcg/arm: " Richard Henderson
2016-12-08 17:56   ` Alex Bennée
2016-12-08 18:13     ` Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 43/64] tcg/mips: Handle clz opcode Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 44/64] tcg/s390: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 45/64] tcg/i386: Fuly convert tcg_target_op_def Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 46/64] tcg/i386: Hoist common arguments in tcg_out_op Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 47/64] tcg/i386: Allow bmi2 shiftx to have non-matching operands Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 48/64] tcg/i386: Handle ctz and clz opcodes Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 49/64] tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 50/64] tcg: Add helpers for clrsb Richard Henderson
2016-12-09  9:51   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 51/64] target-arm: Use clrsb helper Richard Henderson
2016-12-09  9:52   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 52/64] target-tricore: " Richard Henderson
2016-11-23 14:58   ` Bastian Koppelmann
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 53/64] target-xtensa: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 54/64] tcg: Add opcode for ctpop Richard Henderson
2016-12-09  9:57   ` Alex Bennée
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 55/64] target-alpha: Use ctpop helper Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 56/64] target-ppc: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 57/64] target-s390x: Avoid a loop for popcnt Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 58/64] target-sparc: Use ctpop helper Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 59/64] target-tilegx: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 60/64] target-i386: " Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 61/64] qemu/host-utils.h: Reduce the operation count in the fallback ctpop Richard Henderson
2016-12-09 14:41   ` Alex Bennée
2016-12-09 17:18     ` Richard Henderson
2016-11-23 13:01 ` [Qemu-devel] [PATCH v4 62/64] tcg: Use ctpop to generate ctz if needed Richard Henderson
2016-12-09 16:07   ` Alex Bennée
2016-12-09 16:48     ` Richard Henderson
2016-11-23 13:02 ` [Qemu-devel] [PATCH v4 63/64] tcg/ppc: Handle ctpop opcode Richard Henderson
2016-11-23 13:02 ` [Qemu-devel] [PATCH v4 64/64] tcg/i386: " Richard Henderson
2016-11-29 13:33 ` [Qemu-devel] [PATCH v4 00/64] tcg 2.9 patch queue no-reply
2016-12-09 16:08 ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.