All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically
@ 2016-11-16 19:25 Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes Richard Henderson
                   ` (24 more replies)
  0 siblings, 25 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

As you can see from the diffstat below, almost every target was
defining its own helpers for these common bit manipulation ops.

In the case of ctz and clz, they are often central to string search
routines such as strlen, so it does benefit us to make them fast.
So I do go ahead and add those as tcg opcodes, that can be handled
by the tcg backends.

One perhaps surprising thing about my definition of these opcodes
is that they take an explicit argument for the value to return when
the input is zero.

In the vast majority of cases this we simply pass a constant that is
the width of the argument.  But it turns out to have a number of uses
when implementing target-i386 (bsf does not change the register),
target-openrisc (ff1 = ffs(3)), decomposing 64-bit clz/ctz for
a 32-bit host, and actually implementing tcg/i386 (forcing the value
into a register for input to cmov).

Lightly tested with x86_64 and ppc64 hosts.


r~


Richard Henderson (25):
  tcg: Add clz and ctz opcodes
  target-alpha: Use the ctz and clz opcodes
  target-cris: Use clz opcode
  target-microblaze: Use clz opcode
  target-mips: Use clz opcode
  target-openrisc: Use clz and ctz opcodes
  target-ppc: Use clz and ctz opcodes
  target-s390x: Use clz opcode
  target-tilegx: Use clz and ctz opcodes
  target-tricore: Use clz opcode
  target-unicore32: Use clz opcode
  target-xtensa: Use clz opcode
  target-arm: Use clz opcode
  target-i386: Use clz and ctz opcodes
  disas/i386.c: Handle tzcnt
  tcg/i386: Handle ctz and clz opcodes
  tcg/ppc: Handle ctz and clz opcodes
  tcg/aarch64: Handle ctz and clz opcodes
  tcg/arm: Handle ctz and clz opcodes
  tcg/mips: Handle clz opcode
  tcg/s390: Handle clz opcode
  tcg: Add helpers for clrsb
  target-arm: Use clrsb helper
  target-tricore: Use clrsb helper
  target-xtensa: Use clrsb helper

 disas/i386.c                  |  12 ++++-
 target-alpha/helper.h         |   2 -
 target-alpha/int_helper.c     |  10 ----
 target-alpha/translate.c      |   4 +-
 target-arm/helper-a64.c       |  20 --------
 target-arm/helper-a64.h       |   4 --
 target-arm/helper.c           |   5 --
 target-arm/helper.h           |   1 -
 target-arm/translate-a64.c    |  16 +++---
 target-arm/translate.c        |   6 +--
 target-cris/helper.h          |   1 -
 target-cris/op_helper.c       |   5 --
 target-cris/translate.c       |   2 +-
 target-i386/helper.h          |   2 -
 target-i386/int_helper.c      |  11 ----
 target-i386/translate.c       |  31 ++++++------
 target-microblaze/helper.h    |   1 -
 target-microblaze/op_helper.c |   5 --
 target-microblaze/translate.c |   2 +-
 target-mips/helper.h          |   7 ---
 target-mips/op_helper.c       |  22 --------
 target-mips/translate.c       |  23 ++++++---
 target-openrisc/helper.h      |   2 -
 target-openrisc/int_helper.c  |  19 -------
 target-openrisc/translate.c   |   6 ++-
 target-ppc/helper.h           |   4 --
 target-ppc/int_helper.c       |  20 --------
 target-ppc/translate.c        |  20 ++++++--
 target-s390x/helper.h         |   1 -
 target-s390x/int_helper.c     |   6 ---
 target-s390x/translate.c      |   2 +-
 target-tilegx/helper.c        |  10 ----
 target-tilegx/helper.h        |   2 -
 target-tilegx/translate.c     |   4 +-
 target-tricore/helper.h       |   3 --
 target-tricore/op_helper.c    |  15 ------
 target-tricore/translate.c    |   7 +--
 target-unicore32/helper.c     |  10 ----
 target-unicore32/helper.h     |   3 --
 target-unicore32/translate.c  |   6 +--
 target-xtensa/helper.h        |   2 -
 target-xtensa/op_helper.c     |  13 -----
 target-xtensa/translate.c     |   4 +-
 tcg-runtime.c                 |  30 +++++++++++
 tcg/README                    |   8 +++
 tcg/aarch64/tcg-target.h      |   4 ++
 tcg/aarch64/tcg-target.inc.c  |  47 +++++++++++++++++
 tcg/arm/tcg-target.h          |   2 +
 tcg/arm/tcg-target.inc.c      |  27 ++++++++++
 tcg/i386/tcg-target.h         |   4 ++
 tcg/i386/tcg-target.inc.c     |  83 ++++++++++++++++++++++++++----
 tcg/ia64/tcg-target.h         |   4 ++
 tcg/mips/tcg-target.h         |   2 +
 tcg/mips/tcg-target.inc.c     |  34 +++++++++++++
 tcg/optimize.c                |  36 +++++++++++++
 tcg/ppc/tcg-target.h          |   4 ++
 tcg/ppc/tcg-target.inc.c      |  57 +++++++++++++++++++++
 tcg/s390/tcg-target.h         |   4 ++
 tcg/s390/tcg-target.inc.c     |  36 ++++++++++++-
 tcg/sparc/tcg-target.h        |   4 ++
 tcg/tcg-op.c                  | 114 ++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h                  |  20 ++++++++
 tcg/tcg-opc.h                 |   4 ++
 tcg/tcg-runtime.h             |   7 +++
 tcg/tcg.h                     |   2 +
 tcg/tci/tcg-target.h          |   4 ++
 66 files changed, 614 insertions(+), 274 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-21 15:11   ` Alex Bennée
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 02/25] target-alpha: Use the ctz and clz opcodes Richard Henderson
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg-runtime.c            | 20 +++++++++++
 tcg/README               |  8 +++++
 tcg/aarch64/tcg-target.h |  4 +++
 tcg/arm/tcg-target.h     |  2 ++
 tcg/i386/tcg-target.h    |  4 +++
 tcg/ia64/tcg-target.h    |  4 +++
 tcg/mips/tcg-target.h    |  2 ++
 tcg/optimize.c           | 36 ++++++++++++++++++++
 tcg/ppc/tcg-target.h     |  4 +++
 tcg/s390/tcg-target.h    |  4 +++
 tcg/sparc/tcg-target.h   |  4 +++
 tcg/tcg-op.c             | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h             | 16 +++++++++
 tcg/tcg-opc.h            |  4 +++
 tcg/tcg-runtime.h        |  5 +++
 tcg/tcg.h                |  2 ++
 tcg/tci/tcg-target.h     |  4 +++
 17 files changed, 209 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index 9327b6f..eb3bade 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -101,6 +101,26 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
     return h;
 }
 
+uint32_t HELPER(clz_i32)(uint32_t arg, uint32_t zero_val)
+{
+    return arg ? clz32(arg) : zero_val;
+}
+
+uint32_t HELPER(ctz_i32)(uint32_t arg, uint32_t zero_val)
+{
+    return arg ? ctz32(arg) : zero_val;
+}
+
+uint64_t HELPER(clz_i64)(uint64_t arg, uint64_t zero_val)
+{
+    return arg ? clz64(arg) : zero_val;
+}
+
+uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
+{
+    return arg ? ctz64(arg) : zero_val;
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/README b/tcg/README
index 065d9c2..f5ccf04 100644
--- a/tcg/README
+++ b/tcg/README
@@ -246,6 +246,14 @@ t0=~(t1|t2)
 
 t0=t1|~t2
 
+* clz_i32/i64 t0, t1, t2
+
+t0 = t1 ? clz(t1) : t2
+
+* ctz_i32/i64 t0, t1, t2
+
+t0 = t1 ? ctz(t1) : t2
+
 ********* Shifts/Rotates
 
 * shl_i32/i64 t0, t1, t2
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 4a74bd8..976f493 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -62,6 +62,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -94,6 +96,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 4e30728..02cc242 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -110,6 +110,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index dc19c47..f2d9955 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -93,6 +93,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -125,6 +127,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 8856dc8..9a829ae 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -140,6 +140,10 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_orc_i32          1
 #define TCG_TARGET_HAS_orc_i64          1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f1c3137..f133684 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -109,6 +109,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rem_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_andc_i32         0
 #define TCG_TARGET_HAS_orc_i32          0
 #define TCG_TARGET_HAS_eqv_i32          0
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 28ce624..34a28ac 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -323,6 +323,18 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
     CASE_OP_32_64(nor):
         return ~(x | y);
 
+    case INDEX_op_clz_i32:
+        return (uint32_t)x ? clz32(x) : y;
+
+    case INDEX_op_clz_i64:
+        return x ? clz64(x) : y;
+
+    case INDEX_op_ctz_i32:
+        return (uint32_t)x ? ctz32(x) : y;
+
+    case INDEX_op_ctz_i64:
+        return x ? ctz64(x) : y;
+
     CASE_OP_32_64(ext8s):
         return (int8_t)x;
 
@@ -934,6 +946,16 @@ void tcg_optimize(TCGContext *s)
             mask = temp_info(args[1])->mask | temp_info(args[2])->mask;
             break;
 
+        case INDEX_op_clz_i32:
+        case INDEX_op_ctz_i32:
+            mask = temp_info(args[2])->mask | 31;
+            break;
+
+        case INDEX_op_clz_i64:
+        case INDEX_op_ctz_i64:
+            mask = temp_info(args[2])->mask | 63;
+            break;
+
         CASE_OP_32_64(setcond):
         case INDEX_op_setcond2_i32:
             mask = 1;
@@ -1090,6 +1112,20 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
+        CASE_OP_32_64(clz):
+        CASE_OP_32_64(ctz):
+            if (temp_is_const(args[1])) {
+                TCGArg v = temp_info(args[1])->val;
+                if (v != 0) {
+                    tmp = do_constant_folding(opc, v, 0);
+                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                } else {
+                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
+                }
+                break;
+            }
+            goto do_default;
+
         CASE_OP_32_64(deposit):
             if (temp_is_const(args[1]) && temp_is_const(args[2])) {
                 tmp = deposit64(temp_info(args[1])->val, args[3], args[4],
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b42c57a..698a599 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -68,6 +68,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -101,6 +103,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index e9ac12e..3ac2dc9 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -77,6 +77,8 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i32        0
 #define TCG_TARGET_HAS_nand_i32       0
 #define TCG_TARGET_HAS_nor_i32        0
+#define TCG_TARGET_HAS_clz_i32        0
+#define TCG_TARGET_HAS_ctz_i32        0
 #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i32   0
@@ -108,6 +110,8 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i64        0
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
+#define TCG_TARGET_HAS_clz_i64        0
+#define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i64   0
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index a212167..340837a 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -110,6 +110,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_deposit_i32      0
 #define TCG_TARGET_HAS_extract_i32      0
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -142,6 +144,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 1927e53..b45095c 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -457,6 +457,38 @@ void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
     }
 }
 
+void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
+{
+    if (TCG_TARGET_HAS_clz_i32) {
+        tcg_gen_op3_i32(INDEX_op_clz_i32, ret, arg1, arg2);
+    } else {
+        gen_helper_clz_i32(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
+{
+    TCGv_i32 t = tcg_const_i32(arg2);
+    tcg_gen_clz_i32(ret, arg1, t);
+    tcg_temp_free_i32(t);
+}
+
+void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
+{
+    if (TCG_TARGET_HAS_ctz_i32) {
+        tcg_gen_op3_i32(INDEX_op_ctz_i32, ret, arg1, arg2);
+    } else {
+        gen_helper_ctz_i32(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
+{
+    TCGv_i32 t = tcg_const_i32(arg2);
+    tcg_gen_ctz_i32(ret, arg1, t);
+    tcg_temp_free_i32(t);
+}
+
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 {
     if (TCG_TARGET_HAS_rot_i32) {
@@ -1703,6 +1735,60 @@ void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
     }
 }
 
+void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
+{
+    if (TCG_TARGET_HAS_clz_i64) {
+        tcg_gen_op3_i64(INDEX_op_clz_i64, ret, arg1, arg2);
+    } else {
+        gen_helper_clz_i64(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
+{
+    if (TCG_TARGET_REG_BITS == 32
+        && TCG_TARGET_HAS_clz_i32
+        && arg2 <= 0xffffffffu) {
+        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
+        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
+        tcg_gen_addi_i32(t, t, 32);
+        tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        tcg_temp_free_i32(t);
+    } else {
+        TCGv_i64 t = tcg_const_i64(arg2);
+        tcg_gen_clz_i64(ret, arg1, t);
+        tcg_temp_free_i64(t);
+    }
+}
+
+void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
+{
+    if (TCG_TARGET_HAS_ctz_i64) {
+        tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
+    } else {
+        gen_helper_ctz_i64(ret, arg1, arg2);
+    }
+}
+
+void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
+{
+    if (TCG_TARGET_REG_BITS == 32
+        && TCG_TARGET_HAS_ctz_i32
+        && arg2 <= 0xffffffffu) {
+        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
+        tcg_gen_ctz_i32(t, TCGV_HIGH(arg1), t);
+        tcg_gen_addi_i32(t, t, 32);
+        tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t);
+        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+        tcg_temp_free_i32(t);
+    } else {
+        TCGv_i64 t = tcg_const_i64(arg2);
+        tcg_gen_ctz_i64(ret, arg1, t);
+        tcg_temp_free_i64(t);
+    }
+}
+
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
     if (TCG_TARGET_HAS_rot_i64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index d42fd0d..7a24e84 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -286,6 +286,10 @@ void tcg_gen_eqv_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_nand_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_nor_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
+void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
+void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -469,6 +473,10 @@ void tcg_gen_eqv_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_nand_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_nor_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
+void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
+void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -958,6 +966,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_nand_tl tcg_gen_nand_i64
 #define tcg_gen_nor_tl tcg_gen_nor_i64
 #define tcg_gen_orc_tl tcg_gen_orc_i64
+#define tcg_gen_clz_tl tcg_gen_clz_i64
+#define tcg_gen_ctz_tl tcg_gen_ctz_i64
+#define tcg_gen_clzi_tl tcg_gen_clzi_i64
+#define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
 #define tcg_gen_rotl_tl tcg_gen_rotl_i64
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
@@ -1049,6 +1061,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_nand_tl tcg_gen_nand_i32
 #define tcg_gen_nor_tl tcg_gen_nor_i32
 #define tcg_gen_orc_tl tcg_gen_orc_i32
+#define tcg_gen_clz_tl tcg_gen_clz_i32
+#define tcg_gen_ctz_tl tcg_gen_ctz_i32
+#define tcg_gen_clzi_tl tcg_gen_clzi_i32
+#define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
 #define tcg_gen_rotl_tl tcg_gen_rotl_i32
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 11563ac..d00db4f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -104,6 +104,8 @@ DEF(orc_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_i32))
 DEF(eqv_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_eqv_i32))
 DEF(nand_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nand_i32))
 DEF(nor_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nor_i32))
+DEF(clz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_clz_i32))
+DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
@@ -171,6 +173,8 @@ DEF(orc_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_orc_i64))
 DEF(eqv_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_eqv_i64))
 DEF(nand_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nand_i64))
 DEF(nor_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nor_i64))
+DEF(clz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_clz_i64))
+DEF(ctz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctz_i64))
 
 DEF(add2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_add2_i64))
 DEF(sub2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_sub2_i64))
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 1deb86a..eb1cd76 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -15,6 +15,11 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
+DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
+DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
+DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
 #ifdef CONFIG_SOFTMMU
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 730c2d5..ba1389c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -111,6 +111,8 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
 #define TCG_TARGET_HAS_extract_i64      0
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 2065042..0646444 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -74,6 +74,8 @@
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
+#define TCG_TARGET_HAS_clz_i32          0
+#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_neg_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_orc_i32          0
@@ -104,6 +106,8 @@
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
+#define TCG_TARGET_HAS_clz_i64          0
+#define TCG_TARGET_HAS_ctz_i64          0
 #define TCG_TARGET_HAS_neg_i64          1
 #define TCG_TARGET_HAS_not_i64          1
 #define TCG_TARGET_HAS_orc_i64          0
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 02/25] target-alpha: Use the ctz and clz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 03/25] target-cris: Use clz opcode Richard Henderson
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/helper.h     |  2 --
 target-alpha/int_helper.c | 10 ----------
 target-alpha/translate.c  |  4 ++--
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index 004221d..eed3906 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -4,8 +4,6 @@ DEF_HELPER_FLAGS_1(load_pcc, TCG_CALL_NO_RWG_SE, i64, env)
 DEF_HELPER_FLAGS_3(check_overflow, TCG_CALL_NO_WG, void, env, i64, i64)
 
 DEF_HELPER_FLAGS_1(ctpop, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(ctlz, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cttz, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_2(zap, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(zapnot, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target-alpha/int_helper.c b/target-alpha/int_helper.c
index 19bebfe..3c303bd 100644
--- a/target-alpha/int_helper.c
+++ b/target-alpha/int_helper.c
@@ -29,16 +29,6 @@ uint64_t helper_ctpop(uint64_t arg)
     return ctpop64(arg);
 }
 
-uint64_t helper_ctlz(uint64_t arg)
-{
-    return clz64(arg);
-}
-
-uint64_t helper_cttz(uint64_t arg)
-{
-    return ctz64(arg);
-}
-
 uint64_t helper_zapnot(uint64_t val, uint64_t mskb)
 {
     uint64_t mask;
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 5ac2277..6e2e563 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2555,14 +2555,14 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
             REQUIRE_REG_31(ra);
             REQUIRE_NO_LIT;
-            gen_helper_ctlz(vc, vb);
+            tcg_gen_clzi_i64(vc, vb, 64);
             break;
         case 0x33:
             /* CTTZ */
             REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
             REQUIRE_REG_31(ra);
             REQUIRE_NO_LIT;
-            gen_helper_cttz(vc, vb);
+            tcg_gen_ctzi_i64(vc, vb, 64);
             break;
         case 0x34:
             /* UNPKBW */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 03/25] target-cris: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 02/25] target-alpha: Use the ctz and clz opcodes Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 04/25] target-microblaze: " Richard Henderson
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Edgar E . Iglesias

Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-cris/helper.h    | 1 -
 target-cris/op_helper.c | 5 -----
 target-cris/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-cris/helper.h b/target-cris/helper.h
index ff35956..20d21c4 100644
--- a/target-cris/helper.h
+++ b/target-cris/helper.h
@@ -7,7 +7,6 @@ DEF_HELPER_1(rfn, void, env)
 DEF_HELPER_3(movl_sreg_reg, void, env, i32, i32)
 DEF_HELPER_3(movl_reg_sreg, void, env, i32, i32)
 
-DEF_HELPER_FLAGS_1(lz, TCG_CALL_NO_SE, i32, i32)
 DEF_HELPER_FLAGS_4(btst, TCG_CALL_NO_SE, i32, env, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(evaluate_flags_muls, TCG_CALL_NO_SE, i32, env, i32, i32, i32)
diff --git a/target-cris/op_helper.c b/target-cris/op_helper.c
index 5043039..e92505c 100644
--- a/target-cris/op_helper.c
+++ b/target-cris/op_helper.c
@@ -230,11 +230,6 @@ void helper_rfn(CPUCRISState *env)
 	env->pregs[PR_CCS] |= M_FLAG_V32;
 }
 
-uint32_t helper_lz(uint32_t t0)
-{
-	return clz32(t0);
-}
-
 uint32_t helper_btst(CPUCRISState *env, uint32_t t0, uint32_t t1, uint32_t ccs)
 {
 	/* FIXME: clean this up.  */
diff --git a/target-cris/translate.c b/target-cris/translate.c
index b910427..0ee05ca 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -767,7 +767,7 @@ static void cris_alu_op_exec(DisasContext *dc, int op,
         t_gen_subx_carry(dc, dst);
         break;
     case CC_OP_LZ:
-        gen_helper_lz(dst, b);
+        tcg_gen_clzi_tl(dst, b, TARGET_LONG_BITS);
         break;
     case CC_OP_MULS:
         tcg_gen_muls2_tl(dst, cpu_PR[PR_MOF], a, b);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 04/25] target-microblaze: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (2 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 03/25] target-cris: Use clz opcode Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 05/25] target-mips: " Richard Henderson
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Edgar E . Iglesias

Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-microblaze/helper.h    | 1 -
 target-microblaze/op_helper.c | 5 -----
 target-microblaze/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-microblaze/helper.h b/target-microblaze/helper.h
index bd13826..71a6c08 100644
--- a/target-microblaze/helper.h
+++ b/target-microblaze/helper.h
@@ -3,7 +3,6 @@ DEF_HELPER_1(debug, void, env)
 DEF_HELPER_FLAGS_3(carry, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_2(cmp, i32, i32, i32)
 DEF_HELPER_2(cmpu, i32, i32, i32)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 
 DEF_HELPER_3(divs, i32, env, i32, i32)
 DEF_HELPER_3(divu, i32, env, i32, i32)
diff --git a/target-microblaze/op_helper.c b/target-microblaze/op_helper.c
index 4a856e6..1e07e21 100644
--- a/target-microblaze/op_helper.c
+++ b/target-microblaze/op_helper.c
@@ -145,11 +145,6 @@ uint32_t helper_cmpu(uint32_t a, uint32_t b)
     return t;
 }
 
-uint32_t helper_clz(uint32_t t0)
-{
-    return clz32(t0);
-}
-
 uint32_t helper_carry(uint32_t a, uint32_t b, uint32_t cf)
 {
     return compute_carry(a, b, cf);
diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
index de2090a..0bb6095 100644
--- a/target-microblaze/translate.c
+++ b/target-microblaze/translate.c
@@ -768,7 +768,7 @@ static void dec_bit(DisasContext *dc)
                 t_gen_raise_exception(dc, EXCP_HW_EXCP);
             }
             if (dc->cpu->env.pvr.regs[2] & PVR2_USE_PCMP_INSTR) {
-                gen_helper_clz(cpu_R[dc->rd], cpu_R[dc->ra]);
+                tcg_gen_clzi_i32(cpu_R[dc->rd], cpu_R[dc->ra], 32);
             }
             break;
         case 0x1e0:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 05/25] target-mips: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (3 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 04/25] target-microblaze: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 06/25] target-openrisc: Use clz and ctz opcodes Richard Henderson
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yongbok Kim

Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/helper.h    |  7 -------
 target-mips/op_helper.c | 22 ----------------------
 target-mips/translate.c | 23 ++++++++++++++++-------
 3 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index 666936c..60efa01 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -20,13 +20,6 @@ DEF_HELPER_4(scd, tl, env, tl, tl, int)
 #endif
 #endif
 
-DEF_HELPER_FLAGS_1(clo, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl)
-#ifdef TARGET_MIPS64
-DEF_HELPER_FLAGS_1(dclo, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(dclz, TCG_CALL_NO_RWG_SE, tl, tl)
-#endif
-
 DEF_HELPER_3(muls, tl, env, tl, tl)
 DEF_HELPER_3(mulsu, tl, env, tl, tl)
 DEF_HELPER_3(macc, tl, env, tl, tl)
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 7af4c2f..11d781f 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -103,28 +103,6 @@ HELPER_ST(sd, stq, uint64_t)
 #endif
 #undef HELPER_ST
 
-target_ulong helper_clo (target_ulong arg1)
-{
-    return clo32(arg1);
-}
-
-target_ulong helper_clz (target_ulong arg1)
-{
-    return clz32(arg1);
-}
-
-#if defined(TARGET_MIPS64)
-target_ulong helper_dclo (target_ulong arg1)
-{
-    return clo64(arg1);
-}
-
-target_ulong helper_dclz (target_ulong arg1)
-{
-    return clz64(arg1);
-}
-#endif /* TARGET_MIPS64 */
-
 /* 64 bits arithmetic for 32 bits hosts */
 static inline uint64_t get_HILO(CPUMIPSState *env)
 {
diff --git a/target-mips/translate.c b/target-mips/translate.c
index cf79aa4..24d7657 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -3626,29 +3626,38 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
         /* Treat as NOP. */
         return;
     }
-    t0 = tcg_temp_new();
+    t0 = cpu_gpr[rd];
     gen_load_gpr(t0, rs);
+
     switch (opc) {
     case OPC_CLO:
     case R6_OPC_CLO:
-        gen_helper_clo(cpu_gpr[rd], t0);
+#if defined(TARGET_MIPS64)
+    case OPC_DCLO:
+    case R6_OPC_DCLO:
+#endif
+        tcg_gen_not_tl(t0, t0);
         break;
+    }
+
+    switch (opc) {
+    case OPC_CLO:
+    case R6_OPC_CLO:
     case OPC_CLZ:
     case R6_OPC_CLZ:
-        gen_helper_clz(cpu_gpr[rd], t0);
+        tcg_gen_ext32u_tl(t0, t0);
+        tcg_gen_clzi_tl(t0, t0, TARGET_LONG_BITS);
+        tcg_gen_subi_tl(t0, t0, TARGET_LONG_BITS - 32);
         break;
 #if defined(TARGET_MIPS64)
     case OPC_DCLO:
     case R6_OPC_DCLO:
-        gen_helper_dclo(cpu_gpr[rd], t0);
-        break;
     case OPC_DCLZ:
     case R6_OPC_DCLZ:
-        gen_helper_dclz(cpu_gpr[rd], t0);
+        tcg_gen_clzi_i64(t0, t0, 64);
         break;
 #endif
     }
-    tcg_temp_free(t0);
 }
 
 /* Godson integer instructions */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 06/25] target-openrisc: Use clz and ctz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (4 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 05/25] target-mips: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 07/25] target-ppc: " Richard Henderson
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jia Liu

Cc: Jia Liu <proljc@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-openrisc/helper.h     |  2 --
 target-openrisc/int_helper.c | 19 -------------------
 target-openrisc/translate.c  |  6 ++++--
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/target-openrisc/helper.h b/target-openrisc/helper.h
index f53fa21..bcc7245 100644
--- a/target-openrisc/helper.h
+++ b/target-openrisc/helper.h
@@ -54,8 +54,6 @@ FOP_CMP(ge)
 #undef FOP_CMP
 
 /* int */
-DEF_HELPER_FLAGS_1(ff1, 0, tl, tl)
-DEF_HELPER_FLAGS_1(fl1, 0, tl, tl)
 DEF_HELPER_FLAGS_3(mul32, 0, i32, env, i32, i32)
 
 /* interrupt */
diff --git a/target-openrisc/int_helper.c b/target-openrisc/int_helper.c
index 4d1f958..ba0fd27 100644
--- a/target-openrisc/int_helper.c
+++ b/target-openrisc/int_helper.c
@@ -24,25 +24,6 @@
 #include "exception.h"
 #include "qemu/host-utils.h"
 
-target_ulong HELPER(ff1)(target_ulong x)
-{
-/*#ifdef TARGET_OPENRISC64
-    return x ? ctz64(x) + 1 : 0;
-#else*/
-    return x ? ctz32(x) + 1 : 0;
-/*#endif*/
-}
-
-target_ulong HELPER(fl1)(target_ulong x)
-{
-/* not used yet, open it when we need or64.  */
-/*#ifdef TARGET_OPENRISC64
-    return 64 - clz64(x);
-#else*/
-    return 32 - clz32(x);
-/*#endif*/
-}
-
 uint32_t HELPER(mul32)(CPUOpenRISCState *env,
                        uint32_t ra, uint32_t rb)
 {
diff --git a/target-openrisc/translate.c b/target-openrisc/translate.c
index 229361a..03fa7db 100644
--- a/target-openrisc/translate.c
+++ b/target-openrisc/translate.c
@@ -602,11 +602,13 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
         switch (op1) {
         case 0x00:    /* l.ff1 */
             LOG_DIS("l.ff1 r%d, r%d, r%d\n", rd, ra, rb);
-            gen_helper_ff1(cpu_R[rd], cpu_R[ra]);
+            tcg_gen_ctzi_tl(cpu_R[rd], cpu_R[ra], -1);
+            tcg_gen_addi_tl(cpu_R[rd], cpu_R[rd], 1);
             break;
         case 0x01:    /* l.fl1 */
             LOG_DIS("l.fl1 r%d, r%d, r%d\n", rd, ra, rb);
-            gen_helper_fl1(cpu_R[rd], cpu_R[ra]);
+            tcg_gen_clzi_tl(cpu_R[rd], cpu_R[ra], TARGET_LONG_BITS);
+            tcg_gen_subfi_tl(cpu_R[rd], TARGET_LONG_BITS, cpu_R[rd]);
             break;
 
         default:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 07/25] target-ppc: Use clz and ctz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (5 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 06/25] target-openrisc: Use clz and ctz opcodes Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-17  3:09   ` David Gibson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 08/25] target-s390x: Use clz opcode Richard Henderson
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, David Gibson

Cc: qemu-ppc@nongnu.org
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/helper.h     |  4 ----
 target-ppc/int_helper.c | 20 --------------------
 target-ppc/translate.c  | 20 ++++++++++++++++----
 3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index da00f0a..1ed1d2c 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -38,16 +38,12 @@ DEF_HELPER_4(divde, i64, env, i64, i64, i32)
 DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
 DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
 
-DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(cnttzw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_3(sraw, tl, env, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
-DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 9ac204a..a6486ce 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -141,16 +141,6 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
 #endif
 
 
-target_ulong helper_cntlzw(target_ulong t)
-{
-    return clz32(t);
-}
-
-target_ulong helper_cnttzw(target_ulong t)
-{
-    return ctz32(t);
-}
-
 #if defined(TARGET_PPC64)
 /* if x = 0xab, returns 0xababababababababa */
 #define pattern(x) (((x) & 0xff) * (~(target_ulong)0 / 0xff))
@@ -174,16 +164,6 @@ uint32_t helper_cmpeqb(target_ulong ra, target_ulong rb)
 #undef haszero
 #undef hasvalue
 
-target_ulong helper_cntlzd(target_ulong t)
-{
-    return clz64(t);
-}
-
-target_ulong helper_cnttzd(target_ulong t)
-{
-    return ctz64(t);
-}
-
 /* Return invalid random number.
  *
  * FIXME: Add rng backend or other mechanism to get cryptographically suitable
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 435c6f0..1224f56 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1641,7 +1641,13 @@ static void gen_andis_(DisasContext *ctx)
 /* cntlzw */
 static void gen_cntlzw(DisasContext *ctx)
 {
-    gen_helper_cntlzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_clzi_i32(t, t, 32);
+    tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
+    tcg_temp_free_i32(t);
+
     if (unlikely(Rc(ctx->opcode) != 0))
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1649,7 +1655,13 @@ static void gen_cntlzw(DisasContext *ctx)
 /* cnttzw */
 static void gen_cnttzw(DisasContext *ctx)
 {
-    gen_helper_cnttzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    TCGv_i32 t = tcg_temp_new_i32();
+
+    tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_ctzi_i32(t, t, 32);
+    tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
+    tcg_temp_free_i32(t);
+
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
     }
@@ -1891,7 +1903,7 @@ GEN_LOGICAL1(extsw, tcg_gen_ext32s_tl, 0x1E, PPC_64B);
 /* cntlzd */
 static void gen_cntlzd(DisasContext *ctx)
 {
-    gen_helper_cntlzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_clzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
     if (unlikely(Rc(ctx->opcode) != 0))
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1899,7 +1911,7 @@ static void gen_cntlzd(DisasContext *ctx)
 /* cnttzd */
 static void gen_cnttzd(DisasContext *ctx)
 {
-    gen_helper_cnttzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    tcg_gen_ctzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
     if (unlikely(Rc(ctx->opcode) != 0)) {
         gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 08/25] target-s390x: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (6 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 07/25] target-ppc: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 09/25] target-tilegx: Use clz and ctz opcodes Richard Henderson
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-s390x/helper.h     | 1 -
 target-s390x/int_helper.c | 6 ------
 target-s390x/translate.c  | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/target-s390x/helper.h b/target-s390x/helper.h
index 207a6e7..9102071 100644
--- a/target-s390x/helper.h
+++ b/target-s390x/helper.h
@@ -70,7 +70,6 @@ DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target-s390x/int_helper.c b/target-s390x/int_helper.c
index 370c94d..5bc470b 100644
--- a/target-s390x/int_helper.c
+++ b/target-s390x/int_helper.c
@@ -117,12 +117,6 @@ uint64_t HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al,
     return ret;
 }
 
-/* count leading zeros, for find leftmost one */
-uint64_t HELPER(clz)(uint64_t v)
-{
-    return clz64(v);
-}
-
 uint64_t HELPER(cvd)(int32_t reg)
 {
     /* positive 0 */
diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index 6cebb7e..01c6217 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -2249,7 +2249,7 @@ static ExitStatus op_flogr(DisasContext *s, DisasOps *o)
     gen_op_update1_cc_i64(s, CC_OP_FLOGR, o->in2);
 
     /* R1 = IN ? CLZ(IN) : 64.  */
-    gen_helper_clz(o->out, o->in2);
+    tcg_gen_clzi_i64(o->out, o->in2, 64);
 
     /* R1+1 = IN & ~(found bit).  Note that we may attempt to shift this
        value by 64, which is undefined.  But since the shift is 64 iff the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 09/25] target-tilegx: Use clz and ctz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (7 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 08/25] target-s390x: Use clz opcode Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode Richard Henderson
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tilegx/helper.c    | 10 ----------
 target-tilegx/helper.h    |  2 --
 target-tilegx/translate.c |  4 ++--
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/target-tilegx/helper.c b/target-tilegx/helper.c
index b4fba9c..b6f5e29 100644
--- a/target-tilegx/helper.c
+++ b/target-tilegx/helper.c
@@ -55,16 +55,6 @@ void helper_ext01_ics(CPUTLGState *env)
     }
 }
 
-uint64_t helper_cntlz(uint64_t arg)
-{
-    return clz64(arg);
-}
-
-uint64_t helper_cnttz(uint64_t arg)
-{
-    return ctz64(arg);
-}
-
 uint64_t helper_pcnt(uint64_t arg)
 {
     return ctpop64(arg);
diff --git a/target-tilegx/helper.h b/target-tilegx/helper.h
index 9281d0f..bab303a 100644
--- a/target-tilegx/helper.h
+++ b/target-tilegx/helper.h
@@ -1,7 +1,5 @@
 DEF_HELPER_2(exception, noreturn, env, i32)
 DEF_HELPER_1(ext01_ics, void, env)
-DEF_HELPER_FLAGS_1(cntlz, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cnttz, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(pcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(revbits, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_3(shufflebytes, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c
index 9c734ee..8a2df1b 100644
--- a/target-tilegx/translate.c
+++ b/target-tilegx/translate.c
@@ -608,12 +608,12 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned opext,
     switch (opext) {
     case OE_RR_X0(CNTLZ):
     case OE_RR_Y0(CNTLZ):
-        gen_helper_cntlz(tdest, tsrca);
+        tcg_gen_clzi_tl(tdest, tsrca, TARGET_LONG_BITS);
         mnemonic = "cntlz";
         break;
     case OE_RR_X0(CNTTZ):
     case OE_RR_Y0(CNTTZ):
-        gen_helper_cnttz(tdest, tsrca);
+        tcg_gen_ctzi_tl(tdest, tsrca, TARGET_LONG_BITS);
         mnemonic = "cnttz";
         break;
     case OE_RR_X0(FSINGLE_PACK1):
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (8 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 09/25] target-tilegx: Use clz and ctz opcodes Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-17 14:42   ` Bastian Koppelmann
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 11/25] target-unicore32: " Richard Henderson
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Bastian Koppelmann

Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tricore/helper.h    |  2 --
 target-tricore/op_helper.c | 10 ----------
 target-tricore/translate.c |  5 +++--
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/target-tricore/helper.h b/target-tricore/helper.h
index 9333e16..2cf04e1 100644
--- a/target-tricore/helper.h
+++ b/target-tricore/helper.h
@@ -87,9 +87,7 @@ DEF_HELPER_FLAGS_2(min_hu, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(ixmin, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 DEF_HELPER_FLAGS_2(ixmin_u, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 /* count leading ... */
-DEF_HELPER_FLAGS_1(clo, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clo_h, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clz_h, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls_h, TCG_CALL_NO_RWG_SE, i32, i32)
diff --git a/target-tricore/op_helper.c b/target-tricore/op_helper.c
index ac02e0a..3731d5e 100644
--- a/target-tricore/op_helper.c
+++ b/target-tricore/op_helper.c
@@ -1733,11 +1733,6 @@ EXTREMA_H_B(min, <)
 
 #undef EXTREMA_H_B
 
-uint32_t helper_clo(target_ulong r1)
-{
-    return clo32(r1);
-}
-
 uint32_t helper_clo_h(target_ulong r1)
 {
     uint32_t ret_hw0 = extract32(r1, 0, 16);
@@ -1756,11 +1751,6 @@ uint32_t helper_clo_h(target_ulong r1)
     return ret_hw0 | (ret_hw1 << 16);
 }
 
-uint32_t helper_clz(target_ulong r1)
-{
-    return clz32(r1);
-}
-
 uint32_t helper_clz_h(target_ulong r1)
 {
     uint32_t ret_hw0 = extract32(r1, 0, 16);
diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index 36f734a..69cdfb9 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -6367,7 +6367,8 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
         tcg_gen_andc_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
         break;
     case OPC2_32_RR_CLO:
-        gen_helper_clo(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_not_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r3], TARGET_LONG_BITS);
         break;
     case OPC2_32_RR_CLO_H:
         gen_helper_clo_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
@@ -6379,7 +6380,7 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
         gen_helper_cls_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
         break;
     case OPC2_32_RR_CLZ:
-        gen_helper_clz(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], TARGET_LONG_BITS);
         break;
     case OPC2_32_RR_CLZ_H:
         gen_helper_clz_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 11/25] target-unicore32: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (9 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 12/25] target-xtensa: " Richard Henderson
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Guan Xuetao

Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-unicore32/helper.c    | 10 ----------
 target-unicore32/helper.h    |  3 ---
 target-unicore32/translate.c |  6 +++---
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/target-unicore32/helper.c b/target-unicore32/helper.c
index d603bde..7a5613e 100644
--- a/target-unicore32/helper.c
+++ b/target-unicore32/helper.c
@@ -32,16 +32,6 @@ UniCore32CPU *uc32_cpu_init(const char *cpu_model)
     return UNICORE32_CPU(cpu_generic_init(TYPE_UNICORE32_CPU, cpu_model));
 }
 
-uint32_t HELPER(clo)(uint32_t x)
-{
-    return clo32(x);
-}
-
-uint32_t HELPER(clz)(uint32_t x)
-{
-    return clz32(x);
-}
-
 #ifndef CONFIG_USER_ONLY
 void helper_cp0_set(CPUUniCore32State *env, uint32_t val, uint32_t creg,
         uint32_t cop)
diff --git a/target-unicore32/helper.h b/target-unicore32/helper.h
index 9418137..a4a5d45 100644
--- a/target-unicore32/helper.h
+++ b/target-unicore32/helper.h
@@ -13,9 +13,6 @@ DEF_HELPER_3(cp0_get, i32, env, i32, i32)
 DEF_HELPER_1(cp1_putc, void, i32)
 #endif
 
-DEF_HELPER_1(clz, i32, i32)
-DEF_HELPER_1(clo, i32, i32)
-
 DEF_HELPER_2(exception, void, env, i32)
 
 DEF_HELPER_3(asr_write, void, env, i32, i32)
diff --git a/target-unicore32/translate.c b/target-unicore32/translate.c
index 514d460..666a201 100644
--- a/target-unicore32/translate.c
+++ b/target-unicore32/translate.c
@@ -1479,10 +1479,10 @@ static void do_misc(CPUUniCore32State *env, DisasContext *s, uint32_t insn)
         /* clz */
         tmp = load_reg(s, UCOP_REG_M);
         if (UCOP_SET(26)) {
-            gen_helper_clo(tmp, tmp);
-        } else {
-            gen_helper_clz(tmp, tmp);
+            /* clo */
+            tcg_gen_not_i32(tmp, tmp);
         }
+        tcg_gen_clzi_i32(tmp, tmp, 32);
         store_reg(s, UCOP_REG_D, tmp);
         return;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 12/25] target-xtensa: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (10 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 11/25] target-unicore32: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 13/25] target-arm: " Richard Henderson
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov

Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-xtensa/helper.h    |  2 --
 target-xtensa/op_helper.c | 13 -------------
 target-xtensa/translate.c | 13 +++++++++++--
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/target-xtensa/helper.h b/target-xtensa/helper.h
index 5ea9c5b..0c8adae 100644
--- a/target-xtensa/helper.h
+++ b/target-xtensa/helper.h
@@ -3,8 +3,6 @@ DEF_HELPER_3(exception_cause, noreturn, env, i32, i32)
 DEF_HELPER_4(exception_cause_vaddr, noreturn, env, i32, i32, i32)
 DEF_HELPER_3(debug_exception, noreturn, env, i32, i32)
 
-DEF_HELPER_FLAGS_1(nsa, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(nsau, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_2(wsr_windowbase, void, env, i32)
 DEF_HELPER_4(entry, void, env, i32, i32, i32)
 DEF_HELPER_2(retw, i32, env, i32)
diff --git a/target-xtensa/op_helper.c b/target-xtensa/op_helper.c
index 0a4b214..dc25625 100644
--- a/target-xtensa/op_helper.c
+++ b/target-xtensa/op_helper.c
@@ -161,19 +161,6 @@ void HELPER(debug_exception)(CPUXtensaState *env, uint32_t pc, uint32_t cause)
     HELPER(exception)(env, EXC_DEBUG);
 }
 
-uint32_t HELPER(nsa)(uint32_t v)
-{
-    if (v & 0x80000000) {
-        v = ~v;
-    }
-    return v ? clz32(v) - 1 : 31;
-}
-
-uint32_t HELPER(nsau)(uint32_t v)
-{
-    return v ? clz32(v) : 32;
-}
-
 static void copy_window_from_phys(CPUXtensaState *env,
         uint32_t window, uint32_t phys, uint32_t n)
 {
diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
index 0858c29..5c719a4 100644
--- a/target-xtensa/translate.c
+++ b/target-xtensa/translate.c
@@ -1372,14 +1372,23 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
                 case 14: /*NSAu*/
                     HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
                     if (gen_window_check2(dc, RRR_S, RRR_T)) {
-                        gen_helper_nsa(cpu_R[RRR_T], cpu_R[RRR_S]);
+                        TCGv_i32 t0 = tcg_temp_new_i32();
+
+                        /* if (v & 0x80000000) v = ~v; */
+                        tcg_gen_sari_i32(t0, cpu_R[RRR_S], 31);
+                        tcg_gen_xor_i32(t0, t0, cpu_R[RRR_S]);
+
+                        /* r = (v ? clz(v) : 32) - 1; */
+                        tcg_gen_clzi_i32(t0, t0, 32);
+                        tcg_gen_subi_i32(cpu_R[RRR_T], t0, 1);
+                        tcg_temp_free_i32(t0);
                     }
                     break;
 
                 case 15: /*NSAUu*/
                     HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
                     if (gen_window_check2(dc, RRR_S, RRR_T)) {
-                        gen_helper_nsau(cpu_R[RRR_T], cpu_R[RRR_S]);
+                        tcg_gen_clzi_i32(cpu_R[RRR_T], cpu_R[RRR_S], 32);
                     }
                     break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 13/25] target-arm: Use clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (11 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 12/25] target-xtensa: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 14/25] target-i386: Use clz and ctz opcodes Richard Henderson
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Cc: qemu-arm@nongnu.org
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/helper-a64.c    | 10 ----------
 target-arm/helper-a64.h    |  2 --
 target-arm/helper.c        |  5 -----
 target-arm/helper.h        |  1 -
 target-arm/translate-a64.c |  8 ++++----
 target-arm/translate.c     |  6 +++---
 6 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 98b97df..77999ff 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -54,11 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
     return num / den;
 }
 
-uint64_t HELPER(clz64)(uint64_t x)
-{
-    return clz64(x);
-}
-
 uint64_t HELPER(cls64)(uint64_t x)
 {
     return clrsb64(x);
@@ -69,11 +64,6 @@ uint32_t HELPER(cls32)(uint32_t x)
     return clrsb32(x);
 }
 
-uint32_t HELPER(clz32)(uint32_t x)
-{
-    return clz32(x);
-}
-
 uint64_t HELPER(rbit64)(uint64_t x)
 {
     return revbit64(x);
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index dd32000..d320f96 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -18,10 +18,8 @@
  */
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
-DEF_HELPER_FLAGS_1(clz64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(clz32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
diff --git a/target-arm/helper.c b/target-arm/helper.c
index b5b65ca..0cafdbc 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -5718,11 +5718,6 @@ uint32_t HELPER(uxtb16)(uint32_t x)
     return res;
 }
 
-uint32_t HELPER(clz)(uint32_t x)
-{
-    return clz32(x);
-}
-
 int32_t HELPER(sdiv)(int32_t num, int32_t den)
 {
     if (den == 0)
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 84aa637..df86bf7 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -1,4 +1,3 @@
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(sxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(uxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
 
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index e90487b..12621ff 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3953,11 +3953,11 @@ static void handle_clz(DisasContext *s, unsigned int sf,
     tcg_rn = cpu_reg(s, rn);
 
     if (sf) {
-        gen_helper_clz64(tcg_rd, tcg_rn);
+        tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
     } else {
         TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
         tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
-        gen_helper_clz(tcg_tmp32, tcg_tmp32);
+        tcg_gen_clzi_i32(tcg_tmp32, tcg_tmp32, 32);
         tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
         tcg_temp_free_i32(tcg_tmp32);
     }
@@ -7590,7 +7590,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
     switch (opcode) {
     case 0x4: /* CLS, CLZ */
         if (u) {
-            gen_helper_clz64(tcg_rd, tcg_rn);
+            tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
         } else {
             gen_helper_cls64(tcg_rd, tcg_rn);
         }
@@ -10260,7 +10260,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                     goto do_cmop;
                 case 0x4: /* CLS */
                     if (u) {
-                        gen_helper_clz32(tcg_res, tcg_op);
+                        tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
                     } else {
                         gen_helper_cls32(tcg_res, tcg_op);
                     }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 08da9ac..c9186b6 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -7037,7 +7037,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             switch (size) {
                             case 0: gen_helper_neon_clz_u8(tmp, tmp); break;
                             case 1: gen_helper_neon_clz_u16(tmp, tmp); break;
-                            case 2: gen_helper_clz(tmp, tmp); break;
+                            case 2: tcg_gen_clzi_i32(tmp, tmp, 32); break;
                             default: abort();
                             }
                             break;
@@ -8219,7 +8219,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                 ARCH(5);
                 rd = (insn >> 12) & 0xf;
                 tmp = load_reg(s, rm);
-                gen_helper_clz(tmp, tmp);
+                tcg_gen_clzi_i32(tmp, tmp, 32);
                 store_reg(s, rd, tmp);
             } else {
                 goto illegal_op;
@@ -9992,7 +9992,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                     tcg_temp_free_i32(tmp2);
                     break;
                 case 0x18: /* clz */
-                    gen_helper_clz(tmp, tmp);
+                    tcg_gen_clzi_i32(tmp, tmp, 32);
                     break;
                 case 0x20:
                 case 0x21:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 14/25] target-i386: Use clz and ctz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (12 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 13/25] target-arm: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 15/25] disas/i386.c: Handle tzcnt Richard Henderson
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eduardo Habkost

Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/helper.h     |  2 --
 target-i386/int_helper.c | 11 -----------
 target-i386/translate.c  | 31 ++++++++++++++-----------------
 3 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 4e859eb..1e76b09 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -201,8 +201,6 @@ DEF_HELPER_FLAGS_3(xsetbv, TCG_CALL_NO_WG, void, env, i32, i64)
 DEF_HELPER_FLAGS_2(rdpkru, TCG_CALL_NO_WG, i64, env, i32)
 DEF_HELPER_FLAGS_3(wrpkru, TCG_CALL_NO_WG, void, env, i32, i64)
 
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(ctz, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(pdep, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(pext, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
diff --git a/target-i386/int_helper.c b/target-i386/int_helper.c
index 9e873ac..4dc5c65 100644
--- a/target-i386/int_helper.c
+++ b/target-i386/int_helper.c
@@ -417,17 +417,6 @@ void helper_idivq_EAX(CPUX86State *env, target_ulong t0)
 # define clztl  clz64
 #endif
 
-/* bit operations */
-target_ulong helper_ctz(target_ulong t0)
-{
-    return ctztl(t0);
-}
-
-target_ulong helper_clz(target_ulong t0)
-{
-    return clztl(t0);
-}
-
 target_ulong helper_pdep(target_ulong src, target_ulong mask)
 {
     target_ulong dest = 0;
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 4d6d36f..0eac334 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -6792,21 +6792,18 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 ? s->cpuid_ext3_features & CPUID_EXT3_ABM
                 : s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1)) {
             int size = 8 << ot;
+            /* For lzcnt/tzcnt, C bit is defined related to the input. */
             tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
             if (b & 1) {
                 /* For lzcnt, reduce the target_ulong result by the
                    number of zeros that we expect to find at the top.  */
-                gen_helper_clz(cpu_T0, cpu_T0);
+                tcg_gen_clzi_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS);
                 tcg_gen_subi_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS - size);
             } else {
-                /* For tzcnt, a zero input must return the operand size:
-                   force all bits outside the operand size to 1.  */
-                target_ulong mask = (target_ulong)-2 << (size - 1);
-                tcg_gen_ori_tl(cpu_T0, cpu_T0, mask);
-                gen_helper_ctz(cpu_T0, cpu_T0);
-            }
-            /* For lzcnt/tzcnt, C and Z bits are defined and are
-               related to the result.  */
+                /* For tzcnt, a zero input must return the operand size.  */
+                tcg_gen_ctzi_tl(cpu_T0, cpu_T0, size);
+            }
+            /* For lzcnt/tzcnt, Z bit is defined related to the result.  */
             gen_op_update1_cc();
             set_cc_op(s, CC_OP_BMILGB + ot);
         } else {
@@ -6814,20 +6811,20 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                to the input and not the result.  */
             tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
             set_cc_op(s, CC_OP_LOGICB + ot);
+
+            /* ??? The manual says that the output is undefined when the
+               input is zero, but real hardware leaves it unchanged, and
+               real programs appear to depend on that.  Accomplish this
+               by passing the output as the value to return upon zero.  */
             if (b & 1) {
                 /* For bsr, return the bit index of the first 1 bit,
                    not the count of leading zeros.  */
-                gen_helper_clz(cpu_T0, cpu_T0);
+                tcg_gen_xori_tl(cpu_T1, cpu_regs[reg], TARGET_LONG_BITS - 1);
+                tcg_gen_clz_tl(cpu_T0, cpu_T0, cpu_T1);
                 tcg_gen_xori_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS - 1);
             } else {
-                gen_helper_ctz(cpu_T0, cpu_T0);
+                tcg_gen_ctz_tl(cpu_T0, cpu_T0, cpu_regs[reg]);
             }
-            /* ??? The manual says that the output is undefined when the
-               input is zero, but real hardware leaves it unchanged, and
-               real programs appear to depend on that.  */
-            tcg_gen_movi_tl(cpu_tmp0, 0);
-            tcg_gen_movcond_tl(TCG_COND_EQ, cpu_T0, cpu_cc_dst, cpu_tmp0,
-                               cpu_regs[reg], cpu_T0);
         }
         gen_op_mov_reg_v(ot, reg, cpu_T0);
         break;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 15/25] disas/i386.c: Handle tzcnt
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (13 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 14/25] target-i386: Use clz and ctz opcodes Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes Richard Henderson
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Eduardo Habkost

Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 disas/i386.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/disas/i386.c b/disas/i386.c
index 57145d0..07f871f 100644
--- a/disas/i386.c
+++ b/disas/i386.c
@@ -682,6 +682,7 @@ fetch_data(struct disassemble_info *info, bfd_byte *addr)
 #define PREGRP104 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 104 } }
 #define PREGRP105 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 105 } }
 #define PREGRP106 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 106 } }
+#define PREGRP107 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 107 } }
 
 #define X86_64_0  NULL, { { NULL, X86_64_SPECIAL }, { NULL, 0 } }
 #define X86_64_1  NULL, { { NULL, X86_64_SPECIAL }, { NULL, 1 } }
@@ -1247,7 +1248,7 @@ static const struct dis386 dis386_twobyte[] = {
   { "ud2b",		{ XX } },
   { GRP8 },
   { "btcS",		{ Ev, Gv } },
-  { "bsfS",		{ Gv, Ev } },
+  { PREGRP107 },
   { PREGRP36 },
   { "movs{bR|x|bR|x}",	{ Gv, Eb } },
   { "movs{wR|x|wR|x}",	{ Gv, Ew } }, /* yes, there really is movsww ! */
@@ -1431,7 +1432,7 @@ static const unsigned char twobyte_uses_REPZ_prefix[256] = {
   /* 80 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 8f */
   /* 90 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 9f */
   /* a0 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* af */
-  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0, /* bf */
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
   /* c0 */ 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, /* cf */
   /* d0 */ 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0, /* df */
   /* e0 */ 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0, /* ef */
@@ -2800,6 +2801,13 @@ static const struct dis386 prefix_user_table[][4] = {
     { "shrxS",	{ Gv, Ev, Bv } },
   },
 
+  /* PREGRP107 */
+  {
+    { "bsfS",	{ Gv, Ev } },
+    { "tzcntS",	{ Gv, Ev } },
+    { "bsfS",	{ Gv, Ev } },
+    { "(bad)",	{ XX } },
+  },
 };
 
 static const struct dis386 x86_64_table[][2] = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (14 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 15/25] disas/i386.c: Handle tzcnt Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-17 16:50   ` Bastian Koppelmann
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 17/25] tcg/ppc: " Richard Henderson
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     |  8 ++---
 tcg/i386/tcg-target.inc.c | 83 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f2d9955..8fff287 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -93,8 +93,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -127,8 +127,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i64          0
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 39f62bd..3eeb58f 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -92,6 +92,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define TCG_CT_CONST_S32 0x100
 #define TCG_CT_CONST_U32 0x200
 #define TCG_CT_CONST_I32 0x400
+#define TCG_CT_CONST_WSZ 0x800
 
 /* Registers used with L constraint, which are the first argument 
    registers on x86_64, and two random call clobbered registers on
@@ -225,6 +226,12 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         } else {
             goto case_c;
         }
+    case 'W':
+        /* With TZCNT/LZCNT, we can have operand-size as an input.  */
+        if (have_bmi1) {
+            ct->ct |= TCG_CT_CONST_WSZ;
+        }
+        break;
 
         /* qemu_ld/st address constraint */
     case 'L':
@@ -273,6 +280,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
     if ((ct & TCG_CT_CONST_I32) && ~val == (int32_t)~val) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+        return 1;
+    }
     return 0;
 }
 
@@ -306,6 +316,8 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_ARITH_GvEv	(0x03)		/* ... plus (ARITH_FOO << 3) */
 #define OPC_ANDN        (0xf2 | P_EXT38)
 #define OPC_ADD_GvEv	(OPC_ARITH_GvEv | (ARITH_ADD << 3))
+#define OPC_BSF         (0xbc | P_EXT)
+#define OPC_BSR         (0xbd | P_EXT)
 #define OPC_BSWAP	(0xc8 | P_EXT)
 #define OPC_CALL_Jz	(0xe8)
 #define OPC_CMOVCC      (0x40 | P_EXT)  /* ... plus condition code */
@@ -320,6 +332,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_JMP_long	(0xe9)
 #define OPC_JMP_short	(0xeb)
 #define OPC_LEA         (0x8d)
+#define OPC_LZCNT       (0xbd | P_EXT | P_SIMDF3)
 #define OPC_MOVB_EvGv	(0x88)		/* stores, more or less */
 #define OPC_MOVL_EvGv	(0x89)		/* stores, more or less */
 #define OPC_MOVL_GvEv	(0x8b)		/* loads, more or less */
@@ -346,6 +359,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_SHLX        (0xf7 | P_EXT38 | P_DATA16)
 #define OPC_SHRX        (0xf7 | P_EXT38 | P_SIMDF2)
 #define OPC_TESTL	(0x85)
+#define OPC_TZCNT       (0xbc | P_EXT | P_SIMDF3)
 #define OPC_XCHG_ax_r32	(0x90)
 
 #define OPC_GRP3_Ev	(0xf7)
@@ -431,6 +445,11 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x)
     if (opc & P_ADDR32) {
         tcg_out8(s, 0x67);
     }
+    if (opc & P_SIMDF3) {
+        tcg_out8(s, 0xf3);
+    } else if (opc & P_SIMDF2) {
+        tcg_out8(s, 0xf2);
+    }
 
     rex = 0;
     rex |= (opc & P_REXW) ? 0x8 : 0x0;  /* REX.W */
@@ -465,6 +484,11 @@ static void tcg_out_opc(TCGContext *s, int opc)
     if (opc & P_DATA16) {
         tcg_out8(s, 0x66);
     }
+    if (opc & P_SIMDF3) {
+        tcg_out8(s, 0xf3);
+    } else if (opc & P_SIMDF2) {
+        tcg_out8(s, 0xf2);
+    }
     if (opc & (P_EXT | P_EXT38)) {
         tcg_out8(s, 0x0f);
         if (opc & P_EXT38) {
@@ -1093,13 +1117,11 @@ static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
 }
 #endif
 
-static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGArg dest,
-                              TCGArg c1, TCGArg c2, int const_c2,
-                              TCGArg v1)
+static void tcg_out_cmov(TCGContext *s, TCGCond cond, int rexw,
+                         TCGReg dest, TCGReg v1)
 {
-    tcg_out_cmp(s, c1, c2, const_c2, 0);
     if (have_cmov) {
-        tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond], dest, v1);
+        tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond] | rexw, dest, v1);
     } else {
         TCGLabel *over = gen_new_label();
         tcg_out_jxx(s, tcg_cond_to_jcc[tcg_invert_cond(cond)], over, 1);
@@ -1108,13 +1130,21 @@ static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGArg dest,
     }
 }
 
+static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGReg dest,
+                              TCGReg c1, TCGArg c2, int const_c2,
+                              TCGReg v1)
+{
+    tcg_out_cmp(s, c1, c2, const_c2, 0);
+    tcg_out_cmov(s, cond, 0, dest, v1);
+}
+
 #if TCG_TARGET_REG_BITS == 64
-static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGArg dest,
-                              TCGArg c1, TCGArg c2, int const_c2,
-                              TCGArg v1)
+static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest,
+                              TCGReg c1, TCGArg c2, int const_c2,
+                              TCGReg v1)
 {
     tcg_out_cmp(s, c1, c2, const_c2, P_REXW);
-    tcg_out_modrm(s, OPC_CMOVCC | tcg_cond_to_jcc[cond] | P_REXW, dest, v1);
+    tcg_out_cmov(s, cond, P_REXW, dest, v1);
 }
 #endif
 
@@ -1993,6 +2023,37 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    OP_32_64(ctz):
+        if (const_args[2]) {
+            tcg_debug_assert(have_bmi1);
+            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
+            tcg_out_modrm(s, OPC_TZCNT + rexw, args[0], args[1]);
+        } else {
+            /* ??? The manual says that the output is undefined when the
+               input is zero, but real hardware leaves it unchanged.  As
+               noted in target-i386/translate.c, real programs depend on
+               this -- now we are one more of those.  */
+            tcg_out_modrm(s, OPC_BSF + rexw, args[0], args[1]);
+            if (args[0] != args[2]) {
+                tcg_out_cmov(s, TCG_COND_EQ, rexw, args[0], args[2]);
+            }
+        }
+        break;
+
+    OP_32_64(clz):
+        if (const_args[2]) {
+            tcg_debug_assert(have_bmi1);
+            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
+            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
+        } else {
+            /* ??? See above.  */
+            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
+            if (args[0] != args[2]) {
+                tcg_out_cmov(s, TCG_COND_EQ, rexw, args[0], args[2]);
+            }
+        }
+        break;
+
     case INDEX_op_brcond_i32:
         tcg_out_brcond32(s, args[2], args[0], args[1], const_args[1],
                          arg_label(args[3]), 0);
@@ -2220,6 +2281,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "0", "Ci" } },
     { INDEX_op_rotl_i32, { "r", "0", "ci" } },
     { INDEX_op_rotr_i32, { "r", "0", "ci" } },
+    { INDEX_op_clz_i32, { "r", "r", "rW" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rW" } },
 
     { INDEX_op_brcond_i32, { "r", "ri" } },
 
@@ -2281,6 +2344,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i64, { "r", "0", "Ci" } },
     { INDEX_op_rotl_i64, { "r", "0", "ci" } },
     { INDEX_op_rotr_i64, { "r", "0", "ci" } },
+    { INDEX_op_clz_i64, { "r", "r", "rW" } },
+    { INDEX_op_ctz_i64, { "r", "r", "rW" } },
 
     { INDEX_op_brcond_i64, { "r", "re" } },
     { INDEX_op_setcond_i64, { "r", "r", "re" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 17/25] tcg/ppc: Handle ctz and clz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (15 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 18/25] tcg/aarch64: " Richard Henderson
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc

Cc: qemu-ppc@nongnu.org
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     |  8 +++----
 tcg/ppc/tcg-target.inc.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 698a599..442be63 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -68,8 +68,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
@@ -103,8 +103,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 7ec54a2..bf147af 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -77,6 +77,7 @@
 #define TCG_CT_CONST_U32  0x800
 #define TCG_CT_CONST_ZERO 0x1000
 #define TCG_CT_CONST_MONE 0x2000
+#define TCG_CT_CONST_WSZ  0x4000
 
 static tcg_insn_unit *tb_ret_addr;
 
@@ -307,6 +308,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'U':
         ct->ct |= TCG_CT_CONST_U32;
         break;
+    case 'W':
+        ct->ct |= TCG_CT_CONST_WSZ;
+        break;
     case 'Z':
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
@@ -345,6 +349,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return 1;
     } else if ((ct & TCG_CT_CONST_MONE) && val == -1) {
         return 1;
+    } else if ((ct & TCG_CT_CONST_WSZ)
+               && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+        return 1;
     }
     return 0;
 }
@@ -449,6 +456,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define NOR    XO31(124)
 #define CNTLZW XO31( 26)
 #define CNTLZD XO31( 58)
+#define CNTTZW XO31(538)
+#define CNTTZD XO31(570)
 #define ANDC   XO31( 60)
 #define ORC    XO31(412)
 #define EQV    XO31(284)
@@ -1170,6 +1179,32 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
     }
 }
 
+static void tcg_out_cntxz(TCGContext *s, TCGType type, uint32_t opc,
+                          TCGArg a0, TCGArg a1, TCGArg a2, bool const_a2)
+{
+    if (const_a2 && a2 == (type == TCG_TYPE_I32 ? 32 : 64)) {
+        tcg_out32(s, opc | RA(a0) | RS(a1));
+    } else {
+        tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 7, type);
+        /* Note that the only other valid constant for a2 is 0.  */
+        if (HAVE_ISEL) {
+            tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
+            tcg_out32(s, tcg_to_isel[TCG_COND_EQ] | TAB(a0, a2, TCG_REG_R0));
+        } else if (!const_a2 && a0 == a2) {
+            tcg_out32(s, tcg_to_bc[TCG_COND_EQ] | 8);
+            tcg_out32(s, opc | RA(a0) | RS(a1));
+        } else {
+            tcg_out32(s, opc | RA(a0) | RS(a1));
+            tcg_out32(s, tcg_to_bc[TCG_COND_NE] | 8);
+            if (const_a2) {
+                tcg_out_movi(s, type, a0, 0);
+            } else {
+                tcg_out_mov(s, type, a0, a2);
+            }
+        }
+    }
+}
+
 static void tcg_out_cmp2(TCGContext *s, const TCGArg *args,
                          const int *const_args)
 {
@@ -2107,6 +2142,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out32(s, NOR | SAB(args[1], args[0], args[2]));
         break;
 
+    case INDEX_op_clz_i32:
+        tcg_out_cntxz(s, TCG_TYPE_I32, CNTLZW, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+    case INDEX_op_ctz_i32:
+        tcg_out_cntxz(s, TCG_TYPE_I32, CNTTZW, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+
+    case INDEX_op_clz_i64:
+        tcg_out_cntxz(s, TCG_TYPE_I64, CNTLZD, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+    case INDEX_op_ctz_i64:
+        tcg_out_cntxz(s, TCG_TYPE_I64, CNTTZD, args[0], args[1],
+                      args[2], const_args[2]);
+        break;
+
     case INDEX_op_mul_i32:
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
@@ -2519,6 +2572,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_eqv_i32, { "r", "r", "ri" } },
     { INDEX_op_nand_i32, { "r", "r", "r" } },
     { INDEX_op_nor_i32, { "r", "r", "r" } },
+    { INDEX_op_clz_i32, { "r", "r", "rZW" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rZW" } },
 
     { INDEX_op_shl_i32, { "r", "r", "ri" } },
     { INDEX_op_shr_i32, { "r", "r", "ri" } },
@@ -2567,6 +2622,8 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_eqv_i64, { "r", "r", "r" } },
     { INDEX_op_nand_i64, { "r", "r", "r" } },
     { INDEX_op_nor_i64, { "r", "r", "r" } },
+    { INDEX_op_clz_i64, { "r", "r", "rZW" } },
+    { INDEX_op_ctz_i64, { "r", "r", "rZW" } },
 
     { INDEX_op_shl_i64, { "r", "r", "ri" } },
     { INDEX_op_shr_i64, { "r", "r", "ri" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 18/25] tcg/aarch64: Handle ctz and clz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (16 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 17/25] tcg/ppc: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-17 11:53   ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 19/25] tcg/arm: " Richard Henderson
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Claudio Fontana

Cc: Claudio Fontana <claudio.fontana@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h     |  8 ++++----
 tcg/aarch64/tcg-target.inc.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 976f493..9d6b00f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -62,8 +62,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          1
+#define TCG_TARGET_HAS_ctz_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     1
@@ -96,8 +96,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_clz_i64          0
-#define TCG_TARGET_HAS_ctz_i64          0
+#define TCG_TARGET_HAS_clz_i64          1
+#define TCG_TARGET_HAS_ctz_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index c0e9890..51abc1b 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -206,6 +206,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
     if ((ct & TCG_CT_CONST_MONE) && val == -1) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_WSZ) && val == (type ? 64 : 32)) {
+        return 1;
+    }
 
     return 0;
 }
@@ -344,8 +347,12 @@ typedef enum {
     /* Conditional select instructions.  */
     I3506_CSEL      = 0x1a800000,
     I3506_CSINC     = 0x1a800400,
+    I3506_CSINV     = 0x5a800000,
+    I3506_CSNEG     = 0x5a800400,
 
     /* Data-processing (1 source) instructions.  */
+    I3507_CLZ       = 0x5ac01000,
+    I3507_RBIT      = 0x5ac00000,
     I3507_REV16     = 0x5ac00400,
     I3507_REV32     = 0x5ac00800,
     I3507_REV64     = 0x5ac00c00,
@@ -998,6 +1005,32 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
     tcg_out32(s, sync[a0 & TCG_MO_ALL]);
 }
 
+static void tcg_out_clz(TCGContext *s, TCGType ext, TCGReg d,
+                        TCGReg a, TCGArg b, bool const_b)
+{
+    if (const_b && b == (ext ? 64 : 32)) {
+        tcg_out_insn(s, 3507, CLZ, ext, d, a);
+    } else {
+        AArch64Insn sel = I3506_CSEL;
+
+        tcg_out_cmp(s, ext, a, 0, 1);
+        tcg_out_insn(s, 3507, CLZ, ext, TCG_REG_TMP, a);
+
+        if (const_b) {
+            if (b == -1) {
+                b = TCG_REG_XZR;
+                sel = I3506_CSINV;
+            } else if (b == 0) {
+                b = TCG_REG_XZR;
+            } else {
+                tcg_out_movi(s, ext, d, b);
+                b = d;
+            }
+        }
+        tcg_out_insn_3506(s, sel, ext, d, TCG_REG_TMP, b, TCG_COND_NE);
+    }
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1564,6 +1597,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_clz_i64:
+    case INDEX_op_clz_i32:
+        tcg_out_clz(s, ext, a0, a1, a2, c2);
+        break;
+    case INDEX_op_ctz_i64:
+    case INDEX_op_ctz_i32:
+        tcg_out_insn(s, 3507, RBIT, ext, TCG_REG_TMP, a1);
+        tcg_out_clz(s, ext, a0, TCG_REG_TMP, a2, c2);
+        break;
+
     case INDEX_op_brcond_i32:
         a1 = (int32_t)a1;
         /* FALLTHRU */
@@ -1755,11 +1798,15 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "r", "ri" } },
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+    { INDEX_op_clz_i32, { "r", "r", "rAL" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rAL" } },
     { INDEX_op_shl_i64, { "r", "r", "ri" } },
     { INDEX_op_shr_i64, { "r", "r", "ri" } },
     { INDEX_op_sar_i64, { "r", "r", "ri" } },
     { INDEX_op_rotl_i64, { "r", "r", "ri" } },
     { INDEX_op_rotr_i64, { "r", "r", "ri" } },
+    { INDEX_op_clz_i64, { "r", "r", "rAL" } },
+    { INDEX_op_ctz_i64, { "r", "r", "rAL" } },
 
     { INDEX_op_brcond_i32, { "r", "rA" } },
     { INDEX_op_brcond_i64, { "r", "rA" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 19/25] tcg/arm: Handle ctz and clz opcodes
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (17 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 18/25] tcg/aarch64: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 20/25] tcg/mips: Handle clz opcode Richard Henderson
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Cc: qemu-arm@nongnu.org
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     |  4 ++--
 tcg/arm/tcg-target.inc.c | 27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 02cc242..4cb94dc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -110,8 +110,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
+#define TCG_TARGET_HAS_clz_i32          use_armv5t_instructions
+#define TCG_TARGET_HAS_ctz_i32          use_armv7_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6765a9d..7595c04 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -261,6 +261,9 @@ typedef enum {
     ARITH_BIC = 0xe << 21,
     ARITH_MVN = 0xf << 21,
 
+    INSN_CLZ       = 0x016f0f10,
+    INSN_RBIT      = 0x06ff0f30,
+
     INSN_LDR_IMM   = 0x04100000,
     INSN_LDR_REG   = 0x06100000,
     INSN_STR_IMM   = 0x04000000,
@@ -1832,6 +1835,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_ctz_i32:
+        tcg_out_dat_reg(s, COND_AL, INSN_RBIT, TCG_REG_TMP, 0, args[1], 0);
+        a1 = TCG_REG_TMP;
+        goto do_clz;
+
+    case INDEX_op_clz_i32:
+        a1 = args[1];
+    do_clz:
+        a0 = args[0];
+        a2 = args[2];
+        c = const_args[2];
+        if (c && a2 == 32) {
+            tcg_out_dat_reg(s, COND_AL, INSN_CLZ, a0, 0, a1, 0);
+            break;
+        }
+        tcg_out_dat_imm(s, COND_AL, ARITH_CMP, 0, a1, 0);
+        tcg_out_dat_reg(s, COND_NE, INSN_CLZ, a0, 0, a1, 0);
+        if (c || a0 != a2) {
+            tcg_out_dat_rIK(s, COND_EQ, ARITH_MOV, ARITH_MVN, a0, 0, a2, c);
+        }
+        break;
+
     case INDEX_op_brcond_i32:
         tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
                        args[0], args[1], const_args[1]);
@@ -1966,6 +1991,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "r", "ri" } },
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+    { INDEX_op_clz_i32, { "r", "r", "rIK" } },
+    { INDEX_op_ctz_i32, { "r", "r", "rIK" } },
 
     { INDEX_op_brcond_i32, { "r", "rIN" } },
     { INDEX_op_setcond_i32, { "r", "r", "rIN" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 20/25] tcg/mips: Handle clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (18 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 19/25] tcg/arm: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 21/25] tcg/s390: " Richard Henderson
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Aurelien Jarno

Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.h     |  4 ++--
 tcg/mips/tcg-target.inc.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f133684..0526018 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -109,8 +109,6 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rem_i32          1
 #define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_nor_i32          1
-#define TCG_TARGET_HAS_clz_i32          0
-#define TCG_TARGET_HAS_ctz_i32          0
 #define TCG_TARGET_HAS_andc_i32         0
 #define TCG_TARGET_HAS_orc_i32          0
 #define TCG_TARGET_HAS_eqv_i32          0
@@ -130,6 +128,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
+#define TCG_TARGET_HAS_clz_i32          use_mips32r2_instructions
+#define TCG_TARGET_HAS_ctz_i32          0
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_neg_i32          0 /* sub  rd, zero, rt   */
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 1ecae08..6196d59 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -160,6 +160,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_S16  0x400    /* Signed 16-bit: -32768 - 32767 */
 #define TCG_CT_CONST_P2M1 0x800    /* Power of 2 minus 1.  */
 #define TCG_CT_CONST_N16  0x1000   /* "Negatable" 16-bit: -32767 - 32767 */
+#define TCG_CT_CONST_WSZ  0x2000   /* word size */
 
 static inline bool is_p2m1(tcg_target_long val)
 {
@@ -217,6 +218,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'N':
         ct->ct |= TCG_CT_CONST_N16;
         break;
+    case 'W':
+        ct->ct |= TCG_CT_CONST_WSZ;
+        break;
     case 'Z':
         /* We are cheating a bit here, using the fact that the register
            ZERO is also the register number 0. Hence there is no need
@@ -250,6 +254,8 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
     } else if ((ct & TCG_CT_CONST_P2M1)
                && use_mips32r2_instructions && is_p2m1(val)) {
         return 1;
+    } else if ((ct & TCG_CT_CONST_WSZ) && val == 32) {
+        return 1;
     }
     return 0;
 }
@@ -317,6 +323,7 @@ typedef enum {
     OPC_SLTU     = OPC_SPECIAL | 0x2B,
     OPC_SELEQZ   = OPC_SPECIAL | 0x35,
     OPC_SELNEZ   = OPC_SPECIAL | 0x37,
+    OPC_CLZ_R6   = OPC_SPECIAL | 0120,
 
     OPC_REGIMM   = 0x01 << 26,
     OPC_BLTZ     = OPC_REGIMM | (0x00 << 16),
@@ -324,6 +331,7 @@ typedef enum {
 
     OPC_SPECIAL2 = 0x1c << 26,
     OPC_MUL_R5   = OPC_SPECIAL2 | 0x002,
+    OPC_CLZ      = OPC_SPECIAL2 | 040,
 
     OPC_SPECIAL3 = 0x1f << 26,
     OPC_EXT      = OPC_SPECIAL3 | 0x000,
@@ -1629,6 +1637,31 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_clz_i32:
+        if (use_mips32r6_instructions) {
+            if (a2 == 32) {
+                tcg_out_opc_reg(s, OPC_CLZ_R6, a0, a1, 0);
+            } else {
+                tcg_out_opc_reg(s, OPC_CLZ_R6, TCG_TMP0, a1, 0);
+                tcg_out_movcond(s, TCG_COND_EQ, a0, a1, 0, a2, TCG_TMP0);
+            }
+        } else {
+            if (a2 == 32) {
+                tcg_out_opc_reg(s, OPC_CLZ, a0, a1, a1);
+            } else if (a0 == a2) {
+                tcg_out_opc_reg(s, OPC_CLZ, TCG_TMP0, a1, a1);
+                tcg_out_opc_reg(s, OPC_MOVN, a0, TCG_TMP0, a1);
+            } else if (a0 != a1) {
+                tcg_out_opc_reg(s, OPC_CLZ, a0, a1, a1);
+                tcg_out_opc_reg(s, OPC_MOVZ, a0, a2, a1);
+            } else {
+                tcg_out_opc_reg(s, OPC_CLZ, TCG_TMP0, a1, a1);
+                tcg_out_opc_reg(s, OPC_MOVZ, TCG_TMP0, a2, a1);
+                tcg_out_mov(s, TCG_TYPE_REG, a0, TCG_TMP0);
+            }
+        }
+        break;
+
     case INDEX_op_bswap32_i32:
         tcg_out_opc_reg(s, OPC_WSBH, a0, 0, a1);
         tcg_out_opc_sa(s, OPC_ROTR, a0, a0, 16);
@@ -1731,6 +1764,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "rZ", "ri" } },
     { INDEX_op_rotr_i32, { "r", "rZ", "ri" } },
     { INDEX_op_rotl_i32, { "r", "rZ", "ri" } },
+    { INDEX_op_clz_i32,  { "r", "r", "rWZ" } },
 
     { INDEX_op_bswap16_i32, { "r", "r" } },
     { INDEX_op_bswap32_i32, { "r", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 21/25] tcg/s390: Handle clz opcode
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (19 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 20/25] tcg/mips: Handle clz opcode Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 22/25] tcg: Add helpers for clrsb Richard Henderson
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     |  2 +-
 tcg/s390/tcg-target.inc.c | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 3ac2dc9..22500ba 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -110,7 +110,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i64        0
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
-#define TCG_TARGET_HAS_clz_i64        0
+#define TCG_TARGET_HAS_clz_i64        (s390_facilities & FACILITY_EXT_IMM)
 #define TCG_TARGET_HAS_ctz_i64        0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index f4c510e..81d9563 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -50,7 +50,7 @@
 #define TCG_REG_NONE    0
 
 /* A scratch register that may be be used throughout the backend.  */
-#define TCG_TMP0        TCG_REG_R14
+#define TCG_TMP0        TCG_REG_R1
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG TCG_REG_R13
@@ -133,6 +133,7 @@ typedef enum S390Opcode {
     RRE_DLR     = 0xb997,
     RRE_DSGFR   = 0xb91d,
     RRE_DSGR    = 0xb90d,
+    RRE_FLOGR   = 0xb983,
     RRE_LGBR    = 0xb906,
     RRE_LCGR    = 0xb903,
     RRE_LGFR    = 0xb914,
@@ -1245,6 +1246,33 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
     }
 }
 
+static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
+                     TCGArg a2, int a2const)
+{
+    /* Since this sets both R and R+1, we have no choice but to store the
+       result into R0, allowing R1 == TCG_TMP0 to be clobbered as well.  */
+    QEMU_BUILD_BUG_ON(TCG_TMP0 != TCG_REG_R1);
+    tcg_out_insn(s, RRE, FLOGR, TCG_REG_R0, a1);
+
+    if (a2const && a2 == 64) {
+        tcg_out_mov(s, TCG_TYPE_I64, dest, TCG_REG_R0);
+    } else {
+        if (a2const) {
+            tcg_out_movi(s, TCG_TYPE_I64, dest, a2);
+        } else {
+            tcg_out_mov(s, TCG_TYPE_I64, dest, a2);
+        }
+        if (s390_facilities & FACILITY_LOAD_ON_COND) {
+            /* Emit: if (one bit found) dest = r0.  */
+            tcg_out_insn(s, RRF, LOCGR, dest, TCG_REG_R0, 2);
+        } else {
+            /* Emit: if (no one bit found) goto over; dest = r0; over:  */
+            tcg_out_insn(s, RI, BRC, 8, (4 + 4) >> 1);
+            tcg_out_insn(s, RRE, LGR, dest, TCG_REG_R0);
+        }
+    }
+}
+
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
                          int ofs, int len, int z)
 {
@@ -2185,6 +2213,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_extract(s, args[0], args[1], args[2], args[3]);
         break;
 
+    case INDEX_op_clz_i64:
+        tgen_clz(s, args[0], args[1], args[2], const_args[2]);
+        break;
+
     case INDEX_op_mb:
         /* The host memory model is quite strong, we simply need to
            serialize the instruction stream.  */
@@ -2308,6 +2340,8 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_bswap32_i64, { "r", "r" } },
     { INDEX_op_bswap64_i64, { "r", "r" } },
 
+    { INDEX_op_clz_i64, { "r", "r", "ri" } },
+
     { INDEX_op_add2_i64, { "r", "r", "0", "1", "rA", "r" } },
     { INDEX_op_sub2_i64, { "r", "r", "0", "1", "rA", "r" } },
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 22/25] tcg: Add helpers for clrsb
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (20 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 21/25] tcg/s390: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 23/25] target-arm: Use clrsb helper Richard Henderson
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel

The number of actual invocations does not warrent an opcode,
and the backends generating it.  But at least we can eliminate
redundant helpers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg-runtime.c     | 10 ++++++++++
 tcg/tcg-op.c      | 28 ++++++++++++++++++++++++++++
 tcg/tcg-op.h      |  4 ++++
 tcg/tcg-runtime.h |  2 ++
 4 files changed, 44 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index eb3bade..c8b98df 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -121,6 +121,16 @@ uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
     return arg ? ctz64(arg) : zero_val;
 }
 
+uint32_t HELPER(clrsb_i32)(uint32_t arg)
+{
+    return clrsb32(arg);
+}
+
+uint64_t HELPER(clrsb_i64)(uint64_t arg)
+{
+    return clrsb64(arg);
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index b45095c..728c4b3 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -489,6 +489,20 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
     tcg_temp_free_i32(t);
 }
 
+void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
+{
+    if (TCG_TARGET_HAS_clz_i32) {
+        TCGv_i32 t = tcg_temp_new_i32();
+        tcg_gen_sari_i32(t, arg, 31);
+        tcg_gen_xor_i32(t, t, arg);
+        tcg_gen_clzi_i32(t, t, 32);
+        tcg_gen_subi_i32(ret, t, 1);
+        tcg_temp_free_i32(t);
+    } else {
+        gen_helper_clrsb_i32(ret, arg);
+    }
+}
+
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 {
     if (TCG_TARGET_HAS_rot_i32) {
@@ -1789,6 +1803,20 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
     }
 }
 
+void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg)
+{
+    if (TCG_TARGET_HAS_clz_i64 || TCG_TARGET_HAS_clz_i32) {
+        TCGv_i64 t = tcg_temp_new_i64();
+        tcg_gen_sari_i64(t, arg, 63);
+        tcg_gen_xor_i64(t, t, arg);
+        tcg_gen_clzi_i64(t, t, 64);
+        tcg_gen_subi_i64(ret, t, 1);
+        tcg_temp_free_i64(t);
+    } else {
+        gen_helper_clrsb_i64(ret, arg);
+    }
+}
+
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
     if (TCG_TARGET_HAS_rot_i64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 7a24e84..c2f3db9 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -290,6 +290,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
+void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -477,6 +478,7 @@ void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
+void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -970,6 +972,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_ctz_tl tcg_gen_ctz_i64
 #define tcg_gen_clzi_tl tcg_gen_clzi_i64
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
+#define tcg_gen_clrsb_tl tcg_gen_clrsb_i64
 #define tcg_gen_rotl_tl tcg_gen_rotl_i64
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
@@ -1065,6 +1068,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_ctz_tl tcg_gen_ctz_i32
 #define tcg_gen_clzi_tl tcg_gen_clzi_i32
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
+#define tcg_gen_clrsb_tl tcg_gen_clrsb_i32
 #define tcg_gen_rotl_tl tcg_gen_rotl_i32
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index eb1cd76..0d30f1a 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -19,6 +19,8 @@ DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_1(clrsb_i32, TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 23/25] target-arm: Use clrsb helper
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (21 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 22/25] tcg: Add helpers for clrsb Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 24/25] target-tricore: " Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 25/25] target-xtensa: " Richard Henderson
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm

Cc: qemu-arm@nongnu.org
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/helper-a64.c    | 10 ----------
 target-arm/helper-a64.h    |  2 --
 target-arm/translate-a64.c |  8 ++++----
 3 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 77999ff..d9df82c 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -54,16 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
     return num / den;
 }
 
-uint64_t HELPER(cls64)(uint64_t x)
-{
-    return clrsb64(x);
-}
-
-uint32_t HELPER(cls32)(uint32_t x)
-{
-    return clrsb32(x);
-}
-
 uint64_t HELPER(rbit64)(uint64_t x)
 {
     return revbit64(x);
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index d320f96..6f9eaba 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -18,8 +18,6 @@
  */
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
-DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 12621ff..f73d63b 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3971,11 +3971,11 @@ static void handle_cls(DisasContext *s, unsigned int sf,
     tcg_rn = cpu_reg(s, rn);
 
     if (sf) {
-        gen_helper_cls64(tcg_rd, tcg_rn);
+        tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
     } else {
         TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
         tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
-        gen_helper_cls32(tcg_tmp32, tcg_tmp32);
+        tcg_gen_clrsb_i32(tcg_tmp32, tcg_tmp32);
         tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
         tcg_temp_free_i32(tcg_tmp32);
     }
@@ -7592,7 +7592,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
         if (u) {
             tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
         } else {
-            gen_helper_cls64(tcg_rd, tcg_rn);
+            tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
         }
         break;
     case 0x5: /* NOT */
@@ -10262,7 +10262,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                     if (u) {
                         tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
                     } else {
-                        gen_helper_cls32(tcg_res, tcg_op);
+                        tcg_gen_clrsb_i32(tcg_res, tcg_op);
                     }
                     break;
                 case 0x7: /* SQABS, SQNEG */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 24/25] target-tricore: Use clrsb helper
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (22 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 23/25] target-arm: Use clrsb helper Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 25/25] target-xtensa: " Richard Henderson
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Bastian Koppelmann

Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-tricore/helper.h    | 1 -
 target-tricore/op_helper.c | 5 -----
 target-tricore/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target-tricore/helper.h b/target-tricore/helper.h
index 2cf04e1..d215349 100644
--- a/target-tricore/helper.h
+++ b/target-tricore/helper.h
@@ -89,7 +89,6 @@ DEF_HELPER_FLAGS_2(ixmin_u, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 /* count leading ... */
 DEF_HELPER_FLAGS_1(clo_h, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clz_h, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(cls, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls_h, TCG_CALL_NO_RWG_SE, i32, i32)
 /* sh */
 DEF_HELPER_FLAGS_2(sh, TCG_CALL_NO_RWG_SE, i32, i32, i32)
diff --git a/target-tricore/op_helper.c b/target-tricore/op_helper.c
index 3731d5e..7af202c 100644
--- a/target-tricore/op_helper.c
+++ b/target-tricore/op_helper.c
@@ -1769,11 +1769,6 @@ uint32_t helper_clz_h(target_ulong r1)
     return ret_hw0 | (ret_hw1 << 16);
 }
 
-uint32_t helper_cls(target_ulong r1)
-{
-    return clrsb32(r1);
-}
-
 uint32_t helper_cls_h(target_ulong r1)
 {
     uint32_t ret_hw0 = extract32(r1, 0, 16);
diff --git a/target-tricore/translate.c b/target-tricore/translate.c
index 69cdfb9..41b1d27 100644
--- a/target-tricore/translate.c
+++ b/target-tricore/translate.c
@@ -6374,7 +6374,7 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
         gen_helper_clo_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
         break;
     case OPC2_32_RR_CLS:
-        gen_helper_cls(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+        tcg_gen_clrsb_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
         break;
     case OPC2_32_RR_CLS_H:
         gen_helper_cls_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 25/25] target-xtensa: Use clrsb helper
  2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
                   ` (23 preceding siblings ...)
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 24/25] target-tricore: " Richard Henderson
@ 2016-11-16 19:25 ` Richard Henderson
  24 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-16 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Max Filippov

Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-xtensa/translate.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/target-xtensa/translate.c b/target-xtensa/translate.c
index 5c719a4..5a93705 100644
--- a/target-xtensa/translate.c
+++ b/target-xtensa/translate.c
@@ -1372,16 +1372,7 @@ static void disas_xtensa_insn(CPUXtensaState *env, DisasContext *dc)
                 case 14: /*NSAu*/
                     HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
                     if (gen_window_check2(dc, RRR_S, RRR_T)) {
-                        TCGv_i32 t0 = tcg_temp_new_i32();
-
-                        /* if (v & 0x80000000) v = ~v; */
-                        tcg_gen_sari_i32(t0, cpu_R[RRR_S], 31);
-                        tcg_gen_xor_i32(t0, t0, cpu_R[RRR_S]);
-
-                        /* r = (v ? clz(v) : 32) - 1; */
-                        tcg_gen_clzi_i32(t0, t0, 32);
-                        tcg_gen_subi_i32(cpu_R[RRR_T], t0, 1);
-                        tcg_temp_free_i32(t0);
+                        tcg_gen_clrsb_i32(cpu_R[RRR_T], cpu_R[RRR_S]);
                     }
                     break;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 07/25] target-ppc: Use clz and ctz opcodes
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 07/25] target-ppc: " Richard Henderson
@ 2016-11-17  3:09   ` David Gibson
  0 siblings, 0 replies; 40+ messages in thread
From: David Gibson @ 2016-11-17  3:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4850 bytes --]

On Wed, Nov 16, 2016 at 08:25:17PM +0100, Richard Henderson wrote:
> Cc: qemu-ppc@nongnu.org
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target-ppc/helper.h     |  4 ----
>  target-ppc/int_helper.c | 20 --------------------
>  target-ppc/translate.c  | 20 ++++++++++++++++----
>  3 files changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/target-ppc/helper.h b/target-ppc/helper.h
> index da00f0a..1ed1d2c 100644
> --- a/target-ppc/helper.h
> +++ b/target-ppc/helper.h
> @@ -38,16 +38,12 @@ DEF_HELPER_4(divde, i64, env, i64, i64, i32)
>  DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
>  DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
>  
> -DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_NO_RWG_SE, tl, tl)
> -DEF_HELPER_FLAGS_1(cnttzw, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
>  DEF_HELPER_3(sraw, tl, env, tl, tl)
>  #if defined(TARGET_PPC64)
>  DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
> -DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_NO_RWG_SE, tl, tl)
> -DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
>  DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>  DEF_HELPER_3(srad, tl, env, tl, tl)
> diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
> index 9ac204a..a6486ce 100644
> --- a/target-ppc/int_helper.c
> +++ b/target-ppc/int_helper.c
> @@ -141,16 +141,6 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t oe)
>  #endif
>  
>  
> -target_ulong helper_cntlzw(target_ulong t)
> -{
> -    return clz32(t);
> -}
> -
> -target_ulong helper_cnttzw(target_ulong t)
> -{
> -    return ctz32(t);
> -}
> -
>  #if defined(TARGET_PPC64)
>  /* if x = 0xab, returns 0xababababababababa */
>  #define pattern(x) (((x) & 0xff) * (~(target_ulong)0 / 0xff))
> @@ -174,16 +164,6 @@ uint32_t helper_cmpeqb(target_ulong ra, target_ulong rb)
>  #undef haszero
>  #undef hasvalue
>  
> -target_ulong helper_cntlzd(target_ulong t)
> -{
> -    return clz64(t);
> -}
> -
> -target_ulong helper_cnttzd(target_ulong t)
> -{
> -    return ctz64(t);
> -}
> -
>  /* Return invalid random number.
>   *
>   * FIXME: Add rng backend or other mechanism to get cryptographically suitable
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 435c6f0..1224f56 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -1641,7 +1641,13 @@ static void gen_andis_(DisasContext *ctx)
>  /* cntlzw */
>  static void gen_cntlzw(DisasContext *ctx)
>  {
> -    gen_helper_cntlzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +    TCGv_i32 t = tcg_temp_new_i32();
> +
> +    tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
> +    tcg_gen_clzi_i32(t, t, 32);
> +    tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
> +    tcg_temp_free_i32(t);
> +
>      if (unlikely(Rc(ctx->opcode) != 0))
>          gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>  }
> @@ -1649,7 +1655,13 @@ static void gen_cntlzw(DisasContext *ctx)
>  /* cnttzw */
>  static void gen_cnttzw(DisasContext *ctx)
>  {
> -    gen_helper_cnttzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +    TCGv_i32 t = tcg_temp_new_i32();
> +
> +    tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
> +    tcg_gen_ctzi_i32(t, t, 32);
> +    tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
> +    tcg_temp_free_i32(t);
> +
>      if (unlikely(Rc(ctx->opcode) != 0)) {
>          gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>      }
> @@ -1891,7 +1903,7 @@ GEN_LOGICAL1(extsw, tcg_gen_ext32s_tl, 0x1E, PPC_64B);
>  /* cntlzd */
>  static void gen_cntlzd(DisasContext *ctx)
>  {
> -    gen_helper_cntlzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +    tcg_gen_clzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
>      if (unlikely(Rc(ctx->opcode) != 0))
>          gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>  }
> @@ -1899,7 +1911,7 @@ static void gen_cntlzd(DisasContext *ctx)
>  /* cnttzd */
>  static void gen_cnttzd(DisasContext *ctx)
>  {
> -    gen_helper_cnttzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
> +    tcg_gen_ctzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
>      if (unlikely(Rc(ctx->opcode) != 0)) {
>          gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
>      }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 18/25] tcg/aarch64: Handle ctz and clz opcodes
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 18/25] tcg/aarch64: " Richard Henderson
@ 2016-11-17 11:53   ` Richard Henderson
  2016-11-22 10:41     ` Alex Bennée
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-17 11:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Claudio Fontana

On 11/16/2016 08:25 PM, Richard Henderson wrote:
> @@ -206,6 +206,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>      if ((ct & TCG_CT_CONST_MONE) && val == -1) {
>          return 1;
>      }
> +    if ((ct & TCG_CT_CONST_WSZ) && val == (type ? 64 : 32)) {
> +        return 1;
> +    }
>
>      return 0;
>  }

Bah.  Forgot to revert this hunk at the last minute.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode Richard Henderson
@ 2016-11-17 14:42   ` Bastian Koppelmann
  2016-11-17 15:47     ` Bastian Koppelmann
  0 siblings, 1 reply; 40+ messages in thread
From: Bastian Koppelmann @ 2016-11-17 14:42 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 11/16/2016 08:25 PM, Richard Henderson wrote:
> diff --git a/target-tricore/translate.c b/target-tricore/translate.c
> index 36f734a..69cdfb9 100644
> --- a/target-tricore/translate.c
> +++ b/target-tricore/translate.c
> @@ -6367,7 +6367,8 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
>          tcg_gen_andc_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
>          break;
>      case OPC2_32_RR_CLO:
> -        gen_helper_clo(cpu_gpr_d[r3], cpu_gpr_d[r1]);
> +        tcg_gen_not_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
> +        tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r3], TARGET_LONG_BITS);

This doesn't work for r1 = 0. It returns 0x1f, but should return 0. I
guess the error is not here, but I couldn't figure out where exactly it is.

Cheers,
    Bastian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode
  2016-11-17 14:42   ` Bastian Koppelmann
@ 2016-11-17 15:47     ` Bastian Koppelmann
  0 siblings, 0 replies; 40+ messages in thread
From: Bastian Koppelmann @ 2016-11-17 15:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 11/17/2016 03:42 PM, Bastian Koppelmann wrote:
> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>> diff --git a/target-tricore/translate.c b/target-tricore/translate.c
>> index 36f734a..69cdfb9 100644
>> --- a/target-tricore/translate.c
>> +++ b/target-tricore/translate.c
>> @@ -6367,7 +6367,8 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, DisasContext *ctx)
>>          tcg_gen_andc_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
>>          break;
>>      case OPC2_32_RR_CLO:
>> -        gen_helper_clo(cpu_gpr_d[r3], cpu_gpr_d[r1]);
>> +        tcg_gen_not_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
>> +        tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r3], TARGET_LONG_BITS);
> 
> This doesn't work for r1 = 0. It returns 0x1f, but should return 0. I
> guess the error is not here, but I couldn't figure out where exactly it is.

Ah I forgot to mention -- I'm running this on x86_64.

Cheers,
    Bastian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes Richard Henderson
@ 2016-11-17 16:50   ` Bastian Koppelmann
  2016-11-17 19:53     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Bastian Koppelmann @ 2016-11-17 16:50 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 11/16/2016 08:25 PM, Richard Henderson wrote:
> +
> +    OP_32_64(clz):
> +        if (const_args[2]) {
> +            tcg_debug_assert(have_bmi1);
> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
> +        } else {
> +            /* ??? See above.  */
> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);

The Intel ISA manual states that it find the bit index of the most
significant bit, where the least significant bit is index 0. So for the
input 0x2 this should return 1. However this is not the number of
leading zeros.

Cheers,
    Bastian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-17 16:50   ` Bastian Koppelmann
@ 2016-11-17 19:53     ` Richard Henderson
  2016-11-17 19:59       ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-17 19:53 UTC (permalink / raw)
  To: Bastian Koppelmann, qemu-devel

On 11/17/2016 05:50 PM, Bastian Koppelmann wrote:
> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>> +
>> +    OP_32_64(clz):
>> +        if (const_args[2]) {
>> +            tcg_debug_assert(have_bmi1);
>> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
>> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
>> +        } else {
>> +            /* ??? See above.  */
>> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
>
> The Intel ISA manual states that it find the bit index of the most
> significant bit, where the least significant bit is index 0. So for the
> input 0x2 this should return 1. However this is not the number of
> leading zeros.

Oh, of course you're right.  I thought I was testing this, but while alpha does 
have this operation, it turns out it isn't used much.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-17 19:53     ` Richard Henderson
@ 2016-11-17 19:59       ` Richard Henderson
  2016-11-17 22:09         ` Bastian Koppelmann
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-17 19:59 UTC (permalink / raw)
  To: Bastian Koppelmann, qemu-devel

On 11/17/2016 08:53 PM, Richard Henderson wrote:
> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote:
>> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>>> +
>>> +    OP_32_64(clz):
>>> +        if (const_args[2]) {
>>> +            tcg_debug_assert(have_bmi1);
>>> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
>>> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
>>> +        } else {
>>> +            /* ??? See above.  */
>>> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
>>
>> The Intel ISA manual states that it find the bit index of the most
>> significant bit, where the least significant bit is index 0. So for the
>> input 0x2 this should return 1. However this is not the number of
>> leading zeros.
>
> Oh, of course you're right.  I thought I was testing this, but while alpha does
> have this operation, it turns out it isn't used much.

Alternately, what I tested was on a haswell machine, which takes the LZCNT 
path, which *does* produce the intended results.  Just the BSR path doesn't.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-17 19:59       ` Richard Henderson
@ 2016-11-17 22:09         ` Bastian Koppelmann
  2016-11-17 23:03           ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Bastian Koppelmann @ 2016-11-17 22:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 11/17/2016 08:59 PM, Richard Henderson wrote:
> On 11/17/2016 08:53 PM, Richard Henderson wrote:
>> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote:
>>> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>>>> +
>>>> +    OP_32_64(clz):
>>>> +        if (const_args[2]) {
>>>> +            tcg_debug_assert(have_bmi1);
>>>> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
>>>> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
>>>> +        } else {
>>>> +            /* ??? See above.  */
>>>> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
>>>
>>> The Intel ISA manual states that it find the bit index of the most
>>> significant bit, where the least significant bit is index 0. So for the
>>> input 0x2 this should return 1. However this is not the number of
>>> leading zeros.
>>
>> Oh, of course you're right.  I thought I was testing this, but while
>> alpha does
>> have this operation, it turns out it isn't used much.
> 
> Alternately, what I tested was on a haswell machine, which takes the
> LZCNT path, which *does* produce the intended results.  Just the BSR
> path doesn't.

Luckily my old laptop is a Core 2 Duo without LZCNT :)

Cheers,
    Bastian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-17 22:09         ` Bastian Koppelmann
@ 2016-11-17 23:03           ` Richard Henderson
  2016-11-18 12:48             ` Bastian Koppelmann
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2016-11-17 23:03 UTC (permalink / raw)
  To: Bastian Koppelmann, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1488 bytes --]

On 11/17/2016 11:09 PM, Bastian Koppelmann wrote:
> On 11/17/2016 08:59 PM, Richard Henderson wrote:
>> On 11/17/2016 08:53 PM, Richard Henderson wrote:
>>> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote:
>>>> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>>>>> +
>>>>> +    OP_32_64(clz):
>>>>> +        if (const_args[2]) {
>>>>> +            tcg_debug_assert(have_bmi1);
>>>>> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
>>>>> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
>>>>> +        } else {
>>>>> +            /* ??? See above.  */
>>>>> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
>>>>
>>>> The Intel ISA manual states that it find the bit index of the most
>>>> significant bit, where the least significant bit is index 0. So for the
>>>> input 0x2 this should return 1. However this is not the number of
>>>> leading zeros.
>>>
>>> Oh, of course you're right.  I thought I was testing this, but while
>>> alpha does
>>> have this operation, it turns out it isn't used much.
>>
>> Alternately, what I tested was on a haswell machine, which takes the
>> LZCNT path, which *does* produce the intended results.  Just the BSR
>> path doesn't.
>
> Luckily my old laptop is a Core 2 Duo without LZCNT :)

Heh.  Well, I've given it another few tests with LZCNT hacked off, and with 
i686 32-bit.  Here's an incremental update.  Wherein I also note that lzcnt 
isn't in the same cpuid flag as tzcnt.  Double whoops.


r~


[-- Attachment #2: 0001-fixup-tcg-i386.patch --]
[-- Type: text/x-patch, Size: 5865 bytes --]

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3eeb58f..c3f7adc 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -139,6 +139,11 @@ static bool have_bmi2;
 #else
 # define have_bmi2 0
 #endif
+#if defined(CONFIG_CPUID_H) && defined(bit_LZCNT)
+static bool have_lzcnt;
+#else
+# define have_lzcnt 0
+#endif
 
 static tcg_insn_unit *tb_ret_addr;
 
@@ -1148,6 +1153,76 @@ static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest,
 }
 #endif
 
+static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
+                        TCGArg arg2, bool a2const)
+{
+    if (a2const) {
+        tcg_debug_assert(have_bmi1);
+        tcg_debug_assert(arg2 == (rexw ? 64 : 32));
+        tcg_out_modrm(s, OPC_TZCNT + rexw, dest, arg1);
+    } else {
+        /* ??? The manual says that the output is undefined when the
+           input is zero, but real hardware leaves it unchanged.  As
+           noted in target-i386/translate.c, real programs depend on
+           this -- now we are one more of those.  */
+        /* ??? We could avoid this if TCG had an early clobber marking
+           for the output.  */
+        tcg_out_modrm(s, OPC_BSF + rexw, dest, arg1);
+        if (dest != arg2) {
+            tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+        }
+    }
+}
+
+static void tcg_out_clz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
+                        TCGArg arg2, bool a2const)
+{
+    TCGLabel *over;
+    TCGType type;
+    unsigned rev;
+
+    /* ??? All this would be easier (and would avoid the semi-undefined
+       behaviour) if TCG had an early clobber marking for the output.  */
+
+    if (have_lzcnt) {
+        if (a2const && arg2 == (rexw ? 64 : 32)) {
+            tcg_out_modrm(s, OPC_LZCNT + rexw, dest, arg1);
+            return;
+        }
+        if (!a2const && dest != arg2) {
+            tcg_out_modrm(s, OPC_LZCNT + rexw, dest, arg1);
+            tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2);
+            return;
+        }
+    }
+
+    over = gen_new_label();
+    type = rexw ? TCG_TYPE_I64: TCG_TYPE_I32;
+    rev = rexw ? 63 : 31;
+
+    tcg_out_modrm(s, OPC_BSR + rexw, dest, arg1);
+
+    /* Recall that the output of BSR is the index not the count.
+       Therefore we must adjust the result by ^ (SIZE-1).  In some
+       cases below, we prefer an extra XOR to an extra JMP.  */
+    if (!a2const && dest == arg2) {
+        /* ??? See the comment in tcg_out_ctz re BSF.  */
+        tcg_out_jxx(s, tcg_cond_to_jcc[TCG_COND_EQ], over, 1);
+        tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
+        tcg_out_label(s, over, s->code_ptr);
+    } else {
+        tcg_out_jxx(s, tcg_cond_to_jcc[TCG_COND_NE], over, 1);
+        if (a2const) {
+            tcg_out_movi(s, type, dest, arg2 ^ rev);
+        } else {
+            tcg_out_mov(s, type, dest, arg2);
+            tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
+        }
+        tcg_out_label(s, over, s->code_ptr);
+        tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
+    }
+}
+
 static void tcg_out_branch(TCGContext *s, int call, tcg_insn_unit *dest)
 {
     intptr_t disp = tcg_pcrel_diff(s, dest) - 5;
@@ -2024,34 +2099,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(ctz):
-        if (const_args[2]) {
-            tcg_debug_assert(have_bmi1);
-            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
-            tcg_out_modrm(s, OPC_TZCNT + rexw, args[0], args[1]);
-        } else {
-            /* ??? The manual says that the output is undefined when the
-               input is zero, but real hardware leaves it unchanged.  As
-               noted in target-i386/translate.c, real programs depend on
-               this -- now we are one more of those.  */
-            tcg_out_modrm(s, OPC_BSF + rexw, args[0], args[1]);
-            if (args[0] != args[2]) {
-                tcg_out_cmov(s, TCG_COND_EQ, rexw, args[0], args[2]);
-            }
-        }
+        tcg_out_ctz(s, rexw, args[0], args[1], args[2], const_args[2]);
         break;
-
     OP_32_64(clz):
-        if (const_args[2]) {
-            tcg_debug_assert(have_bmi1);
-            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
-            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
-        } else {
-            /* ??? See above.  */
-            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
-            if (args[0] != args[2]) {
-                tcg_out_cmov(s, TCG_COND_EQ, rexw, args[0], args[2]);
-            }
-        }
+        tcg_out_clz(s, rexw, args[0], args[1], args[2], const_args[2]);
         break;
 
     case INDEX_op_brcond_i32:
@@ -2281,7 +2332,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "0", "Ci" } },
     { INDEX_op_rotl_i32, { "r", "0", "ci" } },
     { INDEX_op_rotr_i32, { "r", "0", "ci" } },
-    { INDEX_op_clz_i32, { "r", "r", "rW" } },
+    { INDEX_op_clz_i32, { "r", "r", "ri" } },
     { INDEX_op_ctz_i32, { "r", "r", "rW" } },
 
     { INDEX_op_brcond_i32, { "r", "ri" } },
@@ -2344,7 +2395,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i64, { "r", "0", "Ci" } },
     { INDEX_op_rotl_i64, { "r", "0", "ci" } },
     { INDEX_op_rotr_i64, { "r", "0", "ci" } },
-    { INDEX_op_clz_i64, { "r", "r", "rW" } },
+    { INDEX_op_clz_i64, { "r", "r", "re" } },
     { INDEX_op_ctz_i64, { "r", "r", "rW" } },
 
     { INDEX_op_brcond_i64, { "r", "re" } },
@@ -2498,6 +2549,10 @@ static void tcg_target_init(TCGContext *s)
            need to probe for it.  */
         have_movbe = (c & bit_MOVBE) != 0;
 #endif
+#ifndef have_lzcnt
+        /* LZCNT was introduced with AMD Barcelona and Intel Haswell CPUs.  */
+        have_lzcnt = (c & bit_LZCNT) != 0;
+#endif
     }
 
     if (max >= 7) {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-17 23:03           ` Richard Henderson
@ 2016-11-18 12:48             ` Bastian Koppelmann
  2016-11-21 10:37               ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Bastian Koppelmann @ 2016-11-18 12:48 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 11/18/2016 12:03 AM, Richard Henderson wrote:
> On 11/17/2016 11:09 PM, Bastian Koppelmann wrote:
>> On 11/17/2016 08:59 PM, Richard Henderson wrote:
>>> On 11/17/2016 08:53 PM, Richard Henderson wrote:
>>>> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote:
>>>>> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>>>>>> +
>>>>>> +    OP_32_64(clz):
>>>>>> +        if (const_args[2]) {
>>>>>> +            tcg_debug_assert(have_bmi1);
>>>>>> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
>>>>>> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
>>>>>> +        } else {
>>>>>> +            /* ??? See above.  */
>>>>>> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
>>>>>
>>>>> The Intel ISA manual states that it find the bit index of the most
>>>>> significant bit, where the least significant bit is index 0. So for
>>>>> the
>>>>> input 0x2 this should return 1. However this is not the number of
>>>>> leading zeros.
>>>>
>>>> Oh, of course you're right.  I thought I was testing this, but while
>>>> alpha does
>>>> have this operation, it turns out it isn't used much.
>>>
>>> Alternately, what I tested was on a haswell machine, which takes the
>>> LZCNT path, which *does* produce the intended results.  Just the BSR
>>> path doesn't.
>>
>> Luckily my old laptop is a Core 2 Duo without LZCNT :)
> 
> Heh.  Well, I've given it another few tests with LZCNT hacked off, and
> with i686 32-bit.  Here's an incremental update.  Wherein I also note
> that lzcnt isn't in the same cpuid flag as tzcnt.  Double whoops.

My processor[1] seems to lie about the LZCNT cpuid flag. It says it has
LZCNT but executes it as BSR. According to [2] ABM flag is used to
indicate LZCNT support.

Cheers,
    Bastian


[1]
$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     P8400  @ 2.26GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 1600.000
cache size	: 3072 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf eagerfpu pni
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave
lahf_lm tpr_shadow vnmi flexpriority dtherm ida
bugs		:
bogomips	: 4523.35
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

[2] https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes
  2016-11-18 12:48             ` Bastian Koppelmann
@ 2016-11-21 10:37               ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-21 10:37 UTC (permalink / raw)
  To: Bastian Koppelmann, qemu-devel

On 11/18/2016 01:48 PM, Bastian Koppelmann wrote:
> On 11/18/2016 12:03 AM, Richard Henderson wrote:
>> On 11/17/2016 11:09 PM, Bastian Koppelmann wrote:
>>> On 11/17/2016 08:59 PM, Richard Henderson wrote:
>>>> On 11/17/2016 08:53 PM, Richard Henderson wrote:
>>>>> On 11/17/2016 05:50 PM, Bastian Koppelmann wrote:
>>>>>> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>>>>>>> +
>>>>>>> +    OP_32_64(clz):
>>>>>>> +        if (const_args[2]) {
>>>>>>> +            tcg_debug_assert(have_bmi1);
>>>>>>> +            tcg_debug_assert(args[2] == (rexw ? 64 : 32));
>>>>>>> +            tcg_out_modrm(s, OPC_LZCNT + rexw, args[0], args[1]);
>>>>>>> +        } else {
>>>>>>> +            /* ??? See above.  */
>>>>>>> +            tcg_out_modrm(s, OPC_BSR + rexw, args[0], args[1]);
>>>>>>
>>>>>> The Intel ISA manual states that it find the bit index of the most
>>>>>> significant bit, where the least significant bit is index 0. So for
>>>>>> the
>>>>>> input 0x2 this should return 1. However this is not the number of
>>>>>> leading zeros.
>>>>>
>>>>> Oh, of course you're right.  I thought I was testing this, but while
>>>>> alpha does
>>>>> have this operation, it turns out it isn't used much.
>>>>
>>>> Alternately, what I tested was on a haswell machine, which takes the
>>>> LZCNT path, which *does* produce the intended results.  Just the BSR
>>>> path doesn't.
>>>
>>> Luckily my old laptop is a Core 2 Duo without LZCNT :)
>>
>> Heh.  Well, I've given it another few tests with LZCNT hacked off, and
>> with i686 32-bit.  Here's an incremental update.  Wherein I also note
>> that lzcnt isn't in the same cpuid flag as tzcnt.  Double whoops.
>
> My processor[1] seems to lie about the LZCNT cpuid flag. It says it has
> LZCNT but executes it as BSR. According to [2] ABM flag is used to
> indicate LZCNT support.

Yes, the gcc cpuid.h comment for the lzcnt bit, i.e. to which leaf it should 
apply, is wrong.  I'll get that fixed in the next revision.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes
  2016-11-16 19:25 ` [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes Richard Henderson
@ 2016-11-21 15:11   ` Alex Bennée
  2016-11-21 16:05     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Bennée @ 2016-11-21 15:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg-runtime.c            | 20 +++++++++++
>  tcg/README               |  8 +++++
>  tcg/aarch64/tcg-target.h |  4 +++
>  tcg/arm/tcg-target.h     |  2 ++
>  tcg/i386/tcg-target.h    |  4 +++
>  tcg/ia64/tcg-target.h    |  4 +++
>  tcg/mips/tcg-target.h    |  2 ++
>  tcg/optimize.c           | 36 ++++++++++++++++++++
>  tcg/ppc/tcg-target.h     |  4 +++
>  tcg/s390/tcg-target.h    |  4 +++
>  tcg/sparc/tcg-target.h   |  4 +++
>  tcg/tcg-op.c             | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-op.h             | 16 +++++++++
>  tcg/tcg-opc.h            |  4 +++
>  tcg/tcg-runtime.h        |  5 +++
>  tcg/tcg.h                |  2 ++
>  tcg/tci/tcg-target.h     |  4 +++
>  17 files changed, 209 insertions(+)
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 9327b6f..eb3bade 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -101,6 +101,26 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
>      return h;
>  }
>
> +uint32_t HELPER(clz_i32)(uint32_t arg, uint32_t zero_val)
> +{
> +    return arg ? clz32(arg) : zero_val;
> +}
> +
> +uint32_t HELPER(ctz_i32)(uint32_t arg, uint32_t zero_val)
> +{
> +    return arg ? ctz32(arg) : zero_val;
> +}
> +
> +uint64_t HELPER(clz_i64)(uint64_t arg, uint64_t zero_val)
> +{
> +    return arg ? clz64(arg) : zero_val;
> +}
> +
> +uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
> +{
> +    return arg ? ctz64(arg) : zero_val;
> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/README b/tcg/README
> index 065d9c2..f5ccf04 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -246,6 +246,14 @@ t0=~(t1|t2)
>
>  t0=t1|~t2
>
> +* clz_i32/i64 t0, t1, t2
> +
> +t0 = t1 ? clz(t1) : t2
> +
> +* ctz_i32/i64 t0, t1, t2
> +
> +t0 = t1 ? ctz(t1) : t2
> +
>  ********* Shifts/Rotates
>
>  * shl_i32/i64 t0, t1, t2
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 4a74bd8..976f493 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -62,6 +62,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i32          1
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -94,6 +96,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i64          1
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     1
> diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
> index 4e30728..02cc242 100644
> --- a/tcg/arm/tcg-target.h
> +++ b/tcg/arm/tcg-target.h
> @@ -110,6 +110,8 @@ extern bool use_idiv_instructions;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
>  #define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index dc19c47..f2d9955 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -93,6 +93,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     1
> @@ -125,6 +127,8 @@ extern bool have_bmi1;
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
> index 8856dc8..9a829ae 100644
> --- a/tcg/ia64/tcg-target.h
> +++ b/tcg/ia64/tcg-target.h
> @@ -140,6 +140,10 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         1
>  #define TCG_TARGET_HAS_nand_i64         1
>  #define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i32          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_nor_i64          1
>  #define TCG_TARGET_HAS_orc_i32          1
>  #define TCG_TARGET_HAS_orc_i64          1
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index f1c3137..f133684 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -109,6 +109,8 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_rem_i32          1
>  #define TCG_TARGET_HAS_not_i32          1
>  #define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_andc_i32         0
>  #define TCG_TARGET_HAS_orc_i32          0
>  #define TCG_TARGET_HAS_eqv_i32          0
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 28ce624..34a28ac 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -323,6 +323,18 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
>      CASE_OP_32_64(nor):
>          return ~(x | y);
>
> +    case INDEX_op_clz_i32:
> +        return (uint32_t)x ? clz32(x) : y;
> +
> +    case INDEX_op_clz_i64:
> +        return x ? clz64(x) : y;
> +
> +    case INDEX_op_ctz_i32:
> +        return (uint32_t)x ? ctz32(x) : y;
> +
> +    case INDEX_op_ctz_i64:
> +        return x ? ctz64(x) : y;
> +
>      CASE_OP_32_64(ext8s):
>          return (int8_t)x;
>
> @@ -934,6 +946,16 @@ void tcg_optimize(TCGContext *s)
>              mask = temp_info(args[1])->mask | temp_info(args[2])->mask;
>              break;
>
> +        case INDEX_op_clz_i32:
> +        case INDEX_op_ctz_i32:
> +            mask = temp_info(args[2])->mask | 31;
> +            break;
> +
> +        case INDEX_op_clz_i64:
> +        case INDEX_op_ctz_i64:
> +            mask = temp_info(args[2])->mask | 63;
> +            break;
> +

Did I miss a pre-requisite here?

/home/alex/lsrc/qemu/qemu.git/tcg/optimize.c: In function ‘tcg_optimize’:
/home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:900:20: error: implicit declaration of function ‘temp_info’ [-Werror=implicit-function-declaration]
             mask = temp_info(args[2])->mask | 31;
                    ^
/home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:900:13: error: nested extern declaration of ‘temp_info’ [-Werror=nested-externs]
             mask = temp_info(args[2])->mask | 31;
             ^
/home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:900:38: error: invalid type argument of ‘->’ (have ‘int’)
             mask = temp_info(args[2])->mask | 31;
                                      ^
/home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:905:38: error: invalid type argument of ‘->’ (have ‘int’)
             mask = temp_info(args[2])->mask | 63;
                                      ^
/home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:1067:46: error: invalid type argument of ‘->’ (have ‘int’)
                 TCGArg v = temp_info(args[1])->val;
                                              ^
cc1: all warnings being treated as errors
/home/alex/lsrc/qemu/qemu.git/rules.mak:60: recipe for target 'tcg/optimize.o' failed


>          CASE_OP_32_64(setcond):
>          case INDEX_op_setcond2_i32:
>              mask = 1;
> @@ -1090,6 +1112,20 @@ void tcg_optimize(TCGContext *s)
>              }
>              goto do_default;
>
> +        CASE_OP_32_64(clz):
> +        CASE_OP_32_64(ctz):
> +            if (temp_is_const(args[1])) {
> +                TCGArg v = temp_info(args[1])->val;
> +                if (v != 0) {
> +                    tmp = do_constant_folding(opc, v, 0);
> +                    tcg_opt_gen_movi(s, op, args, args[0], tmp);
> +                } else {
> +                    tcg_opt_gen_mov(s, op, args, args[0], args[2]);
> +                }
> +                break;
> +            }
> +            goto do_default;
> +
>          CASE_OP_32_64(deposit):
>              if (temp_is_const(args[1]) && temp_is_const(args[2])) {
>                  tmp = deposit64(temp_info(args[1])->val, args[3], args[4],
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index b42c57a..698a599 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -68,6 +68,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i32          1
>  #define TCG_TARGET_HAS_nand_i32         1
>  #define TCG_TARGET_HAS_nor_i32          1
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
>  #define TCG_TARGET_HAS_extract_i32      1
>  #define TCG_TARGET_HAS_sextract_i32     0
> @@ -101,6 +103,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_eqv_i64          1
>  #define TCG_TARGET_HAS_nand_i64         1
>  #define TCG_TARGET_HAS_nor_i64          1
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
>  #define TCG_TARGET_HAS_extract_i64      1
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
> index e9ac12e..3ac2dc9 100644
> --- a/tcg/s390/tcg-target.h
> +++ b/tcg/s390/tcg-target.h
> @@ -77,6 +77,8 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_eqv_i32        0
>  #define TCG_TARGET_HAS_nand_i32       0
>  #define TCG_TARGET_HAS_nor_i32        0
> +#define TCG_TARGET_HAS_clz_i32        0
> +#define TCG_TARGET_HAS_ctz_i32        0
>  #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_sextract_i32   0
> @@ -108,6 +110,8 @@ extern uint64_t s390_facilities;
>  #define TCG_TARGET_HAS_eqv_i64        0
>  #define TCG_TARGET_HAS_nand_i64       0
>  #define TCG_TARGET_HAS_nor_i64        0
> +#define TCG_TARGET_HAS_clz_i64        0
> +#define TCG_TARGET_HAS_ctz_i64        0
>  #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
>  #define TCG_TARGET_HAS_sextract_i64   0
> diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
> index a212167..340837a 100644
> --- a/tcg/sparc/tcg-target.h
> +++ b/tcg/sparc/tcg-target.h
> @@ -110,6 +110,8 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      0
>  #define TCG_TARGET_HAS_extract_i32      0
>  #define TCG_TARGET_HAS_sextract_i32     0
> @@ -142,6 +144,8 @@ extern bool use_vis3_instructions;
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
>  #define TCG_TARGET_HAS_extract_i64      0
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 1927e53..b45095c 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -457,6 +457,38 @@ void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>      }
>  }
>
> +void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> +{
> +    if (TCG_TARGET_HAS_clz_i32) {
> +        tcg_gen_op3_i32(INDEX_op_clz_i32, ret, arg1, arg2);
> +    } else {
> +        gen_helper_clz_i32(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
> +{
> +    TCGv_i32 t = tcg_const_i32(arg2);
> +    tcg_gen_clz_i32(ret, arg1, t);
> +    tcg_temp_free_i32(t);
> +}
> +
> +void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
> +{
> +    if (TCG_TARGET_HAS_ctz_i32) {
> +        tcg_gen_op3_i32(INDEX_op_ctz_i32, ret, arg1, arg2);
> +    } else {
> +        gen_helper_ctz_i32(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
> +{
> +    TCGv_i32 t = tcg_const_i32(arg2);
> +    tcg_gen_ctz_i32(ret, arg1, t);
> +    tcg_temp_free_i32(t);
> +}
> +
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i32) {
> @@ -1703,6 +1735,60 @@ void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>      }
>  }
>
> +void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
> +{
> +    if (TCG_TARGET_HAS_clz_i64) {
> +        tcg_gen_op3_i64(INDEX_op_clz_i64, ret, arg1, arg2);
> +    } else {
> +        gen_helper_clz_i64(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
> +{
> +    if (TCG_TARGET_REG_BITS == 32
> +        && TCG_TARGET_HAS_clz_i32
> +        && arg2 <= 0xffffffffu) {
> +        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
> +        tcg_gen_clz_i32(t, TCGV_LOW(arg1), t);
> +        tcg_gen_addi_i32(t, t, 32);
> +        tcg_gen_clz_i32(TCGV_LOW(ret), TCGV_HIGH(arg1), t);
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +        tcg_temp_free_i32(t);
> +    } else {
> +        TCGv_i64 t = tcg_const_i64(arg2);
> +        tcg_gen_clz_i64(ret, arg1, t);
> +        tcg_temp_free_i64(t);
> +    }
> +}
> +
> +void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
> +{
> +    if (TCG_TARGET_HAS_ctz_i64) {
> +        tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
> +    } else {
> +        gen_helper_ctz_i64(ret, arg1, arg2);
> +    }
> +}
> +
> +void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2)
> +{
> +    if (TCG_TARGET_REG_BITS == 32
> +        && TCG_TARGET_HAS_ctz_i32
> +        && arg2 <= 0xffffffffu) {
> +        TCGv_i32 t = tcg_const_i32((uint32_t)arg2 - 32);
> +        tcg_gen_ctz_i32(t, TCGV_HIGH(arg1), t);
> +        tcg_gen_addi_i32(t, t, 32);
> +        tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t);
> +        tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
> +        tcg_temp_free_i32(t);
> +    } else {
> +        TCGv_i64 t = tcg_const_i64(arg2);
> +        tcg_gen_ctz_i64(ret, arg1, t);
> +        tcg_temp_free_i64(t);
> +    }
> +}
> +
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
>  {
>      if (TCG_TARGET_HAS_rot_i64) {
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index d42fd0d..7a24e84 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -286,6 +286,10 @@ void tcg_gen_eqv_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_nand_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_nor_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_orc_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> +void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
> +void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
>  void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
>  void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
>  void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
> @@ -469,6 +473,10 @@ void tcg_gen_eqv_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_nand_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_nor_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_orc_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> +void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
> +void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
>  void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
>  void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
>  void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
> @@ -958,6 +966,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_nand_tl tcg_gen_nand_i64
>  #define tcg_gen_nor_tl tcg_gen_nor_i64
>  #define tcg_gen_orc_tl tcg_gen_orc_i64
> +#define tcg_gen_clz_tl tcg_gen_clz_i64
> +#define tcg_gen_ctz_tl tcg_gen_ctz_i64
> +#define tcg_gen_clzi_tl tcg_gen_clzi_i64
> +#define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i64
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i64
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
> @@ -1049,6 +1061,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
>  #define tcg_gen_nand_tl tcg_gen_nand_i32
>  #define tcg_gen_nor_tl tcg_gen_nor_i32
>  #define tcg_gen_orc_tl tcg_gen_orc_i32
> +#define tcg_gen_clz_tl tcg_gen_clz_i32
> +#define tcg_gen_ctz_tl tcg_gen_ctz_i32
> +#define tcg_gen_clzi_tl tcg_gen_clzi_i32
> +#define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
>  #define tcg_gen_rotl_tl tcg_gen_rotl_i32
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i32
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 11563ac..d00db4f 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -104,6 +104,8 @@ DEF(orc_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_orc_i32))
>  DEF(eqv_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_eqv_i32))
>  DEF(nand_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nand_i32))
>  DEF(nor_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_nor_i32))
> +DEF(clz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_clz_i32))
> +DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
>
>  DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
>  DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
> @@ -171,6 +173,8 @@ DEF(orc_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_orc_i64))
>  DEF(eqv_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_eqv_i64))
>  DEF(nand_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nand_i64))
>  DEF(nor_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_nor_i64))
> +DEF(clz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_clz_i64))
> +DEF(ctz_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_ctz_i64))
>
>  DEF(add2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_add2_i64))
>  DEF(sub2_i64, 2, 4, 0, IMPL64 | IMPL(TCG_TARGET_HAS_sub2_i64))
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 1deb86a..eb1cd76 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -15,6 +15,11 @@ DEF_HELPER_FLAGS_2(sar_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(mulsh_i64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
>  DEF_HELPER_FLAGS_2(muluh_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
>
> +DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
> +
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
>  #ifdef CONFIG_SOFTMMU
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 730c2d5..ba1389c 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -111,6 +111,8 @@ typedef uint64_t TCGRegSet;
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
>  #define TCG_TARGET_HAS_extract_i64      0
>  #define TCG_TARGET_HAS_sextract_i64     0
> diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
> index 2065042..0646444 100644
> --- a/tcg/tci/tcg-target.h
> +++ b/tcg/tci/tcg-target.h
> @@ -74,6 +74,8 @@
>  #define TCG_TARGET_HAS_eqv_i32          0
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
> +#define TCG_TARGET_HAS_clz_i32          0
> +#define TCG_TARGET_HAS_ctz_i32          0
>  #define TCG_TARGET_HAS_neg_i32          1
>  #define TCG_TARGET_HAS_not_i32          1
>  #define TCG_TARGET_HAS_orc_i32          0
> @@ -104,6 +106,8 @@
>  #define TCG_TARGET_HAS_eqv_i64          0
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
> +#define TCG_TARGET_HAS_clz_i64          0
> +#define TCG_TARGET_HAS_ctz_i64          0
>  #define TCG_TARGET_HAS_neg_i64          1
>  #define TCG_TARGET_HAS_not_i64          1
>  #define TCG_TARGET_HAS_orc_i64          0


--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes
  2016-11-21 15:11   ` Alex Bennée
@ 2016-11-21 16:05     ` Richard Henderson
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2016-11-21 16:05 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel

On 11/21/2016 04:11 PM, Alex Bennée wrote:
>> > +        case INDEX_op_clz_i32:
>> > +        case INDEX_op_ctz_i32:
>> > +            mask = temp_info(args[2])->mask | 31;
>> > +            break;
>> > +
>> > +        case INDEX_op_clz_i64:
>> > +        case INDEX_op_ctz_i64:
>> > +            mask = temp_info(args[2])->mask | 63;
>> > +            break;
>> > +
> Did I miss a pre-requisite here?
>
> /home/alex/lsrc/qemu/qemu.git/tcg/optimize.c: In function ‘tcg_optimize’:
> /home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:900:20: error: implicit declaration of function ‘temp_info’ [-Werror=implicit-function-declaration]
>              mask = temp_info(args[2])->mask | 31;
>                     ^
> /home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:900:13: error: nested extern declaration of ‘temp_info’ [-Werror=nested-externs]
>              mask = temp_info(args[2])->mask | 31;
>              ^
> /home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:900:38: error: invalid type argument of ‘->’ (have ‘int’)
>              mask = temp_info(args[2])->mask | 31;
>                                       ^
> /home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:905:38: error: invalid type argument of ‘->’ (have ‘int’)
>              mask = temp_info(args[2])->mask | 63;
>                                       ^
> /home/alex/lsrc/qemu/qemu.git/tcg/optimize.c:1067:46: error: invalid type argument of ‘->’ (have ‘int’)
>                  TCGArg v = temp_info(args[1])->val;
>                                               ^
> cc1: all warnings being treated as errors
> /home/alex/lsrc/qemu/qemu.git/rules.mak:60: recipe for target 'tcg/optimize.o' failed
>
>

Hmm, it would appear that I posted the series from the wrong branch, where I 
had other changes installed.

I can re-post later.  In the meantime you could have a look at the branch:

   git://github.com/rth7680/qemu.git tcg-2.9


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 18/25] tcg/aarch64: Handle ctz and clz opcodes
  2016-11-17 11:53   ` Richard Henderson
@ 2016-11-22 10:41     ` Alex Bennée
  0 siblings, 0 replies; 40+ messages in thread
From: Alex Bennée @ 2016-11-22 10:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Claudio Fontana


Richard Henderson <rth@twiddle.net> writes:

> On 11/16/2016 08:25 PM, Richard Henderson wrote:
>> @@ -206,6 +206,9 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>>      if ((ct & TCG_CT_CONST_MONE) && val == -1) {
>>          return 1;
>>      }
>> +    if ((ct & TCG_CT_CONST_WSZ) && val == (type ? 64 : 32)) {
>> +        return 1;
>> +    }
>>
>>      return 0;
>>  }
>
> Bah.  Forgot to revert this hunk at the last minute.
>
>
> r~

I'm also seeing asserts fire as it decodes risu tests:

IN:
0x0000004000801148:  b37ad6fc      bfi x28, x23, #6, #54
0x000000400080114c:  00005af0      unallocated (Unallocated)

qemu-aarch64: /home/alex/qemu.git/tcg/tcg-op.c:1937: tcg_gen_deposit_i64: Assertion `ofs + len <= 64' failed.

Thread 1 "qemu-aarch64" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x0000007fb7ac5df4 in __GI_abort () at abort.c:89
#2  0x0000007fb7abe22c in __assert_fail_base (fmt=0x7fb7bad9f0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5555747068 "ofs + len <= 64", file=file@entry=0x5555746ec8 "/home/alex/qemu.git/tcg/tcg-op.c", line=line@entry=1937, function=function@entry=0x55557472f0 <__PRETTY_FUNCTION__.46784> "tcg_gen_deposit_i64") at assert.c:92
#3  0x0000007fb7abe2c4 in __GI___assert_fail (assertion=0x5555747068 "ofs + len <= 64", file=0x5555746ec8 "/home/alex/qemu.git/tcg/tcg-op.c", line=1937, function=0x55557472f0 <__PRETTY_FUNCTION__.46784> "tcg_gen_deposit_i64") at assert.c:101
#4  0x00000055555ce1e4 in tcg_gen_deposit_i64 (ret=0x1f, arg1=0x1f, arg2=0x3c, ofs=23, len=48) at /home/alex/qemu.git/tcg/tcg-op.c:1937
#5  0x0000005555694a7c in disas_bitfield (s=0x7fffffea08, insn=3010051815) at /home/alex/qemu.git/target-arm/translate-a64.c:3249
#6  0x0000005555694dec in disas_data_proc_imm (s=0x7fffffea08, insn=3010051815) at /home/alex/qemu.git/target-arm/translate-a64.c:3341
#7  0x00000055556a5d30 in disas_a64_insn (env=0x555783ca18, s=0x7fffffea08) at /home/alex/qemu.git/target-arm/translate-a64.c:11154
#8  0x00000055556a624c in gen_intermediate_code_a64 (cpu=0x5557834720, tb=0x7fb5822e50) at /home/alex/qemu.git/target-arm/translate-a64.c:11312
#9  0x0000005555651be0 in gen_intermediate_code (env=0x555783ca18, tb=0x7fb5822e50) at /home/alex/qemu.git/target-arm/translate.c:11588
#10 0x00000055555b8324 in tb_gen_code (cpu=0x5557834720, pc=274886299984, cs_base=0, flags=2147483648, cflags=0) at /home/alex/qemu.git/translate-all.c:1311
#11 0x00000055555bafe8 in tb_find (cpu=0x5557834720, last_tb=0x0, tb_exit=0) at /home/alex/qemu.git/cpu-exec.c:346
#12 0x00000055555bb72c in cpu_exec (cpu=0x5557834720) at /home/alex/qemu.git/cpu-exec.c:637
#13 0x00000055555f1410 in cpu_loop (env=0x555783ca18) at /home/alex/qemu.git/linux-user/main.c:788
#14 0x00000055555f2f74 in main (argc=7, argv=0x7ffffff6b8, envp=0x7ffffff6f8) at /home/alex/qemu.git/linux-user/main.c:4557
(gdb)

Annoyingly in_asm only dumps after a decode but I believe the
instruction is:

  0xb369bee7

    14c:       00005af0        .inst   0x00005af0 ; undefined
    150:       b369bee7        bfxil   x7, x23, #41, #7
    154:       00005af0        .inst   0x00005af0 ; undefined

--
Alex Bennée

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-11-22 10:41 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-16 19:25 [Qemu-devel] [PATCH 00/25] tcg: Handle clz, ctz, and clrsb generically Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 01/25] tcg: Add clz and ctz opcodes Richard Henderson
2016-11-21 15:11   ` Alex Bennée
2016-11-21 16:05     ` Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 02/25] target-alpha: Use the ctz and clz opcodes Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 03/25] target-cris: Use clz opcode Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 04/25] target-microblaze: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 05/25] target-mips: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 06/25] target-openrisc: Use clz and ctz opcodes Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 07/25] target-ppc: " Richard Henderson
2016-11-17  3:09   ` David Gibson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 08/25] target-s390x: Use clz opcode Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 09/25] target-tilegx: Use clz and ctz opcodes Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 10/25] target-tricore: Use clz opcode Richard Henderson
2016-11-17 14:42   ` Bastian Koppelmann
2016-11-17 15:47     ` Bastian Koppelmann
2016-11-16 19:25 ` [Qemu-devel] [PATCH 11/25] target-unicore32: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 12/25] target-xtensa: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 13/25] target-arm: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 14/25] target-i386: Use clz and ctz opcodes Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 15/25] disas/i386.c: Handle tzcnt Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 16/25] tcg/i386: Handle ctz and clz opcodes Richard Henderson
2016-11-17 16:50   ` Bastian Koppelmann
2016-11-17 19:53     ` Richard Henderson
2016-11-17 19:59       ` Richard Henderson
2016-11-17 22:09         ` Bastian Koppelmann
2016-11-17 23:03           ` Richard Henderson
2016-11-18 12:48             ` Bastian Koppelmann
2016-11-21 10:37               ` Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 17/25] tcg/ppc: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 18/25] tcg/aarch64: " Richard Henderson
2016-11-17 11:53   ` Richard Henderson
2016-11-22 10:41     ` Alex Bennée
2016-11-16 19:25 ` [Qemu-devel] [PATCH 19/25] tcg/arm: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 20/25] tcg/mips: Handle clz opcode Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 21/25] tcg/s390: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 22/25] tcg: Add helpers for clrsb Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 23/25] target-arm: Use clrsb helper Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 24/25] target-tricore: " Richard Henderson
2016-11-16 19:25 ` [Qemu-devel] [PATCH 25/25] target-xtensa: " Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.