All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
@ 2016-10-18 15:10 Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 01/18] tcg: Add field extraction primitives Richard Henderson
                   ` (19 more replies)
  0 siblings, 20 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Better tested this time, including aarch64 host.

Changes since v1:
  * Added tcg_gen_deposit_z_*.  Depositing into zero turns out to be
    quite common among targets.  Providing that as a primitive expander
    allows us to easily generate optimal-ish code for hosts with and
    without a real deposit operation.
  * Cleanups in tcg/s390 akin to those I already did for tcg/arm.
  * Add support in tcg/s390 for deposit into zero.
  * More special cases in the expanders for better code generation,
    especially on an x86 host *without* the extract primitives.
  * Silly think-o on aarch64 host.


r~


Richard Henderson (18):
  tcg: Add field extraction primitives
  tcg: Minor adjustments to deposit expanders
  tcg: Add deposit_z expander
  tcg/aarch64: Implement field extraction opcodes
  tcg/arm: Move isa detection to tcg-target.h
  tcg/arm: Implement field extraction opcodes
  tcg/i386: Implement field extraction opcodes
  tcg/mips: Implement field extraction opcodes
  tcg/ppc: Implement field extraction opcodes
  tcg/s390: Expose host facilities to tcg-target.h
  tcg/s390: Implement field extraction opcodes
  tcg/s390: Support deposit into zero
  target-alpha: Use deposit and extract ops
  target-arm: Use new deposit and extract ops
  target-i386: Use new deposit and extract ops
  target-mips: Use the new extract op
  target-ppc: Use the new deposit and extract ops
  target-s390x: Use the new deposit and extract ops

 target-alpha/translate.c     |  67 ++++---
 target-arm/translate-a64.c   |  79 +++-----
 target-arm/translate.c       |  37 +---
 target-i386/translate.c      |  45 +++--
 target-mips/translate.c      |  12 +-
 target-ppc/translate.c       |  35 ++--
 target-s390x/translate.c     |  34 ++--
 tcg/aarch64/tcg-target.h     |   4 +
 tcg/aarch64/tcg-target.inc.c |  14 ++
 tcg/arm/tcg-target.h         |  38 +++-
 tcg/arm/tcg-target.inc.c     |  63 +++---
 tcg/i386/tcg-target.h        |  10 +
 tcg/i386/tcg-target.inc.c    |  38 ++++
 tcg/ia64/tcg-target.h        |   4 +
 tcg/mips/tcg-target.h        |   2 +
 tcg/mips/tcg-target.inc.c    |   4 +
 tcg/optimize.c               |  29 +++
 tcg/ppc/tcg-target.h         |   4 +
 tcg/ppc/tcg-target.inc.c     |  10 +
 tcg/s390/tcg-target.h        | 122 +++++++-----
 tcg/s390/tcg-target.inc.c    | 113 ++++++-----
 tcg/sparc/tcg-target.h       |   4 +
 tcg/tcg-op.c                 | 465 ++++++++++++++++++++++++++++++++++++++++++-
 tcg/tcg-op.h                 |  18 ++
 tcg/tcg-opc.h                |   4 +
 tcg/tcg.h                    |   8 +
 tcg/tci/tcg-target.h         |   4 +
 27 files changed, 954 insertions(+), 313 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 01/18] tcg: Add field extraction primitives
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 02/18] tcg: Minor adjustments to deposit expanders Richard Henderson
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
fixed position bitfields, much like we already have for deposit.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h |   4 +
 tcg/arm/tcg-target.h     |   2 +
 tcg/i386/tcg-target.h    |   4 +
 tcg/ia64/tcg-target.h    |   4 +
 tcg/mips/tcg-target.h    |   2 +
 tcg/optimize.c           |  29 +++++
 tcg/ppc/tcg-target.h     |   4 +
 tcg/s390/tcg-target.h    |   4 +
 tcg/sparc/tcg-target.h   |   4 +
 tcg/tcg-op.c             | 316 +++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h             |  12 ++
 tcg/tcg-opc.h            |   4 +
 tcg/tcg.h                |   8 ++
 tcg/tci/tcg-target.h     |   4 +
 14 files changed, 401 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index a1d101f..410c31b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -63,6 +63,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -93,6 +95,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index a0e1acf..8e724be 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -80,6 +80,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        1
 #define TCG_TARGET_HAS_muls2_i32        1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 524cfc6..7625188 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -94,6 +94,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -124,6 +126,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 6dddb7f..8856dc8 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -149,6 +149,10 @@ typedef enum {
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i32         0
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 3aeac87..1bcea3b 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -123,6 +123,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 0f13490..0be71f8 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -878,6 +878,19 @@ void tcg_optimize(TCGContext *s)
                              temps[args[2]].mask);
             break;
 
+        CASE_OP_32_64(extract):
+            mask = extract64(temps[args[1]].mask, args[2], args[3]);
+            if (args[2] == 0) {
+                affected = temps[args[1]].mask & ~mask;
+            }
+            break;
+        CASE_OP_32_64(sextract):
+            mask = sextract64(temps[args[1]].mask, args[2], args[3]);
+            if (args[2] == 0 && (tcg_target_long)mask >= 0) {
+                affected = temps[args[1]].mask & ~mask;
+            }
+            break;
+
         CASE_OP_32_64(or):
         CASE_OP_32_64(xor):
             mask = temps[args[1]].mask | temps[args[2]].mask;
@@ -1048,6 +1061,22 @@ void tcg_optimize(TCGContext *s)
             }
             goto do_default;
 
+        CASE_OP_32_64(extract):
+            if (temp_is_const(args[1])) {
+                tmp = extract64(temps[args[1]].val, args[3], args[4]);
+                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                break;
+            }
+            goto do_default;
+
+        CASE_OP_32_64(sextract):
+            if (temp_is_const(args[1])) {
+                tmp = sextract64(temps[args[1]].val, args[3], args[4]);
+                tcg_opt_gen_movi(s, op, args, args[0], tmp);
+                break;
+            }
+            goto do_default;
+
         CASE_OP_32_64(setcond):
             tmp = do_constant_folding_cond(opc, args[1], args[2], args[3]);
             if (tmp != 2) {
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index dd032f2..c765d3e 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -69,6 +69,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        0
 #define TCG_TARGET_HAS_muls2_i32        0
@@ -100,6 +102,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 0c1af24..9583df4 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -66,6 +66,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -95,6 +97,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 88f9c90..a212167 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -111,6 +111,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -141,6 +143,8 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 291d50b..144ac0c 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -570,6 +570,131 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
     tcg_temp_free_i32(t1);
 }
 
+void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                         unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 32);
+    tcg_debug_assert(ofs + len <= 32);
+
+    /* Canonicalize certain special cases, even if extract is supported.  */
+    if (ofs + len == 32) {
+        tcg_gen_shri_i32(ret, arg, 32 - len);
+        return;
+    }
+    if (ofs == 0) {
+        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+        return;
+    }
+
+    if (TCG_TARGET_HAS_extract_i32
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
+        tcg_gen_op4ii_i32(INDEX_op_extract_i32, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that zero-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 16:
+        if (TCG_TARGET_HAS_ext16u_i32) {
+            tcg_gen_ext16u_i32(ret, arg);
+            tcg_gen_shri_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8u_i32) {
+            tcg_gen_ext8u_i32(ret, arg);
+            tcg_gen_shri_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+
+    /* ??? Ideally we'd know what values are available for immediate AND.
+       Assume that 8 bits are available, plus the special case of 16,
+       so that we get ext8u, ext16u.  */
+    switch (len) {
+    case 1 ... 8: case 16:
+        tcg_gen_shri_i32(ret, arg, ofs);
+        tcg_gen_andi_i32(ret, ret, (1u << len) - 1);
+        break;
+    default:
+        tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
+        tcg_gen_shri_i32(ret, ret, 32 - len);
+        break;
+    }
+}
+
+void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                          unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 32);
+    tcg_debug_assert(ofs + len <= 32);
+
+    /* Canonicalize certain special cases, even if extract is supported.  */
+    if (ofs + len == 32) {
+        tcg_gen_sari_i32(ret, arg, 32 - len);
+        return;
+    }
+    if (ofs == 0) {
+        switch (len) {
+        case 16:
+            tcg_gen_ext16s_i32(ret, arg);
+            return;
+        case 8:
+            tcg_gen_ext8s_i32(ret, arg);
+            return;
+        }
+    }
+
+    if (TCG_TARGET_HAS_sextract_i32
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
+        tcg_gen_op4ii_i32(INDEX_op_sextract_i32, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that sign-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i32) {
+            tcg_gen_ext16s_i32(ret, arg);
+            tcg_gen_sari_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i32) {
+            tcg_gen_ext8s_i32(ret, arg);
+            tcg_gen_sari_i32(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+    switch (len) {
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i32) {
+            tcg_gen_shri_i32(ret, arg, ofs);
+            tcg_gen_ext16s_i32(ret, ret);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i32) {
+            tcg_gen_shri_i32(ret, arg, ofs);
+            tcg_gen_ext8s_i32(ret, ret);
+            return;
+        }
+        break;
+    }
+
+    tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
+    tcg_gen_sari_i32(ret, ret, 32 - len);
+}
+
 void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
                          TCGv_i32 c2, TCGv_i32 v1, TCGv_i32 v2)
 {
@@ -1618,6 +1743,197 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
     tcg_temp_free_i64(t1);
 }
 
+void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                         unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 64);
+    tcg_debug_assert(ofs + len <= 64);
+
+    /* Canonicalize certain special cases, even if extract is supported.  */
+    if (ofs + len == 64) {
+        tcg_gen_shri_i64(ret, arg, 64 - len);
+        return;
+    }
+    if (ofs == 0) {
+        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+        return;
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        /* Look for a 32-bit extract within one of the two words.  */
+        if (ofs >= 32) {
+            tcg_gen_extract_i32(TCGV_LOW(ret), TCGV_HIGH(arg), ofs - 32, len);
+            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+            return;
+        }
+        if (ofs + len <= 32) {
+            tcg_gen_extract_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs - 32, len);
+            tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+            return;
+        }
+        /* The field is split across two words.  One double-word
+           shift is better than two double-word shifts.  */
+        goto do_shift_and;
+    }
+
+    if (TCG_TARGET_HAS_extract_i64
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
+        tcg_gen_op4ii_i64(INDEX_op_extract_i64, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that zero-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 32:
+        if (TCG_TARGET_HAS_ext32u_i64) {
+            tcg_gen_ext32u_i64(ret, arg);
+            tcg_gen_shri_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 16:
+        if (TCG_TARGET_HAS_ext16u_i64) {
+            tcg_gen_ext16u_i64(ret, arg);
+            tcg_gen_shri_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8u_i64) {
+            tcg_gen_ext8u_i64(ret, arg);
+            tcg_gen_shri_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+
+    /* ??? Ideally we'd know what values are available for immediate AND.
+       Assume that 8 bits are available, plus the special cases of 16 and 32,
+       so that we get ext8u, ext16u, and ext32u.  */
+    switch (len) {
+    case 1 ... 8: case 16: case 32:
+    do_shift_and:
+        tcg_gen_shri_i64(ret, arg, ofs);
+        tcg_gen_andi_i64(ret, ret, (1ull << len) - 1);
+        break;
+    default:
+        tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
+        tcg_gen_shri_i64(ret, ret, 64 - len);
+        break;
+    }
+}
+
+void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                          unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 64);
+    tcg_debug_assert(ofs + len <= 64);
+
+    /* Canonicalize certain special cases, even if sextract is supported.  */
+    if (ofs + len == 64) {
+        tcg_gen_sari_i64(ret, arg, 64 - len);
+        return;
+    }
+    if (ofs == 0) {
+        switch (len) {
+        case 32:
+            tcg_gen_ext32s_i64(ret, arg);
+            return;
+        case 16:
+            tcg_gen_ext16s_i64(ret, arg);
+            return;
+        case 8:
+            tcg_gen_ext8s_i64(ret, arg);
+            return;
+        }
+    }
+
+    if (TCG_TARGET_REG_BITS == 32 && len <= 32) {
+        /* Look for a 32-bit extract within one of the two words.  */
+        if (ofs >= 32) {
+            tcg_gen_sextract_i32(TCGV_LOW(ret), TCGV_HIGH(arg), ofs - 32, len);
+        } else if (ofs + len <= 32) {
+            tcg_gen_sextract_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+        } else {
+            /* The 32-bit field is split across the two words.  We can
+               perform one double-word shift to place the field at the
+               top of the low word and do the rest as single word shifts.  */
+            int sr = ofs + len - 32;
+            int sl = 32 - sr;
+            TCGv_i32 t = tcg_temp_new_i32();
+
+            tcg_gen_shli_i32(t, TCGV_HIGH(arg), sl);
+            tcg_gen_shri_i32(TCGV_LOW(arg), TCGV_LOW(arg), sr);
+            tcg_gen_or_i32(TCGV_LOW(arg), TCGV_LOW(arg), t);
+            tcg_temp_free_i32(t);
+
+            tcg_gen_sari_i32(TCGV_LOW(ret), TCGV_LOW(ret), 32 - len);
+        }
+        tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
+        return;
+    }
+
+    if (TCG_TARGET_HAS_sextract_i64
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
+        tcg_gen_op4ii_i64(INDEX_op_sextract_i64, ret, arg, ofs, len);
+        return;
+    }
+
+    /* Assume that sign-extension, if available, is cheaper than a shift.  */
+    switch (ofs + len) {
+    case 32:
+        if (TCG_TARGET_HAS_ext32s_i64) {
+            tcg_gen_ext32s_i64(ret, arg);
+            tcg_gen_sari_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i64) {
+            tcg_gen_ext16s_i64(ret, arg);
+            tcg_gen_sari_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i64) {
+            tcg_gen_ext8s_i64(ret, arg);
+            tcg_gen_sari_i64(ret, ret, ofs);
+            return;
+        }
+        break;
+    }
+    switch (len) {
+    case 32:
+        if (TCG_TARGET_HAS_ext32s_i64) {
+            tcg_gen_shri_i64(ret, arg, ofs);
+            tcg_gen_ext32s_i64(ret, ret);
+            return;
+        }
+        break;
+    case 16:
+        if (TCG_TARGET_HAS_ext16s_i64) {
+            tcg_gen_shri_i64(ret, arg, ofs);
+            tcg_gen_ext16s_i64(ret, ret);
+            return;
+        }
+        break;
+    case 8:
+        if (TCG_TARGET_HAS_ext8s_i64) {
+            tcg_gen_shri_i64(ret, arg, ofs);
+            tcg_gen_ext8s_i64(ret, ret);
+            return;
+        }
+        break;
+    }
+    tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
+    tcg_gen_sari_i64(ret, ret, 64 - len);
+}
+
 void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
                          TCGv_i64 c2, TCGv_i64 v1, TCGv_i64 v2)
 {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 02cb376..21d30cb 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -292,6 +292,10 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                         unsigned int ofs, unsigned int len);
+void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
+                          unsigned int ofs, unsigned int len);
 void tcg_gen_brcond_i32(TCGCond cond, TCGv_i32 arg1, TCGv_i32 arg2, TCGLabel *);
 void tcg_gen_brcondi_i32(TCGCond cond, TCGv_i32 arg1, int32_t arg2, TCGLabel *);
 void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
@@ -468,6 +472,10 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                         unsigned int ofs, unsigned int len);
+void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
+                          unsigned int ofs, unsigned int len);
 void tcg_gen_brcond_i64(TCGCond cond, TCGv_i64 arg1, TCGv_i64 arg2, TCGLabel *);
 void tcg_gen_brcondi_i64(TCGCond cond, TCGv_i64 arg1, int64_t arg2, TCGLabel *);
 void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
@@ -925,6 +933,8 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
 #define tcg_gen_rotri_tl tcg_gen_rotri_i64
 #define tcg_gen_deposit_tl tcg_gen_deposit_i64
+#define tcg_gen_extract_tl tcg_gen_extract_i64
+#define tcg_gen_sextract_tl tcg_gen_sextract_i64
 #define tcg_const_tl tcg_const_i64
 #define tcg_const_local_tl tcg_const_local_i64
 #define tcg_gen_movcond_tl tcg_gen_movcond_i64
@@ -1002,6 +1012,8 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
 #define tcg_gen_rotri_tl tcg_gen_rotri_i32
 #define tcg_gen_deposit_tl tcg_gen_deposit_i32
+#define tcg_gen_extract_tl tcg_gen_extract_i32
+#define tcg_gen_sextract_tl tcg_gen_sextract_i32
 #define tcg_const_tl tcg_const_i32
 #define tcg_const_local_tl tcg_const_local_i32
 #define tcg_gen_movcond_tl tcg_gen_movcond_i32
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 45528d2..11563ac 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -77,6 +77,8 @@ DEF(sar_i32, 1, 2, 0, 0)
 DEF(rotl_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
 DEF(rotr_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_rot_i32))
 DEF(deposit_i32, 1, 2, 2, IMPL(TCG_TARGET_HAS_deposit_i32))
+DEF(extract_i32, 1, 1, 2, IMPL(TCG_TARGET_HAS_extract_i32))
+DEF(sextract_i32, 1, 1, 2, IMPL(TCG_TARGET_HAS_sextract_i32))
 
 DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END)
 
@@ -139,6 +141,8 @@ DEF(sar_i64, 1, 2, 0, IMPL64)
 DEF(rotl_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(rotr_i64, 1, 2, 0, IMPL64 | IMPL(TCG_TARGET_HAS_rot_i64))
 DEF(deposit_i64, 1, 2, 2, IMPL64 | IMPL(TCG_TARGET_HAS_deposit_i64))
+DEF(extract_i64, 1, 1, 2, IMPL64 | IMPL(TCG_TARGET_HAS_extract_i64))
+DEF(sextract_i64, 1, 1, 2, IMPL64 | IMPL(TCG_TARGET_HAS_sextract_i64))
 
 /* size changing ops */
 DEF(ext_i32_i64, 1, 1, 0, IMPL64)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index c9949aa..e2d34b6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -112,6 +112,8 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      0
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i64         0
@@ -130,6 +132,12 @@ typedef uint64_t TCGRegSet;
 #ifndef TCG_TARGET_deposit_i64_valid
 #define TCG_TARGET_deposit_i64_valid(ofs, len) 1
 #endif
+#ifndef TCG_TARGET_extract_i32_valid
+#define TCG_TARGET_extract_i32_valid(ofs, len) 1
+#endif
+#ifndef TCG_TARGET_extract_i64_valid
+#define TCG_TARGET_extract_i64_valid(ofs, len) 1
+#endif
 
 /* Only one of DIV or DIV2 should be defined.  */
 #if defined(TCG_TARGET_HAS_div_i32)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 868228b..2065042 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -69,6 +69,8 @@
 #define TCG_TARGET_HAS_ext16u_i32       1
 #define TCG_TARGET_HAS_andc_i32         0
 #define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
@@ -88,6 +90,8 @@
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
 #define TCG_TARGET_HAS_deposit_i64      1
+#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_div_i64          0
 #define TCG_TARGET_HAS_rem_i64          0
 #define TCG_TARGET_HAS_ext8s_i64        1
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 02/18] tcg: Minor adjustments to deposit expanders
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 01/18] tcg: Add field extraction primitives Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 03/18] tcg: Add deposit_z expander Richard Henderson
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Assert that len is not 0.

Since we have asserted that ofs + len <= N, a later
check for len == N implies that ofs == 0.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 144ac0c..505dacd 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -543,10 +543,11 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
     TCGv_i32 t1;
 
     tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 32);
     tcg_debug_assert(ofs + len <= 32);
 
-    if (ofs == 0 && len == 32) {
+    if (len == 32) {
         tcg_gen_mov_i32(ret, arg2);
         return;
     }
@@ -1701,10 +1702,11 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
     TCGv_i64 t1;
 
     tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 64);
     tcg_debug_assert(ofs + len <= 64);
 
-    if (ofs == 0 && len == 64) {
+    if (len == 64) {
         tcg_gen_mov_i64(ret, arg2);
         return;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 03/18] tcg: Add deposit_z expander
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 01/18] tcg: Add field extraction primitives Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 02/18] tcg: Minor adjustments to deposit expanders Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes Richard Henderson
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

While we don't require a new opcode, it is handy to have an expander
that knows the first source is zero.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg-op.c | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-op.h |   6 +++
 2 files changed, 149 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 505dacd..911da2e 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -571,6 +571,64 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
     tcg_temp_free_i32(t1);
 }
 
+void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
+                           unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 32);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 32);
+    tcg_debug_assert(ofs + len <= 32);
+
+    if (ofs + len == 32) {
+        tcg_gen_shli_i32(ret, arg, ofs);
+    } else if (ofs == 0) {
+        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+    } else if (TCG_TARGET_HAS_deposit_i32
+               && TCG_TARGET_deposit_i32_valid(ofs, len)) {
+        TCGv_i32 zero = tcg_const_i32(0);
+        tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
+        tcg_temp_free_i32(zero);
+    } else {
+        /* To help two-operand hosts we prefer to zero-extend first,
+           which allows ARG to stay live.  */
+        switch (len) {
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i32) {
+                tcg_gen_ext16u_i32(ret, arg);
+                tcg_gen_shli_i32(ret, ret, ofs);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i32) {
+                tcg_gen_ext8u_i32(ret, arg);
+                tcg_gen_shli_i32(ret, ret, ofs);
+                return;
+            }
+            break;
+        }
+        /* Otherwise prefer zero-extension over AND for code size.  */
+        switch (ofs + len) {
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i32) {
+                tcg_gen_shli_i32(ret, arg, ofs);
+                tcg_gen_ext16u_i32(ret, ret);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i32) {
+                tcg_gen_shli_i32(ret, arg, ofs);
+                tcg_gen_ext8u_i32(ret, ret);
+                return;
+            }
+            break;
+        }
+        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+        tcg_gen_shli_i32(ret, ret, ofs);
+    }
+}
+
 void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
                          unsigned int ofs, unsigned int len)
 {
@@ -1745,6 +1803,91 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
     tcg_temp_free_i64(t1);
 }
 
+void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
+                           unsigned int ofs, unsigned int len)
+{
+    tcg_debug_assert(ofs < 64);
+    tcg_debug_assert(len > 0);
+    tcg_debug_assert(len <= 64);
+    tcg_debug_assert(ofs + len <= 64);
+
+    if (ofs + len == 64) {
+        tcg_gen_shli_i64(ret, arg, ofs);
+    } else if (ofs == 0) {
+        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+    } else if (TCG_TARGET_HAS_deposit_i64
+               && TCG_TARGET_deposit_i64_valid(ofs, len)) {
+        TCGv_i64 zero = tcg_const_i64(0);
+        tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
+        tcg_temp_free_i64(zero);
+    } else {
+        if (TCG_TARGET_REG_BITS == 32) {
+            if (ofs >= 32) {
+                tcg_gen_deposit_z_i32(TCGV_HIGH(ret), TCGV_LOW(arg),
+                                      ofs - 32, len);
+                tcg_gen_movi_i32(TCGV_LOW(ret), 0);
+                return;
+            }
+            if (ofs + len <= 32) {
+                tcg_gen_deposit_z_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+                tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+                return;
+            }
+        }
+        /* To help two-operand hosts we prefer to zero-extend first,
+           which allows ARG to stay live.  */
+        switch (len) {
+        case 32:
+            if (TCG_TARGET_HAS_ext32u_i64) {
+                tcg_gen_ext32u_i64(ret, arg);
+                tcg_gen_shli_i64(ret, ret, ofs);
+                return;
+            }
+            break;
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i64) {
+                tcg_gen_ext16u_i64(ret, arg);
+                tcg_gen_shli_i64(ret, ret, ofs);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i64) {
+                tcg_gen_ext8u_i64(ret, arg);
+                tcg_gen_shli_i64(ret, ret, ofs);
+                return;
+            }
+            break;
+        }
+        /* Otherwise prefer zero-extension over AND for code size.  */
+        switch (ofs + len) {
+        case 32:
+            if (TCG_TARGET_HAS_ext32u_i64) {
+                tcg_gen_shli_i64(ret, arg, ofs);
+                tcg_gen_ext32u_i64(ret, ret);
+                return;
+            }
+            break;
+        case 16:
+            if (TCG_TARGET_HAS_ext16u_i64) {
+                tcg_gen_shli_i64(ret, arg, ofs);
+                tcg_gen_ext16u_i64(ret, ret);
+                return;
+            }
+            break;
+        case 8:
+            if (TCG_TARGET_HAS_ext8u_i64) {
+                tcg_gen_shli_i64(ret, arg, ofs);
+                tcg_gen_ext8u_i64(ret, ret);
+                return;
+            }
+            break;
+        }
+        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+        tcg_gen_shli_i64(ret, ret, ofs);
+    }
+}
+
 void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
                          unsigned int ofs, unsigned int len)
 {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 21d30cb..177c7a0 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -292,6 +292,8 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
+                           unsigned int ofs, unsigned int len);
 void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
                          unsigned int ofs, unsigned int len);
 void tcg_gen_sextract_i32(TCGv_i32 ret, TCGv_i32 arg,
@@ -472,6 +474,8 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len);
+void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
+                           unsigned int ofs, unsigned int len);
 void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
                          unsigned int ofs, unsigned int len);
 void tcg_gen_sextract_i64(TCGv_i64 ret, TCGv_i64 arg,
@@ -933,6 +937,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
 #define tcg_gen_rotri_tl tcg_gen_rotri_i64
 #define tcg_gen_deposit_tl tcg_gen_deposit_i64
+#define tcg_gen_deposit_z_tl tcg_gen_deposit_z_i64
 #define tcg_gen_extract_tl tcg_gen_extract_i64
 #define tcg_gen_sextract_tl tcg_gen_sextract_i64
 #define tcg_const_tl tcg_const_i64
@@ -1012,6 +1017,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
 #define tcg_gen_rotri_tl tcg_gen_rotri_i32
 #define tcg_gen_deposit_tl tcg_gen_deposit_i32
+#define tcg_gen_deposit_z_tl tcg_gen_deposit_z_i32
 #define tcg_gen_extract_tl tcg_gen_extract_i32
 #define tcg_gen_sextract_tl tcg_gen_sextract_i32
 #define tcg_const_tl tcg_const_i32
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (2 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 03/18] tcg: Add deposit_z expander Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:33   ` Claudio Fontana
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 05/18] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.h     |  8 ++++----
 tcg/aarch64/tcg-target.inc.c | 14 ++++++++++++++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 410c31b..4a74bd8 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -63,8 +63,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      1
+#define TCG_TARGET_HAS_sextract_i32     1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -95,8 +95,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
-#define TCG_TARGET_HAS_sextract_i64     0
+#define TCG_TARGET_HAS_extract_i64      1
+#define TCG_TARGET_HAS_sextract_i64     1
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
 #define TCG_TARGET_HAS_sub2_i64         1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1939d35..c0e9890 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1640,6 +1640,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
         break;
 
+    case INDEX_op_extract_i64:
+    case INDEX_op_extract_i32:
+        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
+        break;
+
+    case INDEX_op_sextract_i64:
+    case INDEX_op_sextract_i32:
+        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
+        break;
+
     case INDEX_op_add2_i32:
         tcg_out_addsub2(s, TCG_TYPE_I32, a0, a1, REG0(2), REG0(3),
                         (int32_t)args[4], args[5], const_args[4],
@@ -1785,6 +1795,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
+    { INDEX_op_sextract_i32, { "r", "r" } },
+    { INDEX_op_sextract_i64, { "r", "r" } },
 
     { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
     { INDEX_op_add2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 05/18] tcg/arm: Move isa detection to tcg-target.h
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (3 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 06/18] tcg/arm: Implement field extraction opcodes Richard Henderson
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     | 36 ++++++++++++++++++++++++++++++++----
 tcg/arm/tcg-target.inc.c | 41 +----------------------------------------
 2 files changed, 33 insertions(+), 44 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 8e724be..d1fe12b 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -26,6 +26,37 @@
 #ifndef ARM_TCG_TARGET_H
 #define ARM_TCG_TARGET_H
 
+/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
+#ifndef __ARM_ARCH
+# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
+     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
+     || defined(__ARM_ARCH_7EM__)
+#  define __ARM_ARCH 7
+# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
+       || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
+       || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
+#  define __ARM_ARCH 6
+# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
+       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
+       || defined(__ARM_ARCH_5TEJ__)
+#  define __ARM_ARCH 5
+# else
+#  define __ARM_ARCH 4
+# endif
+#endif
+
+extern int arm_arch;
+
+#if defined(__ARM_ARCH_5T__) \
+    || defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
+# define use_armv5t_instructions 1
+#else
+# define use_armv5t_instructions use_armv6_instructions
+#endif
+
+#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
+#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
+
 #undef TCG_TARGET_STACK_GROWSUP
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
@@ -79,7 +110,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32          0
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      1
+#define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32      0
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
@@ -90,9 +121,6 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32          0
 
-extern bool tcg_target_deposit_valid(int ofs, int len);
-#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
-
 enum {
     TCG_AREG0 = TCG_REG_R6,
 };
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index ffa0d40..1415c27 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -25,36 +25,7 @@
 #include "elf.h"
 #include "tcg-be-ldst.h"
 
-/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
-#ifndef __ARM_ARCH
-# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
-     || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
-     || defined(__ARM_ARCH_7EM__)
-#  define __ARM_ARCH 7
-# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
-       || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
-       || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
-#  define __ARM_ARCH 6
-# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
-       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
-       || defined(__ARM_ARCH_5TEJ__)
-#  define __ARM_ARCH 5
-# else
-#  define __ARM_ARCH 4
-# endif
-#endif
-
-static int arm_arch = __ARM_ARCH;
-
-#if defined(__ARM_ARCH_5T__) \
-    || defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
-# define use_armv5t_instructions 1
-#else
-# define use_armv5t_instructions use_armv6_instructions
-#endif
-
-#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
-#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
+int arm_arch = __ARM_ARCH;
 
 #ifndef use_idiv_instructions
 bool use_idiv_instructions;
@@ -730,16 +701,6 @@ static inline void tcg_out_bswap32(TCGContext *s, int cond, int rd, int rn)
     }
 }
 
-bool tcg_target_deposit_valid(int ofs, int len)
-{
-    /* ??? Without bfi, we could improve over generic code by combining
-       the right-shift from a non-zero ofs with the orr.  We do run into
-       problems when rd == rs, and the mask generated from ofs+len doesn't
-       fit into an immediate.  We would have to be careful not to pessimize
-       wrt the optimizations performed on the expanded code.  */
-    return use_armv7_instructions;
-}
-
 static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
                                    TCGArg a1, int ofs, int len, bool const_a1)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 06/18] tcg/arm: Implement field extraction opcodes
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (4 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 05/18] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 07/18] tcg/i386: " Richard Henderson
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.h     |  4 ++--
 tcg/arm/tcg-target.inc.c | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index d1fe12b..4e30728 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -111,8 +111,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      use_armv7_instructions
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      use_armv7_instructions
+#define TCG_TARGET_HAS_sextract_i32     use_armv7_instructions
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        1
 #define TCG_TARGET_HAS_muls2_i32        1
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 1415c27..6765a9d 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -713,6 +713,20 @@ static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
               | (ofs << 7) | ((ofs + len - 1) << 16));
 }
 
+static inline void tcg_out_extract(TCGContext *s, int cond, TCGReg rd,
+                                   TCGArg a1, int ofs, int len)
+{
+    tcg_out32(s, 0x07e00050 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((len - 1) << 16));
+}
+
+static inline void tcg_out_sextract(TCGContext *s, int cond, TCGReg rd,
+                                    TCGArg a1, int ofs, int len)
+{
+    tcg_out32(s, 0x07a00050 | (cond << 28) | (rd << 12) | a1
+              | (ofs << 7) | ((len - 1) << 16));
+}
+
 /* Note that this routine is used for both LDR and LDRH formats, so we do
    not wish to include an immediate shift at this point.  */
 static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
@@ -1894,6 +1908,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_deposit(s, COND_AL, args[0], args[2],
                         args[3], args[4], const_args[2]);
         break;
+    case INDEX_op_extract_i32:
+        tcg_out_extract(s, COND_AL, args[0], args[1], args[2], args[3]);
+        break;
+    case INDEX_op_sextract_i32:
+        tcg_out_sextract(s, COND_AL, args[0], args[1], args[2], args[3]);
+        break;
 
     case INDEX_op_div_i32:
         tcg_out_sdiv(s, COND_AL, args[0], args[1], args[2]);
@@ -1976,6 +1996,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_ext16u_i32, { "r", "r" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
+    { INDEX_op_sextract_i32, { "r", "r" } },
 
     { INDEX_op_div_i32, { "r", "r", "r" } },
     { INDEX_op_divu_i32, { "r", "r", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction opcodes
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (5 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 06/18] tcg/arm: Implement field extraction opcodes Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-25 12:46   ` Paolo Bonzini
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 08/18] tcg/mips: " Richard Henderson
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.h     | 12 +++++++++---
 tcg/i386/tcg-target.inc.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7625188..dc19c47 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -94,8 +94,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
+#define TCG_TARGET_HAS_extract_i32      1
+#define TCG_TARGET_HAS_sextract_i32     1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
@@ -126,7 +126,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
@@ -142,6 +142,12 @@ extern bool have_bmi1;
      ((ofs) == 0 && (len) == 16))
 #define TCG_TARGET_deposit_i64_valid    TCG_TARGET_deposit_i32_valid
 
+/* Check for the possibility of high-byte extraction and, for 64-bit,
+   zero-extending 32-bit right-shift.  */
+#define TCG_TARGET_extract_i32_valid(ofs, len) ((ofs) == 8 && (len) == 8)
+#define TCG_TARGET_extract_i64_valid(ofs, len) \
+    (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
+
 #if TCG_TARGET_REG_BITS == 64
 # define TCG_AREG0 TCG_REG_R14
 #else
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index eeb1777..39f62bd 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2143,6 +2143,40 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_extract_i64:
+        if (args[2] + args[3] == 32) {
+            /* This is a 32-bit zero-extending right shift.  */
+            tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
+            tcg_out_shifti(s, SHIFT_SHR, args[0], args[2]);
+            break;
+        }
+        /* FALLTHRU */
+    case INDEX_op_extract_i32:
+        /* On the off-chance that we can use the high-byte registers.
+           Otherwise we emit the same ext16 + shift pattern that we
+           would have gotten from the normal tcg-op.c expansion.  */
+        tcg_debug_assert(args[2] == 8 && args[3] == 8);
+        if (args[1] < 4 && args[0] < 8) {
+            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
+        } else {
+            tcg_out_ext16u(s, args[0], args[1]);
+            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
+        }
+        break;
+
+    case INDEX_op_sextract_i32:
+        /* We don't implement sextract_i64, as we cannot sign-extend to
+           64-bits without using the REX prefix that explicitly excludes
+           access to the high-byte registers.  */
+        tcg_debug_assert(args[2] == 8 && args[3] == 8);
+        if (args[1] < 4 && args[0] < 8) {
+            tcg_out_modrm(s, OPC_MOVSBL, args[0], args[1] + 4);
+        } else {
+            tcg_out_ext16s(s, args[0], args[1], 0);
+            tcg_out_shifti(s, SHIFT_SAR, args[0], 8);
+        }
+        break;
+
     case INDEX_op_mb:
         tcg_out_mb(s, args[0]);
         break;
@@ -2204,6 +2238,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_setcond_i32, { "q", "r", "ri" } },
 
     { INDEX_op_deposit_i32, { "Q", "0", "Q" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
+    { INDEX_op_sextract_i32, { "r", "r" } },
+
     { INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
 
     { INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
@@ -2265,6 +2302,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_extu_i32_i64, { "r", "r" } },
 
     { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
     { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
     { INDEX_op_mulu2_i64, { "a", "d", "a", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 08/18] tcg/mips: Implement field extraction opcodes
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (6 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 07/18] tcg/i386: " Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-27 13:40   ` Yongbok Kim
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 09/18] tcg/ppc: " Richard Henderson
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.h     | 2 +-
 tcg/mips/tcg-target.inc.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 1bcea3b..f1c3137 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -123,7 +123,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
-#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_extract_i32      use_mips32r2_instructions
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index abce602..192dd49 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1637,6 +1637,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_deposit_i32:
         tcg_out_opc_bf(s, OPC_INS, a0, a2, args[3] + args[4] - 1, args[3]);
         break;
+    case INDEX_op_extract_i32:
+        tcg_out_opc_bf(s, OPC_EXT, a0, a1, a3 + args[3] - 1, a2);
+        break;
 
     case INDEX_op_brcond_i32:
         tcg_out_brcond(s, a2, a0, a1, arg_label(args[3]));
@@ -1736,6 +1739,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_ext16s_i32, { "r", "rZ" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_brcond_i32, { "rZ", "rZ" } },
 #if use_mips32r6_instructions
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 09/18] tcg/ppc: Implement field extraction opcodes
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (7 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 08/18] tcg/mips: " Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 10/18] tcg/s390: Expose host facilities to tcg-target.h Richard Henderson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.h     |  4 ++--
 tcg/ppc/tcg-target.inc.c | 10 ++++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c765d3e..b42c57a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -69,7 +69,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         1
 #define TCG_TARGET_HAS_nor_i32          1
 #define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
+#define TCG_TARGET_HAS_extract_i32      1
 #define TCG_TARGET_HAS_sextract_i32     0
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_mulu2_i32        0
@@ -102,7 +102,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         1
 #define TCG_TARGET_HAS_nor_i64          1
 #define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
+#define TCG_TARGET_HAS_extract_i64      1
 #define TCG_TARGET_HAS_sextract_i64     0
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         1
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index a3262cf..7ec54a2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2396,6 +2396,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         break;
 
+    case INDEX_op_extract_i32:
+        tcg_out_rlw(s, RLWINM, args[0], args[1],
+                    32 - args[2], 32 - args[3], 31);
+        break;
+    case INDEX_op_extract_i64:
+        tcg_out_rld(s, RLDICL, args[0], args[1], 64 - args[2], 64 - args[3]);
+        break;
+
     case INDEX_op_movcond_i32:
         tcg_out_movcond(s, TCG_TYPE_I32, args[5], args[0], args[1], args[2],
                         args[3], args[4], const_args[2]);
@@ -2530,6 +2538,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_movcond_i32, { "r", "r", "ri", "rZ", "rZ" } },
 
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_muluh_i32, { "r", "r", "r" } },
     { INDEX_op_mulsh_i32, { "r", "r", "r" } },
@@ -2585,6 +2594,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "r", "ri", "rZ", "rZ" } },
 
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
 
     { INDEX_op_mulsh_i64, { "r", "r", "r" } },
     { INDEX_op_muluh_i64, { "r", "r", "r" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 10/18] tcg/s390: Expose host facilities to tcg-target.h
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (8 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 09/18] tcg/ppc: " Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 11/18] tcg/s390: Implement field extraction opcodes Richard Henderson
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

This lets us expose facilities to TCG_TARGET_HAS_* defines
directly, rather than hiding behind function calls.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     | 126 ++++++++++++++++++++++++----------------------
 tcg/s390/tcg-target.inc.c |  74 +++++++++++----------------
 2 files changed, 96 insertions(+), 104 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 9583df4..9220f1f 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -49,67 +49,75 @@ typedef enum TCGReg {
 
 #define TCG_TARGET_NB_REGS 16
 
-/* optional instructions */
-#define TCG_TARGET_HAS_div2_i32         1
-#define TCG_TARGET_HAS_rot_i32          1
-#define TCG_TARGET_HAS_ext8s_i32        1
-#define TCG_TARGET_HAS_ext16s_i32       1
-#define TCG_TARGET_HAS_ext8u_i32        1
-#define TCG_TARGET_HAS_ext16u_i32       1
-#define TCG_TARGET_HAS_bswap16_i32      1
-#define TCG_TARGET_HAS_bswap32_i32      1
-#define TCG_TARGET_HAS_not_i32          0
-#define TCG_TARGET_HAS_neg_i32          1
-#define TCG_TARGET_HAS_andc_i32         0
-#define TCG_TARGET_HAS_orc_i32          0
-#define TCG_TARGET_HAS_eqv_i32          0
-#define TCG_TARGET_HAS_nand_i32         0
-#define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      1
-#define TCG_TARGET_HAS_extract_i32      0
-#define TCG_TARGET_HAS_sextract_i32     0
-#define TCG_TARGET_HAS_movcond_i32      1
-#define TCG_TARGET_HAS_add2_i32         1
-#define TCG_TARGET_HAS_sub2_i32         1
-#define TCG_TARGET_HAS_mulu2_i32        0
-#define TCG_TARGET_HAS_muls2_i32        0
-#define TCG_TARGET_HAS_muluh_i32        0
-#define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_extrl_i64_i32    0
-#define TCG_TARGET_HAS_extrh_i64_i32    0
+/* A list of relevant facilities used by this translator.  Some of these
+   are required for proper operation, and these are checked at startup.  */
+
+#define FACILITY_ZARCH_ACTIVE	(1ULL << (63 - 2))
+#define FACILITY_LONG_DISP	(1ULL << (63 - 18))
+#define FACILITY_EXT_IMM	(1ULL << (63 - 21))
+#define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
+#define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
+#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
 
-#define TCG_TARGET_HAS_div2_i64         1
-#define TCG_TARGET_HAS_rot_i64          1
-#define TCG_TARGET_HAS_ext8s_i64        1
-#define TCG_TARGET_HAS_ext16s_i64       1
-#define TCG_TARGET_HAS_ext32s_i64       1
-#define TCG_TARGET_HAS_ext8u_i64        1
-#define TCG_TARGET_HAS_ext16u_i64       1
-#define TCG_TARGET_HAS_ext32u_i64       1
-#define TCG_TARGET_HAS_bswap16_i64      1
-#define TCG_TARGET_HAS_bswap32_i64      1
-#define TCG_TARGET_HAS_bswap64_i64      1
-#define TCG_TARGET_HAS_not_i64          0
-#define TCG_TARGET_HAS_neg_i64          1
-#define TCG_TARGET_HAS_andc_i64         0
-#define TCG_TARGET_HAS_orc_i64          0
-#define TCG_TARGET_HAS_eqv_i64          0
-#define TCG_TARGET_HAS_nand_i64         0
-#define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_deposit_i64      1
-#define TCG_TARGET_HAS_extract_i64      0
-#define TCG_TARGET_HAS_sextract_i64     0
-#define TCG_TARGET_HAS_movcond_i64      1
-#define TCG_TARGET_HAS_add2_i64         1
-#define TCG_TARGET_HAS_sub2_i64         1
-#define TCG_TARGET_HAS_mulu2_i64        1
-#define TCG_TARGET_HAS_muls2_i64        0
-#define TCG_TARGET_HAS_muluh_i64        0
-#define TCG_TARGET_HAS_mulsh_i64        0
+extern uint64_t s390_facilities;
+
+/* optional instructions */
+#define TCG_TARGET_HAS_div2_i32       1
+#define TCG_TARGET_HAS_rot_i32        1
+#define TCG_TARGET_HAS_ext8s_i32      1
+#define TCG_TARGET_HAS_ext16s_i32     1
+#define TCG_TARGET_HAS_ext8u_i32      1
+#define TCG_TARGET_HAS_ext16u_i32     1
+#define TCG_TARGET_HAS_bswap16_i32    1
+#define TCG_TARGET_HAS_bswap32_i32    1
+#define TCG_TARGET_HAS_not_i32        0
+#define TCG_TARGET_HAS_neg_i32        1
+#define TCG_TARGET_HAS_andc_i32       0
+#define TCG_TARGET_HAS_orc_i32        0
+#define TCG_TARGET_HAS_eqv_i32        0
+#define TCG_TARGET_HAS_nand_i32       0
+#define TCG_TARGET_HAS_nor_i32        0
+#define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
+#define TCG_TARGET_HAS_extract_i32    0
+#define TCG_TARGET_HAS_sextract_i32   0
+#define TCG_TARGET_HAS_movcond_i32    1
+#define TCG_TARGET_HAS_add2_i32       1
+#define TCG_TARGET_HAS_sub2_i32       1
+#define TCG_TARGET_HAS_mulu2_i32      0
+#define TCG_TARGET_HAS_muls2_i32      0
+#define TCG_TARGET_HAS_muluh_i32      0
+#define TCG_TARGET_HAS_mulsh_i32      0
+#define TCG_TARGET_HAS_extrl_i64_i32  0
+#define TCG_TARGET_HAS_extrh_i64_i32  0
 
-extern bool tcg_target_deposit_valid(int ofs, int len);
-#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
-#define TCG_TARGET_deposit_i64_valid  tcg_target_deposit_valid
+#define TCG_TARGET_HAS_div2_i64       1
+#define TCG_TARGET_HAS_rot_i64        1
+#define TCG_TARGET_HAS_ext8s_i64      1
+#define TCG_TARGET_HAS_ext16s_i64     1
+#define TCG_TARGET_HAS_ext32s_i64     1
+#define TCG_TARGET_HAS_ext8u_i64      1
+#define TCG_TARGET_HAS_ext16u_i64     1
+#define TCG_TARGET_HAS_ext32u_i64     1
+#define TCG_TARGET_HAS_bswap16_i64    1
+#define TCG_TARGET_HAS_bswap32_i64    1
+#define TCG_TARGET_HAS_bswap64_i64    1
+#define TCG_TARGET_HAS_not_i64        0
+#define TCG_TARGET_HAS_neg_i64        1
+#define TCG_TARGET_HAS_andc_i64       0
+#define TCG_TARGET_HAS_orc_i64        0
+#define TCG_TARGET_HAS_eqv_i64        0
+#define TCG_TARGET_HAS_nand_i64       0
+#define TCG_TARGET_HAS_nor_i64        0
+#define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
+#define TCG_TARGET_HAS_extract_i64    0
+#define TCG_TARGET_HAS_sextract_i64   0
+#define TCG_TARGET_HAS_movcond_i64    1
+#define TCG_TARGET_HAS_add2_i64       1
+#define TCG_TARGET_HAS_sub2_i64       1
+#define TCG_TARGET_HAS_mulu2_i64      1
+#define TCG_TARGET_HAS_muls2_i64      0
+#define TCG_TARGET_HAS_muluh_i64      0
+#define TCG_TARGET_HAS_mulsh_i64      0
 
 /* used for function call generation */
 #define TCG_REG_CALL_STACK		TCG_REG_R15
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 253d4a0..9f51133 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -334,18 +334,7 @@ static void * const qemu_st_helpers[16] = {
 #endif
 
 static tcg_insn_unit *tb_ret_addr;
-
-/* A list of relevant facilities used by this translator.  Some of these
-   are required for proper operation, and these are checked at startup.  */
-
-#define FACILITY_ZARCH_ACTIVE	(1ULL << (63 - 2))
-#define FACILITY_LONG_DISP	(1ULL << (63 - 18))
-#define FACILITY_EXT_IMM	(1ULL << (63 - 21))
-#define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
-#define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
-#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
-
-static uint64_t facilities;
+uint64_t s390_facilities;
 
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
                         intptr_t value, intptr_t addend)
@@ -432,7 +421,7 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
 
 static int tcg_match_ori(TCGType type, tcg_target_long val)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (type == TCG_TYPE_I32) {
             /* All 32-bit ORs can be performed with 1 48-bit insn.  */
             return 1;
@@ -444,7 +433,7 @@ static int tcg_match_ori(TCGType type, tcg_target_long val)
         if (val == (int16_t)val) {
             return 0;
         }
-        if (facilities & FACILITY_EXT_IMM) {
+        if (s390_facilities & FACILITY_EXT_IMM) {
             if (val == (int32_t)val) {
                 return 0;
             }
@@ -461,7 +450,7 @@ static int tcg_match_ori(TCGType type, tcg_target_long val)
 
 static int tcg_match_xori(TCGType type, tcg_target_long val)
 {
-    if ((facilities & FACILITY_EXT_IMM) == 0) {
+    if ((s390_facilities & FACILITY_EXT_IMM) == 0) {
         return 0;
     }
 
@@ -482,7 +471,7 @@ static int tcg_match_xori(TCGType type, tcg_target_long val)
 
 static int tcg_match_cmpi(TCGType type, tcg_target_long val)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         /* The COMPARE IMMEDIATE instruction is available.  */
         if (type == TCG_TYPE_I32) {
             /* We have a 32-bit immediate and can compare against anything.  */
@@ -511,7 +500,7 @@ static int tcg_match_cmpi(TCGType type, tcg_target_long val)
 
 static int tcg_match_add2i(TCGType type, tcg_target_long val)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (type == TCG_TYPE_I32) {
             return 1;
         } else if (val >= -0xffffffffll && val <= 0xffffffffll) {
@@ -541,7 +530,7 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
            general-instruction-extensions, then we have MULTIPLY SINGLE
            IMMEDIATE with a signed 32-bit, otherwise we have only
            MULTIPLY HALFWORD IMMEDIATE, with a signed 16-bit.  */
-        if (facilities & FACILITY_GEN_INST_EXT) {
+        if (s390_facilities & FACILITY_GEN_INST_EXT) {
             return val == (int32_t)val;
         } else {
             return val == (int16_t)val;
@@ -668,7 +657,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
     }
 
     /* Try all 48-bit insns that can load it in one go.  */
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (sval == (int32_t)sval) {
             tcg_out_insn(s, RIL, LGFI, ret, sval);
             return;
@@ -694,7 +683,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 
     /* If extended immediates are not present, then we may have to issue
        several instructions to load the low 32 bits.  */
-    if (!(facilities & FACILITY_EXT_IMM)) {
+    if (!(s390_facilities & FACILITY_EXT_IMM)) {
         /* A 32-bit unsigned value can be loaded in 2 insns.  And given
            that the lli_insns loop above did not succeed, we know that
            both insns are required.  */
@@ -727,7 +716,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 
     /* Insert data into the high 32-bits.  */
     uval = uval >> 31 >> 1;
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if (uval < 0x10000) {
             tcg_out_insn(s, RI, IIHL, ret, uval);
         } else if ((uval & 0xffff) == 0) {
@@ -810,7 +799,7 @@ static void tcg_out_ld_abs(TCGContext *s, TCGType type, TCGReg dest, void *abs)
 {
     intptr_t addr = (intptr_t)abs;
 
-    if ((facilities & FACILITY_GEN_INST_EXT) && !(addr & 1)) {
+    if ((s390_facilities & FACILITY_GEN_INST_EXT) && !(addr & 1)) {
         ptrdiff_t disp = tcg_pcrel_diff(s, abs) >> 1;
         if (disp == (int32_t)disp) {
             if (type == TCG_TYPE_I32) {
@@ -837,7 +826,7 @@ static inline void tcg_out_risbg(TCGContext *s, TCGReg dest, TCGReg src,
 
 static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LGBR, dest, src);
         return;
     }
@@ -857,7 +846,7 @@ static void tgen_ext8s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 
 static void tgen_ext8u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LLGCR, dest, src);
         return;
     }
@@ -877,7 +866,7 @@ static void tgen_ext8u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 
 static void tgen_ext16s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LGHR, dest, src);
         return;
     }
@@ -897,7 +886,7 @@ static void tgen_ext16s(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 
 static void tgen_ext16u(TCGContext *s, TCGType type, TCGReg dest, TCGReg src)
 {
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         tcg_out_insn(s, RRE, LLGHR, dest, src);
         return;
     }
@@ -985,7 +974,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
         tgen_ext32u(s, dest, dest);
         return;
     }
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         if ((val & valid) == 0xff) {
             tgen_ext8u(s, TCG_TYPE_I64, dest, dest);
             return;
@@ -1006,7 +995,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
     }
 
     /* Try all 48-bit insns that can perform it in one go.  */
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         for (i = 0; i < 2; i++) {
             tcg_target_ulong mask = ~(0xffffffffull << i*32);
             if (((val | ~valid) & mask) == mask) {
@@ -1015,7 +1004,7 @@ static void tgen_andi(TCGContext *s, TCGType type, TCGReg dest, uint64_t val)
             }
         }
     }
-    if ((facilities & FACILITY_GEN_INST_EXT) && risbg_mask(val)) {
+    if ((s390_facilities & FACILITY_GEN_INST_EXT) && risbg_mask(val)) {
         tgen_andi_risbg(s, dest, dest, val);
         return;
     }
@@ -1045,7 +1034,7 @@ static void tgen64_ori(TCGContext *s, TCGReg dest, tcg_target_ulong val)
         return;
     }
 
-    if (facilities & FACILITY_EXT_IMM) {
+    if (s390_facilities & FACILITY_EXT_IMM) {
         /* Try all 32-bit insns that can perform it in one go.  */
         for (i = 0; i < 4; i++) {
             tcg_target_ulong mask = (0xffffull << i*16);
@@ -1220,7 +1209,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
     }
 
     cc = tgen_cmp(s, type, cond, c1, c2, c2const);
-    if (facilities & FACILITY_LOAD_ON_COND) {
+    if (s390_facilities & FACILITY_LOAD_ON_COND) {
         /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
         tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
         tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
@@ -1237,7 +1226,7 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
                          TCGReg c1, TCGArg c2, int c2const, TCGReg r3)
 {
     int cc;
-    if (facilities & FACILITY_LOAD_ON_COND) {
+    if (s390_facilities & FACILITY_LOAD_ON_COND) {
         cc = tgen_cmp(s, type, c, c1, c2, c2const);
         tcg_out_insn(s, RRF, LOCGR, dest, r3, cc);
     } else {
@@ -1250,11 +1239,6 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
     }
 }
 
-bool tcg_target_deposit_valid(int ofs, int len)
-{
-    return (facilities & FACILITY_GEN_INST_EXT) != 0;
-}
-
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
                          int ofs, int len)
 {
@@ -1332,7 +1316,7 @@ static void tgen_brcond(TCGContext *s, TCGType type, TCGCond c,
 {
     int cc;
 
-    if (facilities & FACILITY_GEN_INST_EXT) {
+    if (s390_facilities & FACILITY_GEN_INST_EXT) {
         bool is_unsigned = is_unsigned_cond(c);
         bool in_range;
         S390Opcode opc;
@@ -1519,7 +1503,7 @@ static TCGReg tcg_out_tlb_read(TCGContext* s, TCGReg addr_reg, TCGMemOp opc,
     a_off = (a_bits >= s_bits ? 0 : s_mask - a_mask);
     tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
 
-    if (facilities & FACILITY_GEN_INST_EXT) {
+    if (s390_facilities & FACILITY_GEN_INST_EXT) {
         tcg_out_risbg(s, TCG_REG_R2, addr_reg,
                       64 - CPU_TLB_BITS - CPU_TLB_ENTRY_BITS,
                       63 - CPU_TLB_ENTRY_BITS,
@@ -1790,7 +1774,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                     tcg_out_insn(s, RI, AHI, a0, a2);
                     break;
                 }
-                if (facilities & FACILITY_EXT_IMM) {
+                if (s390_facilities & FACILITY_EXT_IMM) {
                     tcg_out_insn(s, RIL, AFI, a0, a2);
                     break;
                 }
@@ -1986,7 +1970,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                     tcg_out_insn(s, RI, AGHI, a0, a2);
                     break;
                 }
-                if (facilities & FACILITY_EXT_IMM) {
+                if (s390_facilities & FACILITY_EXT_IMM) {
                     if (a2 == (int32_t)a2) {
                         tcg_out_insn(s, RIL, AGFI, a0, a2);
                         break;
@@ -2175,7 +2159,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
            serialize the instruction stream.  */
         if (args[0] & TCG_MO_ST_LD) {
             tcg_out_insn(s, RR, BCR,
-                         facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
+                         s390_facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
         }
         break;
 
@@ -2304,7 +2288,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { -1 },
 };
 
-static void query_facilities(void)
+static void query_s390_facilities(void)
 {
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
 
@@ -2315,7 +2299,7 @@ static void query_facilities(void)
         register void *r1 __asm__("1");
 
         /* stfle 0(%r1) */
-        r1 = &facilities;
+        r1 = &s390_facilities;
         asm volatile(".word 0xb2b0,0x1000"
                      : "=r"(r0) : "0"(0), "r"(r1) : "memory", "cc");
     }
@@ -2323,7 +2307,7 @@ static void query_facilities(void)
 
 static void tcg_target_init(TCGContext *s)
 {
-    query_facilities();
+    query_s390_facilities();
 
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 11/18] tcg/s390: Implement field extraction opcodes
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (9 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 10/18] tcg/s390: Expose host facilities to tcg-target.h Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 12/18] tcg/s390: Support deposit into zero Richard Henderson
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.h     |  4 ++--
 tcg/s390/tcg-target.inc.c | 11 +++++++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 9220f1f..52b33d7 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -78,7 +78,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nand_i32       0
 #define TCG_TARGET_HAS_nor_i32        0
 #define TCG_TARGET_HAS_deposit_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i32    0
+#define TCG_TARGET_HAS_extract_i32    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i32   0
 #define TCG_TARGET_HAS_movcond_i32    1
 #define TCG_TARGET_HAS_add2_i32       1
@@ -109,7 +109,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nand_i64       0
 #define TCG_TARGET_HAS_nor_i64        0
 #define TCG_TARGET_HAS_deposit_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i64    0
+#define TCG_TARGET_HAS_extract_i64    (s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i64   0
 #define TCG_TARGET_HAS_movcond_i64    1
 #define TCG_TARGET_HAS_add2_i64       1
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 9f51133..083c992 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1247,6 +1247,12 @@ static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
     tcg_out_risbg(s, dest, src, msb, lsb, ofs, 0);
 }
 
+static void tgen_extract(TCGContext *s, TCGReg dest, TCGReg src,
+                         int ofs, int len)
+{
+    tcg_out_risbg(s, dest, src, 64 - len, 63, 64 - ofs, 1);
+}
+
 static void tgen_gotoi(TCGContext *s, int cc, tcg_insn_unit *dest)
 {
     ptrdiff_t off = dest - s->code_ptr;
@@ -2153,6 +2159,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     OP_32_64(deposit):
         tgen_deposit(s, args[0], args[2], args[3], args[4]);
         break;
+    OP_32_64(extract):
+        tgen_extract(s, args[0], args[1], args[2], args[3]);
+        break;
 
     case INDEX_op_mb:
         /* The host memory model is quite strong, we simply need to
@@ -2222,6 +2231,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_setcond_i32, { "r", "r", "rC" } },
     { INDEX_op_movcond_i32, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i32, { "r", "0", "r" } },
+    { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
     { INDEX_op_qemu_ld_i64, { "r", "L" } },
@@ -2283,6 +2293,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_setcond_i64, { "r", "r", "rC" } },
     { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i64, { "r", "0", "r" } },
+    { INDEX_op_extract_i64, { "r", "r" } },
 
     { INDEX_op_mb, { } },
     { -1 },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 12/18] tcg/s390: Support deposit into zero
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (10 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 11/18] tcg/s390: Implement field extraction opcodes Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 13/18] target-alpha: Use deposit and extract ops Richard Henderson
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Since we can no longer use matching constraints, this does
mean we must handle that data movement by hand.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.inc.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 083c992..f4c510e 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -43,6 +43,7 @@
 #define TCG_CT_CONST_XORI  0x400
 #define TCG_CT_CONST_CMPI  0x800
 #define TCG_CT_CONST_ADLI  0x1000
+#define TCG_CT_CONST_ZERO  0x2000
 
 /* Several places within the instruction set 0 means "no register"
    rather than TCG_REG_R0.  */
@@ -404,6 +405,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
     case 'C':
         ct->ct |= TCG_CT_CONST_CMPI;
         break;
+    case 'Z':
+        ct->ct |= TCG_CT_CONST_ZERO;
+        break;
     default:
         return -1;
     }
@@ -543,6 +547,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return tcg_match_xori(type, val);
     } else if (ct & TCG_CT_CONST_CMPI) {
         return tcg_match_cmpi(type, val);
+    } else if (ct & TCG_CT_CONST_ZERO) {
+        return val == 0;
     }
 
     return 0;
@@ -1240,11 +1246,11 @@ static void tgen_movcond(TCGContext *s, TCGType type, TCGCond c, TCGReg dest,
 }
 
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
-                         int ofs, int len)
+                         int ofs, int len, int z)
 {
     int lsb = (63 - ofs);
     int msb = lsb - (len - 1);
-    tcg_out_risbg(s, dest, src, msb, lsb, ofs, 0);
+    tcg_out_risbg(s, dest, src, msb, lsb, ofs, z);
 }
 
 static void tgen_extract(TCGContext *s, TCGReg dest, TCGReg src,
@@ -2157,8 +2163,24 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(deposit):
-        tgen_deposit(s, args[0], args[2], args[3], args[4]);
+        a0 = args[0], a1 = args[1], a2 = args[2];
+        if (const_args[1]) {
+            tgen_deposit(s, a0, a2, args[3], args[4], 1);
+        } else {
+            /* Since we can't support "0Z" as a constraint, we allow a1 in
+               any register.  Fix things up as if a matching constraint.  */
+            if (a0 != a1) {
+                TCGType type = (opc == INDEX_op_deposit_i64);
+                if (a0 == a2) {
+                    tcg_out_mov(s, type, TCG_TMP0, a2);
+                    a2 = TCG_TMP0;
+                }
+                tcg_out_mov(s, type, a0, a1);
+            }
+            tgen_deposit(s, a0, a2, args[3], args[4], 0);
+        }
         break;
+
     OP_32_64(extract):
         tgen_extract(s, args[0], args[1], args[2], args[3]);
         break;
@@ -2230,7 +2252,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_brcond_i32, { "r", "rC" } },
     { INDEX_op_setcond_i32, { "r", "r", "rC" } },
     { INDEX_op_movcond_i32, { "r", "r", "rC", "r", "0" } },
-    { INDEX_op_deposit_i32, { "r", "0", "r" } },
+    { INDEX_op_deposit_i32, { "r", "rZ", "r" } },
     { INDEX_op_extract_i32, { "r", "r" } },
 
     { INDEX_op_qemu_ld_i32, { "r", "L" } },
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 13/18] target-alpha: Use deposit and extract ops
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (11 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 12/18] tcg/s390: Support deposit into zero Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 14/18] target-arm: Use new " Richard Henderson
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-alpha/translate.c | 67 ++++++++++++++++++++++++++++++------------------
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index af717ca..3314223 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -953,7 +953,13 @@ static void gen_ext_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
     if (islit) {
-        tcg_gen_shli_i64(vc, va, (64 - lit * 8) & 0x3f);
+        int pos = (64 - lit * 8) & 0x3f;
+        int len = cto32(byte_mask) * 8;
+        if (pos < len) {
+            tcg_gen_deposit_z_i64(vc, va, pos, len - pos);
+        } else {
+            tcg_gen_movi_i64(vc, 0);
+        }
     } else {
         TCGv tmp = tcg_temp_new();
         tcg_gen_shli_i64(tmp, load_gpr(ctx, rb), 3);
@@ -970,38 +976,44 @@ static void gen_ext_l(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
     if (islit) {
-        tcg_gen_shri_i64(vc, va, (lit & 7) * 8);
+        int pos = (lit & 7) * 8;
+        int len = cto32(byte_mask) * 8;
+        if (pos + len >= 64) {
+            len = 64 - pos;
+        }
+        tcg_gen_extract_i64(vc, va, pos, len);
     } else {
         TCGv tmp = tcg_temp_new();
         tcg_gen_andi_i64(tmp, load_gpr(ctx, rb), 7);
         tcg_gen_shli_i64(tmp, tmp, 3);
         tcg_gen_shr_i64(vc, va, tmp);
         tcg_temp_free(tmp);
+        gen_zapnoti(vc, vc, byte_mask);
     }
-    gen_zapnoti(vc, vc, byte_mask);
 }
 
 /* INSWH, INSLH, INSQH */
 static void gen_ins_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
-    TCGv tmp = tcg_temp_new();
-
-    /* The instruction description has us left-shift the byte mask and extract
-       bits <15:8> and apply that zap at the end.  This is equivalent to simply
-       performing the zap first and shifting afterward.  */
-    gen_zapnoti(tmp, va, byte_mask);
-
     if (islit) {
-        lit &= 7;
-        if (unlikely(lit == 0)) {
-            tcg_gen_movi_i64(vc, 0);
+        int pos = 64 - (lit & 7) * 8;
+        int len = cto32(byte_mask) * 8;
+        if (pos < len) {
+            tcg_gen_extract_i64(vc, va, pos, len - pos);
         } else {
-            tcg_gen_shri_i64(vc, tmp, 64 - lit * 8);
+            tcg_gen_movi_i64(vc, 0);
         }
     } else {
+        TCGv tmp = tcg_temp_new();
         TCGv shift = tcg_temp_new();
 
+        /* The instruction description has us left-shift the byte mask
+           and extract bits <15:8> and apply that zap at the end.  This
+           is equivalent to simply performing the zap first and shifting
+           afterward.  */
+        gen_zapnoti(tmp, va, byte_mask);
+
         /* If (B & 7) == 0, we need to shift by 64 and leave a zero.  Do this
            portably by splitting the shift into two parts: shift_count-1 and 1.
            Arrange for the -1 by using ones-complement instead of
@@ -1014,32 +1026,37 @@ static void gen_ins_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
         tcg_gen_shr_i64(vc, tmp, shift);
         tcg_gen_shri_i64(vc, vc, 1);
         tcg_temp_free(shift);
+        tcg_temp_free(tmp);
     }
-    tcg_temp_free(tmp);
 }
 
 /* INSBL, INSWL, INSLL, INSQL */
 static void gen_ins_l(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
                       uint8_t lit, uint8_t byte_mask)
 {
-    TCGv tmp = tcg_temp_new();
-
-    /* The instruction description has us left-shift the byte mask
-       the same number of byte slots as the data and apply the zap
-       at the end.  This is equivalent to simply performing the zap
-       first and shifting afterward.  */
-    gen_zapnoti(tmp, va, byte_mask);
-
     if (islit) {
-        tcg_gen_shli_i64(vc, tmp, (lit & 7) * 8);
+        int pos = (lit & 7) * 8;
+        int len = cto32(byte_mask) * 8;
+        if (pos + len > 64) {
+            len = 64 - pos;
+        }
+        tcg_gen_deposit_z_i64(vc, va, pos, len);
     } else {
+        TCGv tmp = tcg_temp_new();
         TCGv shift = tcg_temp_new();
+
+        /* The instruction description has us left-shift the byte mask
+           and extract bits <15:8> and apply that zap at the end.  This
+           is equivalent to simply performing the zap first and shifting
+           afterward.  */
+        gen_zapnoti(tmp, va, byte_mask);
+
         tcg_gen_andi_i64(shift, load_gpr(ctx, rb), 7);
         tcg_gen_shli_i64(shift, shift, 3);
         tcg_gen_shl_i64(vc, tmp, shift);
         tcg_temp_free(shift);
+        tcg_temp_free(tmp);
     }
-    tcg_temp_free(tmp);
 }
 
 /* MSKWH, MSKLH, MSKQH */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 14/18] target-arm: Use new deposit and extract ops
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (12 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 13/18] target-alpha: Use deposit and extract ops Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 15/18] target-i386: " Richard Henderson
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Use the new primitives for UBFX and SBFX.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 79 +++++++++++++++-------------------------------
 target-arm/translate.c     | 37 +++++-----------------
 2 files changed, 34 insertions(+), 82 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 2d5c1a2..8041715 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -3157,67 +3157,40 @@ static void disas_bitfield(DisasContext *s, uint32_t insn)
        low 32-bits anyway.  */
     tcg_tmp = read_cpu_reg(s, rn, 1);
 
-    /* Recognize the common aliases.  */
-    if (opc == 0) { /* SBFM */
-        if (ri == 0) {
-            if (si == 7) { /* SXTB */
-                tcg_gen_ext8s_i64(tcg_rd, tcg_tmp);
-                goto done;
-            } else if (si == 15) { /* SXTH */
-                tcg_gen_ext16s_i64(tcg_rd, tcg_tmp);
-                goto done;
-            } else if (si == 31) { /* SXTW */
-                tcg_gen_ext32s_i64(tcg_rd, tcg_tmp);
-                goto done;
-            }
-        }
-        if (si == 63 || (si == 31 && ri <= si)) { /* ASR */
-            if (si == 31) {
-                tcg_gen_ext32s_i64(tcg_tmp, tcg_tmp);
-            }
-            tcg_gen_sari_i64(tcg_rd, tcg_tmp, ri);
+    /* Recognize simple(r) extractions.  */
+    if (ri <= si) {
+        int len = (si - ri) + 1;
+        if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
+            tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
             goto done;
-        }
-    } else if (opc == 2) { /* UBFM */
-        if (ri == 0) { /* UXTB, UXTH, plus non-canonical AND */
-            tcg_gen_andi_i64(tcg_rd, tcg_tmp, bitmask64(si + 1));
-            return;
-        }
-        if (si == 63 || (si == 31 && ri <= si)) { /* LSR */
-            if (si == 31) {
-                tcg_gen_ext32u_i64(tcg_tmp, tcg_tmp);
-            }
-            tcg_gen_shri_i64(tcg_rd, tcg_tmp, ri);
+        } else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
+            tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
             return;
         }
-        if (si + 1 == ri && si != bitsize - 1) { /* LSL */
-            int shift = bitsize - 1 - si;
-            tcg_gen_shli_i64(tcg_rd, tcg_tmp, shift);
-            goto done;
-        }
     }
 
-    if (opc != 1) { /* SBFM or UBFM */
-        tcg_gen_movi_i64(tcg_rd, 0);
-    }
+    /* Do the bit move operation.  Note that above we handled ri <= si,
+       Wd<s-r:0> = Wn<s:r>, via tcg_gen_*extract_i64.  Now we handle
+       the ri > si case, Wd<32+s-r,32-r> = Wn<s:0>, via deposit.  */
+    pos = bitsize - ri;
+    len = si + 1;
 
-    /* do the bit move operation */
-    if (si >= ri) {
-        /* Wd<s-r:0> = Wn<s:r> */
-        tcg_gen_shri_i64(tcg_tmp, tcg_tmp, ri);
-        pos = 0;
-        len = (si - ri) + 1;
-    } else {
-        /* Wd<32+s-r,32-r> = Wn<s:0> */
-        pos = bitsize - ri;
-        len = si + 1;
+    if (opc == 0 && len < ri) {
+        /* SBFM: sign extend the destination field from len to fill
+           the balance of the word.  Let the deposit below insert all
+           of those sign bits.  */
+        tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
+        len = ri;
     }
 
-    tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
-
-    if (opc == 0) { /* SBFM - sign extend the destination field */
-        tcg_gen_shli_i64(tcg_rd, tcg_rd, 64 - (pos + len));
-        tcg_gen_sari_i64(tcg_rd, tcg_rd, 64 - (pos + len));
+    if (opc == 1) { /* BFM */
+        tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
+    } else {
+        /* SBFM or UBFM: We start with zero, and we haven't modified
+           any bits outside bitsize, therefore the zero-extension
+           below is unneeded.  */
+        tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
+        return;
     }
 
  done:
diff --git a/target-arm/translate.c b/target-arm/translate.c
index aaf6135..37ad61d 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -297,29 +297,6 @@ static void gen_revsh(TCGv_i32 var)
     tcg_gen_ext16s_i32(var, var);
 }
 
-/* Unsigned bitfield extract.  */
-static void gen_ubfx(TCGv_i32 var, int shift, uint32_t mask)
-{
-    if (shift)
-        tcg_gen_shri_i32(var, var, shift);
-    tcg_gen_andi_i32(var, var, mask);
-}
-
-/* Signed bitfield extract.  */
-static void gen_sbfx(TCGv_i32 var, int shift, int width)
-{
-    uint32_t signbit;
-
-    if (shift)
-        tcg_gen_sari_i32(var, var, shift);
-    if (shift + width < 32) {
-        signbit = 1u << (width - 1);
-        tcg_gen_andi_i32(var, var, (1u << width) - 1);
-        tcg_gen_xori_i32(var, var, signbit);
-        tcg_gen_subi_i32(var, var, signbit);
-    }
-}
-
 /* Return (b << 32) + a. Mark inputs as dead */
 static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
 {
@@ -9234,9 +9211,9 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                             goto illegal_op;
                         if (i < 32) {
                             if (op1 & 0x20) {
-                                gen_ubfx(tmp, shift, (1u << i) - 1);
+                                tcg_gen_extract_i32(tmp, tmp, shift, i);
                             } else {
-                                gen_sbfx(tmp, shift, i);
+                                tcg_gen_sextract_i32(tmp, tmp, shift, i);
                             }
                         }
                         store_reg(s, rd, tmp);
@@ -10551,15 +10528,17 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                         imm++;
                         if (shift + imm > 32)
                             goto illegal_op;
-                        if (imm < 32)
-                            gen_sbfx(tmp, shift, imm);
+                        if (imm < 32) {
+                            tcg_gen_sextract_i32(tmp, tmp, shift, imm);
+                        }
                         break;
                     case 6: /* Unsigned bitfield extract.  */
                         imm++;
                         if (shift + imm > 32)
                             goto illegal_op;
-                        if (imm < 32)
-                            gen_ubfx(tmp, shift, (1u << imm) - 1);
+                        if (imm < 32) {
+                            tcg_gen_extract_i32(tmp, tmp, shift, imm);
+                        }
                         break;
                     case 3: /* Bitfield insert/clear.  */
                         if (imm < shift)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 15/18] target-i386: Use new deposit and extract ops
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (13 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 14/18] target-arm: Use new " Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 16/18] target-mips: Use the new extract op Richard Henderson
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

A couple of places where it was easy to identify a right-shift
followed by an extract or and-with-immediate, and the obvious
sign-extract from a high byte register.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 2f60e9c..4a3014c 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -383,8 +383,7 @@ static void gen_op_mov_reg_v(TCGMemOp ot, int reg, TCGv t0)
 static inline void gen_op_mov_v_reg(TCGMemOp ot, TCGv t0, int reg)
 {
     if (ot == MO_8 && byte_reg_is_xH(reg)) {
-        tcg_gen_shri_tl(t0, cpu_regs[reg - 4], 8);
-        tcg_gen_ext8u_tl(t0, t0);
+        tcg_gen_extract_tl(t0, cpu_regs[reg - 4], 8, 8);
     } else {
         tcg_gen_mov_tl(t0, cpu_regs[reg]);
     }
@@ -3715,8 +3714,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
 
                     /* Extract the LEN into a mask.  Lengths larger than
                        operand size get all ones.  */
-                    tcg_gen_shri_tl(cpu_A0, cpu_regs[s->vex_v], 8);
-                    tcg_gen_ext8u_tl(cpu_A0, cpu_A0);
+                    tcg_gen_extract_tl(cpu_A0, cpu_regs[s->vex_v], 8, 8);
                     tcg_gen_movcond_tl(TCG_COND_LEU, cpu_A0, cpu_A0, bound,
                                        cpu_A0, bound);
                     tcg_temp_free(bound);
@@ -3867,9 +3865,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                             gen_compute_eflags(s);
                         }
                         carry_in = cpu_tmp0;
-                        tcg_gen_shri_tl(carry_in, cpu_cc_src,
-                                        ctz32(b == 0x1f6 ? CC_C : CC_O));
-                        tcg_gen_andi_tl(carry_in, carry_in, 1);
+                        tcg_gen_extract_tl(carry_in, cpu_cc_src,
+                                           ctz32(b == 0x1f6 ? CC_C : CC_O), 1);
                     }
 
                     switch (ot) {
@@ -5340,21 +5337,25 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             rm = (modrm & 7) | REX_B(s);
 
             if (mod == 3) {
-                gen_op_mov_v_reg(ot, cpu_T0, rm);
-                switch (s_ot) {
-                case MO_UB:
-                    tcg_gen_ext8u_tl(cpu_T0, cpu_T0);
-                    break;
-                case MO_SB:
-                    tcg_gen_ext8s_tl(cpu_T0, cpu_T0);
-                    break;
-                case MO_UW:
-                    tcg_gen_ext16u_tl(cpu_T0, cpu_T0);
-                    break;
-                default:
-                case MO_SW:
-                    tcg_gen_ext16s_tl(cpu_T0, cpu_T0);
-                    break;
+                if (s_ot == MO_SB && byte_reg_is_xH(rm)) {
+                    tcg_gen_sextract_tl(cpu_T0, cpu_regs[rm - 4], 8, 8);
+                } else {
+                    gen_op_mov_v_reg(ot, cpu_T0, rm);
+                    switch (s_ot) {
+                    case MO_UB:
+                        tcg_gen_ext8u_tl(cpu_T0, cpu_T0);
+                        break;
+                    case MO_SB:
+                        tcg_gen_ext8s_tl(cpu_T0, cpu_T0);
+                        break;
+                    case MO_UW:
+                        tcg_gen_ext16u_tl(cpu_T0, cpu_T0);
+                        break;
+                    default:
+                    case MO_SW:
+                        tcg_gen_ext16s_tl(cpu_T0, cpu_T0);
+                        break;
+                    }
                 }
                 gen_op_mov_reg_v(d_ot, reg, cpu_T0);
             } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 16/18] target-mips: Use the new extract op
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (14 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 15/18] target-i386: " Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-27 12:43   ` Yongbok Kim
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 17/18] target-ppc: Use the new deposit and extract ops Richard Henderson
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Use extract for EXT and DEXT.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index d8dde7a..cf79aa4 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -4484,11 +4484,12 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
         if (lsb + msb > 31) {
             goto fail;
         }
-        tcg_gen_shri_tl(t0, t1, lsb);
         if (msb != 31) {
-            tcg_gen_andi_tl(t0, t0, (1U << (msb + 1)) - 1);
+            tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
         } else {
-            tcg_gen_ext32s_tl(t0, t0);
+            /* The two checks together imply that lsb == 0,
+               so this is a simple sign-extension.  */
+            tcg_gen_ext32s_tl(t0, t1);
         }
         break;
 #if defined(TARGET_MIPS64)
@@ -4503,10 +4504,7 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
         if (lsb + msb > 63) {
             goto fail;
         }
-        tcg_gen_shri_tl(t0, t1, lsb);
-        if (msb != 63) {
-            tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
-        }
+        tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
         break;
 #endif
     case OPC_INS:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 17/18] target-ppc: Use the new deposit and extract ops
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (15 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 16/18] target-mips: Use the new extract op Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-27  2:09   ` David Gibson
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 18/18] target-s390x: " Richard Henderson
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Use the new primitives for RDWINM and RLDICL.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index bfc1301..7b12303 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1970,16 +1970,16 @@ static void gen_rlwinm(DisasContext *ctx)
 {
     TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
     TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
-    uint32_t sh = SH(ctx->opcode);
-    uint32_t mb = MB(ctx->opcode);
-    uint32_t me = ME(ctx->opcode);
-
-    if (mb == 0 && me == (31 - sh)) {
-        tcg_gen_shli_tl(t_ra, t_rs, sh);
-        tcg_gen_ext32u_tl(t_ra, t_ra);
-    } else if (sh != 0 && me == 31 && sh == (32 - mb)) {
-        tcg_gen_ext32u_tl(t_ra, t_rs);
-        tcg_gen_shri_tl(t_ra, t_ra, mb);
+    int sh = SH(ctx->opcode);
+    int mb = MB(ctx->opcode);
+    int me = ME(ctx->opcode);
+    int len = me - mb + 1;
+    int rsh = (32 - sh) & 31;
+
+    if (sh != 0 && len > 0 && me == (31 - sh)) {
+        tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
+    } else if (me == 31 && rsh + len <= 32) {
+        tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
     } else {
         target_ulong mask;
 #if defined(TARGET_PPC64)
@@ -1987,8 +1987,9 @@ static void gen_rlwinm(DisasContext *ctx)
         me += 32;
 #endif
         mask = MASK(mb, me);
-
-        if (mask <= 0xffffffffu) {
+        if (sh == 0) {
+            tcg_gen_andi_tl(t_ra, t_rs, mask);
+        } else if (mask <= 0xffffffffu) {
             TCGv_i32 t0 = tcg_temp_new_i32();
             tcg_gen_trunc_tl_i32(t0, t_rs);
             tcg_gen_rotli_i32(t0, t0, sh);
@@ -2091,11 +2092,13 @@ static void gen_rldinm(DisasContext *ctx, int mb, int me, int sh)
 {
     TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
     TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
+    int len = me - mb + 1;
+    int rsh = (64 - sh) & 63;
 
-    if (sh != 0 && mb == 0 && me == (63 - sh)) {
-        tcg_gen_shli_tl(t_ra, t_rs, sh);
-    } else if (sh != 0 && me == 63 && sh == (64 - mb)) {
-        tcg_gen_shri_tl(t_ra, t_rs, mb);
+    if (sh != 0 && len > 0 && me == (63 - sh)) {
+        tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
+    } else if (me == 63 && rsh + len <= 64) {
+        tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
     } else {
         tcg_gen_rotli_tl(t_ra, t_rs, sh);
         tcg_gen_andi_tl(t_ra, t_ra, MASK(mb, me));
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 18/18] target-s390x: Use the new deposit and extract ops
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (16 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 17/18] target-ppc: Use the new deposit and extract ops Richard Henderson
@ 2016-10-18 15:10 ` Richard Henderson
  2016-10-18 16:15 ` [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives no-reply
  2016-10-24 19:04 ` Richard Henderson
  19 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 15:10 UTC (permalink / raw)
  To: qemu-devel

Use the new primitives for RISBG.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-s390x/translate.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index 02bc705..6cebb7e 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -3134,20 +3134,26 @@ static ExitStatus op_risbg(DisasContext *s, DisasOps *o)
         }
     }
 
-    /* In some cases we can implement this with deposit, which can be more
-       efficient on some hosts.  */
-    if (~mask == imask && i3 <= i4) {
-        if (s->fields->op2 == 0x5d) {
-            i3 += 32, i4 += 32;
-        }
+    len = i4 - i3 + 1;
+    pos = 63 - i4;
+    rot = i5 & 63;
+    if (s->fields->op2 == 0x5d) {
+        pos += 32;
+    }
+
+    /* In some cases we can implement this with extract.  */
+    if (imask == 0 && pos == 0 && len > 0 && rot + len <= 64) {
+        tcg_gen_extract_i64(o->out, o->in2, rot, len);
+        return NO_EXIT;
+    }
+
+    /* In some cases we can implement this with deposit.  */
+    if (len > 0 && (imask == 0 || ~mask == imask)) {
         /* Note that we rotate the bits to be inserted to the lsb, not to
            the position as described in the PoO.  */
-        len = i4 - i3 + 1;
-        pos = 63 - i4;
-        rot = (i5 - pos) & 63;
+        rot = (rot - pos) & 63;
     } else {
-        pos = len = -1;
-        rot = i5 & 63;
+        pos = -1;
     }
 
     /* Rotate the input as necessary.  */
@@ -3155,7 +3161,11 @@ static ExitStatus op_risbg(DisasContext *s, DisasOps *o)
 
     /* Insert the selected bits into the output.  */
     if (pos >= 0) {
-        tcg_gen_deposit_i64(o->out, o->out, o->in2, pos, len);
+        if (imask == 0) {
+            tcg_gen_deposit_z_i64(o->out, o->in2, pos, len);
+        } else {
+            tcg_gen_deposit_i64(o->out, o->out, o->in2, pos, len);
+        }
     } else if (imask == 0) {
         tcg_gen_andi_i64(o->out, o->in2, mask);
     } else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes Richard Henderson
@ 2016-10-18 15:33   ` Claudio Fontana
  2016-10-18 16:11     ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Claudio Fontana @ 2016-10-18 15:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 18.10.2016 17:10, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.h     |  8 ++++----
>  tcg/aarch64/tcg-target.inc.c | 14 ++++++++++++++
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 410c31b..4a74bd8 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -63,8 +63,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      1
> -#define TCG_TARGET_HAS_extract_i32      0
> -#define TCG_TARGET_HAS_sextract_i32     0
> +#define TCG_TARGET_HAS_extract_i32      1
> +#define TCG_TARGET_HAS_sextract_i32     1
>  #define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         1
>  #define TCG_TARGET_HAS_sub2_i32         1
> @@ -95,8 +95,8 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      1
> -#define TCG_TARGET_HAS_extract_i64      0
> -#define TCG_TARGET_HAS_sextract_i64     0
> +#define TCG_TARGET_HAS_extract_i64      1
> +#define TCG_TARGET_HAS_sextract_i64     1
>  #define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         1
>  #define TCG_TARGET_HAS_sub2_i64         1
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 1939d35..c0e9890 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -1640,6 +1640,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
>          break;
>  
> +    case INDEX_op_extract_i64:
> +    case INDEX_op_extract_i32:
> +        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
> +        break;
> +
> +    case INDEX_op_sextract_i64:
> +    case INDEX_op_sextract_i32:
> +        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
> +        break;
> +

ah, probably missing something obvious.. not args[3]?



>      case INDEX_op_add2_i32:
>          tcg_out_addsub2(s, TCG_TYPE_I32, a0, a1, REG0(2), REG0(3),
>                          (int32_t)args[4], args[5], const_args[4],
> @@ -1785,6 +1795,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>  
>      { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
>      { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
> +    { INDEX_op_extract_i32, { "r", "r" } },
> +    { INDEX_op_extract_i64, { "r", "r" } },
> +    { INDEX_op_sextract_i32, { "r", "r" } },
> +    { INDEX_op_sextract_i64, { "r", "r" } },
>  
>      { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
>      { INDEX_op_add2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes
  2016-10-18 15:33   ` Claudio Fontana
@ 2016-10-18 16:11     ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-18 16:11 UTC (permalink / raw)
  To: Claudio Fontana, qemu-devel

On 10/18/2016 08:33 AM, Claudio Fontana wrote:
>> > +    case INDEX_op_extract_i64:
>> > +    case INDEX_op_extract_i32:
>> > +        tcg_out_ubfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
>> > +        break;
>> > +
>> > +    case INDEX_op_sextract_i64:
>> > +    case INDEX_op_sextract_i32:
>> > +        tcg_out_sbfm(s, ext, a0, a1, a2, a2 + args[3] - 1);
>> > +        break;
>> > +
> ah, probably missing something obvious.. not args[3]?

No, the r and s fields encode lsb and msb of the field.

I had the same wrong memory of the encoding; which is fair considering the 
aarch32 instruction *does* use lsb and size-1.

The failure was quite obvious even in the disassembly once I actually got a 
machine checked out to test it.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (17 preceding siblings ...)
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 18/18] target-s390x: " Richard Henderson
@ 2016-10-18 16:15 ` no-reply
  2016-10-24 19:04 ` Richard Henderson
  19 siblings, 0 replies; 35+ messages in thread
From: no-reply @ 2016-10-18 16:15 UTC (permalink / raw)
  To: rth; +Cc: famz, qemu-devel

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1476803431-7208-1-git-send-email-rth@twiddle.net
Subject: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
c797e38 target-s390x: Use the new deposit and extract ops
8665678 target-ppc: Use the new deposit and extract ops
3ecbf06 target-mips: Use the new extract op
2729c07 target-i386: Use new deposit and extract ops
07d7282 target-arm: Use new deposit and extract ops
c2a8f3c target-alpha: Use deposit and extract ops
1e56779 tcg/s390: Support deposit into zero
21e361f tcg/s390: Implement field extraction opcodes
23f1f91 tcg/s390: Expose host facilities to tcg-target.h
a2378d7 tcg/ppc: Implement field extraction opcodes
d2df385 tcg/mips: Implement field extraction opcodes
ad6690d tcg/i386: Implement field extraction opcodes
84cdccb tcg/arm: Implement field extraction opcodes
2337625 tcg/arm: Move isa detection to tcg-target.h
1e01570 tcg/aarch64: Implement field extraction opcodes
f6f72e0 tcg: Add deposit_z expander
a0dcab5 tcg: Minor adjustments to deposit expanders
3359201 tcg: Add field extraction primitives

=== OUTPUT BEGIN ===
Checking PATCH 1/18: tcg: Add field extraction primitives...
ERROR: spaces required around that ':' (ctx:VxE)
#105: FILE: tcg/optimize.c:881:
+        CASE_OP_32_64(extract):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#111: FILE: tcg/optimize.c:887:
+        CASE_OP_32_64(sextract):
                                ^

ERROR: spaces required around that ':' (ctx:VxE)
#125: FILE: tcg/optimize.c:1064:
+        CASE_OP_32_64(extract):
                               ^

ERROR: spaces required around that ':' (ctx:VxE)
#133: FILE: tcg/optimize.c:1072:
+        CASE_OP_32_64(sextract):
                                ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#237: FILE: tcg/tcg-op.c:592:
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#300: FILE: tcg/tcg-op.c:655:
+        && TCG_TARGET_extract_i32_valid(ofs, len)) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#386: FILE: tcg/tcg-op.c:1782:
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
         ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#485: FILE: tcg/tcg-op.c:1881:
+        && TCG_TARGET_extract_i64_valid(ofs, len)) {
         ^

total: 8 errors, 0 warnings, 563 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/18: tcg: Minor adjustments to deposit expanders...
Checking PATCH 3/18: tcg: Add deposit_z expander...
ERROR: space prohibited after that '&&' (ctx:ExW)
#33: FILE: tcg/tcg-op.c:587:
+               && TCG_TARGET_deposit_i32_valid(ofs, len)) {
                ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#98: FILE: tcg/tcg-op.c:1819:
+               && TCG_TARGET_deposit_i64_valid(ofs, len)) {
                ^

total: 2 errors, 0 warnings, 185 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/18: tcg/aarch64: Implement field extraction opcodes...
Checking PATCH 5/18: tcg/arm: Move isa detection to tcg-target.h...
WARNING: architecture specific defines should be avoided
#18: FILE: tcg/arm/tcg-target.h:30:
+#ifndef __ARM_ARCH

WARNING: architecture specific defines should be avoided
#19: FILE: tcg/arm/tcg-target.h:31:
+# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \

WARNING: architecture specific defines should be avoided
#38: FILE: tcg/arm/tcg-target.h:50:
+#if defined(__ARM_ARCH_5T__) \

total: 0 errors, 3 warnings, 107 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 6/18: tcg/arm: Implement field extraction opcodes...
Checking PATCH 7/18: tcg/i386: Implement field extraction opcodes...
Checking PATCH 8/18: tcg/mips: Implement field extraction opcodes...
Checking PATCH 9/18: tcg/ppc: Implement field extraction opcodes...
Checking PATCH 10/18: tcg/s390: Expose host facilities to tcg-target.h...
ERROR: code indent should never use tabs
#51: FILE: tcg/s390/tcg-target.h:55:
+#define FACILITY_ZARCH_ACTIVE^I(1ULL << (63 - 2))$

ERROR: code indent should never use tabs
#52: FILE: tcg/s390/tcg-target.h:56:
+#define FACILITY_LONG_DISP^I(1ULL << (63 - 18))$

ERROR: code indent should never use tabs
#53: FILE: tcg/s390/tcg-target.h:57:
+#define FACILITY_EXT_IMM^I(1ULL << (63 - 21))$

ERROR: code indent should never use tabs
#54: FILE: tcg/s390/tcg-target.h:58:
+#define FACILITY_GEN_INST_EXT^I(1ULL << (63 - 34))$

total: 4 errors, 0 warnings, 388 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 11/18: tcg/s390: Implement field extraction opcodes...
ERROR: spaces required around that ':' (ctx:VxE)
#52: FILE: tcg/s390/tcg-target.inc.c:2162:
+    OP_32_64(extract):
                      ^

total: 1 errors, 0 warnings, 51 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 12/18: tcg/s390: Support deposit into zero...
Checking PATCH 13/18: target-alpha: Use deposit and extract ops...
Checking PATCH 14/18: target-arm: Use new deposit and extract ops...
Checking PATCH 15/18: target-i386: Use new deposit and extract ops...
Checking PATCH 16/18: target-mips: Use the new extract op...
Checking PATCH 17/18: target-ppc: Use the new deposit and extract ops...
Checking PATCH 18/18: target-s390x: Use the new deposit and extract ops...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
                   ` (18 preceding siblings ...)
  2016-10-18 16:15 ` [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives no-reply
@ 2016-10-24 19:04 ` Richard Henderson
  2016-10-25 11:48   ` Eduardo Habkost
                     ` (3 more replies)
  19 siblings, 4 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-24 19:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Eduardo Habkost, Yongbok Kim, David Gibson,
	qemu-ppc, Alexander Graf

Pinging target maintainers.  If I don't get responses by the end of the week, 
I'll only push the generic tcg bits and the two targets that I maintain.


r~


On 10/18/2016 08:10 AM, Richard Henderson wrote:
> Better tested this time, including aarch64 host.
>
> Changes since v1:
>   * Added tcg_gen_deposit_z_*.  Depositing into zero turns out to be
>     quite common among targets.  Providing that as a primitive expander
>     allows us to easily generate optimal-ish code for hosts with and
>     without a real deposit operation.
>   * Cleanups in tcg/s390 akin to those I already did for tcg/arm.
>   * Add support in tcg/s390 for deposit into zero.
>   * More special cases in the expanders for better code generation,
>     especially on an x86 host *without* the extract primitives.
>   * Silly think-o on aarch64 host.
>
>
> r~
>
>
> Richard Henderson (18):
>   tcg: Add field extraction primitives
>   tcg: Minor adjustments to deposit expanders
>   tcg: Add deposit_z expander
>   tcg/aarch64: Implement field extraction opcodes
>   tcg/arm: Move isa detection to tcg-target.h
>   tcg/arm: Implement field extraction opcodes
>   tcg/i386: Implement field extraction opcodes
>   tcg/mips: Implement field extraction opcodes
>   tcg/ppc: Implement field extraction opcodes
>   tcg/s390: Expose host facilities to tcg-target.h
>   tcg/s390: Implement field extraction opcodes
>   tcg/s390: Support deposit into zero
>   target-alpha: Use deposit and extract ops
>   target-arm: Use new deposit and extract ops
>   target-i386: Use new deposit and extract ops
>   target-mips: Use the new extract op
>   target-ppc: Use the new deposit and extract ops
>   target-s390x: Use the new deposit and extract ops
>
>  target-alpha/translate.c     |  67 ++++---
>  target-arm/translate-a64.c   |  79 +++-----
>  target-arm/translate.c       |  37 +---
>  target-i386/translate.c      |  45 +++--
>  target-mips/translate.c      |  12 +-
>  target-ppc/translate.c       |  35 ++--
>  target-s390x/translate.c     |  34 ++--
>  tcg/aarch64/tcg-target.h     |   4 +
>  tcg/aarch64/tcg-target.inc.c |  14 ++
>  tcg/arm/tcg-target.h         |  38 +++-
>  tcg/arm/tcg-target.inc.c     |  63 +++---
>  tcg/i386/tcg-target.h        |  10 +
>  tcg/i386/tcg-target.inc.c    |  38 ++++
>  tcg/ia64/tcg-target.h        |   4 +
>  tcg/mips/tcg-target.h        |   2 +
>  tcg/mips/tcg-target.inc.c    |   4 +
>  tcg/optimize.c               |  29 +++
>  tcg/ppc/tcg-target.h         |   4 +
>  tcg/ppc/tcg-target.inc.c     |  10 +
>  tcg/s390/tcg-target.h        | 122 +++++++-----
>  tcg/s390/tcg-target.inc.c    | 113 ++++++-----
>  tcg/sparc/tcg-target.h       |   4 +
>  tcg/tcg-op.c                 | 465 ++++++++++++++++++++++++++++++++++++++++++-
>  tcg/tcg-op.h                 |  18 ++
>  tcg/tcg-opc.h                |   4 +
>  tcg/tcg.h                    |   8 +
>  tcg/tci/tcg-target.h         |   4 +
>  27 files changed, 954 insertions(+), 313 deletions(-)
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-24 19:04 ` Richard Henderson
@ 2016-10-25 11:48   ` Eduardo Habkost
  2016-10-25 12:49   ` Paolo Bonzini
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Eduardo Habkost @ 2016-10-25 11:48 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Peter Maydell, Yongbok Kim, David Gibson, qemu-ppc,
	Alexander Graf

On Mon, Oct 24, 2016 at 12:04:33PM -0700, Richard Henderson wrote:
> Pinging target maintainers.  If I don't get responses by the end of the
> week, I'll only push the generic tcg bits and the two targets that I
> maintain.

I can't say I fully reviewed it, but I trust your judgement. For
the i386 parts:

Acked-by: Eduardo Habkost <ehabkost@redhat.com>

> 
> 
> r~
> 
> 
> On 10/18/2016 08:10 AM, Richard Henderson wrote:
> > Better tested this time, including aarch64 host.
> > 
> > Changes since v1:
> >   * Added tcg_gen_deposit_z_*.  Depositing into zero turns out to be
> >     quite common among targets.  Providing that as a primitive expander
> >     allows us to easily generate optimal-ish code for hosts with and
> >     without a real deposit operation.
> >   * Cleanups in tcg/s390 akin to those I already did for tcg/arm.
> >   * Add support in tcg/s390 for deposit into zero.
> >   * More special cases in the expanders for better code generation,
> >     especially on an x86 host *without* the extract primitives.
> >   * Silly think-o on aarch64 host.
> > 
> > 
> > r~
> > 
> > 
> > Richard Henderson (18):
> >   tcg: Add field extraction primitives
> >   tcg: Minor adjustments to deposit expanders
> >   tcg: Add deposit_z expander
> >   tcg/aarch64: Implement field extraction opcodes
> >   tcg/arm: Move isa detection to tcg-target.h
> >   tcg/arm: Implement field extraction opcodes
> >   tcg/i386: Implement field extraction opcodes
> >   tcg/mips: Implement field extraction opcodes
> >   tcg/ppc: Implement field extraction opcodes
> >   tcg/s390: Expose host facilities to tcg-target.h
> >   tcg/s390: Implement field extraction opcodes
> >   tcg/s390: Support deposit into zero
> >   target-alpha: Use deposit and extract ops
> >   target-arm: Use new deposit and extract ops
> >   target-i386: Use new deposit and extract ops
> >   target-mips: Use the new extract op
> >   target-ppc: Use the new deposit and extract ops
> >   target-s390x: Use the new deposit and extract ops
> > 
> >  target-alpha/translate.c     |  67 ++++---
> >  target-arm/translate-a64.c   |  79 +++-----
> >  target-arm/translate.c       |  37 +---
> >  target-i386/translate.c      |  45 +++--
> >  target-mips/translate.c      |  12 +-
> >  target-ppc/translate.c       |  35 ++--
> >  target-s390x/translate.c     |  34 ++--
> >  tcg/aarch64/tcg-target.h     |   4 +
> >  tcg/aarch64/tcg-target.inc.c |  14 ++
> >  tcg/arm/tcg-target.h         |  38 +++-
> >  tcg/arm/tcg-target.inc.c     |  63 +++---
> >  tcg/i386/tcg-target.h        |  10 +
> >  tcg/i386/tcg-target.inc.c    |  38 ++++
> >  tcg/ia64/tcg-target.h        |   4 +
> >  tcg/mips/tcg-target.h        |   2 +
> >  tcg/mips/tcg-target.inc.c    |   4 +
> >  tcg/optimize.c               |  29 +++
> >  tcg/ppc/tcg-target.h         |   4 +
> >  tcg/ppc/tcg-target.inc.c     |  10 +
> >  tcg/s390/tcg-target.h        | 122 +++++++-----
> >  tcg/s390/tcg-target.inc.c    | 113 ++++++-----
> >  tcg/sparc/tcg-target.h       |   4 +
> >  tcg/tcg-op.c                 | 465 ++++++++++++++++++++++++++++++++++++++++++-
> >  tcg/tcg-op.h                 |  18 ++
> >  tcg/tcg-opc.h                |   4 +
> >  tcg/tcg.h                    |   8 +
> >  tcg/tci/tcg-target.h         |   4 +
> >  27 files changed, 954 insertions(+), 313 deletions(-)
> > 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction opcodes
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 07/18] tcg/i386: " Richard Henderson
@ 2016-10-25 12:46   ` Paolo Bonzini
  2016-10-25 16:46     ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2016-10-25 12:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel



On 18/10/2016 17:10, Richard Henderson wrote:
> +    case INDEX_op_extract_i32:
> +        /* On the off-chance that we can use the high-byte registers.
> +           Otherwise we emit the same ext16 + shift pattern that we
> +           would have gotten from the normal tcg-op.c expansion.  */
> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
> +        if (args[1] < 4 && args[0] < 8) {
> +            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
> +        } else {
> +            tcg_out_ext16u(s, args[0], args[1]);
> +            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
> +        }

Since the opcode is pretty rare, perhaps it's worth restricting the
constraints to, respectively, a new constraint for 0xff ("R"?) and "Q"?
It should generate slightly better code without constraining the
register allocator too much.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-24 19:04 ` Richard Henderson
  2016-10-25 11:48   ` Eduardo Habkost
@ 2016-10-25 12:49   ` Paolo Bonzini
  2016-10-26  3:02   ` David Gibson
  2016-10-31 14:00   ` Peter Maydell
  3 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2016-10-25 12:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Peter Maydell, Eduardo Habkost, Alexander Graf, qemu-ppc,
	Yongbok Kim, David Gibson



On 24/10/2016 21:04, Richard Henderson wrote:
> Pinging target maintainers.  If I don't get responses by the end of the
> week, I'll only push the generic tcg bits and the two targets that I
> maintain.

There's no documentation in tcg/README; apart from that looks good.

Paolo

> 
> r~
> 
> 
> On 10/18/2016 08:10 AM, Richard Henderson wrote:
>> Better tested this time, including aarch64 host.
>>
>> Changes since v1:
>>   * Added tcg_gen_deposit_z_*.  Depositing into zero turns out to be
>>     quite common among targets.  Providing that as a primitive expander
>>     allows us to easily generate optimal-ish code for hosts with and
>>     without a real deposit operation.
>>   * Cleanups in tcg/s390 akin to those I already did for tcg/arm.
>>   * Add support in tcg/s390 for deposit into zero.
>>   * More special cases in the expanders for better code generation,
>>     especially on an x86 host *without* the extract primitives.
>>   * Silly think-o on aarch64 host.
>>
>>
>> r~
>>
>>
>> Richard Henderson (18):
>>   tcg: Add field extraction primitives
>>   tcg: Minor adjustments to deposit expanders
>>   tcg: Add deposit_z expander
>>   tcg/aarch64: Implement field extraction opcodes
>>   tcg/arm: Move isa detection to tcg-target.h
>>   tcg/arm: Implement field extraction opcodes
>>   tcg/i386: Implement field extraction opcodes
>>   tcg/mips: Implement field extraction opcodes
>>   tcg/ppc: Implement field extraction opcodes
>>   tcg/s390: Expose host facilities to tcg-target.h
>>   tcg/s390: Implement field extraction opcodes
>>   tcg/s390: Support deposit into zero
>>   target-alpha: Use deposit and extract ops
>>   target-arm: Use new deposit and extract ops
>>   target-i386: Use new deposit and extract ops
>>   target-mips: Use the new extract op
>>   target-ppc: Use the new deposit and extract ops
>>   target-s390x: Use the new deposit and extract ops
>>
>>  target-alpha/translate.c     |  67 ++++---
>>  target-arm/translate-a64.c   |  79 +++-----
>>  target-arm/translate.c       |  37 +---
>>  target-i386/translate.c      |  45 +++--
>>  target-mips/translate.c      |  12 +-
>>  target-ppc/translate.c       |  35 ++--
>>  target-s390x/translate.c     |  34 ++--
>>  tcg/aarch64/tcg-target.h     |   4 +
>>  tcg/aarch64/tcg-target.inc.c |  14 ++
>>  tcg/arm/tcg-target.h         |  38 +++-
>>  tcg/arm/tcg-target.inc.c     |  63 +++---
>>  tcg/i386/tcg-target.h        |  10 +
>>  tcg/i386/tcg-target.inc.c    |  38 ++++
>>  tcg/ia64/tcg-target.h        |   4 +
>>  tcg/mips/tcg-target.h        |   2 +
>>  tcg/mips/tcg-target.inc.c    |   4 +
>>  tcg/optimize.c               |  29 +++
>>  tcg/ppc/tcg-target.h         |   4 +
>>  tcg/ppc/tcg-target.inc.c     |  10 +
>>  tcg/s390/tcg-target.h        | 122 +++++++-----
>>  tcg/s390/tcg-target.inc.c    | 113 ++++++-----
>>  tcg/sparc/tcg-target.h       |   4 +
>>  tcg/tcg-op.c                 | 465
>> ++++++++++++++++++++++++++++++++++++++++++-
>>  tcg/tcg-op.h                 |  18 ++
>>  tcg/tcg-opc.h                |   4 +
>>  tcg/tcg.h                    |   8 +
>>  tcg/tci/tcg-target.h         |   4 +
>>  27 files changed, 954 insertions(+), 313 deletions(-)
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction opcodes
  2016-10-25 12:46   ` Paolo Bonzini
@ 2016-10-25 16:46     ` Richard Henderson
  2016-10-25 16:48       ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2016-10-25 16:46 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

On 10/25/2016 05:46 AM, Paolo Bonzini wrote:
> 
> 
> On 18/10/2016 17:10, Richard Henderson wrote:
>> +    case INDEX_op_extract_i32:
>> +        /* On the off-chance that we can use the high-byte registers.
>> +           Otherwise we emit the same ext16 + shift pattern that we
>> +           would have gotten from the normal tcg-op.c expansion.  */
>> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
>> +        if (args[1] < 4 && args[0] < 8) {
>> +            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
>> +        } else {
>> +            tcg_out_ext16u(s, args[0], args[1]);
>> +            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
>> +        }
> 
> Since the opcode is pretty rare, perhaps it's worth restricting the
> constraints to, respectively, a new constraint for 0xff ("R"?) and "Q"?
> It should generate slightly better code without constraining the
> register allocator too much.

I tried that, but since our allocator does nothing to look forward to future
uses, it will only properly load a value into Q if this is the first use of the
value within the TB.  Otherwise it'll generate an extra move to satisfy the
constraint.

Given that movzwl can operate on any source, and can copy to another
destination at the same time, it's wasteful to force the register allocator to
generate the extra move.

This ext16u+shift form is what we'll generate without the special case here.
So if you prefer I could drop the %[abcd]h special case entirely.

The one that's particularly valuable is the 32-bit shift as extraction from a
64-bit input.  That turns out to happen lots for e.g. ppc64abi32 guest.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/18] tcg/i386: Implement field extraction opcodes
  2016-10-25 16:46     ` Richard Henderson
@ 2016-10-25 16:48       ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2016-10-25 16:48 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel



On 25/10/2016 18:46, Richard Henderson wrote:
> On 10/25/2016 05:46 AM, Paolo Bonzini wrote:
>>
>>
>> On 18/10/2016 17:10, Richard Henderson wrote:
>>> +    case INDEX_op_extract_i32:
>>> +        /* On the off-chance that we can use the high-byte registers.
>>> +           Otherwise we emit the same ext16 + shift pattern that we
>>> +           would have gotten from the normal tcg-op.c expansion.  */
>>> +        tcg_debug_assert(args[2] == 8 && args[3] == 8);
>>> +        if (args[1] < 4 && args[0] < 8) {
>>> +            tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
>>> +        } else {
>>> +            tcg_out_ext16u(s, args[0], args[1]);
>>> +            tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
>>> +        }
>>
>> Since the opcode is pretty rare, perhaps it's worth restricting the
>> constraints to, respectively, a new constraint for 0xff ("R"?) and "Q"?
>> It should generate slightly better code without constraining the
>> register allocator too much.
> 
> I tried that, but since our allocator does nothing to look forward to future
> uses, it will only properly load a value into Q if this is the first use of the
> value within the TB.  Otherwise it'll generate an extra move to satisfy the
> constraint.
> 
> Given that movzwl can operate on any source, and can copy to another
> destination at the same time, it's wasteful to force the register allocator to
> generate the extra move.
> 
> This ext16u+shift form is what we'll generate without the special case here.
> So if you prefer I could drop the %[abcd]h special case entirely.

Nah, as you said there's always a chance of satisfying the constraint
(and of getting a better register allocator).

> The one that's particularly valuable is the 32-bit shift as extraction from a
> 64-bit input.  That turns out to happen lots for e.g. ppc64abi32 guest.

Sounds good, thanks!

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-24 19:04 ` Richard Henderson
  2016-10-25 11:48   ` Eduardo Habkost
  2016-10-25 12:49   ` Paolo Bonzini
@ 2016-10-26  3:02   ` David Gibson
  2016-10-31 14:00   ` Peter Maydell
  3 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-26  3:02 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Peter Maydell, Eduardo Habkost, Yongbok Kim,
	qemu-ppc, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

On Mon, Oct 24, 2016 at 12:04:33PM -0700, Richard Henderson wrote:
> Pinging target maintainers.  If I don't get responses by the end of the
> week, I'll only push the generic tcg bits and the two targets that I
> maintain.

Sorry, missed this first time around.  The ppc host side looked ok to
me, but the ppc target side didn't look quite right.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/18] target-ppc: Use the new deposit and extract ops
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 17/18] target-ppc: Use the new deposit and extract ops Richard Henderson
@ 2016-10-27  2:09   ` David Gibson
  0 siblings, 0 replies; 35+ messages in thread
From: David Gibson @ 2016-10-27  2:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]

On Tue, Oct 18, 2016 at 08:10:30AM -0700, Richard Henderson wrote:
> Use the new primitives for RDWINM and RLDICL.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  target-ppc/translate.c | 35 +++++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index bfc1301..7b12303 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -1970,16 +1970,16 @@ static void gen_rlwinm(DisasContext *ctx)
>  {
>      TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
>      TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
> -    uint32_t sh = SH(ctx->opcode);
> -    uint32_t mb = MB(ctx->opcode);
> -    uint32_t me = ME(ctx->opcode);
> -
> -    if (mb == 0 && me == (31 - sh)) {
> -        tcg_gen_shli_tl(t_ra, t_rs, sh);
> -        tcg_gen_ext32u_tl(t_ra, t_ra);
> -    } else if (sh != 0 && me == 31 && sh == (32 - mb)) {
> -        tcg_gen_ext32u_tl(t_ra, t_rs);
> -        tcg_gen_shri_tl(t_ra, t_ra, mb);
> +    int sh = SH(ctx->opcode);
> +    int mb = MB(ctx->opcode);
> +    int me = ME(ctx->opcode);
> +    int len = me - mb + 1;
> +    int rsh = (32 - sh) & 31;
> +
> +    if (sh != 0 && len > 0 && me == (31 - sh)) {
> +        tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
> +    } else if (me == 31 && rsh + len <= 32) {
> +        tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
>      } else {
>          target_ulong mask;
>  #if defined(TARGET_PPC64)
> @@ -1987,8 +1987,9 @@ static void gen_rlwinm(DisasContext *ctx)
>          me += 32;
>  #endif
>          mask = MASK(mb, me);
> -
> -        if (mask <= 0xffffffffu) {
> +        if (sh == 0) {
> +            tcg_gen_andi_tl(t_ra, t_rs, mask);
> +        } else if (mask <= 0xffffffffu) {
>              TCGv_i32 t0 = tcg_temp_new_i32();
>              tcg_gen_trunc_tl_i32(t0, t_rs);
>              tcg_gen_rotli_i32(t0, t0, sh);
> @@ -2091,11 +2092,13 @@ static void gen_rldinm(DisasContext *ctx, int mb, int me, int sh)
>  {
>      TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
>      TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
> +    int len = me - mb + 1;
> +    int rsh = (64 - sh) & 63;
>  
> -    if (sh != 0 && mb == 0 && me == (63 - sh)) {
> -        tcg_gen_shli_tl(t_ra, t_rs, sh);
> -    } else if (sh != 0 && me == 63 && sh == (64 - mb)) {
> -        tcg_gen_shri_tl(t_ra, t_rs, mb);
> +    if (sh != 0 && len > 0 && me == (63 - sh)) {
> +        tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
> +    } else if (me == 63 && rsh + len <= 64) {
> +        tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
>      } else {
>          tcg_gen_rotli_tl(t_ra, t_rs, sh);
>          tcg_gen_andi_tl(t_ra, t_ra, MASK(mb, me));

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/18] target-mips: Use the new extract op
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 16/18] target-mips: Use the new extract op Richard Henderson
@ 2016-10-27 12:43   ` Yongbok Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Yongbok Kim @ 2016-10-27 12:43 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel



On 18/10/2016 16:10, Richard Henderson wrote:
> Use extract for EXT and DEXT.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/translate.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index d8dde7a..cf79aa4 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -4484,11 +4484,12 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
>          if (lsb + msb > 31) {
>              goto fail;
>          }
> -        tcg_gen_shri_tl(t0, t1, lsb);
>          if (msb != 31) {
> -            tcg_gen_andi_tl(t0, t0, (1U << (msb + 1)) - 1);
> +            tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
>          } else {
> -            tcg_gen_ext32s_tl(t0, t0);
> +            /* The two checks together imply that lsb == 0,
> +               so this is a simple sign-extension.  */
> +            tcg_gen_ext32s_tl(t0, t1);
>          }
>          break;
>  #if defined(TARGET_MIPS64)
> @@ -4503,10 +4504,7 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, int rt,
>          if (lsb + msb > 63) {
>              goto fail;
>          }
> -        tcg_gen_shri_tl(t0, t1, lsb);
> -        if (msb != 63) {
> -            tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
> -        }
> +        tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
>          break;
>  #endif
>      case OPC_INS:
> 

Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>

Regards,
Yongbok

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] tcg/mips: Implement field extraction opcodes
  2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 08/18] tcg/mips: " Richard Henderson
@ 2016-10-27 13:40   ` Yongbok Kim
  2016-10-27 14:19     ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Yongbok Kim @ 2016-10-27 13:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: James Hogan, aurelien



On 18/10/2016 16:10, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/mips/tcg-target.h     | 2 +-
>  tcg/mips/tcg-target.inc.c | 4 ++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
> index 1bcea3b..f1c3137 100644
> --- a/tcg/mips/tcg-target.h
> +++ b/tcg/mips/tcg-target.h
> @@ -123,7 +123,7 @@ extern bool use_mips32r2_instructions;
>  #define TCG_TARGET_HAS_bswap16_i32      use_mips32r2_instructions
>  #define TCG_TARGET_HAS_bswap32_i32      use_mips32r2_instructions
>  #define TCG_TARGET_HAS_deposit_i32      use_mips32r2_instructions
> -#define TCG_TARGET_HAS_extract_i32      0
> +#define TCG_TARGET_HAS_extract_i32      use_mips32r2_instructions
>  #define TCG_TARGET_HAS_sextract_i32     0
>  #define TCG_TARGET_HAS_ext8s_i32        use_mips32r2_instructions
>  #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index abce602..192dd49 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -1637,6 +1637,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_deposit_i32:
>          tcg_out_opc_bf(s, OPC_INS, a0, a2, args[3] + args[4] - 1, args[3]);
>          break;
> +    case INDEX_op_extract_i32:
> +        tcg_out_opc_bf(s, OPC_EXT, a0, a1, a3 + args[3] - 1, a2);

The msbd (5th argument) must be (a2 + args[3] - 1). In fact "a3" isn't defined.

> +        break;
>  
>      case INDEX_op_brcond_i32:
>          tcg_out_brcond(s, a2, a0, a1, arg_label(args[3]));
> @@ -1736,6 +1739,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
>      { INDEX_op_ext16s_i32, { "r", "rZ" } },
>  
>      { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
> +    { INDEX_op_extract_i32, { "r", "r" } },
>  
>      { INDEX_op_brcond_i32, { "rZ", "rZ" } },
>  #if use_mips32r6_instructions
> 

cc-ing Aurelien and James.

Regards,
Yongbok

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/18] tcg/mips: Implement field extraction opcodes
  2016-10-27 13:40   ` Yongbok Kim
@ 2016-10-27 14:19     ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-27 14:19 UTC (permalink / raw)
  To: Yongbok Kim, qemu-devel; +Cc: James Hogan, aurelien

On 10/27/2016 06:40 AM, Yongbok Kim wrote:
>> > +    case INDEX_op_extract_i32:
>> > +        tcg_out_opc_bf(s, OPC_EXT, a0, a1, a3 + args[3] - 1, a2);
> The msbd (5th argument) must be (a2 + args[3] - 1). In fact "a3" isn't defined.
>

Yes, I eventually caught this typo when I finally got around to the 
cross-compile.  Thanks.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-24 19:04 ` Richard Henderson
                     ` (2 preceding siblings ...)
  2016-10-26  3:02   ` David Gibson
@ 2016-10-31 14:00   ` Peter Maydell
  2016-10-31 14:36     ` Richard Henderson
  3 siblings, 1 reply; 35+ messages in thread
From: Peter Maydell @ 2016-10-31 14:00 UTC (permalink / raw)
  To: Richard Henderson
  Cc: QEMU Developers, Eduardo Habkost, Yongbok Kim, David Gibson,
	qemu-ppc, Alexander Graf, Alex Bennée

On 24 October 2016 at 20:04, Richard Henderson <rth@twiddle.net> wrote:
> Pinging target maintainers.  If I don't get responses by the end of the
> week, I'll only push the generic tcg bits and the two targets that I
> maintain.

Sorry we didn't get to this :-( Unfortunately our risu test
environment had bitrotted a bit so it wasn't as easy as I'd
hoped to test the ARM frontend stuff.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives
  2016-10-31 14:00   ` Peter Maydell
@ 2016-10-31 14:36     ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2016-10-31 14:36 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Eduardo Habkost, Yongbok Kim, David Gibson,
	qemu-ppc, Alexander Graf, Alex Bennée

On 10/31/2016 08:00 AM, Peter Maydell wrote:
> On 24 October 2016 at 20:04, Richard Henderson <rth@twiddle.net> wrote:
>> Pinging target maintainers.  If I don't get responses by the end of the
>> week, I'll only push the generic tcg bits and the two targets that I
>> maintain.
>
> Sorry we didn't get to this :-( Unfortunately our risu test
> environment had bitrotted a bit so it wasn't as easy as I'd
> hoped to test the ARM frontend stuff.

Yeah, I ran into a few problems in testing arm32 hosting ppc64 that really 
exercised this (much more so than i686).  I'll have to push this whole patch 
set off til next time.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-10-31 14:36 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-18 15:10 [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 01/18] tcg: Add field extraction primitives Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 02/18] tcg: Minor adjustments to deposit expanders Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 03/18] tcg: Add deposit_z expander Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 04/18] tcg/aarch64: Implement field extraction opcodes Richard Henderson
2016-10-18 15:33   ` Claudio Fontana
2016-10-18 16:11     ` Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 05/18] tcg/arm: Move isa detection to tcg-target.h Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 06/18] tcg/arm: Implement field extraction opcodes Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 07/18] tcg/i386: " Richard Henderson
2016-10-25 12:46   ` Paolo Bonzini
2016-10-25 16:46     ` Richard Henderson
2016-10-25 16:48       ` Paolo Bonzini
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 08/18] tcg/mips: " Richard Henderson
2016-10-27 13:40   ` Yongbok Kim
2016-10-27 14:19     ` Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 09/18] tcg/ppc: " Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 10/18] tcg/s390: Expose host facilities to tcg-target.h Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 11/18] tcg/s390: Implement field extraction opcodes Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 12/18] tcg/s390: Support deposit into zero Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 13/18] target-alpha: Use deposit and extract ops Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 14/18] target-arm: Use new " Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 15/18] target-i386: " Richard Henderson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 16/18] target-mips: Use the new extract op Richard Henderson
2016-10-27 12:43   ` Yongbok Kim
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 17/18] target-ppc: Use the new deposit and extract ops Richard Henderson
2016-10-27  2:09   ` David Gibson
2016-10-18 15:10 ` [Qemu-devel] [PATCH v2 18/18] target-s390x: " Richard Henderson
2016-10-18 16:15 ` [Qemu-devel] [PATCH v2 00/18] tcg field extract primitives no-reply
2016-10-24 19:04 ` Richard Henderson
2016-10-25 11:48   ` Eduardo Habkost
2016-10-25 12:49   ` Paolo Bonzini
2016-10-26  3:02   ` David Gibson
2016-10-31 14:00   ` Peter Maydell
2016-10-31 14:36     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.