All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2
@ 2011-01-11  3:23 Richard Henderson
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation Richard Henderson
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Changes since v1:
  * No attempt to pack pos+len into one operand.  Updated backends
    to match this change.

  * Example in the README is a bit more complex.

  * Define an official tcg_scratch_alloc routine, used by the i386
    target for the case in which we need a scratch register.  I had
    said that triggering this wasn't possible with mainline, but I
    was wrong.  The rlwimi example I posted in the other thread hits
    this case.

Aurelien suggested using a shorter name like "dep", but I think that
would be more confusing than not.  This opcode is not really used
enough to warrent cropping 4 characters, in my opinion.


r~


Richard Henderson (7):
  tcg: Define "deposit" as an optional operation.
  tcg-ppc: Implement deposit operation.
  tcg-hppa: Implement deposit operation.
  tcg-ia64: Implement deposit operation.
  tcg-i386: Implement deposit operation.
  target-i386: Use deposit operation.
  target-ppc: Use deposit operation.

 target-i386/translate.c |   34 +++-----------
 target-ppc/translate.c  |   10 ++++
 tcg/README              |   14 ++++++
 tcg/hppa/tcg-target.c   |   58 +++++++++++++++++++++---
 tcg/hppa/tcg-target.h   |    1 +
 tcg/i386/tcg-target.c   |   67 ++++++++++++++++++++++++++-
 tcg/i386/tcg-target.h   |    2 +
 tcg/ia64/tcg-target.c   |  115 +++++++++++++++++++++++++++++++++++++++++++++++
 tcg/ia64/tcg-target.h   |    2 +
 tcg/ppc/tcg-target.c    |   17 +++++++-
 tcg/ppc/tcg-target.h    |    1 +
 tcg/tcg-op.h            |   64 ++++++++++++++++++++++++++
 tcg/tcg-opc.h           |    6 +++
 tcg/tcg.c               |   11 +++++
 14 files changed, 364 insertions(+), 38 deletions(-)

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-11 23:45   ` Aurelien Jarno
  2011-01-12 11:00   ` Edgar E. Iglesias
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 2/7] tcg-ppc: Implement deposit operation Richard Henderson
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README    |   14 ++++++++++++
 tcg/tcg-op.h  |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-opc.h |    6 +++++
 3 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/tcg/README b/tcg/README
index a18a87f..a50ecc6 100644
--- a/tcg/README
+++ b/tcg/README
@@ -285,6 +285,20 @@ the four high order bytes are set to zero.
 Indicate that the value of t0 won't be used later. It is useful to
 force dead code elimination.
 
+* deposit_i32/i64 dest, t1, t2, pos, loc
+
+Deposit T2 as a bitfield into T1, placing the result in DEST.
+The bitfield is described by POS/LOC, which are immediate values:
+
+  LEN - the length of the bitfield
+  POS - the position of the first bit, counting from the LSB
+
+For example, pos=8, len=4 indicates a 4-bit field at bit 8.
+This operation would be equivalent to
+
+  dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
+
+
 ********* Conditional moves
 
 * setcond_i32/i64 cond, dest, t1, t2
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 3ee0a58..207a89f 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -254,6 +254,30 @@ static inline void tcg_gen_op5i_i64(TCGOpcode opc, TCGv_i64 arg1, TCGv_i64 arg2,
     *gen_opparam_ptr++ = arg5;
 }
 
+static inline void tcg_gen_op5ii_i32(TCGOpcode opc, TCGv_i32 arg1,
+                                     TCGv_i32 arg2, TCGv_i32 arg3,
+                                     TCGArg arg4, TCGArg arg5)
+{
+    *gen_opc_ptr++ = opc;
+    *gen_opparam_ptr++ = GET_TCGV_I32(arg1);
+    *gen_opparam_ptr++ = GET_TCGV_I32(arg2);
+    *gen_opparam_ptr++ = GET_TCGV_I32(arg3);
+    *gen_opparam_ptr++ = arg4;
+    *gen_opparam_ptr++ = arg5;
+}
+
+static inline void tcg_gen_op5ii_i64(TCGOpcode opc, TCGv_i64 arg1,
+                                     TCGv_i64 arg2, TCGv_i64 arg3,
+                                     TCGArg arg4, TCGArg arg5)
+{
+    *gen_opc_ptr++ = opc;
+    *gen_opparam_ptr++ = GET_TCGV_I64(arg1);
+    *gen_opparam_ptr++ = GET_TCGV_I64(arg2);
+    *gen_opparam_ptr++ = GET_TCGV_I64(arg3);
+    *gen_opparam_ptr++ = arg4;
+    *gen_opparam_ptr++ = arg5;
+}
+
 static inline void tcg_gen_op6_i32(TCGOpcode opc, TCGv_i32 arg1, TCGv_i32 arg2,
                                    TCGv_i32 arg3, TCGv_i32 arg4, TCGv_i32 arg5,
                                    TCGv_i32 arg6)
@@ -2071,6 +2095,44 @@ static inline void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
     }
 }
 
+static inline void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1,
+				       TCGv_i32 arg2, unsigned int ofs,
+				       unsigned int len)
+{
+#ifdef TCG_TARGET_HAS_deposit_i32
+  tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, arg1, arg2, ofs, len);
+#else
+  uint32_t mask = (1u << len) - 1;
+  TCGv_i32 t1 = tcg_temp_new_i32 ();
+
+  tcg_gen_andi_i32(t1, arg2, mask);
+  tcg_gen_shli_i32(t1, t1, ofs);
+  tcg_gen_andi_i32(ret, arg1, ~(mask << ofs));
+  tcg_gen_or_i32(ret, ret, t1);
+
+  tcg_temp_free_i32(t1);
+#endif
+}
+
+static inline void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1,
+				       TCGv_i64 arg2, unsigned int ofs,
+				       unsigned int len)
+{
+#ifdef TCG_TARGET_HAS_deposit_i64
+  tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, arg1, arg2, ofs, len);
+#else
+  uint64_t mask = (1ull << len) - 1;
+  TCGv_i64 t1 = tcg_temp_new_i64 ();
+
+  tcg_gen_andi_i64(t1, arg2, mask);
+  tcg_gen_shli_i64(t1, t1, ofs);
+  tcg_gen_andi_i64(ret, arg1, ~(mask << ofs));
+  tcg_gen_or_i64(ret, ret, t1);
+
+  tcg_temp_free_i64(t1);
+#endif
+}
+
 /***************************************/
 /* QEMU specific operations. Their type depend on the QEMU CPU
    type. */
@@ -2384,6 +2446,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
 #define tcg_gen_rotri_tl tcg_gen_rotri_i64
+#define tcg_gen_deposit_tl tcg_gen_deposit_i64
 #define tcg_const_tl tcg_const_i64
 #define tcg_const_local_tl tcg_const_local_i64
 #else
@@ -2454,6 +2517,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
 #define tcg_gen_rotri_tl tcg_gen_rotri_i32
+#define tcg_gen_deposit_tl tcg_gen_deposit_i32
 #define tcg_const_tl tcg_const_i32
 #define tcg_const_local_tl tcg_const_local_i32
 #endif
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 2a98fed..2c7ca1a 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -78,6 +78,9 @@ DEF(sar_i32, 1, 2, 0, 0)
 DEF(rotl_i32, 1, 2, 0, 0)
 DEF(rotr_i32, 1, 2, 0, 0)
 #endif
+#ifdef TCG_TARGET_HAS_deposit_i32
+DEF(deposit_i32, 1, 2, 2, 0)
+#endif
 
 DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 #if TCG_TARGET_REG_BITS == 32
@@ -168,6 +171,9 @@ DEF(sar_i64, 1, 2, 0, 0)
 DEF(rotl_i64, 1, 2, 0, 0)
 DEF(rotr_i64, 1, 2, 0, 0)
 #endif
+#ifdef TCG_TARGET_HAS_deposit_i64
+DEF(deposit_i64, 1, 2, 2, 0)
+#endif
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 #ifdef TCG_TARGET_HAS_ext8s_i64
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 2/7] tcg-ppc: Implement deposit operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-11 12:40   ` [Qemu-devel] " malc
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 3/7] tcg-hppa: " Richard Henderson
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.c |   17 ++++++++++++++++-
 tcg/ppc/tcg-target.h |    1 +
 2 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 7970268..39aa4f1 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -1611,6 +1611,21 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         break;
 
+    case INDEX_op_deposit_i32:
+        {
+            unsigned len = args[4];
+            unsigned lsb_ofs = args[3];
+            unsigned msb_ofs = 31 - lsb_ofs;
+
+            tcg_out32 (s, RLWIMI
+                       | RA(args[0])
+                       | RS(args[2])
+                       | SH(lsb_ofs)
+                       | MB(msb_ofs - len + 1)
+                       | ME(msb_ofs));
+        }
+        break;
+
     case INDEX_op_add2_i32:
         if (args[0] == args[3] || args[0] == args[5]) {
             tcg_out32 (s, ADDC | TAB (0, args[2], args[4]));
@@ -1829,9 +1844,9 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_shl_i32, { "r", "r", "ri" } },
     { INDEX_op_shr_i32, { "r", "r", "ri" } },
     { INDEX_op_sar_i32, { "r", "r", "ri" } },
-
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+    { INDEX_op_deposit_i32, { "r", "0", "r" } },
 
     { INDEX_op_brcond_i32, { "r", "ri" } },
 
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index a1f8599..bbf38d5 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -92,6 +92,7 @@ enum {
 #define TCG_TARGET_HAS_eqv_i32
 #define TCG_TARGET_HAS_nand_i32
 #define TCG_TARGET_HAS_nor_i32
+#define TCG_TARGET_HAS_deposit_i32
 
 #define TCG_AREG0 TCG_REG_R27
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 3/7] tcg-hppa: Implement deposit operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation Richard Henderson
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 2/7] tcg-ppc: Implement deposit operation Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 4/7] tcg-ia64: " Richard Henderson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/hppa/tcg-target.c |   58 +++++++++++++++++++++++++++++++++++++++++++-----
 tcg/hppa/tcg-target.h |    1 +
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/tcg/hppa/tcg-target.c b/tcg/hppa/tcg-target.c
index 7f4653e..a572cbf 100644
--- a/tcg/hppa/tcg-target.c
+++ b/tcg/hppa/tcg-target.c
@@ -467,6 +467,22 @@ static inline void tcg_out_dep(TCGContext *s, int ret, int arg,
               | INSN_SHDEP_CP(31 - ofs) | INSN_DEP_LEN(len));
 }
 
+static inline void tcg_out_depi(TCGContext *s, int ret, int arg,
+                                unsigned ofs, unsigned len)
+{
+    assert(ofs < 32 && len <= 32 - ofs);
+    tcg_out32(s, INSN_DEPI | INSN_R2(ret) | INSN_IM5(arg)
+              | INSN_SHDEP_CP(31 - ofs) | INSN_DEP_LEN(len));
+}
+
+static inline void tcg_out_zdep(TCGContext *s, int ret, int arg,
+                                unsigned ofs, unsigned len)
+{
+    assert(ofs < 32 && len <= 32 - ofs);
+    tcg_out32(s, INSN_ZDEP | INSN_R2(ret) | INSN_R1(arg)
+              | INSN_SHDEP_CP(31 - ofs) | INSN_DEP_LEN(len));
+}
+
 static inline void tcg_out_shd(TCGContext *s, int ret, int hi, int lo,
                                unsigned count)
 {
@@ -499,8 +515,7 @@ static void tcg_out_ori(TCGContext *s, int ret, int arg, tcg_target_ulong m)
     assert(bs1 == 32 || (1ul << bs1) > m);
 
     tcg_out_mov(s, TCG_TYPE_I32, ret, arg);
-    tcg_out32(s, INSN_DEPI | INSN_R2(ret) | INSN_IM5(-1)
-              | INSN_SHDEP_CP(31 - bs0) | INSN_DEP_LEN(bs1 - bs0));
+    tcg_out_depi(s, ret, -1, bs0, bs1 - bs0);
 }
 
 static void tcg_out_andi(TCGContext *s, int ret, int arg, tcg_target_ulong m)
@@ -529,8 +544,7 @@ static void tcg_out_andi(TCGContext *s, int ret, int arg, tcg_target_ulong m)
         tcg_out_extr(s, ret, arg, 0, ls0, 0);
     } else {
         tcg_out_mov(s, TCG_TYPE_I32, ret, arg);
-        tcg_out32(s, INSN_DEPI | INSN_R2(ret) | INSN_IM5(0)
-                  | INSN_SHDEP_CP(31 - ls0) | INSN_DEP_LEN(ls1 - ls0));
+        tcg_out_depi(s, ret, 0, ls0, ls1 - ls0);
     }
 }
 
@@ -547,8 +561,7 @@ static inline void tcg_out_ext16s(TCGContext *s, int ret, int arg)
 static void tcg_out_shli(TCGContext *s, int ret, int arg, int count)
 {
     count &= 31;
-    tcg_out32(s, INSN_ZDEP | INSN_R2(ret) | INSN_R1(arg)
-              | INSN_SHDEP_CP(31 - count) | INSN_DEP_LEN(32 - count));
+    tcg_out_zdep(s, ret, arg, count, 32 - count);
 }
 
 static void tcg_out_shl(TCGContext *s, int ret, int arg, int creg)
@@ -1407,6 +1420,38 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         break;
 
+    case INDEX_op_deposit_i32:
+        {
+            unsigned ofs = args[3], len = args[4];
+            int arg2 = args[2];
+            int arg1 = args[1];
+            int arg0 = args[0];
+
+            if (const_args[1]) {
+                if (const_args[2]) {
+                    tcg_out_movi(s, TCG_TYPE_I32, arg0,
+                                 (arg2 & ((1u << len) - 1)) << ofs);
+                } else {
+                    tcg_out_zdep(s, arg0, arg2, ofs, len);
+                }
+            } else {
+                if (const_args[2]) {
+                    tcg_out_mov(s, TCG_TYPE_I32, arg0, arg1);
+                    tcg_out_depi(s, arg0, arg2, ofs, len);
+                } else {
+                    if (arg0 != arg1) {
+                        if (arg0 == arg2) {
+                            tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_R20, arg2);
+                            arg2 = TCG_REG_R20;
+                        }
+                        tcg_out_mov(s, TCG_TYPE_I32, arg0, arg1);
+                    }
+                    tcg_out_dep(s, arg0, arg2, ofs, len);
+                }
+            }
+        }
+        break;
+
     case INDEX_op_mul_i32:
         tcg_out_xmpyu(s, args[0], TCG_REG_R0, args[1], args[2]);
         break;
@@ -1534,6 +1579,7 @@ static const TCGTargetOpDef hppa_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "r", "ri" } },
     { INDEX_op_rotl_i32, { "r", "r", "ri" } },
     { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+    { INDEX_op_deposit_i32, { "r", "rZ", "rJ" } },
 
     { INDEX_op_bswap16_i32, { "r", "r" } },
     { INDEX_op_bswap32_i32, { "r", "r" } },
diff --git a/tcg/hppa/tcg-target.h b/tcg/hppa/tcg-target.h
index a5cc440..d3fe075 100644
--- a/tcg/hppa/tcg-target.h
+++ b/tcg/hppa/tcg-target.h
@@ -87,6 +87,7 @@ enum {
 /* optional instructions */
 // #define TCG_TARGET_HAS_div_i32
 #define TCG_TARGET_HAS_rot_i32
+#define TCG_TARGET_HAS_deposit_i32
 #define TCG_TARGET_HAS_ext8s_i32
 #define TCG_TARGET_HAS_ext16s_i32
 #define TCG_TARGET_HAS_bswap16_i32
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 4/7] tcg-ia64: Implement deposit operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
                   ` (2 preceding siblings ...)
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 3/7] tcg-hppa: " Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 5/7] tcg-i386: " Richard Henderson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ia64/tcg-target.c |  115 +++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/ia64/tcg-target.h |    2 +
 2 files changed, 117 insertions(+), 0 deletions(-)

diff --git a/tcg/ia64/tcg-target.c b/tcg/ia64/tcg-target.c
index e2e44f7..5d19a61 100644
--- a/tcg/ia64/tcg-target.c
+++ b/tcg/ia64/tcg-target.c
@@ -237,6 +237,8 @@ enum {
     OPC_CMP4_LT_A6            = 0x18400000000ull,
     OPC_CMP4_LTU_A6           = 0x1a400000000ull,
     OPC_CMP4_EQ_A6            = 0x1c400000000ull,
+    OPC_DEP_I14               = 0x0ae00000000ull,
+    OPC_DEP_I15               = 0x08000000000ull,
     OPC_DEP_Z_I12             = 0x0a600000000ull,
     OPC_EXTR_I11              = 0x0a400002000ull,
     OPC_EXTR_U_I11            = 0x0a400000000ull,
@@ -508,6 +510,32 @@ static inline uint64_t tcg_opc_i12(int qp, uint64_t opc, int r1,
            | (qp & 0x3f);
 }
 
+static inline uint64_t tcg_opc_i14(int qp, uint64_t opc, int r1, int imm1,
+                                   int r3, uint64_t len,
+                                   uint64_t cpos)
+{
+    return opc
+           | ((imm1 & 1LL) << 36)
+           | ((len & 0x0f) << 27)
+           | ((r3 & 0x7f) << 20)
+           | ((cpos & 0x3f) << 14)
+           | ((r1 & 0x7f) << 6)
+           | (qp & 0x3f);
+}
+
+static inline uint64_t tcg_opc_i15(int qp, uint64_t opc, int r1,
+                                   int r2, int r3, uint64_t len,
+                                   uint64_t cpos)
+{
+    return opc
+           | ((cpos & 0x3f) << 31)
+           | ((len & 0x0f) << 27)
+           | ((r3 & 0x7f) << 20)
+           | ((r2 & 0x7f) << 13)
+           | ((r1 & 0x7f) << 6)
+           | (qp & 0x3f);
+}
+
 static inline uint64_t tcg_opc_i18(int qp, uint64_t opc, uint64_t imm)
 {
     return opc
@@ -1335,6 +1363,84 @@ static inline void tcg_out_bswap64(TCGContext *s, TCGArg ret, TCGArg arg)
                    tcg_opc_i3 (TCG_REG_P0, OPC_MUX1_I3, ret, arg, 0xb));
 }
 
+static void tcg_out_deposit_i32(TCGContext *s, TCGArg out, TCGArg in,
+                                TCGArg val, unsigned ofs, unsigned len)
+{
+    uint64_t nop_m = tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0);
+
+    if (in == 0) {
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i12(TCG_REG_P0, OPC_DEP_Z_I12, out, val,
+                                   len - 1, 63 - ofs));
+    } else if (val == 0) {
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i14(TCG_REG_P0, OPC_DEP_I14, out, val,
+                                   in, len - 1, 63 - ofs));
+    } else if (len <= 16) {
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i15(TCG_REG_P0, OPC_DEP_I15, out, val, in,
+                                   len - 1, 63 - ofs));
+    } else {
+        /* Perform the 17- to 32-bit deposit in two parts.  At the same
+           time we perform the first deposit, extract the high 16-bits
+           into a scratch register.  */
+        tcg_out_bundle(s, miI, nop_m,
+                       tcg_opc_i11(TCG_REG_P0, OPC_EXTR_U_I11, TCG_REG_R2,
+                                   val, 16, 31 - 16),
+                       tcg_opc_i15(TCG_REG_P0, OPC_DEP_I15, out, val, in,
+                                   16 - 1, 63 - ofs));
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i15(TCG_REG_P0, OPC_DEP_I15, out, TCG_REG_R2,
+                                   out, len - 16 - 1, 63 - (ofs + 16)));
+    }
+}
+
+static void tcg_out_deposit_i64(TCGContext *s, TCGArg out, TCGArg in,
+                                TCGArg val, unsigned ofs, unsigned len)
+{
+    uint64_t nop_m = tcg_opc_m48(TCG_REG_P0, OPC_NOP_M48, 0);
+
+    if (in == 0) {
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i12(TCG_REG_P0, OPC_DEP_Z_I12, out, in,
+                                   len - 1, 63 - ofs));
+    } else if (val == 0) {
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i14(TCG_REG_P0, OPC_DEP_I14, out, val,
+                                   in, len - 1, 63 - ofs));
+    } else if (len <= 16) {
+        tcg_out_bundle(s, mmI, nop_m, nop_m,
+                       tcg_opc_i15(TCG_REG_P0, OPC_DEP_I15, out, val, in,
+                                   len - 1, 63 - ofs));
+    } else {
+        uint64_t ror = 0, shrp, rol = 0;
+
+        if (ofs) {
+	    ror = tcg_opc_i10(TCG_REG_P0, OPC_SHRP_I10, TCG_REG_R2,
+                              in, in, ofs);
+            in = TCG_REG_R2;
+        }
+
+        shrp = tcg_opc_i10(TCG_REG_P0, OPC_SHRP_I10, out, in, val, len);
+
+        ofs = (ofs - len) & 63;
+        if (ofs) {
+            rol = tcg_opc_i10(TCG_REG_P0, OPC_SHRP_I10, out, out, out, 64-ofs);
+        }
+
+        if (ror) {
+            tcg_out_bundle(s, mII, nop_m, ror, shrp);
+            if (rol) {
+                tcg_out_bundle(s, mmI, nop_m, nop_m, rol);
+            }
+        } else if (rol) {
+            tcg_out_bundle(s, mII, nop_m, shrp, rol);
+        } else {
+            tcg_out_bundle(s, mmI, nop_m, nop_m, shrp);
+        }
+    }
+}
+
 static inline uint64_t tcg_opc_cmp_a(int qp, TCGCond cond, TCGArg arg1,
                                      TCGArg arg2, int cmp4)
 {
@@ -2063,6 +2169,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_rotr_i64(s, args[0], args[1], args[2], const_args[2]);
         break;
 
+    case INDEX_op_deposit_i32:
+        tcg_out_deposit_i32(s, args[0], args[1], args[2], args[3], args[4]);
+        break;
+    case INDEX_op_deposit_i64:
+        tcg_out_deposit_i64(s, args[0], args[1], args[2], args[3], args[4]);
+        break;
+
     case INDEX_op_ext8s_i32:
     case INDEX_op_ext8s_i64:
         tcg_out_ext(s, OPC_SXT1_I29, args[0], args[1]);
@@ -2192,6 +2305,7 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_shr_i32, { "r", "rZ", "ri" } },
     { INDEX_op_rotl_i32, { "r", "rZ", "ri" } },
     { INDEX_op_rotr_i32, { "r", "rZ", "ri" } },
+    { INDEX_op_deposit_i32, { "r", "rZ", "rZ" } },
 
     { INDEX_op_ext8s_i32, { "r", "rZ"} },
     { INDEX_op_ext8u_i32, { "r", "rZ"} },
@@ -2238,6 +2352,7 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_shr_i64, { "r", "rZ", "ri" } },
     { INDEX_op_rotl_i64, { "r", "rZ", "ri" } },
     { INDEX_op_rotr_i64, { "r", "rZ", "ri" } },
+    { INDEX_op_deposit_i64, { "r", "rZ", "rZ" } },
 
     { INDEX_op_ext8s_i64, { "r", "rZ"} },
     { INDEX_op_ext8u_i64, { "r", "rZ"} },
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index e56e88f..80e3534 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -131,6 +131,8 @@ enum {
 #define TCG_TARGET_HAS_orc_i64
 #define TCG_TARGET_HAS_rot_i32
 #define TCG_TARGET_HAS_rot_i64
+#define TCG_TARGET_HAS_deposit_i32
+#define TCG_TARGET_HAS_deposit_i64
 
 /* optional instructions automatically implemented */
 #undef TCG_TARGET_HAS_neg_i32   /* sub r1, r0, r3 */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
                   ` (3 preceding siblings ...)
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 4/7] tcg-ia64: " Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-25 12:27   ` Edgar E. Iglesias
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 6/7] target-i386: Use " Richard Henderson
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Special case deposits that are implementable with byte and word stores.
Otherwise implement with double-word shift plus rotates.

Expose tcg_scratch_alloc to the backend for allocation of scratch registers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++--
 tcg/i386/tcg-target.h |    2 +
 tcg/tcg.c             |   11 ++++++++
 3 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index bb19a95..ba1c45a 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -258,7 +258,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define OPC_JMP_long	(0xe9)
 #define OPC_JMP_short	(0xeb)
 #define OPC_LEA         (0x8d)
-#define OPC_MOVB_EvGv	(0x88)		/* stores, more or less */
+#define OPC_MOVB_EbGb	(0x88)		/* stores, more or less */
+#define OPC_MOVB_GbEb   (0x8a)		/* loads, more or less */
 #define OPC_MOVL_EvGv	(0x89)		/* stores, more or less */
 #define OPC_MOVL_GvEv	(0x8b)		/* loads, more or less */
 #define OPC_MOVL_EvIz	(0xc7)
@@ -277,6 +278,7 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define OPC_SHIFT_1	(0xd1)
 #define OPC_SHIFT_Ib	(0xc1)
 #define OPC_SHIFT_cl	(0xd3)
+#define OPC_SHRD_Ib	(0xac | P_EXT)
 #define OPC_TESTL	(0x85)
 #define OPC_XCHG_ax_r32	(0x90)
 
@@ -710,6 +712,59 @@ static void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
     }
 }
 
+static void tcg_out_deposit(TCGContext *s, int inout, int val,
+                            unsigned ofs, unsigned len, int rexw)
+{
+    /* Look for MOVW special case.  */
+    if (ofs == 0 && len == 16) {
+        tcg_out_modrm(s, OPC_MOVL_GvEv + P_DATA16, inout, val);
+        return;
+    }
+
+    /* Look for MOVB w/ %reg_l special case.  */
+    if (ofs == 0 && len == 8
+        && (TCG_TARGET_REG_BITS == 64 || (inout < 4 && val < 4))) {
+        tcg_out_modrm(s, OPC_MOVB_GbEb + P_REXB_R + P_REXB_RM, inout, val);
+        return;
+    }
+
+    /* Look for MOVB w/ %reg_h special case.  */
+    if (ofs == 8 && len == 8 && inout < 4 && val < 4) {
+        tcg_out_modrm(s, OPC_MOVB_GbEb, inout + 4, val);
+        return;
+    }
+
+    /* If we have a real deposit from self, we need a temporary.  */
+    if (inout == val) {
+        TCGType type = rexw ? TCG_TYPE_I64 : TCG_TYPE_I32;
+        TCGRegSet live;
+
+        tcg_regset_clear(live);
+        tcg_regset_set_reg(live, inout);
+        val = tcg_scratch_alloc(s, type, live);
+
+        tcg_out_mov(s, type, val, inout);
+    }
+
+    /* Arrange for the field to be at offset 0.  */
+    if (ofs != 0) {
+        tcg_out_shifti(s, SHIFT_ROR + rexw, inout, ofs);
+    }
+
+    /* Shift the value into the top of the word.  This shifts the old
+       field out of the bottom of the word and leaves us with the whole
+       word rotated right by the size of the field.  */
+    tcg_out_modrm(s, OPC_SHRD_Ib + rexw, val, inout);
+    tcg_out8(s, len);
+
+    /* Restore the field to its proper location.  */
+    ofs = (len + ofs) & (rexw ? 63 : 31);
+    if (ofs != 0) {
+        tcg_out_shifti(s, SHIFT_ROL + rexw, inout, ofs);
+    }
+}
+
+
 /* Use SMALL != 0 to force a short forward branch.  */
 static void tcg_out_jxx(TCGContext *s, int opc, int label_index, int small)
 {
@@ -1266,7 +1321,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi,
 
     switch (sizeop) {
     case 0:
-        tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R, datalo, base, ofs);
+        tcg_out_modrm_offset(s, OPC_MOVB_EbGb + P_REXB_R, datalo, base, ofs);
         break;
     case 1:
         if (bswap) {
@@ -1504,7 +1559,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(st8):
-        tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R,
+        tcg_out_modrm_offset(s, OPC_MOVB_EbGb | P_REXB_R,
                              args[0], args[1], args[2]);
         break;
     OP_32_64(st16):
@@ -1603,6 +1658,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    OP_32_64(deposit):
+        tcg_out_deposit(s, args[0], args[2], args[3], args[4], rexw);
+        break;
+
     case INDEX_op_brcond_i32:
         tcg_out_brcond32(s, args[2], args[0], args[1], const_args[1],
                          args[3], 0);
@@ -1783,6 +1842,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "0", "ci" } },
     { INDEX_op_rotl_i32, { "r", "0", "ci" } },
     { INDEX_op_rotr_i32, { "r", "0", "ci" } },
+    { INDEX_op_deposit_i32, { "r", "0", "r" } },
 
     { INDEX_op_brcond_i32, { "r", "ri" } },
 
@@ -1835,6 +1895,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i64, { "r", "0", "ci" } },
     { INDEX_op_rotl_i64, { "r", "0", "ci" } },
     { INDEX_op_rotr_i64, { "r", "0", "ci" } },
+    { INDEX_op_deposit_i64, { "r", "0", "r" } },
 
     { INDEX_op_brcond_i64, { "r", "re" } },
     { INDEX_op_setcond_i64, { "r", "r", "re" } },
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index bfafbfc..9f90d17 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -77,6 +77,7 @@ enum {
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32
 #define TCG_TARGET_HAS_rot_i32
+#define TCG_TARGET_HAS_deposit_i32
 #define TCG_TARGET_HAS_ext8s_i32
 #define TCG_TARGET_HAS_ext16s_i32
 #define TCG_TARGET_HAS_ext8u_i32
@@ -94,6 +95,7 @@ enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_div2_i64
 #define TCG_TARGET_HAS_rot_i64
+#define TCG_TARGET_HAS_deposit_i64
 #define TCG_TARGET_HAS_ext8s_i64
 #define TCG_TARGET_HAS_ext16s_i64
 #define TCG_TARGET_HAS_ext32s_i64
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5dd6a2c..d190a20 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -156,6 +156,17 @@ int gen_new_label(void)
     return idx;
 }
 
+static int tcg_reg_alloc(TCGContext *s, TCGRegSet reg1, TCGRegSet reg2);
+
+/* For use by backends, allocate a scratch register of TYPE, avoiding
+   any that are present in live.  */
+
+static inline int tcg_scratch_alloc(TCGContext *s, TCGType type, TCGRegSet live)
+{
+    tcg_regset_or (live, s->reserved_regs, live);
+    return tcg_reg_alloc(s, tcg_target_available_regs[type], live);
+}
+
 #include "tcg-target.c"
 
 /* pool based memory allocation */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 6/7] target-i386: Use deposit operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
                   ` (4 preceding siblings ...)
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 5/7] tcg-i386: " Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-12 11:01   ` Edgar E. Iglesias
  2011-01-20 11:06   ` Aurelien Jarno
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 7/7] target-ppc: " Richard Henderson
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Use this for assignment to the low byte or low word of a register.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/translate.c |   34 ++++++----------------------------
 1 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7b6e3c2..c008450 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -274,28 +274,16 @@ static inline void gen_op_andl_A0_ffff(void)
 
 static inline void gen_op_mov_reg_v(int ot, int reg, TCGv t0)
 {
-    TCGv tmp;
-
     switch(ot) {
     case OT_BYTE:
-        tmp = tcg_temp_new();
-        tcg_gen_ext8u_tl(tmp, t0);
         if (reg < 4 X86_64_DEF( || reg >= 8 || x86_64_hregs)) {
-            tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xff);
-            tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
+            tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], t0, 0, 8);
         } else {
-            tcg_gen_shli_tl(tmp, tmp, 8);
-            tcg_gen_andi_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], ~0xff00);
-            tcg_gen_or_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], tmp);
+            tcg_gen_deposit_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], t0, 8, 8);
         }
-        tcg_temp_free(tmp);
         break;
     case OT_WORD:
-        tmp = tcg_temp_new();
-        tcg_gen_ext16u_tl(tmp, t0);
-        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
-        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
-        tcg_temp_free(tmp);
+        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], t0, 0, 16);
         break;
     default: /* XXX this shouldn't be reached;  abort? */
     case OT_LONG:
@@ -323,15 +311,9 @@ static inline void gen_op_mov_reg_T1(int ot, int reg)
 
 static inline void gen_op_mov_reg_A0(int size, int reg)
 {
-    TCGv tmp;
-
     switch(size) {
     case 0:
-        tmp = tcg_temp_new();
-        tcg_gen_ext16u_tl(tmp, cpu_A0);
-        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
-        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
-        tcg_temp_free(tmp);
+        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_A0, 0, 16);
         break;
     default: /* XXX this shouldn't be reached;  abort? */
     case 1:
@@ -415,9 +397,7 @@ static inline void gen_op_add_reg_im(int size, int reg, int32_t val)
     switch(size) {
     case 0:
         tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val);
-        tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0);
-        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
-        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0);
+        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0, 0, 16);
         break;
     case 1:
         tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val);
@@ -439,9 +419,7 @@ static inline void gen_op_add_reg_T0(int size, int reg)
     switch(size) {
     case 0:
         tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]);
-        tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0);
-        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
-        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0);
+        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0, 0, 16);
         break;
     case 1:
         tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH 7/7] target-ppc: Use deposit operation.
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
                   ` (5 preceding siblings ...)
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 6/7] target-i386: Use " Richard Henderson
@ 2011-01-11  3:23 ` Richard Henderson
  2011-01-11 23:45 ` [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Aurelien Jarno
  2011-01-20 11:31 ` Edgar E. Iglesias
  8 siblings, 0 replies; 40+ messages in thread
From: Richard Henderson @ 2011-01-11  3:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, aurelien

Use this in implementing rl[wd]imi, at least for the cases
that don't require true rotation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 74e06d7..f45c0ec 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1516,6 +1516,11 @@ static void gen_rlwimi(DisasContext *ctx)
     sh = SH(ctx->opcode);
     if (likely(sh == 0 && mb == 0 && me == 31)) {
         tcg_gen_ext32u_tl(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    } else if ((31 - me) == sh && mb <= me) {
+        /* This is a well-behaved bitfield deposit.  */
+        tcg_gen_deposit_tl (cpu_gpr[rA(ctx->opcode)],
+                            cpu_gpr[rA(ctx->opcode)],
+                            cpu_gpr[rS(ctx->opcode)], sh, me - mb + 1);
     } else {
         target_ulong mask;
         TCGv t1;
@@ -1761,6 +1766,11 @@ static inline void gen_rldimi(DisasContext *ctx, int mbn, int shn)
     me = 63 - sh;
     if (unlikely(sh == 0 && mb == 0)) {
         tcg_gen_mov_tl(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+    } else if (mb <= me) {
+        /* This is a well-behaved bitfield deposit.  */
+        tcg_gen_deposit_tl (cpu_gpr[rA(ctx->opcode)],
+                            cpu_gpr[rA(ctx->opcode)],
+                            cpu_gpr[rS(ctx->opcode)], sh, me - mb + 1);
     } else {
         TCGv t0, t1;
         target_ulong mask;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] tcg-ppc: Implement deposit operation.
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 2/7] tcg-ppc: Implement deposit operation Richard Henderson
@ 2011-01-11 12:40   ` malc
  2011-01-11 16:32     ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: malc @ 2011-01-11 12:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Mon, 10 Jan 2011, Richard Henderson wrote:

> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/ppc/tcg-target.c |   17 ++++++++++++++++-
>  tcg/ppc/tcg-target.h |    1 +
>  2 files changed, 17 insertions(+), 1 deletions(-)
> 
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index 7970268..39aa4f1 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -1611,6 +1611,21 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>          }
>          break;
>  
> +    case INDEX_op_deposit_i32:
> +        {
> +            unsigned len = args[4];
> +            unsigned lsb_ofs = args[3];
> +            unsigned msb_ofs = 31 - lsb_ofs;
> +
> +            tcg_out32 (s, RLWIMI
> +                       | RA(args[0])
> +                       | RS(args[2])
> +                       | SH(lsb_ofs)
> +                       | MB(msb_ofs - len + 1)
> +                       | ME(msb_ofs));
> +        }
> +        break;
> +
>      case INDEX_op_add2_i32:
>          if (args[0] == args[3] || args[0] == args[5]) {
>              tcg_out32 (s, ADDC | TAB (0, args[2], args[4]));
> @@ -1829,9 +1844,9 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>      { INDEX_op_shl_i32, { "r", "r", "ri" } },
>      { INDEX_op_shr_i32, { "r", "r", "ri" } },
>      { INDEX_op_sar_i32, { "r", "r", "ri" } },
> -
>      { INDEX_op_rotl_i32, { "r", "r", "ri" } },
>      { INDEX_op_rotr_i32, { "r", "r", "ri" } },
> +    { INDEX_op_deposit_i32, { "r", "0", "r" } },
>  

Are you sure about "0" constraint?

>      { INDEX_op_brcond_i32, { "r", "ri" } },
>  
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index a1f8599..bbf38d5 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -92,6 +92,7 @@ enum {
>  #define TCG_TARGET_HAS_eqv_i32
>  #define TCG_TARGET_HAS_nand_i32
>  #define TCG_TARGET_HAS_nor_i32
> +#define TCG_TARGET_HAS_deposit_i32
>  
>  #define TCG_AREG0 TCG_REG_R27
>  
> 

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] tcg-ppc: Implement deposit operation.
  2011-01-11 12:40   ` [Qemu-devel] " malc
@ 2011-01-11 16:32     ` Richard Henderson
  2011-01-11 19:11       ` malc
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-11 16:32 UTC (permalink / raw)
  To: malc; +Cc: qemu-devel, aurelien, agraf

On 01/11/2011 04:40 AM, malc wrote:
>> +            tcg_out32 (s, RLWIMI
>> +                       | RA(args[0])
>> +                       | RS(args[2])
>> +                       | SH(lsb_ofs)
>> +                       | MB(msb_ofs - len + 1)
>> +                       | ME(msb_ofs));
>> +        }
...
>> +    { INDEX_op_deposit_i32, { "r", "0", "r" } },
>>  
> 
> Are you sure about "0" constraint?

Yes, rlwimi inserts a value into the destination register.
There is no second source register for the instruction.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] Re: [PATCH 2/7] tcg-ppc: Implement deposit operation.
  2011-01-11 16:32     ` Richard Henderson
@ 2011-01-11 19:11       ` malc
  0 siblings, 0 replies; 40+ messages in thread
From: malc @ 2011-01-11 19:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Tue, 11 Jan 2011, Richard Henderson wrote:

> On 01/11/2011 04:40 AM, malc wrote:
> >> +            tcg_out32 (s, RLWIMI
> >> +                       | RA(args[0])
> >> +                       | RS(args[2])
> >> +                       | SH(lsb_ofs)
> >> +                       | MB(msb_ofs - len + 1)
> >> +                       | ME(msb_ofs));
> >> +        }
> ...
> >> +    { INDEX_op_deposit_i32, { "r", "0", "r" } },
> >>  
> > 
> > Are you sure about "0" constraint?
> 
> Yes, rlwimi inserts a value into the destination register.
> There is no second source register for the instruction.
> 

Indeed, nothing else uses "0" in tcg/ppc so it caught my attention.

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
                   ` (6 preceding siblings ...)
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 7/7] target-ppc: " Richard Henderson
@ 2011-01-11 23:45 ` Aurelien Jarno
  2011-01-20 11:31 ` Edgar E. Iglesias
  8 siblings, 0 replies; 40+ messages in thread
From: Aurelien Jarno @ 2011-01-11 23:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, agraf

On Mon, Jan 10, 2011 at 07:23:41PM -0800, Richard Henderson wrote:
> Changes since v1:
>   * No attempt to pack pos+len into one operand.  Updated backends
>     to match this change.
> 
>   * Example in the README is a bit more complex.
> 
>   * Define an official tcg_scratch_alloc routine, used by the i386
>     target for the case in which we need a scratch register.  I had
>     said that triggering this wasn't possible with mainline, but I
>     was wrong.  The rlwimi example I posted in the other thread hits
>     this case.
> 
> Aurelien suggested using a shorter name like "dep", but I think that
> would be more confusing than not.  This opcode is not really used
> enough to warrent cropping 4 characters, in my opinion.
> 

It was just a suggestion, I am also fine with "deposit".

The implementation of the deposit op (that is the first patch) looks 
fine to me, and I am convince this new op is useful. It's probably going
to be useful for the s390 target, that tried to solve the same problem 
with a sync op.

To go further, I think we should split the patch series into two (no
need to resend it), the op itself, and the implementation/usage in
various host/targets. The latter can go part by part, so the various
persons can review them separately. The whole series is still useful 
anyway to see the potential of such a new op.

Even if the code for the new op is quite simple, adding a new op has
some consequences (the main one being is that it's difficult to remove 
it latter), so I propose that it is reviewed/acked by a few persons
before being committed. People, please do that or send your comments.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation.
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation Richard Henderson
@ 2011-01-11 23:45   ` Aurelien Jarno
  2011-01-12 11:00   ` Edgar E. Iglesias
  1 sibling, 0 replies; 40+ messages in thread
From: Aurelien Jarno @ 2011-01-11 23:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, agraf

On Mon, Jan 10, 2011 at 07:23:42PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/README    |   14 ++++++++++++
>  tcg/tcg-op.h  |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-opc.h |    6 +++++
>  3 files changed, 84 insertions(+), 0 deletions(-)

Acked-by: Aurelien Jarno <aurelien@aurel32.net>

> diff --git a/tcg/README b/tcg/README
> index a18a87f..a50ecc6 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -285,6 +285,20 @@ the four high order bytes are set to zero.
>  Indicate that the value of t0 won't be used later. It is useful to
>  force dead code elimination.
>  
> +* deposit_i32/i64 dest, t1, t2, pos, loc
> +
> +Deposit T2 as a bitfield into T1, placing the result in DEST.
> +The bitfield is described by POS/LOC, which are immediate values:
> +
> +  LEN - the length of the bitfield
> +  POS - the position of the first bit, counting from the LSB
> +
> +For example, pos=8, len=4 indicates a 4-bit field at bit 8.
> +This operation would be equivalent to
> +
> +  dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
> +
> +
>  ********* Conditional moves
>  
>  * setcond_i32/i64 cond, dest, t1, t2
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 3ee0a58..207a89f 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -254,6 +254,30 @@ static inline void tcg_gen_op5i_i64(TCGOpcode opc, TCGv_i64 arg1, TCGv_i64 arg2,
>      *gen_opparam_ptr++ = arg5;
>  }
>  
> +static inline void tcg_gen_op5ii_i32(TCGOpcode opc, TCGv_i32 arg1,
> +                                     TCGv_i32 arg2, TCGv_i32 arg3,
> +                                     TCGArg arg4, TCGArg arg5)
> +{
> +    *gen_opc_ptr++ = opc;
> +    *gen_opparam_ptr++ = GET_TCGV_I32(arg1);
> +    *gen_opparam_ptr++ = GET_TCGV_I32(arg2);
> +    *gen_opparam_ptr++ = GET_TCGV_I32(arg3);
> +    *gen_opparam_ptr++ = arg4;
> +    *gen_opparam_ptr++ = arg5;
> +}
> +
> +static inline void tcg_gen_op5ii_i64(TCGOpcode opc, TCGv_i64 arg1,
> +                                     TCGv_i64 arg2, TCGv_i64 arg3,
> +                                     TCGArg arg4, TCGArg arg5)
> +{
> +    *gen_opc_ptr++ = opc;
> +    *gen_opparam_ptr++ = GET_TCGV_I64(arg1);
> +    *gen_opparam_ptr++ = GET_TCGV_I64(arg2);
> +    *gen_opparam_ptr++ = GET_TCGV_I64(arg3);
> +    *gen_opparam_ptr++ = arg4;
> +    *gen_opparam_ptr++ = arg5;
> +}
> +
>  static inline void tcg_gen_op6_i32(TCGOpcode opc, TCGv_i32 arg1, TCGv_i32 arg2,
>                                     TCGv_i32 arg3, TCGv_i32 arg4, TCGv_i32 arg5,
>                                     TCGv_i32 arg6)
> @@ -2071,6 +2095,44 @@ static inline void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      }
>  }
>  
> +static inline void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1,
> +				       TCGv_i32 arg2, unsigned int ofs,
> +				       unsigned int len)
> +{
> +#ifdef TCG_TARGET_HAS_deposit_i32
> +  tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, arg1, arg2, ofs, len);
> +#else
> +  uint32_t mask = (1u << len) - 1;
> +  TCGv_i32 t1 = tcg_temp_new_i32 ();
> +
> +  tcg_gen_andi_i32(t1, arg2, mask);
> +  tcg_gen_shli_i32(t1, t1, ofs);
> +  tcg_gen_andi_i32(ret, arg1, ~(mask << ofs));
> +  tcg_gen_or_i32(ret, ret, t1);
> +
> +  tcg_temp_free_i32(t1);
> +#endif
> +}
> +
> +static inline void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1,
> +				       TCGv_i64 arg2, unsigned int ofs,
> +				       unsigned int len)
> +{
> +#ifdef TCG_TARGET_HAS_deposit_i64
> +  tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, arg1, arg2, ofs, len);
> +#else
> +  uint64_t mask = (1ull << len) - 1;
> +  TCGv_i64 t1 = tcg_temp_new_i64 ();
> +
> +  tcg_gen_andi_i64(t1, arg2, mask);
> +  tcg_gen_shli_i64(t1, t1, ofs);
> +  tcg_gen_andi_i64(ret, arg1, ~(mask << ofs));
> +  tcg_gen_or_i64(ret, ret, t1);
> +
> +  tcg_temp_free_i64(t1);
> +#endif
> +}
> +
>  /***************************************/
>  /* QEMU specific operations. Their type depend on the QEMU CPU
>     type. */
> @@ -2384,6 +2446,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i64
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
>  #define tcg_gen_rotri_tl tcg_gen_rotri_i64
> +#define tcg_gen_deposit_tl tcg_gen_deposit_i64
>  #define tcg_const_tl tcg_const_i64
>  #define tcg_const_local_tl tcg_const_local_i64
>  #else
> @@ -2454,6 +2517,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i32
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
>  #define tcg_gen_rotri_tl tcg_gen_rotri_i32
> +#define tcg_gen_deposit_tl tcg_gen_deposit_i32
>  #define tcg_const_tl tcg_const_i32
>  #define tcg_const_local_tl tcg_const_local_i32
>  #endif
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 2a98fed..2c7ca1a 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -78,6 +78,9 @@ DEF(sar_i32, 1, 2, 0, 0)
>  DEF(rotl_i32, 1, 2, 0, 0)
>  DEF(rotr_i32, 1, 2, 0, 0)
>  #endif
> +#ifdef TCG_TARGET_HAS_deposit_i32
> +DEF(deposit_i32, 1, 2, 2, 0)
> +#endif
>  
>  DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  #if TCG_TARGET_REG_BITS == 32
> @@ -168,6 +171,9 @@ DEF(sar_i64, 1, 2, 0, 0)
>  DEF(rotl_i64, 1, 2, 0, 0)
>  DEF(rotr_i64, 1, 2, 0, 0)
>  #endif
> +#ifdef TCG_TARGET_HAS_deposit_i64
> +DEF(deposit_i64, 1, 2, 2, 0)
> +#endif
>  
>  DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  #ifdef TCG_TARGET_HAS_ext8s_i64
> -- 
> 1.7.2.3
> 
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation.
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation Richard Henderson
  2011-01-11 23:45   ` Aurelien Jarno
@ 2011-01-12 11:00   ` Edgar E. Iglesias
  1 sibling, 0 replies; 40+ messages in thread
From: Edgar E. Iglesias @ 2011-01-12 11:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Mon, Jan 10, 2011 at 07:23:42PM -0800, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/README    |   14 ++++++++++++
>  tcg/tcg-op.h  |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-opc.h |    6 +++++
>  3 files changed, 84 insertions(+), 0 deletions(-)
> 
> diff --git a/tcg/README b/tcg/README
> index a18a87f..a50ecc6 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -285,6 +285,20 @@ the four high order bytes are set to zero.
>  Indicate that the value of t0 won't be used later. It is useful to
>  force dead code elimination.
>  
> +* deposit_i32/i64 dest, t1, t2, pos, loc
                                        ^^^
                                  Should maybe be len.

> +
> +Deposit T2 as a bitfield into T1, placing the result in DEST.
> +The bitfield is described by POS/LOC, which are immediate values:
                                    ^^^
                                    LEN


The rest looks good, thanks.

Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>


> +
> +  LEN - the length of the bitfield
> +  POS - the position of the first bit, counting from the LSB
> +
> +For example, pos=8, len=4 indicates a 4-bit field at bit 8.
> +This operation would be equivalent to
> +
> +  dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
> +
> +
>  ********* Conditional moves
>  
>  * setcond_i32/i64 cond, dest, t1, t2
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index 3ee0a58..207a89f 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -254,6 +254,30 @@ static inline void tcg_gen_op5i_i64(TCGOpcode opc, TCGv_i64 arg1, TCGv_i64 arg2,
>      *gen_opparam_ptr++ = arg5;
>  }
>  
> +static inline void tcg_gen_op5ii_i32(TCGOpcode opc, TCGv_i32 arg1,
> +                                     TCGv_i32 arg2, TCGv_i32 arg3,
> +                                     TCGArg arg4, TCGArg arg5)
> +{
> +    *gen_opc_ptr++ = opc;
> +    *gen_opparam_ptr++ = GET_TCGV_I32(arg1);
> +    *gen_opparam_ptr++ = GET_TCGV_I32(arg2);
> +    *gen_opparam_ptr++ = GET_TCGV_I32(arg3);
> +    *gen_opparam_ptr++ = arg4;
> +    *gen_opparam_ptr++ = arg5;
> +}
> +
> +static inline void tcg_gen_op5ii_i64(TCGOpcode opc, TCGv_i64 arg1,
> +                                     TCGv_i64 arg2, TCGv_i64 arg3,
> +                                     TCGArg arg4, TCGArg arg5)
> +{
> +    *gen_opc_ptr++ = opc;
> +    *gen_opparam_ptr++ = GET_TCGV_I64(arg1);
> +    *gen_opparam_ptr++ = GET_TCGV_I64(arg2);
> +    *gen_opparam_ptr++ = GET_TCGV_I64(arg3);
> +    *gen_opparam_ptr++ = arg4;
> +    *gen_opparam_ptr++ = arg5;
> +}
> +
>  static inline void tcg_gen_op6_i32(TCGOpcode opc, TCGv_i32 arg1, TCGv_i32 arg2,
>                                     TCGv_i32 arg3, TCGv_i32 arg4, TCGv_i32 arg5,
>                                     TCGv_i32 arg6)
> @@ -2071,6 +2095,44 @@ static inline void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
>      }
>  }
>  
> +static inline void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1,
> +				       TCGv_i32 arg2, unsigned int ofs,
> +				       unsigned int len)
> +{
> +#ifdef TCG_TARGET_HAS_deposit_i32
> +  tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, arg1, arg2, ofs, len);
> +#else
> +  uint32_t mask = (1u << len) - 1;
> +  TCGv_i32 t1 = tcg_temp_new_i32 ();
> +
> +  tcg_gen_andi_i32(t1, arg2, mask);
> +  tcg_gen_shli_i32(t1, t1, ofs);
> +  tcg_gen_andi_i32(ret, arg1, ~(mask << ofs));
> +  tcg_gen_or_i32(ret, ret, t1);
> +
> +  tcg_temp_free_i32(t1);
> +#endif
> +}
> +
> +static inline void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1,
> +				       TCGv_i64 arg2, unsigned int ofs,
> +				       unsigned int len)
> +{
> +#ifdef TCG_TARGET_HAS_deposit_i64
> +  tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, arg1, arg2, ofs, len);
> +#else
> +  uint64_t mask = (1ull << len) - 1;
> +  TCGv_i64 t1 = tcg_temp_new_i64 ();
> +
> +  tcg_gen_andi_i64(t1, arg2, mask);
> +  tcg_gen_shli_i64(t1, t1, ofs);
> +  tcg_gen_andi_i64(ret, arg1, ~(mask << ofs));
> +  tcg_gen_or_i64(ret, ret, t1);
> +
> +  tcg_temp_free_i64(t1);
> +#endif
> +}
> +
>  /***************************************/
>  /* QEMU specific operations. Their type depend on the QEMU CPU
>     type. */
> @@ -2384,6 +2446,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i64
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i64
>  #define tcg_gen_rotri_tl tcg_gen_rotri_i64
> +#define tcg_gen_deposit_tl tcg_gen_deposit_i64
>  #define tcg_const_tl tcg_const_i64
>  #define tcg_const_local_tl tcg_const_local_i64
>  #else
> @@ -2454,6 +2517,7 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
>  #define tcg_gen_rotli_tl tcg_gen_rotli_i32
>  #define tcg_gen_rotr_tl tcg_gen_rotr_i32
>  #define tcg_gen_rotri_tl tcg_gen_rotri_i32
> +#define tcg_gen_deposit_tl tcg_gen_deposit_i32
>  #define tcg_const_tl tcg_const_i32
>  #define tcg_const_local_tl tcg_const_local_i32
>  #endif
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 2a98fed..2c7ca1a 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -78,6 +78,9 @@ DEF(sar_i32, 1, 2, 0, 0)
>  DEF(rotl_i32, 1, 2, 0, 0)
>  DEF(rotr_i32, 1, 2, 0, 0)
>  #endif
> +#ifdef TCG_TARGET_HAS_deposit_i32
> +DEF(deposit_i32, 1, 2, 2, 0)
> +#endif
>  
>  DEF(brcond_i32, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  #if TCG_TARGET_REG_BITS == 32
> @@ -168,6 +171,9 @@ DEF(sar_i64, 1, 2, 0, 0)
>  DEF(rotl_i64, 1, 2, 0, 0)
>  DEF(rotr_i64, 1, 2, 0, 0)
>  #endif
> +#ifdef TCG_TARGET_HAS_deposit_i64
> +DEF(deposit_i64, 1, 2, 2, 0)
> +#endif
>  
>  DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  #ifdef TCG_TARGET_HAS_ext8s_i64
> -- 
> 1.7.2.3
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] target-i386: Use deposit operation.
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 6/7] target-i386: Use " Richard Henderson
@ 2011-01-12 11:01   ` Edgar E. Iglesias
  2011-01-20 11:06   ` Aurelien Jarno
  1 sibling, 0 replies; 40+ messages in thread
From: Edgar E. Iglesias @ 2011-01-12 11:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Mon, Jan 10, 2011 at 07:23:47PM -0800, Richard Henderson wrote:
> Use this for assignment to the low byte or low word of a register.

Looks good.

Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>


> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-i386/translate.c |   34 ++++++----------------------------
>  1 files changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 7b6e3c2..c008450 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -274,28 +274,16 @@ static inline void gen_op_andl_A0_ffff(void)
>  
>  static inline void gen_op_mov_reg_v(int ot, int reg, TCGv t0)
>  {
> -    TCGv tmp;
> -
>      switch(ot) {
>      case OT_BYTE:
> -        tmp = tcg_temp_new();
> -        tcg_gen_ext8u_tl(tmp, t0);
>          if (reg < 4 X86_64_DEF( || reg >= 8 || x86_64_hregs)) {
> -            tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xff);
> -            tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
> +            tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], t0, 0, 8);
>          } else {
> -            tcg_gen_shli_tl(tmp, tmp, 8);
> -            tcg_gen_andi_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], ~0xff00);
> -            tcg_gen_or_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], tmp);
> +            tcg_gen_deposit_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], t0, 8, 8);
>          }
> -        tcg_temp_free(tmp);
>          break;
>      case OT_WORD:
> -        tmp = tcg_temp_new();
> -        tcg_gen_ext16u_tl(tmp, t0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
> -        tcg_temp_free(tmp);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], t0, 0, 16);
>          break;
>      default: /* XXX this shouldn't be reached;  abort? */
>      case OT_LONG:
> @@ -323,15 +311,9 @@ static inline void gen_op_mov_reg_T1(int ot, int reg)
>  
>  static inline void gen_op_mov_reg_A0(int size, int reg)
>  {
> -    TCGv tmp;
> -
>      switch(size) {
>      case 0:
> -        tmp = tcg_temp_new();
> -        tcg_gen_ext16u_tl(tmp, cpu_A0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
> -        tcg_temp_free(tmp);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_A0, 0, 16);
>          break;
>      default: /* XXX this shouldn't be reached;  abort? */
>      case 1:
> @@ -415,9 +397,7 @@ static inline void gen_op_add_reg_im(int size, int reg, int32_t val)
>      switch(size) {
>      case 0:
>          tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val);
> -        tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0, 0, 16);
>          break;
>      case 1:
>          tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val);
> @@ -439,9 +419,7 @@ static inline void gen_op_add_reg_T0(int size, int reg)
>      switch(size) {
>      case 0:
>          tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]);
> -        tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0, 0, 16);
>          break;
>      case 1:
>          tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]);
> -- 
> 1.7.2.3
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] target-i386: Use deposit operation.
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 6/7] target-i386: Use " Richard Henderson
  2011-01-12 11:01   ` Edgar E. Iglesias
@ 2011-01-20 11:06   ` Aurelien Jarno
  1 sibling, 0 replies; 40+ messages in thread
From: Aurelien Jarno @ 2011-01-20 11:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, agraf

On Mon, Jan 10, 2011 at 07:23:47PM -0800, Richard Henderson wrote:
> Use this for assignment to the low byte or low word of a register.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-i386/translate.c |   34 ++++++----------------------------
>  1 files changed, 6 insertions(+), 28 deletions(-)

Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>

> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 7b6e3c2..c008450 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -274,28 +274,16 @@ static inline void gen_op_andl_A0_ffff(void)
>  
>  static inline void gen_op_mov_reg_v(int ot, int reg, TCGv t0)
>  {
> -    TCGv tmp;
> -
>      switch(ot) {
>      case OT_BYTE:
> -        tmp = tcg_temp_new();
> -        tcg_gen_ext8u_tl(tmp, t0);
>          if (reg < 4 X86_64_DEF( || reg >= 8 || x86_64_hregs)) {
> -            tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xff);
> -            tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
> +            tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], t0, 0, 8);
>          } else {
> -            tcg_gen_shli_tl(tmp, tmp, 8);
> -            tcg_gen_andi_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], ~0xff00);
> -            tcg_gen_or_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], tmp);
> +            tcg_gen_deposit_tl(cpu_regs[reg - 4], cpu_regs[reg - 4], t0, 8, 8);
>          }
> -        tcg_temp_free(tmp);
>          break;
>      case OT_WORD:
> -        tmp = tcg_temp_new();
> -        tcg_gen_ext16u_tl(tmp, t0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
> -        tcg_temp_free(tmp);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], t0, 0, 16);
>          break;
>      default: /* XXX this shouldn't be reached;  abort? */
>      case OT_LONG:
> @@ -323,15 +311,9 @@ static inline void gen_op_mov_reg_T1(int ot, int reg)
>  
>  static inline void gen_op_mov_reg_A0(int size, int reg)
>  {
> -    TCGv tmp;
> -
>      switch(size) {
>      case 0:
> -        tmp = tcg_temp_new();
> -        tcg_gen_ext16u_tl(tmp, cpu_A0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], tmp);
> -        tcg_temp_free(tmp);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_A0, 0, 16);
>          break;
>      default: /* XXX this shouldn't be reached;  abort? */
>      case 1:
> @@ -415,9 +397,7 @@ static inline void gen_op_add_reg_im(int size, int reg, int32_t val)
>      switch(size) {
>      case 0:
>          tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val);
> -        tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0, 0, 16);
>          break;
>      case 1:
>          tcg_gen_addi_tl(cpu_tmp0, cpu_regs[reg], val);
> @@ -439,9 +419,7 @@ static inline void gen_op_add_reg_T0(int size, int reg)
>      switch(size) {
>      case 0:
>          tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]);
> -        tcg_gen_ext16u_tl(cpu_tmp0, cpu_tmp0);
> -        tcg_gen_andi_tl(cpu_regs[reg], cpu_regs[reg], ~0xffff);
> -        tcg_gen_or_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0);
> +        tcg_gen_deposit_tl(cpu_regs[reg], cpu_regs[reg], cpu_tmp0, 0, 16);
>          break;
>      case 1:
>          tcg_gen_add_tl(cpu_tmp0, cpu_regs[reg], cpu_T[0]);
> -- 
> 1.7.2.3
> 
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2
  2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
                   ` (7 preceding siblings ...)
  2011-01-11 23:45 ` [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Aurelien Jarno
@ 2011-01-20 11:31 ` Edgar E. Iglesias
  8 siblings, 0 replies; 40+ messages in thread
From: Edgar E. Iglesias @ 2011-01-20 11:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Mon, Jan 10, 2011 at 07:23:41PM -0800, Richard Henderson wrote:
> Changes since v1:
>   * No attempt to pack pos+len into one operand.  Updated backends
>     to match this change.
> 
>   * Example in the README is a bit more complex.
> 
>   * Define an official tcg_scratch_alloc routine, used by the i386
>     target for the case in which we need a scratch register.  I had
>     said that triggering this wasn't possible with mainline, but I
>     was wrong.  The rlwimi example I posted in the other thread hits
>     this case.
> 
> Aurelien suggested using a shorter name like "dep", but I think that
> would be more confusing than not.  This opcode is not really used
> enough to warrent cropping 4 characters, in my opinion.

Thanks Richard. In agreement with Aurelien I've applied patch 1 and 6
as a first step. I've also applied a follow-up patch to fix the
copy-pastos in the README (loc -> len).

Hopefully as more acks come in, we'll apply the rest.

Cheers

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-11  3:23 ` [Qemu-devel] [PATCH 5/7] tcg-i386: " Richard Henderson
@ 2011-01-25 12:27   ` Edgar E. Iglesias
  2011-01-25 16:13     ` Richard Henderson
  2011-01-31  8:33     ` Aurelien Jarno
  0 siblings, 2 replies; 40+ messages in thread
From: Edgar E. Iglesias @ 2011-01-25 12:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Mon, Jan 10, 2011 at 07:23:46PM -0800, Richard Henderson wrote:
> Special case deposits that are implementable with byte and word stores.
> Otherwise implement with double-word shift plus rotates.
> 
> Expose tcg_scratch_alloc to the backend for allocation of scratch registers.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>

Hi,

I've tested this patch a bit and got mixed results. I tested with patched
CRIS and MicroBlaze translators. The patch works OK (it doesn't break
anything) for the usecases I had but I saw a bit of a slowdown with
MicroBlaze (compare to not using deposit at all).

I suspect that the fast 8 and 16 bit x86 deposits are giving me a slight
speedup with CRIS. But MicroBlaze uses one bit fields into bit 2 and
31. Those seem to be slower with deposit than with other tcg sequences.

I would have guessed that at worst, this patch would be equally fast
as any TCG sequence. Am I missing something?

These are the patches I've applied:

Microblaze translator:
diff --git a/target-microblaze/translate.c b/target-microblaze/translate.c
index 2207431..39ab3a5 100644
--- a/target-microblaze/translate.c
+++ b/target-microblaze/translate.c
@@ -160,6 +160,7 @@ static void read_carry(DisasContext *dc, TCGv d)
 
 static void write_carry(DisasContext *dc, TCGv v)
 {
+#if 0
     TCGv t0 = tcg_temp_new();
     tcg_gen_shli_tl(t0, v, 31);
     tcg_gen_sari_tl(t0, t0, 31);
@@ -168,6 +169,10 @@ static void write_carry(DisasContext *dc, TCGv v)
                     ~(MSR_C | MSR_CC));
     tcg_gen_or_tl(cpu_SR[SR_MSR], cpu_SR[SR_MSR], t0);
     tcg_temp_free(t0);
+#else
+    tcg_gen_deposit_tl(cpu_SR[SR_MSR], cpu_SR[SR_MSR], v, 2, 1);
+    tcg_gen_deposit_tl(cpu_SR[SR_MSR], cpu_SR[SR_MSR], v, 31, 1);
+#endif
 }


CRIS translator:
commit 9f427e14b2535a067bf046fea093f28cfaa92f7f
Author: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Date:   Fri Jan 21 22:09:44 2011 +0100

    cris: Use deposit for ALU writeback
    
    Most ALU insns on CRIS have deposit semantics in the writeback
    stage. Use the new deposit tcg operation to perform the write
    back to registers.
    
    Move the extract of the result into cc_result to the slow path
    in evaluate_flags.
    
    Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>

diff --git a/target-cris/translate.c b/target-cris/translate.c
index f4cc125..018ce68 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -861,11 +861,6 @@ static void cris_alu_op_exec(DisasContext *dc, int op,
 			BUG();
 			break;
 	}
-
-	if (size == 1)
-		tcg_gen_andi_tl(dst, dst, 0xff);
-	else if (size == 2)
-		tcg_gen_andi_tl(dst, dst, 0xffff);
 }
 
 static void cris_alu(DisasContext *dc, int op,
@@ -880,6 +875,7 @@ static void cris_alu(DisasContext *dc, int op,
 		tmp = tcg_temp_new();
 		writeback = 0;
 	} else if (size == 4) {
+		/* We write directly into the dest.  */
 		tmp = d;
 		writeback = 0;
 	} else
@@ -892,11 +888,7 @@ static void cris_alu(DisasContext *dc, int op,
 
 	/* Writeback.  */
 	if (writeback) {
-		if (size == 1)
-			tcg_gen_andi_tl(d, d, ~0xff);
-		else
-			tcg_gen_andi_tl(d, d, ~0xffff);
-		tcg_gen_or_tl(d, d, tmp);
+		tcg_gen_deposit_tl(d, d, tmp, 0, size * 8);
 	}
 	if (!TCGV_EQUAL(tmp, d))
 		tcg_temp_free(tmp);
@@ -941,6 +933,10 @@ static void gen_tst_cc (DisasContext *dc, TCGv cc, int cond)
 	 * When this function is done, T0 should be non-zero if the condition
 	 * code is true.
 	 */
+	if (dc->cc_size != 4) {
+		tcg_gen_andi_tl(cc_result, cc_result,
+				(1 << (dc->cc_size * 8)) - 1);
+	}
 	arith_opt = arith_cc(dc) && !dc->flags_uptodate;
 	move_opt = (dc->cc_op == CC_OP_MOVE);
 	switch (cond) {

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-25 12:27   ` Edgar E. Iglesias
@ 2011-01-25 16:13     ` Richard Henderson
  2011-01-25 16:48       ` Edgar E. Iglesias
  2011-01-31  8:33     ` Aurelien Jarno
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-25 16:13 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: qemu-devel, aurelien, agraf

On 01/25/2011 04:27 AM, Edgar E. Iglesias wrote:
> I've tested this patch a bit and got mixed results. I tested with patched
> CRIS and MicroBlaze translators. The patch works OK (it doesn't break
> anything) for the usecases I had but I saw a bit of a slowdown with
> MicroBlaze (compare to not using deposit at all).
> 
> I suspect that the fast 8 and 16 bit x86 deposits are giving me a slight
> speedup with CRIS. But MicroBlaze uses one bit fields into bit 2 and
> 31. Those seem to be slower with deposit than with other tcg sequences.
> 
> I would have guessed that at worst, this patch would be equally fast
> as any TCG sequence. Am I missing something?

With or without the i386 tcg-target.c changes?

If without, then I'm stumped, since it looks like identical tcg ops
being emitted.

It with, then perhaps SHLD is slower than I thought.  I see that GCC
lists this insn as "vector decoded" for AMD cores, as opposed to
"direct decoded".  If this insn is indeed microcoded on some hosts
then maybe the i386 tcg-target patch isn't such a great idea.

That said, there are still other tcg targets which support this 
operation directly.  I would be shocked if you measured a slowdown
with these changes on a ppc host, for instance.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-25 16:13     ` Richard Henderson
@ 2011-01-25 16:48       ` Edgar E. Iglesias
  2011-01-25 22:07         ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Edgar E. Iglesias @ 2011-01-25 16:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Tue, Jan 25, 2011 at 08:13:53AM -0800, Richard Henderson wrote:
> On 01/25/2011 04:27 AM, Edgar E. Iglesias wrote:
> > I've tested this patch a bit and got mixed results. I tested with patched
> > CRIS and MicroBlaze translators. The patch works OK (it doesn't break
> > anything) for the usecases I had but I saw a bit of a slowdown with
> > MicroBlaze (compare to not using deposit at all).
> > 
> > I suspect that the fast 8 and 16 bit x86 deposits are giving me a slight
> > speedup with CRIS. But MicroBlaze uses one bit fields into bit 2 and
> > 31. Those seem to be slower with deposit than with other tcg sequences.
> > 
> > I would have guessed that at worst, this patch would be equally fast
> > as any TCG sequence. Am I missing something?
> 
> With or without the i386 tcg-target.c changes?
> 
> If without, then I'm stumped, since it looks like identical tcg ops
> being emitted.

It's with the tcg-target patch.

> It with, then perhaps SHLD is slower than I thought.  I see that GCC
> lists this insn as "vector decoded" for AMD cores, as opposed to
> "direct decoded".  If this insn is indeed microcoded on some hosts
> then maybe the i386 tcg-target patch isn't such a great idea.

OK, I see. Maybe we should try to emit an insn sequence more similar
to what tcg was emitting (for the non 8 & 16-bit deposits)?
That ought too at least give similar results as before for those and
give us a speedup for the byte and word moves.

> That said, there are still other tcg targets which support this 
> operation directly.  I would be shocked if you measured a slowdown
> with these changes on a ppc host, for instance.

Yep, agreed.

Cheers

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-25 16:48       ` Edgar E. Iglesias
@ 2011-01-25 22:07         ` Richard Henderson
  2011-01-26  8:53           ` Edgar E. Iglesias
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-25 22:07 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: qemu-devel, aurelien, agraf

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

On 01/25/2011 08:48 AM, Edgar E. Iglesias wrote:
> OK, I see. Maybe we should try to emit an insn sequence more similar
> to what tcg was emitting (for the non 8 & 16-bit deposits)?
> That ought too at least give similar results as before for those and
> give us a speedup for the byte and word moves.

Please try this replacement version for tcg/i386/*.

I was able to run your cris testsuite, and stuff looked ok there.
But for some reason the microblaze kernel would not boot.  It seems
that the kernel commandline isn't in place properly and it isn't 
finding the disk image.


r~

[-- Attachment #2: z --]
[-- Type: text/plain, Size: 7146 bytes --]

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index bb19a95..ec49b34 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -258,7 +258,8 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define OPC_JMP_long	(0xe9)
 #define OPC_JMP_short	(0xeb)
 #define OPC_LEA         (0x8d)
-#define OPC_MOVB_EvGv	(0x88)		/* stores, more or less */
+#define OPC_MOVB_EbGb	(0x88)		/* stores, more or less */
+#define OPC_MOVB_GbEb   (0x8a)		/* loads, more or less */
 #define OPC_MOVL_EvGv	(0x89)		/* stores, more or less */
 #define OPC_MOVL_GvEv	(0x8b)		/* loads, more or less */
 #define OPC_MOVL_EvIz	(0xc7)
@@ -277,6 +278,7 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define OPC_SHIFT_1	(0xd1)
 #define OPC_SHIFT_Ib	(0xc1)
 #define OPC_SHIFT_cl	(0xd3)
+#define OPC_SHRD_Ib	(0xac | P_EXT)
 #define OPC_TESTL	(0x85)
 #define OPC_XCHG_ax_r32	(0x90)
 
@@ -710,6 +712,107 @@ static void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
     }
 }
 
+static void tcg_out_deposit(TCGContext *s, int inout, int val,
+                            unsigned ofs, unsigned len, int rexw)
+{
+    TCGType type = rexw ? TCG_TYPE_I64 : TCG_TYPE_I32;
+    tcg_target_ulong imask, vmask;
+    TCGRegSet live;
+    int scratch;
+
+    /* Look for MOVB w/ %reg_h special case.  */
+    if (ofs == 8 && len == 8 && inout < 4 && val < 4) {
+        tcg_out_modrm(s, OPC_MOVB_GbEb, inout + 4, val);
+        return;
+    }
+
+    /* Look for MOVB/MOVW special cases.  */
+    if (len == 16
+        || (len == 8
+            && (TCG_TARGET_REG_BITS == 64 || (inout < 4 && val < 4)))) {
+        /* If the offset is non-zero, and we have a deposit from self,
+           then we need a tempoarary.  */
+        if (ofs != 0 && inout == val) {
+            tcg_regset_clear(live);
+            tcg_regset_set_reg(live, inout);
+            val = tcg_scratch_alloc(s, type, live);
+            tcg_out_mov(s, type, val, inout);
+        }
+
+        /* If the offset is non-zero, rotate the destination into place.  */
+        if (ofs != 0) {
+            tcg_out_shifti(s, SHIFT_ROR + rexw, inout, ofs);
+        }
+
+        if (len == 8) {
+            tcg_out_modrm(s, OPC_MOVB_GbEb + P_REXB_R + P_REXB_RM, inout, val);
+        } else {
+            tcg_out_modrm(s, OPC_MOVL_GvEv + P_DATA16, inout, val);
+        }
+
+        /* Restore the destination to its proper location.  */
+        if (ofs != 0) {
+            tcg_out_shifti(s, SHIFT_ROL + rexw, inout, ofs);
+        }
+        return;
+    }
+
+    /* Otherwise we can't support this operation natively.  It's possible to
+       play tricks with rotates and shld in order to implement this.  While
+       this is much smaller than masks, but it turns out that shld is too slow
+       on many cpus.  */
+    tcg_regset_clear(live);
+    tcg_regset_set_reg(live, inout);
+    tcg_regset_set_reg(live, val);
+    scratch = tcg_scratch_alloc(s, type, live);
+
+    vmask = ((tcg_target_ulong)1 << len) - 1;
+    imask = ~(vmask << ofs);
+
+    /* Careful, some 64-bit masks cannot use immediate operands.  */
+    if (type == TCG_TYPE_I64 && imask != (int32_t)imask) {
+        bool val_scratch = false;
+
+        /* Since we are going to clobber INOUT first, the destination
+           bitfield cannot overlap the input bits.  */
+        if (inout == val && ofs < len) {
+            tcg_regset_set_reg(live, scratch);
+            val = tcg_scratch_alloc(s, type, live);
+            tcg_out_mov(s, type, val, inout);
+            val_scratch = true;
+        }
+
+        tcg_out_movi(s, type, scratch, imask);
+        tgen_arithr(s, ARITH_AND + rexw, inout, scratch);
+
+        if (vmask > 0xffffffffu) {
+            tcg_out_movi(s, type, scratch, vmask);
+            tgen_arithr(s, ARITH_AND + P_REXW, scratch, val);
+        } else {
+            if (val_scratch) {
+                scratch = val;
+            } else {
+                tcg_out_mov(s, TCG_TYPE_I32, scratch, val);
+            }
+            tgen_arithi(s, ARITH_AND, scratch, vmask, 0);
+        }
+
+        tcg_out_shifti(s, SHIFT_SHL + P_REXW, scratch, ofs);
+        tgen_arithr(s, ARITH_OR + P_REXW, inout, scratch);
+        return;
+    }
+
+    /* Both IMASK and VMASK are valid immediate operands, which means that
+       VAL may be treated as a 32-bit value.  */
+    tcg_out_mov(s, TCG_TYPE_I32, scratch, val);
+    tgen_arithi(s, ARITH_AND, scratch, vmask, 0);
+    tcg_out_shifti (s, SHIFT_SHL + rexw, scratch, ofs);
+
+    tgen_arithi(s, ARITH_AND + rexw, inout, imask, 0);
+    tgen_arithr(s, ARITH_OR + rexw, inout, scratch);
+}
+
+
 /* Use SMALL != 0 to force a short forward branch.  */
 static void tcg_out_jxx(TCGContext *s, int opc, int label_index, int small)
 {
@@ -1266,7 +1369,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi,
 
     switch (sizeop) {
     case 0:
-        tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R, datalo, base, ofs);
+        tcg_out_modrm_offset(s, OPC_MOVB_EbGb + P_REXB_R, datalo, base, ofs);
         break;
     case 1:
         if (bswap) {
@@ -1504,7 +1607,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     OP_32_64(st8):
-        tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R,
+        tcg_out_modrm_offset(s, OPC_MOVB_EbGb | P_REXB_R,
                              args[0], args[1], args[2]);
         break;
     OP_32_64(st16):
@@ -1603,6 +1706,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    OP_32_64(deposit):
+        tcg_out_deposit(s, args[0], args[2], args[3], args[4], rexw);
+        break;
+
     case INDEX_op_brcond_i32:
         tcg_out_brcond32(s, args[2], args[0], args[1], const_args[1],
                          args[3], 0);
@@ -1783,6 +1890,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i32, { "r", "0", "ci" } },
     { INDEX_op_rotl_i32, { "r", "0", "ci" } },
     { INDEX_op_rotr_i32, { "r", "0", "ci" } },
+    { INDEX_op_deposit_i32, { "r", "0", "r" } },
 
     { INDEX_op_brcond_i32, { "r", "ri" } },
 
@@ -1835,6 +1943,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_sar_i64, { "r", "0", "ci" } },
     { INDEX_op_rotl_i64, { "r", "0", "ci" } },
     { INDEX_op_rotr_i64, { "r", "0", "ci" } },
+    { INDEX_op_deposit_i64, { "r", "0", "r" } },
 
     { INDEX_op_brcond_i64, { "r", "re" } },
     { INDEX_op_setcond_i64, { "r", "r", "re" } },
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index bfafbfc..9f90d17 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -77,6 +77,7 @@ enum {
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32
 #define TCG_TARGET_HAS_rot_i32
+#define TCG_TARGET_HAS_deposit_i32
 #define TCG_TARGET_HAS_ext8s_i32
 #define TCG_TARGET_HAS_ext16s_i32
 #define TCG_TARGET_HAS_ext8u_i32
@@ -94,6 +95,7 @@ enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_div2_i64
 #define TCG_TARGET_HAS_rot_i64
+#define TCG_TARGET_HAS_deposit_i64
 #define TCG_TARGET_HAS_ext8s_i64
 #define TCG_TARGET_HAS_ext16s_i64
 #define TCG_TARGET_HAS_ext32s_i64

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-25 22:07         ` Richard Henderson
@ 2011-01-26  8:53           ` Edgar E. Iglesias
  2011-01-26  9:23             ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Edgar E. Iglesias @ 2011-01-26  8:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, aurelien, agraf

On Tue, Jan 25, 2011 at 02:07:15PM -0800, Richard Henderson wrote:
> On 01/25/2011 08:48 AM, Edgar E. Iglesias wrote:
> > OK, I see. Maybe we should try to emit an insn sequence more similar
> > to what tcg was emitting (for the non 8 & 16-bit deposits)?
> > That ought too at least give similar results as before for those and
> > give us a speedup for the byte and word moves.
> 
> Please try this replacement version for tcg/i386/*.
> 
> I was able to run your cris testsuite, and stuff looked ok there.
> But for some reason the microblaze kernel would not boot.  It seems
> that the kernel commandline isn't in place properly and it isn't 
> finding the disk image.

Yes, you need to build qemu with libfdt for that image to boot.
IIRC, libfdt comes with the dtc package on some dists. In the future
I'll see if can upload an image that boots without the need of
devicetree manipulation in qemu.

I tried your new patch and got similar results as before. Maaybe slightly
faster but within the noise.

I looked a little bit more at it and realized that I'm probably not doing
a fair comparition with microblaze. The write_carry sequence actually
bit deposits a bit into two locations with a sequence of tcg ops that
is not much longer than the one to deposit a single bit. So I'm basically
comparing the cost of a single tcg deposit sequence with the cost of two
tcg deposit backend ops. I should probably just accept that the new
deposit op is not worth using for that particular case.

It would be nice if somebody else also tested this patch. Before we
agree on applying it.

One note, the tcg_scratch_alloc hunk from the previous version was
missing on this one.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26  8:53           ` Edgar E. Iglesias
@ 2011-01-26  9:23             ` Alexander Graf
  2011-01-26 15:50               ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2011-01-26  9:23 UTC (permalink / raw)
  To: Edgar E.Iglesias; +Cc: qemu-devel, aurelien, Richard Henderson


On 26.01.2011, at 09:53, Edgar E. Iglesias wrote:

> On Tue, Jan 25, 2011 at 02:07:15PM -0800, Richard Henderson wrote:
>> On 01/25/2011 08:48 AM, Edgar E. Iglesias wrote:
>>> OK, I see. Maybe we should try to emit an insn sequence more similar
>>> to what tcg was emitting (for the non 8 & 16-bit deposits)?
>>> That ought too at least give similar results as before for those and
>>> give us a speedup for the byte and word moves.
>> 
>> Please try this replacement version for tcg/i386/*.
>> 
>> I was able to run your cris testsuite, and stuff looked ok there.
>> But for some reason the microblaze kernel would not boot.  It seems
>> that the kernel commandline isn't in place properly and it isn't 
>> finding the disk image.
> 
> Yes, you need to build qemu with libfdt for that image to boot.
> IIRC, libfdt comes with the dtc package on some dists. In the future
> I'll see if can upload an image that boots without the need of
> devicetree manipulation in qemu.
> 
> I tried your new patch and got similar results as before. Maaybe slightly
> faster but within the noise.
> 
> I looked a little bit more at it and realized that I'm probably not doing
> a fair comparition with microblaze. The write_carry sequence actually
> bit deposits a bit into two locations with a sequence of tcg ops that
> is not much longer than the one to deposit a single bit. So I'm basically
> comparing the cost of a single tcg deposit sequence with the cost of two
> tcg deposit backend ops. I should probably just accept that the new
> deposit op is not worth using for that particular case.
> 
> It would be nice if somebody else also tested this patch. Before we
> agree on applying it.
> 
> One note, the tcg_scratch_alloc hunk from the previous version was
> missing on this one.

I'm using the deposit instruction on S/390 with the implementation patch for x86 where it appears to not improve performance:

Booting into initrd shell:

3.53s  w/ x86 deposit patch 
3.52s  w/o x86 deposit patch


But maybe this is the same problem as with the shift opcode that was reported earlier in the thread:

agraf@toonie:/studio/s390/qemu-s390> grep deposit target-s390x/translate.c 
    tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 32);
    tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 32);
    tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 16);
    tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 8);
        tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 48, 16);
        tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 32, 16);

The 0, 32 and 0, 16 versions should get accelerated pretty well while the 32, 16 and 48, 16 are not I assume?

I'm not saying that the deposit instruction doesn't make sense btw. Code cleanup wise it was awesome - 10 lines of tcg combined into a single call :). Maybe a simple andi and ori emission for unknown masks might make more sense though?


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26  9:23             ` Alexander Graf
@ 2011-01-26 15:50               ` Richard Henderson
  2011-01-26 16:00                 ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-26 15:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

On 01/26/2011 01:23 AM, Alexander Graf wrote:
> agraf@toonie:/studio/s390/qemu-s390> grep deposit target-s390x/translate.c 
>     tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 32);
>     tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 32);
>     tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 16);
>     tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 8);
>         tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 48, 16);
>         tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 32, 16);
> 
> The 0, 32 and 0, 16 versions should get accelerated pretty well while
> the 32, 16 and 48, 16 are not I assume?

No, only the 0,16 and 0,8 deposits correspond to a hardware insn on x86.

Given that the 0,32 lowpart writeback is almost certainly the most common
operation for s390x, I doubt the deposit patch will help with an x86 host.

Have you thought about buffering the lowpart writeback in the translator?
I.e. when a 32-bit insn writes to a register, remember that value without
writing it back.  If the next insn in the TB is also 32-bit, reuse the 
saved value, etc.  Only perform the writeback for 64-bit insns using the
register as a source, end of TB, and places that can take an exception.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 15:50               ` Richard Henderson
@ 2011-01-26 16:00                 ` Alexander Graf
  2011-01-26 16:34                   ` Avi Kivity
  2011-01-26 16:40                   ` Richard Henderson
  0 siblings, 2 replies; 40+ messages in thread
From: Alexander Graf @ 2011-01-26 16:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

Richard Henderson wrote:
> On 01/26/2011 01:23 AM, Alexander Graf wrote:
>   
>> agraf@toonie:/studio/s390/qemu-s390> grep deposit target-s390x/translate.c 
>>     tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 32);
>>     tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 32);
>>     tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 16);
>>     tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 8);
>>         tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 48, 16);
>>         tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 32, 16);
>>
>> The 0, 32 and 0, 16 versions should get accelerated pretty well while
>> the 32, 16 and 48, 16 are not I assume?
>>     
>
> No, only the 0,16 and 0,8 deposits correspond to a hardware insn on x86.
>
> Given that the 0,32 lowpart writeback is almost certainly the most common
> operation for s390x, I doubt the deposit patch will help with an x86 host.
>
> Have you thought about buffering the lowpart writeback in the translator?
> I.e. when a 32-bit insn writes to a register, remember that value without
> writing it back.  If the next insn in the TB is also 32-bit, reuse the 
> saved value, etc.  Only perform the writeback for 64-bit insns using the
> register as a source, end of TB, and places that can take an exception.
>   

That's basically what uli's tcg_sync op did which got rejected upstream :).

Keeping it only inside of the translator would break on page faults, as
the lower 32 bits of the register would lie around in a temporary which
is invisible for the page fault resolver.


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 16:00                 ` Alexander Graf
@ 2011-01-26 16:34                   ` Avi Kivity
  2011-01-26 16:40                   ` Richard Henderson
  1 sibling, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2011-01-26 16:34 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien, Richard Henderson

On 01/26/2011 06:00 PM, Alexander Graf wrote:
> Richard Henderson wrote:
> >  On 01/26/2011 01:23 AM, Alexander Graf wrote:
> >
> >>  agraf@toonie:/studio/s390/qemu-s390>  grep deposit target-s390x/translate.c
> >>      tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 32);
> >>      tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 32);
> >>      tcg_gen_deposit_i64(regs[reg], regs[reg], tmp, 0, 16);
> >>      tcg_gen_deposit_i64(regs[reg], regs[reg], v, 0, 8);
> >>          tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 48, 16);
> >>          tcg_gen_deposit_i64(regs[r1], regs[r1], tmp, 32, 16);
> >>
> >>  The 0, 32 and 0, 16 versions should get accelerated pretty well while
> >>  the 32, 16 and 48, 16 are not I assume?
> >>
> >
> >  No, only the 0,16 and 0,8 deposits correspond to a hardware insn on x86.
> >
> >  Given that the 0,32 lowpart writeback is almost certainly the most common
> >  operation for s390x, I doubt the deposit patch will help with an x86 host.
> >
> >  Have you thought about buffering the lowpart writeback in the translator?
> >  I.e. when a 32-bit insn writes to a register, remember that value without
> >  writing it back.  If the next insn in the TB is also 32-bit, reuse the
> >  saved value, etc.  Only perform the writeback for 64-bit insns using the
> >  register as a source, end of TB, and places that can take an exception.
> >
>
> That's basically what uli's tcg_sync op did which got rejected upstream :).
>
> Keeping it only inside of the translator would break on page faults, as
> the lower 32 bits of the register would lie around in a temporary which
> is invisible for the page fault resolver.

It should be possible to have the exception injector look up instruction 
pointer in a bitmap in the TB, and if set, execute a completion which 
writes back the temporary to the correct register.  Not trivial, but I 
imagine useful for many optimizations which combine several target 
instructions into one host instruction (probably most useful for 
risc->cisc translations).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 16:00                 ` Alexander Graf
  2011-01-26 16:34                   ` Avi Kivity
@ 2011-01-26 16:40                   ` Richard Henderson
  2011-01-26 16:50                     ` Alexander Graf
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-26 16:40 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

On 01/26/2011 08:00 AM, Alexander Graf wrote:
> Keeping it only inside of the translator would break on page faults, as
> the lower 32 bits of the register would lie around in a temporary which
> is invisible for the page fault resolver.

Given that QEMU doesn't support truely async signals, and the fact that
the translator can tell which insns can fault, I can't imagine that this
is actually a problem.

You should get the same sequence of writebacks when translating the TB
the second time for tcg_gen_code_search_pc.

Am I totally confused here?


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 16:40                   ` Richard Henderson
@ 2011-01-26 16:50                     ` Alexander Graf
  2011-01-26 18:21                       ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2011-01-26 16:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

Richard Henderson wrote:
> On 01/26/2011 08:00 AM, Alexander Graf wrote:
>   
>> Keeping it only inside of the translator would break on page faults, as
>> the lower 32 bits of the register would lie around in a temporary which
>> is invisible for the page fault resolver.
>>     
>
> Given that QEMU doesn't support truely async signals, and the fact that
> the translator can tell which insns can fault, I can't imagine that this
> is actually a problem.
>
> You should get the same sequence of writebacks when translating the TB
> the second time for tcg_gen_code_search_pc.
>
> Am I totally confused here?
>   

Oh, you mean basically to have the following:

TCGv_i32 regs32[16];
TCGv_i64 regs[16];

Then declare both as globals with offset and just switch between the
access type using a disas struct variable. Once the TB ends, I'd
obviously have to sync back to 64 bit again.

Yes, that would work. I'll see if I can get that rolling once I'm done
with the CC optimizations :)


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 16:50                     ` Alexander Graf
@ 2011-01-26 18:21                       ` Richard Henderson
  2011-01-26 18:27                         ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-26 18:21 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

On 01/26/2011 08:50 AM, Alexander Graf wrote:
> Oh, you mean basically to have the following:
> 
> TCGv_i32 regs32[16];
> TCGv_i64 regs[16];
> 
> Then declare both as globals with offset and just switch between the
> access type using a disas struct variable. Once the TB ends, I'd
> obviously have to sync back to 64 bit again.

Maybe?  I don't think that regs32 can overlap with regs in the real
cpu structure.  Otherwise you have the same sort of sync problems as
was originally rejected.

I had been picturing regs32 as a member of DisasContext, and it would
hold tcg_temp_new_i32 variables as-needed.  Something like

#if TCG_TARGET_REG_BITS == 64

static void writeback_reg32(DisasContext *dc, int r)
{
    // ??? This macro should really exist to match TCGV_UNUSED_I32 etc.
    if (!TCGV_IS_UNUSED_I32 (dc->regs32[r])) {
        tcg_gen_deposit_tl(cpu_regs[r], cpu_regs[r],
                           MAKE_TCGV_I64 (GET_TCGV_I32 (dc->regs32[r])),
                           0, 32);
    }
}

static void writeback_all_reg32(DisasContext *dc)
{
    int i;
    for (i = 0; i < 16; ++i) {
        flush_reg32(dc, i);
    }
}

static void flush_all_reg32(DisasContext *dc)
{
    int i;
    for (i = 0; i < 16; ++i) {
        if (!TCGV_IS_UNUSED_I32 (dc->regs32[r])) {
            tcg_temp_free_i32 (dc->regs32[r]);
            TCGV_UNUSED_I32 (dc->regs32[r]);
        }
    }
}

static TCGv_i32 get_reg32(DisasContext *dc, int r)
{
    if (TCGV_IS_UNUSED_I32 (dc->regs32[r])) {
        dc->regs32[r] = tcg_temp_new_i32();
        tcg_gen_trunc_i64_i32(dc->regs32[r], cpu_regs[r]);
    }
    return dc->regs32[r];
}

static TCGv_i64 get_reg64(DisasContext *dc, int r)
{
    writeback_reg32(dc, r);
    return cpu_regs[r];
}

#elif TCG_TARGET_REG_BITS == 32

static void writeback_reg32(DisasContext *dc, int r) { }
static void writeback_all_reg32(DisasContext *dc) { }
static void flush_all_reg32(DisasContext *dc) { }

static TCGv_i32 get_reg32(DisasContext *dc, int r)
{
    return TCGV_LOW(cpu_regs[r]);
}

static TCGv_i64 get_reg64(DisasContext *dc, int r)
{
    return cpu_regs[r];
}

#endif


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 18:21                       ` Richard Henderson
@ 2011-01-26 18:27                         ` Alexander Graf
  2011-01-26 18:55                           ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2011-01-26 18:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

Richard Henderson wrote:
> On 01/26/2011 08:50 AM, Alexander Graf wrote:
>   
>> Oh, you mean basically to have the following:
>>
>> TCGv_i32 regs32[16];
>> TCGv_i64 regs[16];
>>
>> Then declare both as globals with offset and just switch between the
>> access type using a disas struct variable. Once the TB ends, I'd
>> obviously have to sync back to 64 bit again.
>>     
>
> Maybe?  I don't think that regs32 can overlap with regs in the real
> cpu structure.  Otherwise you have the same sort of sync problems as
> was originally rejected.
>
> I had been picturing regs32 as a member of DisasContext, and it would
> hold tcg_temp_new_i32 variables as-needed.  Something like
>   

That would mean that during a TB the reg32 parts would be in
temporaries, right? So they'd end up being somewhere in a register.

What do I do on a page fault now? I could rerun the translator to find
out which registers would be stuck in a temporary, but I have no way to
actually read the temporary's value, as all register state is thrown
away on a page fault IIUC.

So the only alternative to this would be that I'd keep both 32 bit and
64 bit register states around in env, declare both as global, and sync
them manually on access. I guess.


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 18:27                         ` Alexander Graf
@ 2011-01-26 18:55                           ` Richard Henderson
  2011-01-26 19:01                             ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-26 18:55 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

On 01/26/2011 10:27 AM, Alexander Graf wrote:
> What do I do on a page fault now? I could rerun the translator to find
> out which registers would be stuck in a temporary, but I have no way to
> actually read the temporary's value, as all register state is thrown
> away on a page fault IIUC.

When does a page-fault happen?

As far as I know, it does not happen at random.  Which seems to be 
what you are suggesting.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 18:55                           ` Richard Henderson
@ 2011-01-26 19:01                             ` Alexander Graf
  2011-01-26 19:05                               ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2011-01-26 19:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

Richard Henderson wrote:
> On 01/26/2011 10:27 AM, Alexander Graf wrote:
>   
>> What do I do on a page fault now? I could rerun the translator to find
>> out which registers would be stuck in a temporary, but I have no way to
>> actually read the temporary's value, as all register state is thrown
>> away on a page fault IIUC.
>>     
>
> When does a page-fault happen?
>
> As far as I know, it does not happen at random.  Which seems to be 
> what you are suggesting.
>   

It happens on load/store and potentially helpers. The main difference
IIUC between globals and temps is that globals are kept in registers as
long as possible (read: until load/store or helper or tb end gets
emitted) while temporaries are not stored back to any memory, so they
are lost on load/store.

So what you are suggesting is basically to use a different set of
globals for regs32 and to keep track of their usage throughout the TB,
so we can convert on demand. We can't use temporaries for that unless we
manually store them off on load/store/helper/tb end which means we'd
rewrite the globals treatment in target code :).


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 19:01                             ` Alexander Graf
@ 2011-01-26 19:05                               ` Richard Henderson
  2011-01-26 19:09                                 ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-26 19:05 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

On 01/26/2011 11:01 AM, Alexander Graf wrote:
>> As far as I know, it does not happen at random.  Which seems to be 
>> what you are suggesting.
> 
> It happens on load/store and potentially helpers. The main difference
> IIUC between globals and temps is that globals are kept in registers as
> long as possible (read: until load/store or helper or tb end gets
> emitted) while temporaries are not stored back to any memory, so they
> are lost on load/store.
> 
> So what you are suggesting is basically to use a different set of
> globals for regs32 and to keep track of their usage throughout the TB,
> so we can convert on demand. We can't use temporaries for that unless we
> manually store them off on load/store/helper/tb end which means we'd
> rewrite the globals treatment in target code :).

No, what I'm suggesting is manually storing the reg32 temporaries back
to their reg64 origins in the translator immediately before issuing the
load/store/helper/tbend, at which point the generic TCG bits write back
the reg64 globals to their env origin.

Do you have a pointer to your s390x tree?



r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 19:05                               ` Richard Henderson
@ 2011-01-26 19:09                                 ` Alexander Graf
  2011-01-26 19:19                                   ` Richard Henderson
  0 siblings, 1 reply; 40+ messages in thread
From: Alexander Graf @ 2011-01-26 19:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

Richard Henderson wrote:
> On 01/26/2011 11:01 AM, Alexander Graf wrote:
>   
>>> As far as I know, it does not happen at random.  Which seems to be 
>>> what you are suggesting.
>>>       
>> It happens on load/store and potentially helpers. The main difference
>> IIUC between globals and temps is that globals are kept in registers as
>> long as possible (read: until load/store or helper or tb end gets
>> emitted) while temporaries are not stored back to any memory, so they
>> are lost on load/store.
>>
>> So what you are suggesting is basically to use a different set of
>> globals for regs32 and to keep track of their usage throughout the TB,
>> so we can convert on demand. We can't use temporaries for that unless we
>> manually store them off on load/store/helper/tb end which means we'd
>> rewrite the globals treatment in target code :).
>>     
>
> No, what I'm suggesting is manually storing the reg32 temporaries back
> to their reg64 origins in the translator immediately before issuing the
> load/store/helper/tbend, at which point the generic TCG bits write back
> the reg64 globals to their env origin.
>   

Oh, I see :). That makes sense now. Thanks for the clarification :).

> Do you have a pointer to your s390x tree?
>   

Sure.

  git://repo.or.cz/qemu/agraf.git  s390

Please be aware of the fact that I'm currently reworking the whole CC
model, so if you start working on the register acceleration now there
will be conflicts for sure :). Do you have code to test it with?


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 19:09                                 ` Alexander Graf
@ 2011-01-26 19:19                                   ` Richard Henderson
  2011-01-26 19:27                                     ` Alexander Graf
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-01-26 19:19 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

On 01/26/2011 11:09 AM, Alexander Graf wrote:
> Please be aware of the fact that I'm currently reworking the whole CC
> model, so if you start working on the register acceleration now there
> will be conflicts for sure :).

Roger.

> Do you have code to test it with?

Er, I assume I can pull something out of my hercules installation,
but if you already have images tarred up, that would be helpful.


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-26 19:19                                   ` Richard Henderson
@ 2011-01-26 19:27                                     ` Alexander Graf
  0 siblings, 0 replies; 40+ messages in thread
From: Alexander Graf @ 2011-01-26 19:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E.Iglesias, qemu-devel, aurelien

Richard Henderson wrote:
> On 01/26/2011 11:09 AM, Alexander Graf wrote:
>   
>> Please be aware of the fact that I'm currently reworking the whole CC
>> model, so if you start working on the register acceleration now there
>> will be conflicts for sure :).
>>     
>
> Roger.
>
>   
>> Do you have code to test it with?
>>     
>
> Er, I assume I can pull something out of my hercules installation,
> but if you already have images tarred up, that would be helpful.
>   

I'll gladly put together something :).


Alex

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-25 12:27   ` Edgar E. Iglesias
  2011-01-25 16:13     ` Richard Henderson
@ 2011-01-31  8:33     ` Aurelien Jarno
  2011-02-08 18:05       ` Richard Henderson
  1 sibling, 1 reply; 40+ messages in thread
From: Aurelien Jarno @ 2011-01-31  8:33 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: agraf, qemu-devel, Richard Henderson

Hi,

On Tue, Jan 25, 2011 at 01:27:49PM +0100, Edgar E. Iglesias wrote:
> On Mon, Jan 10, 2011 at 07:23:46PM -0800, Richard Henderson wrote:
> > Special case deposits that are implementable with byte and word stores.
> > Otherwise implement with double-word shift plus rotates.
> > 
> > Expose tcg_scratch_alloc to the backend for allocation of scratch registers.
> > 
> > Signed-off-by: Richard Henderson <rth@twiddle.net>
> 
> Hi,
> 
> I've tested this patch a bit and got mixed results. I tested with patched
> CRIS and MicroBlaze translators. The patch works OK (it doesn't break
> anything) for the usecases I had but I saw a bit of a slowdown with
> MicroBlaze (compare to not using deposit at all).
> 

This week-end I have tested it emulating an x86-64 machine on x86-64,
with all the patch series applied. I have measured the boot time from
the bootloader up to the graphical environment of a Debian installation
I used -snapshot to make sure the host hard-drive is not introducing any
noise in the measurement (so that the whole image is in the host cache),
and did the measurement 10 times. The machine is a Core 2 Q9650, nothing
else was running on the machine except the few standard daemons.

I have found that the boot time is roughly 1.8% faster with the patch
series applied. It's undoubtedly an improvement, but still close to the
measurement noise. This is a bit disappointing...

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-01-31  8:33     ` Aurelien Jarno
@ 2011-02-08 18:05       ` Richard Henderson
  2011-02-09  7:41         ` [Qemu-devel] " Paolo Bonzini
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Henderson @ 2011-02-08 18:05 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Edgar E. Iglesias, agraf, qemu-devel

On 01/31/2011 12:33 AM, Aurelien Jarno wrote:
> This week-end I have tested it emulating an x86-64 machine on x86-64,
> with all the patch series applied. I have measured the boot time from
> the bootloader up to the graphical environment of a Debian installation
> I used -snapshot to make sure the host hard-drive is not introducing any
> noise in the measurement (so that the whole image is in the host cache),
> and did the measurement 10 times. The machine is a Core 2 Q9650, nothing
> else was running on the machine except the few standard daemons.
> 
> I have found that the boot time is roughly 1.8% faster with the patch
> series applied. It's undoubtedly an improvement, but still close to the
> measurement noise. This is a bit disappointing...

It's also not terribly surprising, with that test scenario.  GCC tries
not to generate partial register stores, except when (as here) it's 
really a bitfield insert.

A test that might show off the deposit code more would be booting a
16-bit OS.  Either FreeDOS, or Windows 3.1 (if anyone still has a copy).
In that case, the translator will be emitting a deposit op for almost
every guest instruction.

(Which is probably a mistake from a translator point of view -- there's
no reason we can't emulate 16-bit operations with 32-bit operations given
that the high bits are ignorable.)


r~

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] Re: [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-02-08 18:05       ` Richard Henderson
@ 2011-02-09  7:41         ` Paolo Bonzini
  2011-02-09 17:24           ` Blue Swirl
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2011-02-09  7:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Edgar E. Iglesias, agraf, Aurelien Jarno, qemu-devel

On 02/08/2011 07:05 PM, Richard Henderson wrote:
> (Which is probably a mistake from a translator point of view -- there's
> no reason we can't emulate 16-bit operations with 32-bit operations given
> that the high bits are ignorable.)

Not really, you never know if the guest is going to use a 66 prefix on 
the next instruction.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 5/7] tcg-i386: Implement deposit operation.
  2011-02-09  7:41         ` [Qemu-devel] " Paolo Bonzini
@ 2011-02-09 17:24           ` Blue Swirl
  0 siblings, 0 replies; 40+ messages in thread
From: Blue Swirl @ 2011-02-09 17:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Edgar E. Iglesias, agraf, Aurelien Jarno, Richard Henderson

On Wed, Feb 9, 2011 at 9:41 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 02/08/2011 07:05 PM, Richard Henderson wrote:
>>
>> (Which is probably a mistake from a translator point of view -- there's
>> no reason we can't emulate 16-bit operations with 32-bit operations given
>> that the high bits are ignorable.)
>
> Not really, you never know if the guest is going to use a 66 prefix on the
> next instruction.

Perhaps similar system to current lazy condition code evaluation could
be used. The translator would keep record of high bits use status, and
if they are getting used, emit extra ops to clear (or recalculate?)
the high bits.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2011-02-09 17:24 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-11  3:23 [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Richard Henderson
2011-01-11  3:23 ` [Qemu-devel] [PATCH 1/7] tcg: Define "deposit" as an optional operation Richard Henderson
2011-01-11 23:45   ` Aurelien Jarno
2011-01-12 11:00   ` Edgar E. Iglesias
2011-01-11  3:23 ` [Qemu-devel] [PATCH 2/7] tcg-ppc: Implement deposit operation Richard Henderson
2011-01-11 12:40   ` [Qemu-devel] " malc
2011-01-11 16:32     ` Richard Henderson
2011-01-11 19:11       ` malc
2011-01-11  3:23 ` [Qemu-devel] [PATCH 3/7] tcg-hppa: " Richard Henderson
2011-01-11  3:23 ` [Qemu-devel] [PATCH 4/7] tcg-ia64: " Richard Henderson
2011-01-11  3:23 ` [Qemu-devel] [PATCH 5/7] tcg-i386: " Richard Henderson
2011-01-25 12:27   ` Edgar E. Iglesias
2011-01-25 16:13     ` Richard Henderson
2011-01-25 16:48       ` Edgar E. Iglesias
2011-01-25 22:07         ` Richard Henderson
2011-01-26  8:53           ` Edgar E. Iglesias
2011-01-26  9:23             ` Alexander Graf
2011-01-26 15:50               ` Richard Henderson
2011-01-26 16:00                 ` Alexander Graf
2011-01-26 16:34                   ` Avi Kivity
2011-01-26 16:40                   ` Richard Henderson
2011-01-26 16:50                     ` Alexander Graf
2011-01-26 18:21                       ` Richard Henderson
2011-01-26 18:27                         ` Alexander Graf
2011-01-26 18:55                           ` Richard Henderson
2011-01-26 19:01                             ` Alexander Graf
2011-01-26 19:05                               ` Richard Henderson
2011-01-26 19:09                                 ` Alexander Graf
2011-01-26 19:19                                   ` Richard Henderson
2011-01-26 19:27                                     ` Alexander Graf
2011-01-31  8:33     ` Aurelien Jarno
2011-02-08 18:05       ` Richard Henderson
2011-02-09  7:41         ` [Qemu-devel] " Paolo Bonzini
2011-02-09 17:24           ` Blue Swirl
2011-01-11  3:23 ` [Qemu-devel] [PATCH 6/7] target-i386: Use " Richard Henderson
2011-01-12 11:01   ` Edgar E. Iglesias
2011-01-20 11:06   ` Aurelien Jarno
2011-01-11  3:23 ` [Qemu-devel] [PATCH 7/7] target-ppc: " Richard Henderson
2011-01-11 23:45 ` [Qemu-devel] [PATCH 0/7] Define "deposit" tcg operation, v2 Aurelien Jarno
2011-01-20 11:31 ` Edgar E. Iglesias

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.