All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
@ 2013-09-02 17:54 Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT Richard Henderson
                   ` (30 more replies)
  0 siblings, 31 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

I'm not sure if I posted v2 or not, but my branch is named -3,
therefore this is v3.  ;-)

The jumbo "fixme" patch from v1 has been split up.  This has been
updated for the changes in the tlb helpers over the past few weeks.
For the benefit of trivial conflict resolution, it's relative to a
tree that contains basically all of my patches.

See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
you find yourself missing any of the dependencies.


r~


Richard Henderson (29):
  tcg-aarch64: Set ext based on TCG_OPF_64BIT
  tcg-aarch64: Change all ext variables to bool
  tcg-aarch64: Don't handle mov/movi in tcg_out_op
  tcg-aarch64: Hoist common argument loads in tcg_out_op
  tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
  tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
  tcg-aarch64: Introduce tcg_fmt_* functions
  tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
  tcg-aarch64: Implement mov with tcg_fmt_* functions
  tcg-aarch64: Handle constant operands to add, sub, and compare
  tcg-aarch64: Handle constant operands to and, or, xor
  tcg-aarch64: Support andc, orc, eqv, not
  tcg-aarch64: Handle zero as first argument to sub
  tcg-aarch64: Support movcond
  tcg-aarch64: Support deposit
  tcg-aarch64: Support add2, sub2
  tcg-aarch64: Support muluh, mulsh
  tcg-aarch64: Support div, rem
  tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
  tcg-aarch64: Improve tcg_out_movi
  tcg-aarch64: Avoid add with zero in tlb load
  tcg-aarch64: Use adrp in tcg_out_movi
  tcg-aarch64: Pass return address to load/store helpers directly.
  tcg-aarch64: Use tcg_out_call for qemu_ld/st
  tcg-aarch64: Use symbolic names for branches
  tcg-aarch64: Implement tcg_register_jit
  tcg-aarch64: Reuse FP and LR in translated code
  tcg-aarch64: Introduce tcg_out_ldst_pair
  tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check

 include/exec/exec-all.h  |   18 -
 tcg/aarch64/tcg-target.c | 1276 ++++++++++++++++++++++++++++++----------------
 tcg/aarch64/tcg-target.h |   76 +--
 3 files changed, 867 insertions(+), 503 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-12  8:25   ` Claudio Fontana
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool Richard Henderson
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 28 +++++++---------------------
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 55ff700..5b067fe 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1105,9 +1105,9 @@ static inline void tcg_out_load_pair(TCGContext *s, TCGReg addr,
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const TCGArg *args, const int *const_args)
 {
-    /* ext will be set in the switch below, which will fall through to the
-       common code. It triggers the use of extended regs where appropriate. */
-    int ext = 0;
+    /* 99% of the time, we can signal the use of extension registers
+       by looking to see if the opcode handles 64-bit data.  */
+    bool ext = (tcg_op_defs[opc].flags & TCG_OPF_64BIT) != 0;
 
     switch (opc) {
     case INDEX_op_exit_tb:
@@ -1163,7 +1163,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_mov_i64:
-        ext = 1; /* fall through */
     case INDEX_op_mov_i32:
         tcg_out_movr(s, ext, args[0], args[1]);
         break;
@@ -1176,43 +1175,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_add_i64:
-        ext = 1; /* fall through */
     case INDEX_op_add_i32:
         tcg_out_arith(s, ARITH_ADD, ext, args[0], args[1], args[2], 0);
         break;
 
     case INDEX_op_sub_i64:
-        ext = 1; /* fall through */
     case INDEX_op_sub_i32:
         tcg_out_arith(s, ARITH_SUB, ext, args[0], args[1], args[2], 0);
         break;
 
     case INDEX_op_and_i64:
-        ext = 1; /* fall through */
     case INDEX_op_and_i32:
         tcg_out_arith(s, ARITH_AND, ext, args[0], args[1], args[2], 0);
         break;
 
     case INDEX_op_or_i64:
-        ext = 1; /* fall through */
     case INDEX_op_or_i32:
         tcg_out_arith(s, ARITH_OR, ext, args[0], args[1], args[2], 0);
         break;
 
     case INDEX_op_xor_i64:
-        ext = 1; /* fall through */
     case INDEX_op_xor_i32:
         tcg_out_arith(s, ARITH_XOR, ext, args[0], args[1], args[2], 0);
         break;
 
     case INDEX_op_mul_i64:
-        ext = 1; /* fall through */
     case INDEX_op_mul_i32:
         tcg_out_mul(s, ext, args[0], args[1], args[2]);
         break;
 
     case INDEX_op_shl_i64:
-        ext = 1; /* fall through */
     case INDEX_op_shl_i32:
         if (const_args[2]) {    /* LSL / UBFM Wd, Wn, (32 - m) */
             tcg_out_shl(s, ext, args[0], args[1], args[2]);
@@ -1222,7 +1214,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_shr_i64:
-        ext = 1; /* fall through */
     case INDEX_op_shr_i32:
         if (const_args[2]) {    /* LSR / UBFM Wd, Wn, m, 31 */
             tcg_out_shr(s, ext, args[0], args[1], args[2]);
@@ -1232,7 +1223,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_sar_i64:
-        ext = 1; /* fall through */
     case INDEX_op_sar_i32:
         if (const_args[2]) {    /* ASR / SBFM Wd, Wn, m, 31 */
             tcg_out_sar(s, ext, args[0], args[1], args[2]);
@@ -1242,7 +1232,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_rotr_i64:
-        ext = 1; /* fall through */
     case INDEX_op_rotr_i32:
         if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, m */
             tcg_out_rotr(s, ext, args[0], args[1], args[2]);
@@ -1252,7 +1241,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_rotl_i64:
-        ext = 1; /* fall through */
     case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
         if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
             tcg_out_rotl(s, ext, args[0], args[1], args[2]);
@@ -1265,14 +1253,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_brcond_i64:
-        ext = 1; /* fall through */
     case INDEX_op_brcond_i32: /* CMP 0, 1, cond(2), label 3 */
         tcg_out_cmp(s, ext, args[0], args[1], 0);
         tcg_out_goto_label_cond(s, args[2], args[3]);
         break;
 
     case INDEX_op_setcond_i64:
-        ext = 1; /* fall through */
     case INDEX_op_setcond_i32:
         tcg_out_cmp(s, ext, args[1], args[2], 0);
         tcg_out_cset(s, 0, args[0], args[3]);
@@ -1315,9 +1301,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_qemu_st(s, args, 3);
         break;
 
-    case INDEX_op_bswap64_i64:
-        ext = 1; /* fall through */
     case INDEX_op_bswap32_i64:
+        /* Despite the _i64, this is a 32-bit bswap.  */
+        ext = 0;
+        /* FALLTHRU */
+    case INDEX_op_bswap64_i64:
     case INDEX_op_bswap32_i32:
         tcg_out_rev(s, ext, args[0], args[1]);
         break;
@@ -1327,12 +1315,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_ext8s_i64:
-        ext = 1; /* fall through */
     case INDEX_op_ext8s_i32:
         tcg_out_sxt(s, ext, 0, args[0], args[1]);
         break;
     case INDEX_op_ext16s_i64:
-        ext = 1; /* fall through */
     case INDEX_op_ext16s_i32:
         tcg_out_sxt(s, ext, 1, args[0], args[1]);
         break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-12  8:29   ` Claudio Fontana
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op Richard Henderson
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5b067fe..bde4c72 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -326,7 +326,7 @@ static inline void tcg_out_ldst_12(TCGContext *s,
               | op_type << 20 | scaled_uimm << 10 | rn << 5 | rd);
 }
 
-static inline void tcg_out_movr(TCGContext *s, int ext, TCGReg rd, TCGReg src)
+static inline void tcg_out_movr(TCGContext *s, bool ext, TCGReg rd, TCGReg src)
 {
     /* register to register move using MOV (shifted register with no shift) */
     /* using MOV 0x2a0003e0 | (shift).. */
@@ -407,7 +407,7 @@ static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data data,
 }
 
 /* mov alias implemented with add immediate, useful to move to/from SP */
-static inline void tcg_out_movr_sp(TCGContext *s, int ext, TCGReg rd, TCGReg rn)
+static inline void tcg_out_movr_sp(TCGContext *s, bool ext, TCGReg rd, TCGReg rn)
 {
     /* using ADD 0x11000000 | (ext) | rn << 5 | rd */
     unsigned int base = ext ? 0x91000000 : 0x11000000;
@@ -437,7 +437,7 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
 }
 
 static inline void tcg_out_arith(TCGContext *s, enum aarch64_arith_opc opc,
-                                 int ext, TCGReg rd, TCGReg rn, TCGReg rm,
+                                 bool ext, TCGReg rd, TCGReg rn, TCGReg rm,
                                  int shift_imm)
 {
     /* Using shifted register arithmetic operations */
@@ -453,7 +453,7 @@ static inline void tcg_out_arith(TCGContext *s, enum aarch64_arith_opc opc,
     tcg_out32(s, base | rm << 16 | shift | rn << 5 | rd);
 }
 
-static inline void tcg_out_mul(TCGContext *s, int ext,
+static inline void tcg_out_mul(TCGContext *s, bool ext,
                                TCGReg rd, TCGReg rn, TCGReg rm)
 {
     /* Using MADD 0x1b000000 with Ra = wzr alias MUL 0x1b007c00 */
@@ -462,7 +462,7 @@ static inline void tcg_out_mul(TCGContext *s, int ext,
 }
 
 static inline void tcg_out_shiftrot_reg(TCGContext *s,
-                                        enum aarch64_srr_opc opc, int ext,
+                                        enum aarch64_srr_opc opc, bool ext,
                                         TCGReg rd, TCGReg rn, TCGReg rm)
 {
     /* using 2-source data processing instructions 0x1ac02000 */
@@ -470,7 +470,7 @@ static inline void tcg_out_shiftrot_reg(TCGContext *s,
     tcg_out32(s, base | rm << 16 | opc << 8 | rn << 5 | rd);
 }
 
-static inline void tcg_out_ubfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
+static inline void tcg_out_ubfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                 unsigned int a, unsigned int b)
 {
     /* Using UBFM 0x53000000 Wd, Wn, a, b */
@@ -478,7 +478,7 @@ static inline void tcg_out_ubfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
     tcg_out32(s, base | a << 16 | b << 10 | rn << 5 | rd);
 }
 
-static inline void tcg_out_sbfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
+static inline void tcg_out_sbfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                 unsigned int a, unsigned int b)
 {
     /* Using SBFM 0x13000000 Wd, Wn, a, b */
@@ -486,7 +486,7 @@ static inline void tcg_out_sbfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
     tcg_out32(s, base | a << 16 | b << 10 | rn << 5 | rd);
 }
 
-static inline void tcg_out_extr(TCGContext *s, int ext, TCGReg rd,
+static inline void tcg_out_extr(TCGContext *s, bool ext, TCGReg rd,
                                 TCGReg rn, TCGReg rm, unsigned int a)
 {
     /* Using EXTR 0x13800000 Wd, Wn, Wm, a */
@@ -494,7 +494,7 @@ static inline void tcg_out_extr(TCGContext *s, int ext, TCGReg rd,
     tcg_out32(s, base | rm << 16 | a << 10 | rn << 5 | rd);
 }
 
-static inline void tcg_out_shl(TCGContext *s, int ext,
+static inline void tcg_out_shl(TCGContext *s, bool ext,
                                TCGReg rd, TCGReg rn, unsigned int m)
 {
     int bits, max;
@@ -503,28 +503,28 @@ static inline void tcg_out_shl(TCGContext *s, int ext,
     tcg_out_ubfm(s, ext, rd, rn, bits - (m & max), max - (m & max));
 }
 
-static inline void tcg_out_shr(TCGContext *s, int ext,
+static inline void tcg_out_shr(TCGContext *s, bool ext,
                                TCGReg rd, TCGReg rn, unsigned int m)
 {
     int max = ext ? 63 : 31;
     tcg_out_ubfm(s, ext, rd, rn, m & max, max);
 }
 
-static inline void tcg_out_sar(TCGContext *s, int ext,
+static inline void tcg_out_sar(TCGContext *s, bool ext,
                                TCGReg rd, TCGReg rn, unsigned int m)
 {
     int max = ext ? 63 : 31;
     tcg_out_sbfm(s, ext, rd, rn, m & max, max);
 }
 
-static inline void tcg_out_rotr(TCGContext *s, int ext,
+static inline void tcg_out_rotr(TCGContext *s, bool ext,
                                 TCGReg rd, TCGReg rn, unsigned int m)
 {
     int max = ext ? 63 : 31;
     tcg_out_extr(s, ext, rd, rn, rn, m & max);
 }
 
-static inline void tcg_out_rotl(TCGContext *s, int ext,
+static inline void tcg_out_rotl(TCGContext *s, bool ext,
                                 TCGReg rd, TCGReg rn, unsigned int m)
 {
     int bits, max;
@@ -533,14 +533,14 @@ static inline void tcg_out_rotl(TCGContext *s, int ext,
     tcg_out_extr(s, ext, rd, rn, rn, bits - (m & max));
 }
 
-static inline void tcg_out_cmp(TCGContext *s, int ext, TCGReg rn, TCGReg rm,
+static inline void tcg_out_cmp(TCGContext *s, bool ext, TCGReg rn, TCGReg rm,
                                int shift_imm)
 {
     /* Using CMP alias SUBS wzr, Wn, Wm */
     tcg_out_arith(s, ARITH_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
 }
 
-static inline void tcg_out_cset(TCGContext *s, int ext, TCGReg rd, TCGCond c)
+static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)
 {
     /* Using CSET alias of CSINC 0x1a800400 Xd, XZR, XZR, invert(cond) */
     unsigned int base = ext ? 0x9a9f07e0 : 0x1a9f07e0;
@@ -637,7 +637,7 @@ aarch64_limm(unsigned int m, unsigned int r)
    to test a 32bit reg against 0xff000000, pass M = 8,  R = 8.
    to test a 32bit reg against 0xff0000ff, pass M = 16, R = 8.
  */
-static inline void tcg_out_tst(TCGContext *s, int ext, TCGReg rn,
+static inline void tcg_out_tst(TCGContext *s, bool ext, TCGReg rn,
                                unsigned int m, unsigned int r)
 {
     /* using TST alias of ANDS XZR, Xn,#bimm64 0x7200001f */
@@ -646,7 +646,7 @@ static inline void tcg_out_tst(TCGContext *s, int ext, TCGReg rn,
 }
 
 /* and a register with a bit pattern, similarly to TST, no flags change */
-static inline void tcg_out_andi(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
+static inline void tcg_out_andi(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                 unsigned int m, unsigned int r)
 {
     /* using AND 0x12000000 */
@@ -700,21 +700,21 @@ static inline void tcg_out_goto_label_cond(TCGContext *s,
     }
 }
 
-static inline void tcg_out_rev(TCGContext *s, int ext, TCGReg rd, TCGReg rm)
+static inline void tcg_out_rev(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
 {
     /* using REV 0x5ac00800 */
     unsigned int base = ext ? 0xdac00c00 : 0x5ac00800;
     tcg_out32(s, base | rm << 5 | rd);
 }
 
-static inline void tcg_out_rev16(TCGContext *s, int ext, TCGReg rd, TCGReg rm)
+static inline void tcg_out_rev16(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
 {
     /* using REV16 0x5ac00400 */
     unsigned int base = ext ? 0xdac00400 : 0x5ac00400;
     tcg_out32(s, base | rm << 5 | rd);
 }
 
-static inline void tcg_out_sxt(TCGContext *s, int ext, int s_bits,
+static inline void tcg_out_sxt(TCGContext *s, bool ext, int s_bits,
                                TCGReg rd, TCGReg rn)
 {
     /* using ALIASes SXTB 0x13001c00, SXTH 0x13003c00, SXTW 0x93407c00
@@ -732,7 +732,7 @@ static inline void tcg_out_uxt(TCGContext *s, int s_bits,
     tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
 
-static inline void tcg_out_addi(TCGContext *s, int ext,
+static inline void tcg_out_addi(TCGContext *s, bool ext,
                                 TCGReg rd, TCGReg rn, unsigned int aimm)
 {
     /* add immediate aimm unsigned 12bit value (with LSL 0 or 12) */
@@ -752,7 +752,7 @@ static inline void tcg_out_addi(TCGContext *s, int ext,
     tcg_out32(s, base | aimm | (rn << 5) | rd);
 }
 
-static inline void tcg_out_subi(TCGContext *s, int ext,
+static inline void tcg_out_subi(TCGContext *s, bool ext,
                                 TCGReg rd, TCGReg rn, unsigned int aimm)
 {
     /* sub immediate aimm unsigned 12bit value (with LSL 0 or 12) */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-12  8:30   ` Claudio Fontana
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 04/29] tcg-aarch64: Hoist common argument loads " Richard Henderson
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index bde4c72..79a447d 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1162,18 +1162,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                      args[0], args[1], args[2]);
         break;
 
-    case INDEX_op_mov_i64:
-    case INDEX_op_mov_i32:
-        tcg_out_movr(s, ext, args[0], args[1]);
-        break;
-
-    case INDEX_op_movi_i64:
-        tcg_out_movi(s, TCG_TYPE_I64, args[0], args[1]);
-        break;
-    case INDEX_op_movi_i32:
-        tcg_out_movi(s, TCG_TYPE_I32, args[0], args[1]);
-        break;
-
     case INDEX_op_add_i64:
     case INDEX_op_add_i32:
         tcg_out_arith(s, ARITH_ADD, ext, args[0], args[1], args[2], 0);
@@ -1337,8 +1325,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_movr(s, 0, args[0], args[1]);
         break;
 
+    case INDEX_op_mov_i64:
+    case INDEX_op_mov_i32:
+    case INDEX_op_movi_i64:
+    case INDEX_op_movi_i32:
+        /* Always implemented with tcg_out_mov/i, never with tcg_out_op.  */
     default:
-        tcg_abort(); /* opcode not implemented */
+        /* Opcode not implemented.  */
+        tcg_abort();
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 04/29] tcg-aarch64: Hoist common argument loads in tcg_out_op
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 05/29] tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn Richard Henderson
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

This reduces the code size of the function significantly.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 95 +++++++++++++++++++++++++-----------------------
 1 file changed, 50 insertions(+), 45 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 79a447d..974a1d0 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1103,15 +1103,22 @@ static inline void tcg_out_load_pair(TCGContext *s, TCGReg addr,
 }
 
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
-                       const TCGArg *args, const int *const_args)
+                       const TCGArg args[TCG_MAX_OP_ARGS],
+                       const int const_args[TCG_MAX_OP_ARGS])
 {
     /* 99% of the time, we can signal the use of extension registers
        by looking to see if the opcode handles 64-bit data.  */
     bool ext = (tcg_op_defs[opc].flags & TCG_OPF_64BIT) != 0;
 
+    /* Hoist the loads of the most common arguments.  */
+    TCGArg a0 = args[0];
+    TCGArg a1 = args[1];
+    TCGArg a2 = args[2];
+    int c2 = const_args[2];
+
     switch (opc) {
     case INDEX_op_exit_tb:
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, args[0]);
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
         tcg_out_goto(s, (tcg_target_long)tb_ret_addr);
         break;
 
@@ -1120,23 +1127,23 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 #error "USE_DIRECT_JUMP required for aarch64"
 #endif
         assert(s->tb_jmp_offset != NULL); /* consistency for USE_DIRECT_JUMP */
-        s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
+        s->tb_jmp_offset[a0] = s->code_ptr - s->code_buf;
         /* actual branch destination will be patched by
            aarch64_tb_set_jmp_target later, beware retranslation. */
         tcg_out_goto_noaddr(s);
-        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+        s->tb_next_offset[a0] = s->code_ptr - s->code_buf;
         break;
 
     case INDEX_op_call:
         if (const_args[0]) {
-            tcg_out_call(s, args[0]);
+            tcg_out_call(s, a0);
         } else {
-            tcg_out_callr(s, args[0]);
+            tcg_out_callr(s, a0);
         }
         break;
 
     case INDEX_op_br:
-        tcg_out_goto_label(s, args[0]);
+        tcg_out_goto_label(s, a0);
         break;
 
     case INDEX_op_ld_i32:
@@ -1159,97 +1166,95 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st16_i64:
     case INDEX_op_st32_i64:
         tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
-                     args[0], args[1], args[2]);
+                     a0, a1, a2);
         break;
 
     case INDEX_op_add_i64:
     case INDEX_op_add_i32:
-        tcg_out_arith(s, ARITH_ADD, ext, args[0], args[1], args[2], 0);
+        tcg_out_arith(s, ARITH_ADD, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_sub_i64:
     case INDEX_op_sub_i32:
-        tcg_out_arith(s, ARITH_SUB, ext, args[0], args[1], args[2], 0);
+        tcg_out_arith(s, ARITH_SUB, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_and_i64:
     case INDEX_op_and_i32:
-        tcg_out_arith(s, ARITH_AND, ext, args[0], args[1], args[2], 0);
+        tcg_out_arith(s, ARITH_AND, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_or_i64:
     case INDEX_op_or_i32:
-        tcg_out_arith(s, ARITH_OR, ext, args[0], args[1], args[2], 0);
+        tcg_out_arith(s, ARITH_OR, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_xor_i64:
     case INDEX_op_xor_i32:
-        tcg_out_arith(s, ARITH_XOR, ext, args[0], args[1], args[2], 0);
+        tcg_out_arith(s, ARITH_XOR, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_mul_i64:
     case INDEX_op_mul_i32:
-        tcg_out_mul(s, ext, args[0], args[1], args[2]);
+        tcg_out_mul(s, ext, a0, a1, a2);
         break;
 
     case INDEX_op_shl_i64:
     case INDEX_op_shl_i32:
-        if (const_args[2]) {    /* LSL / UBFM Wd, Wn, (32 - m) */
-            tcg_out_shl(s, ext, args[0], args[1], args[2]);
+        if (c2) {    /* LSL / UBFM Wd, Wn, (32 - m) */
+            tcg_out_shl(s, ext, a0, a1, a2);
         } else {                /* LSL / LSLV */
-            tcg_out_shiftrot_reg(s, SRR_SHL, ext, args[0], args[1], args[2]);
+            tcg_out_shiftrot_reg(s, SRR_SHL, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_shr_i64:
     case INDEX_op_shr_i32:
-        if (const_args[2]) {    /* LSR / UBFM Wd, Wn, m, 31 */
-            tcg_out_shr(s, ext, args[0], args[1], args[2]);
+        if (c2) {    /* LSR / UBFM Wd, Wn, m, 31 */
+            tcg_out_shr(s, ext, a0, a1, a2);
         } else {                /* LSR / LSRV */
-            tcg_out_shiftrot_reg(s, SRR_SHR, ext, args[0], args[1], args[2]);
+            tcg_out_shiftrot_reg(s, SRR_SHR, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_sar_i64:
     case INDEX_op_sar_i32:
-        if (const_args[2]) {    /* ASR / SBFM Wd, Wn, m, 31 */
-            tcg_out_sar(s, ext, args[0], args[1], args[2]);
+        if (c2) {    /* ASR / SBFM Wd, Wn, m, 31 */
+            tcg_out_sar(s, ext, a0, a1, a2);
         } else {                /* ASR / ASRV */
-            tcg_out_shiftrot_reg(s, SRR_SAR, ext, args[0], args[1], args[2]);
+            tcg_out_shiftrot_reg(s, SRR_SAR, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_rotr_i64:
     case INDEX_op_rotr_i32:
-        if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, m */
-            tcg_out_rotr(s, ext, args[0], args[1], args[2]);
+        if (c2) {    /* ROR / EXTR Wd, Wm, Wm, m */
+            tcg_out_rotr(s, ext, a0, a1, a2);
         } else {                /* ROR / RORV */
-            tcg_out_shiftrot_reg(s, SRR_ROR, ext, args[0], args[1], args[2]);
+            tcg_out_shiftrot_reg(s, SRR_ROR, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_rotl_i64:
     case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
-        if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
-            tcg_out_rotl(s, ext, args[0], args[1], args[2]);
+        if (c2) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
+            tcg_out_rotl(s, ext, a0, a1, a2);
         } else {
-            tcg_out_arith(s, ARITH_SUB, 0,
-                          TCG_REG_TMP, TCG_REG_XZR, args[2], 0);
-            tcg_out_shiftrot_reg(s, SRR_ROR, ext,
-                                 args[0], args[1], TCG_REG_TMP);
+            tcg_out_arith(s, ARITH_SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2, 0);
+            tcg_out_shiftrot_reg(s, SRR_ROR, ext, a0, a1, TCG_REG_TMP);
         }
         break;
 
     case INDEX_op_brcond_i64:
-    case INDEX_op_brcond_i32: /* CMP 0, 1, cond(2), label 3 */
-        tcg_out_cmp(s, ext, args[0], args[1], 0);
-        tcg_out_goto_label_cond(s, args[2], args[3]);
+    case INDEX_op_brcond_i32:
+        tcg_out_cmp(s, ext, a0, a1, 0);
+        tcg_out_goto_label_cond(s, a2, args[3]);
         break;
 
     case INDEX_op_setcond_i64:
     case INDEX_op_setcond_i32:
-        tcg_out_cmp(s, ext, args[1], args[2], 0);
-        tcg_out_cset(s, 0, args[0], args[3]);
+        tcg_out_cmp(s, ext, a1, a2, 0);
+        tcg_out_cset(s, 0, a0, args[3]);
         break;
 
     case INDEX_op_qemu_ld8u:
@@ -1295,34 +1300,34 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* FALLTHRU */
     case INDEX_op_bswap64_i64:
     case INDEX_op_bswap32_i32:
-        tcg_out_rev(s, ext, args[0], args[1]);
+        tcg_out_rev(s, ext, a0, a1);
         break;
     case INDEX_op_bswap16_i64:
     case INDEX_op_bswap16_i32:
-        tcg_out_rev16(s, 0, args[0], args[1]);
+        tcg_out_rev16(s, 0, a0, a1);
         break;
 
     case INDEX_op_ext8s_i64:
     case INDEX_op_ext8s_i32:
-        tcg_out_sxt(s, ext, 0, args[0], args[1]);
+        tcg_out_sxt(s, ext, 0, a0, a1);
         break;
     case INDEX_op_ext16s_i64:
     case INDEX_op_ext16s_i32:
-        tcg_out_sxt(s, ext, 1, args[0], args[1]);
+        tcg_out_sxt(s, ext, 1, a0, a1);
         break;
     case INDEX_op_ext32s_i64:
-        tcg_out_sxt(s, 1, 2, args[0], args[1]);
+        tcg_out_sxt(s, 1, 2, a0, a1);
         break;
     case INDEX_op_ext8u_i64:
     case INDEX_op_ext8u_i32:
-        tcg_out_uxt(s, 0, args[0], args[1]);
+        tcg_out_uxt(s, 0, a0, a1);
         break;
     case INDEX_op_ext16u_i64:
     case INDEX_op_ext16u_i32:
-        tcg_out_uxt(s, 1, args[0], args[1]);
+        tcg_out_uxt(s, 1, a0, a1);
         break;
     case INDEX_op_ext32u_i64:
-        tcg_out_movr(s, 0, args[0], args[1]);
+        tcg_out_movr(s, 0, a0, a1);
         break;
 
     case INDEX_op_mov_i64:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 05/29] tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 04/29] tcg-aarch64: Hoist common argument loads " Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 06/29] tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn Richard Henderson
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

And since we're no longer talking about opcodes, change the
values to be shifted into the opcode field, avoiding a shift
at runtime.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 43 +++++++++++++++++++++++--------------------
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 974a1d0..d1ca402 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -199,16 +199,19 @@ enum aarch64_ldst_op_type { /* type of operation */
     LDST_LD_S_W = 0xc,  /* load and sign-extend into Wt */
 };
 
-enum aarch64_arith_opc {
-    ARITH_AND = 0x0a,
-    ARITH_ADD = 0x0b,
-    ARITH_OR = 0x2a,
-    ARITH_ADDS = 0x2b,
-    ARITH_XOR = 0x4a,
-    ARITH_SUB = 0x4b,
-    ARITH_ANDS = 0x6a,
-    ARITH_SUBS = 0x6b,
-};
+typedef enum {
+    /* Logical shifted register instructions */
+    INSN_AND    = 0x0a000000,
+    INSN_ORR    = 0x2a000000,
+    INSN_EOR    = 0x4a000000,
+    INSN_ANDS   = 0x6a000000,
+
+    /* Add/subtract shifted register instructions */
+    INSN_ADD    = 0x0b000000,
+    INSN_ADDS   = 0x2b000000,
+    INSN_SUB    = 0x4b000000,
+    INSN_SUBS   = 0x6b000000,
+} AArch64Insn;
 
 enum aarch64_srr_opc {
     SRR_SHL = 0x0,
@@ -436,13 +439,13 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                  arg, arg1, arg2);
 }
 
-static inline void tcg_out_arith(TCGContext *s, enum aarch64_arith_opc opc,
+static inline void tcg_out_arith(TCGContext *s, AArch64Insn insn,
                                  bool ext, TCGReg rd, TCGReg rn, TCGReg rm,
                                  int shift_imm)
 {
     /* Using shifted register arithmetic operations */
     /* if extended register operation (64bit) just OR with 0x80 << 24 */
-    unsigned int shift, base = ext ? (0x80 | opc) << 24 : opc << 24;
+    unsigned int shift, base = insn | (ext ? 0x80000000 : 0);
     if (shift_imm == 0) {
         shift = 0;
     } else if (shift_imm > 0) {
@@ -537,7 +540,7 @@ static inline void tcg_out_cmp(TCGContext *s, bool ext, TCGReg rn, TCGReg rm,
                                int shift_imm)
 {
     /* Using CMP alias SUBS wzr, Wn, Wm */
-    tcg_out_arith(s, ARITH_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
+    tcg_out_arith(s, INSN_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
 }
 
 static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)
@@ -894,7 +897,7 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
     tcg_out_addi(s, 1, TCG_REG_X2, base, tlb_offset & 0xfff000);
     /* Merge the tlb index contribution into X2.
        X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
-    tcg_out_arith(s, ARITH_ADD, 1, TCG_REG_X2, TCG_REG_X2,
+    tcg_out_arith(s, INSN_ADD, 1, TCG_REG_X2, TCG_REG_X2,
                   TCG_REG_X0, -CPU_TLB_ENTRY_BITS);
     /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
        X0 = load [X2 + (tlb_offset & 0x000fff)] */
@@ -1171,27 +1174,27 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_add_i64:
     case INDEX_op_add_i32:
-        tcg_out_arith(s, ARITH_ADD, ext, a0, a1, a2, 0);
+        tcg_out_arith(s, INSN_ADD, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_sub_i64:
     case INDEX_op_sub_i32:
-        tcg_out_arith(s, ARITH_SUB, ext, a0, a1, a2, 0);
+        tcg_out_arith(s, INSN_SUB, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_and_i64:
     case INDEX_op_and_i32:
-        tcg_out_arith(s, ARITH_AND, ext, a0, a1, a2, 0);
+        tcg_out_arith(s, INSN_AND, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_or_i64:
     case INDEX_op_or_i32:
-        tcg_out_arith(s, ARITH_OR, ext, a0, a1, a2, 0);
+        tcg_out_arith(s, INSN_ORR, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_xor_i64:
     case INDEX_op_xor_i32:
-        tcg_out_arith(s, ARITH_XOR, ext, a0, a1, a2, 0);
+        tcg_out_arith(s, INSN_EOR, ext, a0, a1, a2, 0);
         break;
 
     case INDEX_op_mul_i64:
@@ -1240,7 +1243,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
             tcg_out_rotl(s, ext, a0, a1, a2);
         } else {
-            tcg_out_arith(s, ARITH_SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2, 0);
+            tcg_out_arith(s, INSN_SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2, 0);
             tcg_out_shiftrot_reg(s, SRR_ROR, ext, a0, a1, TCG_REG_TMP);
         }
         break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 06/29] tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 05/29] tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 07/29] tcg-aarch64: Introduce tcg_fmt_* functions Richard Henderson
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

And since we're no longer talking about opcodes, merge the 0x1ac02000
data2 primary opcode with the shift subcode to create the full insn.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 49 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index d1ca402..de97fbd 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -211,14 +211,13 @@ typedef enum {
     INSN_ADDS   = 0x2b000000,
     INSN_SUB    = 0x4b000000,
     INSN_SUBS   = 0x6b000000,
-} AArch64Insn;
 
-enum aarch64_srr_opc {
-    SRR_SHL = 0x0,
-    SRR_SHR = 0x4,
-    SRR_SAR = 0x8,
-    SRR_ROR = 0xc
-};
+    /* Data-processing (2 source) instructions */
+    INSN_LSLV  = 0x1ac02000,
+    INSN_LSRV  = 0x1ac02400,
+    INSN_ASRV  = 0x1ac02800,
+    INSN_RORV  = 0x1ac02c00,
+} AArch64Insn;
 
 static inline enum aarch64_ldst_op_data
 aarch64_ldst_get_data(TCGOpcode tcg_op)
@@ -465,12 +464,12 @@ static inline void tcg_out_mul(TCGContext *s, bool ext,
 }
 
 static inline void tcg_out_shiftrot_reg(TCGContext *s,
-                                        enum aarch64_srr_opc opc, bool ext,
+                                        AArch64Insn insn, bool ext,
                                         TCGReg rd, TCGReg rn, TCGReg rm)
 {
     /* using 2-source data processing instructions 0x1ac02000 */
-    unsigned int base = ext ? 0x9ac02000 : 0x1ac02000;
-    tcg_out32(s, base | rm << 16 | opc << 8 | rn << 5 | rd);
+    unsigned int base = insn | (ext ? 0x80000000 : 0);
+    tcg_out32(s, base | rm << 16 | rn << 5 | rd);
 }
 
 static inline void tcg_out_ubfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
@@ -1204,47 +1203,47 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_shl_i64:
     case INDEX_op_shl_i32:
-        if (c2) {    /* LSL / UBFM Wd, Wn, (32 - m) */
+        if (c2) {
             tcg_out_shl(s, ext, a0, a1, a2);
-        } else {                /* LSL / LSLV */
-            tcg_out_shiftrot_reg(s, SRR_SHL, ext, a0, a1, a2);
+        } else {
+            tcg_out_shiftrot_reg(s, INSN_LSLV, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_shr_i64:
     case INDEX_op_shr_i32:
-        if (c2) {    /* LSR / UBFM Wd, Wn, m, 31 */
+        if (c2) {
             tcg_out_shr(s, ext, a0, a1, a2);
-        } else {                /* LSR / LSRV */
-            tcg_out_shiftrot_reg(s, SRR_SHR, ext, a0, a1, a2);
+        } else {
+            tcg_out_shiftrot_reg(s, INSN_LSRV, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_sar_i64:
     case INDEX_op_sar_i32:
-        if (c2) {    /* ASR / SBFM Wd, Wn, m, 31 */
+        if (c2) {
             tcg_out_sar(s, ext, a0, a1, a2);
-        } else {                /* ASR / ASRV */
-            tcg_out_shiftrot_reg(s, SRR_SAR, ext, a0, a1, a2);
+        } else {
+            tcg_out_shiftrot_reg(s, INSN_ASRV, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_rotr_i64:
     case INDEX_op_rotr_i32:
-        if (c2) {    /* ROR / EXTR Wd, Wm, Wm, m */
+        if (c2) {
             tcg_out_rotr(s, ext, a0, a1, a2);
-        } else {                /* ROR / RORV */
-            tcg_out_shiftrot_reg(s, SRR_ROR, ext, a0, a1, a2);
+        } else {
+            tcg_out_shiftrot_reg(s, INSN_RORV, ext, a0, a1, a2);
         }
         break;
 
     case INDEX_op_rotl_i64:
-    case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
-        if (c2) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
+    case INDEX_op_rotl_i32:
+        if (c2) {
             tcg_out_rotl(s, ext, a0, a1, a2);
         } else {
             tcg_out_arith(s, INSN_SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2, 0);
-            tcg_out_shiftrot_reg(s, SRR_ROR, ext, a0, a1, TCG_REG_TMP);
+            tcg_out_shiftrot_reg(s, INSN_RORV, ext, a0, a1, TCG_REG_TMP);
         }
         break;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 07/29] tcg-aarch64: Introduce tcg_fmt_* functions
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 06/29] tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 08/29] tcg-aarch64: Introduce tcg_fmt_Rdn_aimm Richard Henderson
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Now that we've converted opcode fields to pre-shifted insns, we
can merge the implementation of arithmetic and shift insns.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 78 +++++++++++++++++++++++-------------------------
 1 file changed, 38 insertions(+), 40 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index de97fbd..e2f3d1c 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -297,6 +297,30 @@ static inline uint32_t tcg_in32(TCGContext *s)
     return v;
 }
 
+/*
+ * Encode various formats.  Note that since the architecture document is
+ * still private, these names are made up.
+ */
+
+static inline void tcg_fmt_Rdnm(TCGContext *s, AArch64Insn insn, bool ext,
+                                TCGReg rd, TCGReg rn, TCGReg rm)
+{
+    tcg_out32(s, insn | ext << 31 | rm << 16 | rn << 5 | rd);
+}
+
+static inline void tcg_fmt_Rdnm_shift(TCGContext *s, AArch64Insn insn,
+                                      bool ext, TCGReg rd, TCGReg rn,
+                                      TCGReg rm, int shift_imm)
+{
+    unsigned int shift;
+    if (shift_imm > 0) {
+        shift = shift_imm << 10 | 1 << 22;
+    } else {
+        shift = (-shift_imm) << 10;
+    }
+    tcg_out32(s, insn | ext << 31 | shift | rm << 16 | rn << 5 | rd);
+}
+
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
@@ -438,23 +462,6 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                  arg, arg1, arg2);
 }
 
-static inline void tcg_out_arith(TCGContext *s, AArch64Insn insn,
-                                 bool ext, TCGReg rd, TCGReg rn, TCGReg rm,
-                                 int shift_imm)
-{
-    /* Using shifted register arithmetic operations */
-    /* if extended register operation (64bit) just OR with 0x80 << 24 */
-    unsigned int shift, base = insn | (ext ? 0x80000000 : 0);
-    if (shift_imm == 0) {
-        shift = 0;
-    } else if (shift_imm > 0) {
-        shift = shift_imm << 10 | 1 << 22;
-    } else /* (shift_imm < 0) */ {
-        shift = (-shift_imm) << 10;
-    }
-    tcg_out32(s, base | rm << 16 | shift | rn << 5 | rd);
-}
-
 static inline void tcg_out_mul(TCGContext *s, bool ext,
                                TCGReg rd, TCGReg rn, TCGReg rm)
 {
@@ -463,15 +470,6 @@ static inline void tcg_out_mul(TCGContext *s, bool ext,
     tcg_out32(s, base | rm << 16 | rn << 5 | rd);
 }
 
-static inline void tcg_out_shiftrot_reg(TCGContext *s,
-                                        AArch64Insn insn, bool ext,
-                                        TCGReg rd, TCGReg rn, TCGReg rm)
-{
-    /* using 2-source data processing instructions 0x1ac02000 */
-    unsigned int base = insn | (ext ? 0x80000000 : 0);
-    tcg_out32(s, base | rm << 16 | rn << 5 | rd);
-}
-
 static inline void tcg_out_ubfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                 unsigned int a, unsigned int b)
 {
@@ -539,7 +537,7 @@ static inline void tcg_out_cmp(TCGContext *s, bool ext, TCGReg rn, TCGReg rm,
                                int shift_imm)
 {
     /* Using CMP alias SUBS wzr, Wn, Wm */
-    tcg_out_arith(s, INSN_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
+    tcg_fmt_Rdnm_shift(s, INSN_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
 }
 
 static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)
@@ -896,8 +894,8 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
     tcg_out_addi(s, 1, TCG_REG_X2, base, tlb_offset & 0xfff000);
     /* Merge the tlb index contribution into X2.
        X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
-    tcg_out_arith(s, INSN_ADD, 1, TCG_REG_X2, TCG_REG_X2,
-                  TCG_REG_X0, -CPU_TLB_ENTRY_BITS);
+    tcg_fmt_Rdnm_shift(s, INSN_ADD, 1, TCG_REG_X2, TCG_REG_X2,
+                       TCG_REG_X0, -CPU_TLB_ENTRY_BITS);
     /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
        X0 = load [X2 + (tlb_offset & 0x000fff)] */
     tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
@@ -1173,27 +1171,27 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_add_i64:
     case INDEX_op_add_i32:
-        tcg_out_arith(s, INSN_ADD, ext, a0, a1, a2, 0);
+        tcg_fmt_Rdnm(s, INSN_ADD, ext, a0, a1, a2);
         break;
 
     case INDEX_op_sub_i64:
     case INDEX_op_sub_i32:
-        tcg_out_arith(s, INSN_SUB, ext, a0, a1, a2, 0);
+        tcg_fmt_Rdnm(s, INSN_SUB, ext, a0, a1, a2);
         break;
 
     case INDEX_op_and_i64:
     case INDEX_op_and_i32:
-        tcg_out_arith(s, INSN_AND, ext, a0, a1, a2, 0);
+        tcg_fmt_Rdnm(s, INSN_AND, ext, a0, a1, a2);
         break;
 
     case INDEX_op_or_i64:
     case INDEX_op_or_i32:
-        tcg_out_arith(s, INSN_ORR, ext, a0, a1, a2, 0);
+        tcg_fmt_Rdnm(s, INSN_ORR, ext, a0, a1, a2);
         break;
 
     case INDEX_op_xor_i64:
     case INDEX_op_xor_i32:
-        tcg_out_arith(s, INSN_EOR, ext, a0, a1, a2, 0);
+        tcg_fmt_Rdnm(s, INSN_EOR, ext, a0, a1, a2);
         break;
 
     case INDEX_op_mul_i64:
@@ -1206,7 +1204,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {
             tcg_out_shl(s, ext, a0, a1, a2);
         } else {
-            tcg_out_shiftrot_reg(s, INSN_LSLV, ext, a0, a1, a2);
+            tcg_fmt_Rdnm(s, INSN_LSLV, ext, a0, a1, a2);
         }
         break;
 
@@ -1215,7 +1213,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {
             tcg_out_shr(s, ext, a0, a1, a2);
         } else {
-            tcg_out_shiftrot_reg(s, INSN_LSRV, ext, a0, a1, a2);
+            tcg_fmt_Rdnm(s, INSN_LSRV, ext, a0, a1, a2);
         }
         break;
 
@@ -1224,7 +1222,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {
             tcg_out_sar(s, ext, a0, a1, a2);
         } else {
-            tcg_out_shiftrot_reg(s, INSN_ASRV, ext, a0, a1, a2);
+            tcg_fmt_Rdnm(s, INSN_ASRV, ext, a0, a1, a2);
         }
         break;
 
@@ -1233,7 +1231,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {
             tcg_out_rotr(s, ext, a0, a1, a2);
         } else {
-            tcg_out_shiftrot_reg(s, INSN_RORV, ext, a0, a1, a2);
+            tcg_fmt_Rdnm(s, INSN_RORV, ext, a0, a1, a2);
         }
         break;
 
@@ -1242,8 +1240,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         if (c2) {
             tcg_out_rotl(s, ext, a0, a1, a2);
         } else {
-            tcg_out_arith(s, INSN_SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2, 0);
-            tcg_out_shiftrot_reg(s, INSN_RORV, ext, a0, a1, TCG_REG_TMP);
+            tcg_fmt_Rdnm(s, INSN_SUB, 0, TCG_REG_TMP, TCG_REG_XZR, a2);
+            tcg_fmt_Rdnm(s, INSN_RORV, ext, a0, a1, TCG_REG_TMP);
         }
         break;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 08/29] tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (6 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 07/29] tcg-aarch64: Introduce tcg_fmt_* functions Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 09/29] tcg-aarch64: Implement mov with tcg_fmt_* functions Richard Henderson
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

This merges the implementation of tcg_out_addi and tcg_out_subi.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 75 +++++++++++++++++-------------------------------
 1 file changed, 27 insertions(+), 48 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index e2f3d1c..bd6f823 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -206,6 +206,12 @@ typedef enum {
     INSN_EOR    = 0x4a000000,
     INSN_ANDS   = 0x6a000000,
 
+    /* Add/subtract immediate instructions */
+    INSN_ADDI  = 0x11000000,
+    INSN_ADDSI = 0x31000000,
+    INSN_SUBI  = 0x51000000,
+    INSN_SUBSI = 0x71000000,
+
     /* Add/subtract shifted register instructions */
     INSN_ADD    = 0x0b000000,
     INSN_ADDS   = 0x2b000000,
@@ -321,6 +327,18 @@ static inline void tcg_fmt_Rdnm_shift(TCGContext *s, AArch64Insn insn,
     tcg_out32(s, insn | ext << 31 | shift | rm << 16 | rn << 5 | rd);
 }
 
+static inline void tcg_fmt_Rdn_aimm(TCGContext *s, AArch64Insn insn, bool ext,
+                                    TCGReg rd, TCGReg rn, unsigned int aimm)
+{
+    if (aimm > 0xfff) {
+        assert((aimm & 0xfff) == 0);
+        aimm >>= 12;
+        assert(aimm <= 0xfff);
+        aimm |= 1 << 12;  /* apply LSL 12 */
+    }
+    tcg_out32(s, insn | ext << 31 | aimm << 10 | rn << 5 | rd);
+}
+
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
@@ -732,46 +750,6 @@ static inline void tcg_out_uxt(TCGContext *s, int s_bits,
     tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
 
-static inline void tcg_out_addi(TCGContext *s, bool ext,
-                                TCGReg rd, TCGReg rn, unsigned int aimm)
-{
-    /* add immediate aimm unsigned 12bit value (with LSL 0 or 12) */
-    /* using ADD 0x11000000 | (ext) | (aimm << 10) | (rn << 5) | rd */
-    unsigned int base = ext ? 0x91000000 : 0x11000000;
-
-    if (aimm <= 0xfff) {
-        aimm <<= 10;
-    } else {
-        /* we can only shift left by 12, on assert we cannot represent */
-        assert(!(aimm & 0xfff));
-        assert(aimm <= 0xfff000);
-        base |= 1 << 22; /* apply LSL 12 */
-        aimm >>= 2;
-    }
-
-    tcg_out32(s, base | aimm | (rn << 5) | rd);
-}
-
-static inline void tcg_out_subi(TCGContext *s, bool ext,
-                                TCGReg rd, TCGReg rn, unsigned int aimm)
-{
-    /* sub immediate aimm unsigned 12bit value (with LSL 0 or 12) */
-    /* using SUB 0x51000000 | (ext) | (aimm << 10) | (rn << 5) | rd */
-    unsigned int base = ext ? 0xd1000000 : 0x51000000;
-
-    if (aimm <= 0xfff) {
-        aimm <<= 10;
-    } else {
-        /* we can only shift left by 12, on assert we cannot represent */
-        assert(!(aimm & 0xfff));
-        assert(aimm <= 0xfff000);
-        base |= 1 << 22; /* apply LSL 12 */
-        aimm >>= 2;
-    }
-
-    tcg_out32(s, base | aimm | (rn << 5) | rd);
-}
-
 static inline void tcg_out_nop(TCGContext *s)
 {
     tcg_out32(s, 0xd503201f);
@@ -889,9 +867,9 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
                  (TARGET_LONG_BITS - TARGET_PAGE_BITS) + s_bits,
                  (TARGET_LONG_BITS - TARGET_PAGE_BITS));
     /* Add any "high bits" from the tlb offset to the env address into X2,
-       to take advantage of the LSL12 form of the addi instruction.
+       to take advantage of the LSL12 form of the ADDI instruction.
        X2 = env + (tlb_offset & 0xfff000) */
-    tcg_out_addi(s, 1, TCG_REG_X2, base, tlb_offset & 0xfff000);
+    tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_X2, base, tlb_offset & 0xfff000);
     /* Merge the tlb index contribution into X2.
        X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
     tcg_fmt_Rdnm_shift(s, INSN_ADD, 1, TCG_REG_X2, TCG_REG_X2,
@@ -1500,9 +1478,10 @@ static void tcg_target_qemu_prologue(TCGContext *s)
         tcg_out_store_pair(s, TCG_REG_FP, r, r + 1, idx);
     }
 
-    /* make stack space for TCG locals */
-    tcg_out_subi(s, 1, TCG_REG_SP, TCG_REG_SP,
-                 frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+    /* Make stack space for TCG locals.  */
+    tcg_fmt_Rdn_aimm(s, INSN_SUBI, 1, TCG_REG_SP, TCG_REG_SP,
+                     frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+
     /* inform TCG about how to find TCG locals with register, offset, size */
     tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE,
                   CPU_TEMP_BUF_NLONGS * sizeof(long));
@@ -1519,9 +1498,9 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     tb_ret_addr = s->code_ptr;
 
-    /* remove TCG locals stack space */
-    tcg_out_addi(s, 1, TCG_REG_SP, TCG_REG_SP,
-                 frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+    /* Remove TCG locals stack space.  */
+    tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_SP, TCG_REG_SP,
+                     frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
 
     /* restore registers x19..x28.
        FP must be preserved, so it still points to callee_saved area */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 09/29] tcg-aarch64: Implement mov with tcg_fmt_* functions
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (7 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 08/29] tcg-aarch64: Introduce tcg_fmt_Rdn_aimm Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 10/29] tcg-aarch64: Handle constant operands to add, sub, and compare Richard Henderson
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Avoid the magic numbers in the current implementation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index bd6f823..64c8d19 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -370,12 +370,17 @@ static inline void tcg_out_ldst_12(TCGContext *s,
               | op_type << 20 | scaled_uimm << 10 | rn << 5 | rd);
 }
 
-static inline void tcg_out_movr(TCGContext *s, bool ext, TCGReg rd, TCGReg src)
+/* Register to register move using ORR (shifted register with no shift). */
+static inline void tcg_out_movr(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
 {
-    /* register to register move using MOV (shifted register with no shift) */
-    /* using MOV 0x2a0003e0 | (shift).. */
-    unsigned int base = ext ? 0xaa0003e0 : 0x2a0003e0;
-    tcg_out32(s, base | src << 16 | rd);
+    tcg_fmt_Rdnm(s, INSN_ORR, ext, rd, TCG_REG_XZR, rm);
+}
+
+/* Register to register move using ADDI (move to/from SP).  */
+static inline void tcg_out_movr_sp(TCGContext *s, bool ext,
+                                   TCGReg rd, TCGReg rn)
+{
+    tcg_fmt_Rdn_aimm(s, INSN_ADDI, ext, rd, rn, 0);
 }
 
 static inline void tcg_out_movi_aux(TCGContext *s,
@@ -450,14 +455,6 @@ static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data data,
     tcg_out_ldst_r(s, data, type, rd, rn, TCG_REG_TMP);
 }
 
-/* mov alias implemented with add immediate, useful to move to/from SP */
-static inline void tcg_out_movr_sp(TCGContext *s, bool ext, TCGReg rd, TCGReg rn)
-{
-    /* using ADD 0x11000000 | (ext) | rn << 5 | rd */
-    unsigned int base = ext ? 0x91000000 : 0x11000000;
-    tcg_out32(s, base | rn << 5 | rd);
-}
-
 static inline void tcg_out_mov(TCGContext *s,
                                TCGType type, TCGReg ret, TCGReg arg)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 10/29] tcg-aarch64: Handle constant operands to add, sub, and compare
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (8 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 09/29] tcg-aarch64: Implement mov with tcg_fmt_* functions Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 11/29] tcg-aarch64: Handle constant operands to and, or, xor Richard Henderson
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 101 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 79 insertions(+), 22 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 64c8d19..dbb1c45 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -106,6 +106,9 @@ static inline void patch_reloc(uint8_t *code_ptr, int type,
     }
 }
 
+#define TCG_CT_CONST_IS32 0x100
+#define TCG_CT_CONST_AIMM 0x200
+
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct,
                                    const char **pct_str)
@@ -129,6 +132,12 @@ static int target_parse_constraint(TCGArgConstraint *ct,
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_X3);
 #endif
         break;
+    case 'w': /* The operand should be considered 32-bit.  */
+        ct->ct |= TCG_CT_CONST_IS32;
+        break;
+    case 'A': /* Valid for arithmetic immediate (positive or negative).  */
+        ct->ct |= TCG_CT_CONST_AIMM;
+        break;
     default:
         return -1;
     }
@@ -138,14 +147,25 @@ static int target_parse_constraint(TCGArgConstraint *ct,
     return 0;
 }
 
-static inline int tcg_target_const_match(tcg_target_long val,
-                                         const TCGArgConstraint *arg_ct)
+static inline bool is_aimm(uint64_t val)
+{
+    return (val & ~0xfff) == 0 || (val & ~0xfff000) == 0;
+}
+
+static int tcg_target_const_match(tcg_target_long val,
+                                  const TCGArgConstraint *arg_ct)
 {
     int ct = arg_ct->ct;
 
     if (ct & TCG_CT_CONST) {
         return 1;
     }
+    if (ct & TCG_CT_CONST_IS32) {
+        val = (int32_t)val;
+    }
+    if ((ct & TCG_CT_CONST_AIMM) && (is_aimm(val) || is_aimm(-val))) {
+        return 1;
+    }
 
     return 0;
 }
@@ -548,11 +568,21 @@ static inline void tcg_out_rotl(TCGContext *s, bool ext,
     tcg_out_extr(s, ext, rd, rn, rn, bits - (m & max));
 }
 
-static inline void tcg_out_cmp(TCGContext *s, bool ext, TCGReg rn, TCGReg rm,
-                               int shift_imm)
+static void tcg_out_cmp(TCGContext *s, bool ext, TCGReg a,
+                        tcg_target_long b, bool const_b)
 {
-    /* Using CMP alias SUBS wzr, Wn, Wm */
-    tcg_fmt_Rdnm_shift(s, INSN_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
+    if (const_b) {
+        /* Using CMP or CMN aliases.  */
+        AArch64Insn insn = INSN_SUBSI;
+        if (b < 0) {
+            insn = INSN_ADDSI;
+            b = -b;
+        }
+        tcg_fmt_Rdn_aimm(s, insn, ext, TCG_REG_XZR, a, b);
+    } else {
+        /* Using CMP alias SUBS wzr, Wn, Wm */
+        tcg_fmt_Rdnm(s, INSN_SUBS, ext, TCG_REG_XZR, a, b);
+    }
 }
 
 static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)
@@ -747,6 +777,17 @@ static inline void tcg_out_uxt(TCGContext *s, int s_bits,
     tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
 
+static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
+                            TCGReg rn, int aimm)
+{
+    AArch64Insn insn = INSN_ADDI;
+    if (aimm < 0) {
+        insn = INSN_SUBI;
+        aimm = -aimm;
+    }
+    tcg_fmt_Rdn_aimm(s, insn, ext, rd, rn, aimm);
+}
+
 static inline void tcg_out_nop(TCGContext *s)
 {
     tcg_out32(s, 0xd503201f);
@@ -1144,14 +1185,26 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                      a0, a1, a2);
         break;
 
-    case INDEX_op_add_i64:
     case INDEX_op_add_i32:
-        tcg_fmt_Rdnm(s, INSN_ADD, ext, a0, a1, a2);
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_add_i64:
+        if (c2) {
+            tcg_out_addsubi(s, ext, a0, a1, a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_ADD, ext, a0, a1, a2);
+        }
         break;
 
-    case INDEX_op_sub_i64:
     case INDEX_op_sub_i32:
-        tcg_fmt_Rdnm(s, INSN_SUB, ext, a0, a1, a2);
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_sub_i64:
+        if (c2) {
+            tcg_out_addsubi(s, ext, a0, a1, -a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_SUB, ext, a0, a1, a2);
+        }
         break;
 
     case INDEX_op_and_i64:
@@ -1220,15 +1273,19 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
-    case INDEX_op_brcond_i64:
     case INDEX_op_brcond_i32:
-        tcg_out_cmp(s, ext, a0, a1, 0);
+        a1 = (int32_t)a1;
+        /* FALLTHRU */
+    case INDEX_op_brcond_i64:
+        tcg_out_cmp(s, ext, a0, a1, const_args[1]);
         tcg_out_goto_label_cond(s, a2, args[3]);
         break;
 
-    case INDEX_op_setcond_i64:
     case INDEX_op_setcond_i32:
-        tcg_out_cmp(s, ext, a1, a2, 0);
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_setcond_i64:
+        tcg_out_cmp(s, ext, a1, a2, c2);
         tcg_out_cset(s, 0, a0, args[3]);
         break;
 
@@ -1349,10 +1406,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_st32_i64, { "r", "r" } },
     { INDEX_op_st_i64, { "r", "r" } },
 
-    { INDEX_op_add_i32, { "r", "r", "r" } },
-    { INDEX_op_add_i64, { "r", "r", "r" } },
-    { INDEX_op_sub_i32, { "r", "r", "r" } },
-    { INDEX_op_sub_i64, { "r", "r", "r" } },
+    { INDEX_op_add_i32, { "r", "r", "rwA" } },
+    { INDEX_op_add_i64, { "r", "r", "rA" } },
+    { INDEX_op_sub_i32, { "r", "r", "rwA" } },
+    { INDEX_op_sub_i64, { "r", "r", "rA" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mul_i64, { "r", "r", "r" } },
     { INDEX_op_and_i32, { "r", "r", "r" } },
@@ -1373,10 +1430,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_rotl_i64, { "r", "r", "ri" } },
     { INDEX_op_rotr_i64, { "r", "r", "ri" } },
 
-    { INDEX_op_brcond_i32, { "r", "r" } },
-    { INDEX_op_setcond_i32, { "r", "r", "r" } },
-    { INDEX_op_brcond_i64, { "r", "r" } },
-    { INDEX_op_setcond_i64, { "r", "r", "r" } },
+    { INDEX_op_brcond_i32, { "r", "rwA" } },
+    { INDEX_op_brcond_i64, { "r", "rA" } },
+    { INDEX_op_setcond_i32, { "r", "r", "rwA" } },
+    { INDEX_op_setcond_i64, { "r", "r", "rA" } },
 
     { INDEX_op_qemu_ld8u, { "r", "l" } },
     { INDEX_op_qemu_ld8s, { "r", "l" } },
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 11/29] tcg-aarch64: Handle constant operands to and, or, xor
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (9 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 10/29] tcg-aarch64: Handle constant operands to add, sub, and compare Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 12/29] tcg-aarch64: Support andc, orc, eqv, not Richard Henderson
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Handle a simplified set of logical immediates for the moment.

The way gcc and binutils do it, with 52k worth of tables, and
a binary search depth of log2(5334) = 13, seems slow for the
most common cases.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 148 +++++++++++++++++++++++++++++++----------------
 1 file changed, 99 insertions(+), 49 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index dbb1c45..9324185 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -108,6 +108,7 @@ static inline void patch_reloc(uint8_t *code_ptr, int type,
 
 #define TCG_CT_CONST_IS32 0x100
 #define TCG_CT_CONST_AIMM 0x200
+#define TCG_CT_CONST_LIMM 0x400
 
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct,
@@ -138,6 +139,9 @@ static int target_parse_constraint(TCGArgConstraint *ct,
     case 'A': /* Valid for arithmetic immediate (positive or negative).  */
         ct->ct |= TCG_CT_CONST_AIMM;
         break;
+    case 'L': /* Valid for logical immediate.  */
+        ct->ct |= TCG_CT_CONST_LIMM;
+        break;
     default:
         return -1;
     }
@@ -152,6 +156,26 @@ static inline bool is_aimm(uint64_t val)
     return (val & ~0xfff) == 0 || (val & ~0xfff000) == 0;
 }
 
+static inline bool is_limm(uint64_t val)
+{
+    /* Taking a simplified view of the logical immediates for now, ignoring
+       the replication that can happen across the field.  Match bit patterns
+       of the forms
+           0....01....1
+           0..01..10..0
+       and their inverses.  */
+
+    /* Make things easier below, by testing the form with msb clear. */
+    if ((int64_t)val < 0) {
+        val = ~val;
+    }
+    if (val == 0) {
+        return false;
+    }
+    val += val & -val;
+    return (val & (val - 1)) == 0;
+}
+
 static int tcg_target_const_match(tcg_target_long val,
                                   const TCGArgConstraint *arg_ct)
 {
@@ -166,6 +190,9 @@ static int tcg_target_const_match(tcg_target_long val,
     if ((ct & TCG_CT_CONST_AIMM) && (is_aimm(val) || is_aimm(-val))) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_LIMM) && is_limm(val)) {
+        return 1;
+    }
 
     return 0;
 }
@@ -220,6 +247,11 @@ enum aarch64_ldst_op_type { /* type of operation */
 };
 
 typedef enum {
+    /* Logical immediate instructions */
+    INSN_ANDI  = 0x12000000,
+    INSN_ORRI  = 0x32000000,
+    INSN_EORI  = 0x52000000,
+
     /* Logical shifted register instructions */
     INSN_AND    = 0x0a000000,
     INSN_ORR    = 0x2a000000,
@@ -359,6 +391,41 @@ static inline void tcg_fmt_Rdn_aimm(TCGContext *s, AArch64Insn insn, bool ext,
     tcg_out32(s, insn | ext << 31 | aimm << 10 | rn << 5 | rd);
 }
 
+static inline void tcg_fmt_Rdn_r_s(TCGContext *s, AArch64Insn insn, bool ext,
+                                   TCGReg rd, TCGReg rn, int r, int m)
+{
+    tcg_out32(s, insn | ext * 0x80400000 | r << 16 | m << 10 | rn << 5 | rd);
+}
+
+static void tcg_fmt_Rdn_limm(TCGContext *s, AArch64Insn insn, bool ext,
+                             TCGReg rd, TCGReg rn, uint64_t limm)
+{
+    unsigned h, l, r, c;
+
+    /* See is_limm comment about simplified logical immediates.  */
+    assert(is_limm(limm));
+
+    h = clz64(limm);
+    l = ctz64(limm);
+    if (l == 0) {
+        r = 0;                  /* form 0....01....1 */
+        c = ctz64(~limm) - 1;
+        if (h == 0) {
+            r = clz64(~limm);   /* form 1..10..01..1 */
+            c += r;
+        }
+    } else {
+        r = 64 - l;             /* form 1....10....0 or 0..01..10..0 */
+        c = r - h - 1;
+    }
+    if (!ext) {
+        r &= 31;
+        c &= 31;
+    }
+
+    tcg_fmt_Rdn_r_s(s, insn, ext, rd, rn, r, c);
+}
+
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
@@ -665,40 +732,6 @@ static inline void tcg_out_call(TCGContext *s, tcg_target_long target)
     }
 }
 
-/* encode a logical immediate, mapping user parameter
-   M=set bits pattern length to S=M-1 */
-static inline unsigned int
-aarch64_limm(unsigned int m, unsigned int r)
-{
-    assert(m > 0);
-    return r << 16 | (m - 1) << 10;
-}
-
-/* test a register against an immediate bit pattern made of
-   M set bits rotated right by R.
-   Examples:
-   to test a 32/64 reg against 0x00000007, pass M = 3,  R = 0.
-   to test a 32/64 reg against 0x000000ff, pass M = 8,  R = 0.
-   to test a 32bit reg against 0xff000000, pass M = 8,  R = 8.
-   to test a 32bit reg against 0xff0000ff, pass M = 16, R = 8.
- */
-static inline void tcg_out_tst(TCGContext *s, bool ext, TCGReg rn,
-                               unsigned int m, unsigned int r)
-{
-    /* using TST alias of ANDS XZR, Xn,#bimm64 0x7200001f */
-    unsigned int base = ext ? 0xf240001f : 0x7200001f;
-    tcg_out32(s, base | aarch64_limm(m, r) | rn << 5);
-}
-
-/* and a register with a bit pattern, similarly to TST, no flags change */
-static inline void tcg_out_andi(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
-                                unsigned int m, unsigned int r)
-{
-    /* using AND 0x12000000 */
-    unsigned int base = ext ? 0x92400000 : 0x12000000;
-    tcg_out32(s, base | aarch64_limm(m, r) | rn << 5 | rd);
-}
-
 static inline void tcg_out_ret(TCGContext *s)
 {
     /* emit RET { LR } */
@@ -901,9 +934,8 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
     /* Store the page mask part of the address and the low s_bits into X3.
        Later this allows checking for equality and alignment at the same time.
        X3 = addr_reg & (PAGE_MASK | ((1 << s_bits) - 1)) */
-    tcg_out_andi(s, (TARGET_LONG_BITS == 64), TCG_REG_X3, addr_reg,
-                 (TARGET_LONG_BITS - TARGET_PAGE_BITS) + s_bits,
-                 (TARGET_LONG_BITS - TARGET_PAGE_BITS));
+    tcg_fmt_Rdn_limm(s, INSN_ANDI, TARGET_LONG_BITS == 64, TCG_REG_X3,
+                     addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
     /* Add any "high bits" from the tlb offset to the env address into X2,
        to take advantage of the LSL12 form of the ADDI instruction.
        X2 = env + (tlb_offset & 0xfff000) */
@@ -1207,19 +1239,37 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
-    case INDEX_op_and_i64:
     case INDEX_op_and_i32:
-        tcg_fmt_Rdnm(s, INSN_AND, ext, a0, a1, a2);
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_and_i64:
+        if (c2) {
+            tcg_fmt_Rdn_limm(s, INSN_ANDI, ext, a0, a1, a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_AND, ext, a0, a1, a2);
+        }
         break;
 
-    case INDEX_op_or_i64:
     case INDEX_op_or_i32:
-        tcg_fmt_Rdnm(s, INSN_ORR, ext, a0, a1, a2);
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_or_i64:
+        if (c2) {
+            tcg_fmt_Rdn_limm(s, INSN_ORRI, ext, a0, a1, a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_ORR, ext, a0, a1, a2);
+        }
         break;
 
-    case INDEX_op_xor_i64:
     case INDEX_op_xor_i32:
-        tcg_fmt_Rdnm(s, INSN_EOR, ext, a0, a1, a2);
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_xor_i64:
+        if (c2) {
+            tcg_fmt_Rdn_limm(s, INSN_EORI, ext, a0, a1, a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_EOR, ext, a0, a1, a2);
+        }
         break;
 
     case INDEX_op_mul_i64:
@@ -1412,12 +1462,12 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_sub_i64, { "r", "r", "rA" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mul_i64, { "r", "r", "r" } },
-    { INDEX_op_and_i32, { "r", "r", "r" } },
-    { INDEX_op_and_i64, { "r", "r", "r" } },
-    { INDEX_op_or_i32, { "r", "r", "r" } },
-    { INDEX_op_or_i64, { "r", "r", "r" } },
-    { INDEX_op_xor_i32, { "r", "r", "r" } },
-    { INDEX_op_xor_i64, { "r", "r", "r" } },
+    { INDEX_op_and_i32, { "r", "r", "rwL" } },
+    { INDEX_op_and_i64, { "r", "r", "rL" } },
+    { INDEX_op_or_i32, { "r", "r", "rwL" } },
+    { INDEX_op_or_i64, { "r", "r", "rL" } },
+    { INDEX_op_xor_i32, { "r", "r", "rwL" } },
+    { INDEX_op_xor_i64, { "r", "r", "rL" } },
 
     { INDEX_op_shl_i32, { "r", "r", "ri" } },
     { INDEX_op_shr_i32, { "r", "r", "ri" } },
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 12/29] tcg-aarch64: Support andc, orc, eqv, not
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (10 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 11/29] tcg-aarch64: Handle constant operands to and, or, xor Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 13/29] tcg-aarch64: Handle zero as first argument to sub Richard Henderson
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 65 ++++++++++++++++++++++++++++++++++++++++++------
 tcg/aarch64/tcg-target.h | 16 ++++++------
 2 files changed, 65 insertions(+), 16 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 9324185..eb080ed 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -253,10 +253,12 @@ typedef enum {
     INSN_EORI  = 0x52000000,
 
     /* Logical shifted register instructions */
-    INSN_AND    = 0x0a000000,
-    INSN_ORR    = 0x2a000000,
-    INSN_EOR    = 0x4a000000,
-    INSN_ANDS   = 0x6a000000,
+    INSN_AND   = 0x0a000000,
+    INSN_BIC   = 0x0a200000,
+    INSN_ORR   = 0x2a000000,
+    INSN_ORN   = 0x2a200000,
+    INSN_EOR   = 0x4a000000,
+    INSN_EON   = 0x4a200000,
 
     /* Add/subtract immediate instructions */
     INSN_ADDI  = 0x11000000,
@@ -265,10 +267,10 @@ typedef enum {
     INSN_SUBSI = 0x71000000,
 
     /* Add/subtract shifted register instructions */
-    INSN_ADD    = 0x0b000000,
-    INSN_ADDS   = 0x2b000000,
-    INSN_SUB    = 0x4b000000,
-    INSN_SUBS   = 0x6b000000,
+    INSN_ADD   = 0x0b000000,
+    INSN_ADDS  = 0x2b000000,
+    INSN_SUB   = 0x4b000000,
+    INSN_SUBS  = 0x6b000000,
 
     /* Data-processing (2 source) instructions */
     INSN_LSLV  = 0x1ac02000,
@@ -1250,6 +1252,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_andc_i32:
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_andc_i64:
+        if (c2) {
+            tcg_fmt_Rdn_limm(s, INSN_ANDI, ext, a0, a1, ~a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_BIC, ext, a0, a1, a2);
+        }
+        break;
+
     case INDEX_op_or_i32:
         a2 = (int32_t)a2;
         /* FALLTHRU */
@@ -1261,6 +1274,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_orc_i32:
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_orc_i64:
+        if (c2) {
+            tcg_fmt_Rdn_limm(s, INSN_ORRI, ext, a0, a1, ~a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_ORN, ext, a0, a1, a2);
+        }
+        break;
+
     case INDEX_op_xor_i32:
         a2 = (int32_t)a2;
         /* FALLTHRU */
@@ -1272,6 +1296,22 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_eqv_i32:
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_eqv_i64:
+        if (c2) {
+            tcg_fmt_Rdn_limm(s, INSN_EORI, ext, a0, a1, ~a2);
+        } else {
+            tcg_fmt_Rdnm(s, INSN_EON, ext, a0, a1, a2);
+        }
+        break;
+
+    case INDEX_op_not_i64:
+    case INDEX_op_not_i32:
+        tcg_fmt_Rdnm(s, INSN_ORN, ext, a0, TCG_REG_XZR, a1);
+        break;
+
     case INDEX_op_mul_i64:
     case INDEX_op_mul_i32:
         tcg_out_mul(s, ext, a0, a1, a2);
@@ -1468,6 +1508,15 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_or_i64, { "r", "r", "rL" } },
     { INDEX_op_xor_i32, { "r", "r", "rwL" } },
     { INDEX_op_xor_i64, { "r", "r", "rL" } },
+    { INDEX_op_andc_i32, { "r", "r", "rwL" } },
+    { INDEX_op_andc_i64, { "r", "r", "rL" } },
+    { INDEX_op_orc_i32, { "r", "r", "rwL" } },
+    { INDEX_op_orc_i64, { "r", "r", "rL" } },
+    { INDEX_op_eqv_i32, { "r", "r", "rwL" } },
+    { INDEX_op_eqv_i64, { "r", "r", "rL" } },
+
+    { INDEX_op_not_i32, { "r", "r" } },
+    { INDEX_op_not_i64, { "r", "r" } },
 
     { INDEX_op_shl_i32, { "r", "r", "ri" } },
     { INDEX_op_shr_i32, { "r", "r", "ri" } },
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 26ee28b..6242136 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -47,12 +47,12 @@ typedef enum {
 #define TCG_TARGET_HAS_ext16u_i32       1
 #define TCG_TARGET_HAS_bswap16_i32      1
 #define TCG_TARGET_HAS_bswap32_i32      1
-#define TCG_TARGET_HAS_not_i32          0
+#define TCG_TARGET_HAS_not_i32          1
 #define TCG_TARGET_HAS_neg_i32          0
 #define TCG_TARGET_HAS_rot_i32          1
-#define TCG_TARGET_HAS_andc_i32         0
-#define TCG_TARGET_HAS_orc_i32          0
-#define TCG_TARGET_HAS_eqv_i32          0
+#define TCG_TARGET_HAS_andc_i32         1
+#define TCG_TARGET_HAS_orc_i32          1
+#define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      0
@@ -75,12 +75,12 @@ typedef enum {
 #define TCG_TARGET_HAS_bswap16_i64      1
 #define TCG_TARGET_HAS_bswap32_i64      1
 #define TCG_TARGET_HAS_bswap64_i64      1
-#define TCG_TARGET_HAS_not_i64          0
+#define TCG_TARGET_HAS_not_i64          1
 #define TCG_TARGET_HAS_neg_i64          0
 #define TCG_TARGET_HAS_rot_i64          1
-#define TCG_TARGET_HAS_andc_i64         0
-#define TCG_TARGET_HAS_orc_i64          0
-#define TCG_TARGET_HAS_eqv_i64          0
+#define TCG_TARGET_HAS_andc_i64         1
+#define TCG_TARGET_HAS_orc_i64          1
+#define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 13/29] tcg-aarch64: Handle zero as first argument to sub
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (11 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 12/29] tcg-aarch64: Support andc, orc, eqv, not Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 14/29] tcg-aarch64: Support movcond Richard Henderson
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

In order to properly handle neg, as generated by TCG generic code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index eb080ed..ea1db85 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -109,6 +109,7 @@ static inline void patch_reloc(uint8_t *code_ptr, int type,
 #define TCG_CT_CONST_IS32 0x100
 #define TCG_CT_CONST_AIMM 0x200
 #define TCG_CT_CONST_LIMM 0x400
+#define TCG_CT_CONST_ZERO 0x800
 
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct,
@@ -142,6 +143,9 @@ static int target_parse_constraint(TCGArgConstraint *ct,
     case 'L': /* Valid for logical immediate.  */
         ct->ct |= TCG_CT_CONST_LIMM;
         break;
+    case 'Z': /* zero */
+        ct->ct |= TCG_CT_CONST_ZERO;
+        break;
     default:
         return -1;
     }
@@ -193,6 +197,9 @@ static int tcg_target_const_match(tcg_target_long val,
     if ((ct & TCG_CT_CONST_LIMM) && is_limm(val)) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+        return 1;
+    }
 
     return 0;
 }
@@ -1166,6 +1173,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     TCGArg a2 = args[2];
     int c2 = const_args[2];
 
+    /* Some operands are defined with "rZ" constraint, a register or
+       the zero register.  These need not actually test args[I] == 0.  */
+#define REG0(I)  (const_args[I] ? TCG_REG_XZR : (TCGReg)args[I])
+
     switch (opc) {
     case INDEX_op_exit_tb:
         tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_X0, a0);
@@ -1235,9 +1246,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* FALLTHRU */
     case INDEX_op_sub_i64:
         if (c2) {
-            tcg_out_addsubi(s, ext, a0, a1, -a2);
+            /* Arithmetic immediate instructions use Xn|sp, and thus
+               we cannot encode the zero register if tcg optimization
+               is turned off and both A1 and A2 are constants.  */
+            if (const_args[1]) {
+                tcg_out_movi(s, ext ? TCG_TYPE_I64 : TCG_TYPE_I32, a0, -a2);
+            } else {
+                tcg_out_addsubi(s, ext, a0, a1, -a2);
+            }
         } else {
-            tcg_fmt_Rdnm(s, INSN_SUB, ext, a0, a1, a2);
+            tcg_fmt_Rdnm(s, INSN_SUB, ext, a0, REG0(1), a2);
         }
         break;
 
@@ -1461,6 +1479,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* Opcode not implemented.  */
         tcg_abort();
     }
+
+#undef REG0
 }
 
 static const TCGTargetOpDef aarch64_op_defs[] = {
@@ -1498,8 +1518,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 
     { INDEX_op_add_i32, { "r", "r", "rwA" } },
     { INDEX_op_add_i64, { "r", "r", "rA" } },
-    { INDEX_op_sub_i32, { "r", "r", "rwA" } },
-    { INDEX_op_sub_i64, { "r", "r", "rA" } },
+    { INDEX_op_sub_i32, { "r", "rZ", "rwA" } },
+    { INDEX_op_sub_i64, { "r", "rZ", "rA" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mul_i64, { "r", "r", "r" } },
     { INDEX_op_and_i32, { "r", "r", "rwL" } },
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 14/29] tcg-aarch64: Support movcond
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (12 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 13/29] tcg-aarch64: Handle zero as first argument to sub Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-09 15:09   ` Claudio Fontana
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 15/29] tcg-aarch64: Support deposit Richard Henderson
                   ` (16 subsequent siblings)
  30 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Also tidy the implementation of setcond in order to share code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 33 +++++++++++++++++++++++++--------
 tcg/aarch64/tcg-target.h |  4 ++--
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index ea1db85..322660d 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -284,6 +284,10 @@ typedef enum {
     INSN_LSRV  = 0x1ac02400,
     INSN_ASRV  = 0x1ac02800,
     INSN_RORV  = 0x1ac02c00,
+
+    /* Conditional select instructions */
+    INSN_CSEL  = 0x1a800000,
+    INSN_CSINC = 0x1a800400,
 } AArch64Insn;
 
 static inline enum aarch64_ldst_op_data
@@ -435,6 +439,14 @@ static void tcg_fmt_Rdn_limm(TCGContext *s, AArch64Insn insn, bool ext,
     tcg_fmt_Rdn_r_s(s, insn, ext, rd, rn, r, c);
 }
 
+static inline void tcg_fmt_Rdnm_cond(TCGContext *s, AArch64Insn insn,
+                                     bool ext, TCGReg rd, TCGReg rn,
+                                     TCGReg rm, TCGCond c)
+{
+    tcg_out32(s, insn | ext << 31 | rm << 16 | rn << 5 | rd
+              | tcg_cond_to_aarch64[c] << 12);
+}
+
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
@@ -661,13 +673,6 @@ static void tcg_out_cmp(TCGContext *s, bool ext, TCGReg a,
     }
 }
 
-static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)
-{
-    /* Using CSET alias of CSINC 0x1a800400 Xd, XZR, XZR, invert(cond) */
-    unsigned int base = ext ? 0x9a9f07e0 : 0x1a9f07e0;
-    tcg_out32(s, base | tcg_cond_to_aarch64[tcg_invert_cond(c)] << 12 | rd);
-}
-
 static inline void tcg_out_goto(TCGContext *s, tcg_target_long target)
 {
     tcg_target_long offset;
@@ -1394,7 +1399,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         /* FALLTHRU */
     case INDEX_op_setcond_i64:
         tcg_out_cmp(s, ext, a1, a2, c2);
-        tcg_out_cset(s, 0, a0, args[3]);
+        /* Using CSET alias of CSINC Xd, XZR, XZR, invert(cond) */
+        tcg_fmt_Rdnm_cond(s, INSN_CSINC, 0, a0, TCG_REG_XZR,
+                          TCG_REG_XZR, tcg_invert_cond(args[3]));
+        break;
+
+    case INDEX_op_movcond_i32:
+        a2 = (int32_t)a2;
+        /* FALLTHRU */
+    case INDEX_op_movcond_i64:
+        tcg_out_cmp(s, ext, a1, a2, c2);
+        tcg_fmt_Rdnm_cond(s, INSN_CSEL, ext, a0, REG0(3), REG0(4), args[5]);
         break;
 
     case INDEX_op_qemu_ld8u:
@@ -1553,6 +1568,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_brcond_i64, { "r", "rA" } },
     { INDEX_op_setcond_i32, { "r", "r", "rwA" } },
     { INDEX_op_setcond_i64, { "r", "r", "rA" } },
+    { INDEX_op_movcond_i32, { "r", "r", "rwA", "rZ", "rZ" } },
+    { INDEX_op_movcond_i64, { "r", "r", "rwA", "rZ", "rZ" } },
 
     { INDEX_op_qemu_ld8u, { "r", "l" } },
     { INDEX_op_qemu_ld8s, { "r", "l" } },
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 6242136..ff073ca 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -56,7 +56,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      0
-#define TCG_TARGET_HAS_movcond_i32      0
+#define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
 #define TCG_TARGET_HAS_mulu2_i32        0
@@ -84,7 +84,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      0
-#define TCG_TARGET_HAS_movcond_i64      0
+#define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i64         0
 #define TCG_TARGET_HAS_mulu2_i64        0
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 15/29] tcg-aarch64: Support deposit
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (13 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 14/29] tcg-aarch64: Support movcond Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 16/29] tcg-aarch64: Support add2, sub2 Richard Henderson
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Also tidy the implementation of ubfm, sbfm, extr in order to share code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 47 ++++++++++++++++++++++++++++++++++-------------
 tcg/aarch64/tcg-target.h |  4 ++--
 2 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 322660d..0dc3fee 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -285,6 +285,12 @@ typedef enum {
     INSN_ASRV  = 0x1ac02800,
     INSN_RORV  = 0x1ac02c00,
 
+    /* Bitfield instructions */
+    INSN_BFM   = 0x33000000,
+    INSN_SBFM  = 0x13000000,
+    INSN_UBFM  = 0x53000000,
+    INSN_EXTR  = 0x13800000,
+
     /* Conditional select instructions */
     INSN_CSEL  = 0x1a800000,
     INSN_CSINC = 0x1a800400,
@@ -593,28 +599,28 @@ static inline void tcg_out_mul(TCGContext *s, bool ext,
     tcg_out32(s, base | rm << 16 | rn << 5 | rd);
 }
 
+static inline void tcg_out_bfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
+                               unsigned int a, unsigned int b)
+{
+    tcg_fmt_Rdn_r_s(s, INSN_BFM, ext, rd, rn, a, b);
+}
+
 static inline void tcg_out_ubfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                 unsigned int a, unsigned int b)
 {
-    /* Using UBFM 0x53000000 Wd, Wn, a, b */
-    unsigned int base = ext ? 0xd3400000 : 0x53000000;
-    tcg_out32(s, base | a << 16 | b << 10 | rn << 5 | rd);
+    tcg_fmt_Rdn_r_s(s, INSN_UBFM, ext, rd, rn, a, b);
 }
 
 static inline void tcg_out_sbfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                 unsigned int a, unsigned int b)
 {
-    /* Using SBFM 0x13000000 Wd, Wn, a, b */
-    unsigned int base = ext ? 0x93400000 : 0x13000000;
-    tcg_out32(s, base | a << 16 | b << 10 | rn << 5 | rd);
+    tcg_fmt_Rdn_r_s(s, INSN_SBFM, ext, rd, rn, a, b);
 }
 
 static inline void tcg_out_extr(TCGContext *s, bool ext, TCGReg rd,
                                 TCGReg rn, TCGReg rm, unsigned int a)
 {
-    /* Using EXTR 0x13800000 Wd, Wn, Wm, a */
-    unsigned int base = ext ? 0x93c00000 : 0x13800000;
-    tcg_out32(s, base | rm << 16 | a << 10 | rn << 5 | rd);
+    tcg_fmt_Rdn_r_s(s, INSN_EXTR, ext, rd, rn, a, rm);
 }
 
 static inline void tcg_out_shl(TCGContext *s, bool ext,
@@ -656,6 +662,15 @@ static inline void tcg_out_rotl(TCGContext *s, bool ext,
     tcg_out_extr(s, ext, rd, rn, rn, bits - (m & max));
 }
 
+static inline void tcg_out_dep(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
+                               unsigned lsb, unsigned width)
+{
+    unsigned size = ext ? 64 : 32;
+    unsigned a = (size - lsb) & (size - 1);
+    unsigned b = width - 1;
+    tcg_out_bfm(s, ext, rd, rn, a, b);
+}
+
 static void tcg_out_cmp(TCGContext *s, bool ext, TCGReg a,
                         tcg_target_long b, bool const_b)
 {
@@ -809,8 +824,7 @@ static inline void tcg_out_rev16(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
 static inline void tcg_out_sxt(TCGContext *s, bool ext, int s_bits,
                                TCGReg rd, TCGReg rn)
 {
-    /* using ALIASes SXTB 0x13001c00, SXTH 0x13003c00, SXTW 0x93407c00
-       of SBFM Xd, Xn, #0, #7|15|31 */
+    /* Using ALIASes SXTB, SXTH, SXTW, of SBFM Xd, Xn, #0, #7|15|31 */
     int bits = 8 * (1 << s_bits) - 1;
     tcg_out_sbfm(s, ext, rd, rn, 0, bits);
 }
@@ -818,8 +832,7 @@ static inline void tcg_out_sxt(TCGContext *s, bool ext, int s_bits,
 static inline void tcg_out_uxt(TCGContext *s, int s_bits,
                                TCGReg rd, TCGReg rn)
 {
-    /* using ALIASes UXTB 0x53001c00, UXTH 0x53003c00
-       of UBFM Wd, Wn, #0, #7|15 */
+    /* Using ALIASes UXTB, UXTH of UBFM Wd, Wn, #0, #7|15 */
     int bits = 8 * (1 << s_bits) - 1;
     tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
@@ -1485,6 +1498,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_movr(s, 0, a0, a1);
         break;
 
+    case INDEX_op_deposit_i64:
+    case INDEX_op_deposit_i32:
+        tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
+        break;
+
     case INDEX_op_mov_i64:
     case INDEX_op_mov_i32:
     case INDEX_op_movi_i64:
@@ -1604,6 +1622,9 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_ext16u_i64, { "r", "r" } },
     { INDEX_op_ext32u_i64, { "r", "r" } },
 
+    { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+    { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+
     { -1 },
 };
 
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index ff073ca..712e4e7 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -55,7 +55,7 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32          1
 #define TCG_TARGET_HAS_nand_i32         0
 #define TCG_TARGET_HAS_nor_i32          0
-#define TCG_TARGET_HAS_deposit_i32      0
+#define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_movcond_i32      1
 #define TCG_TARGET_HAS_add2_i32         0
 #define TCG_TARGET_HAS_sub2_i32         0
@@ -83,7 +83,7 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64          1
 #define TCG_TARGET_HAS_nand_i64         0
 #define TCG_TARGET_HAS_nor_i64          0
-#define TCG_TARGET_HAS_deposit_i64      0
+#define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_movcond_i64      1
 #define TCG_TARGET_HAS_add2_i64         0
 #define TCG_TARGET_HAS_sub2_i64         0
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 16/29] tcg-aarch64: Support add2, sub2
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (14 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 15/29] tcg-aarch64: Support deposit Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 17/29] tcg-aarch64: Support muluh, mulsh Richard Henderson
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/aarch64/tcg-target.h |  8 ++---
 2 files changed, 82 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 0dc3fee..0587765 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -110,6 +110,7 @@ static inline void patch_reloc(uint8_t *code_ptr, int type,
 #define TCG_CT_CONST_AIMM 0x200
 #define TCG_CT_CONST_LIMM 0x400
 #define TCG_CT_CONST_ZERO 0x800
+#define TCG_CT_CONST_MONE 0x1000
 
 /* parse target specific constraints */
 static int target_parse_constraint(TCGArgConstraint *ct,
@@ -143,6 +144,9 @@ static int target_parse_constraint(TCGArgConstraint *ct,
     case 'L': /* Valid for logical immediate.  */
         ct->ct |= TCG_CT_CONST_LIMM;
         break;
+    case 'M': /* minus one */
+        ct->ct |= TCG_CT_CONST_MONE;
+        break;
     case 'Z': /* zero */
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
@@ -200,6 +204,9 @@ static int tcg_target_const_match(tcg_target_long val,
     if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
         return 1;
     }
+    if ((ct & TCG_CT_CONST_MONE) && val == -1) {
+        return 1;
+    }
 
     return 0;
 }
@@ -279,6 +286,10 @@ typedef enum {
     INSN_SUB   = 0x4b000000,
     INSN_SUBS  = 0x6b000000,
 
+    /* Add/subtract with carry instructions */
+    INSN_ADC   = 0x1a000000,
+    INSN_SBC   = 0x5a000000,
+
     /* Data-processing (2 source) instructions */
     INSN_LSLV  = 0x1ac02000,
     INSN_LSRV  = 0x1ac02400,
@@ -848,6 +859,47 @@ static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
     tcg_fmt_Rdn_aimm(s, insn, ext, rd, rn, aimm);
 }
 
+static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
+                                   TCGReg rh, TCGReg al, TCGReg ah,
+                                   tcg_target_long bl, tcg_target_long bh,
+                                   bool const_bl, bool const_bh, bool sub)
+{
+    TCGReg orig_rl = rl;
+    AArch64Insn insn;
+
+    if (rl == ah || (!const_bh && rl == bh)) {
+        rl = TCG_REG_TMP;
+    }
+
+    if (const_bl) {
+        insn = INSN_ADDSI;
+        if ((bl < 0) ^ sub) {
+            insn = INSN_SUBSI;
+            bl = -bl;
+        }
+        tcg_fmt_Rdn_aimm(s, insn, ext, rl, al, bl);
+    } else {
+        tcg_fmt_Rdnm(s, sub ? INSN_SUBS : INSN_ADDS, ext, rl, al, bl);
+    }
+
+    insn = INSN_ADC;
+    if (const_bh) {
+        /* Note that the only two constants we support are 0 and -1, and
+           that SBC = rn + ~rm + c, so adc -1 is sbc 0, and vice-versa.  */
+        if ((bh != 0) ^ sub) {
+            insn = INSN_SBC;
+        }
+        bh = TCG_REG_XZR;
+    } else if (sub) {
+        insn = INSN_SBC;
+    }
+    tcg_fmt_Rdnm(s, insn, ext, rh, ah, bh);
+
+    if (rl != orig_rl) {
+        tcg_out_movr(s, ext, orig_rl, rl);
+    }
+}
+
 static inline void tcg_out_nop(TCGContext *s)
 {
     tcg_out32(s, 0xd503201f);
@@ -1503,6 +1555,27 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
         break;
 
+    case INDEX_op_add2_i32:
+        a2 = (int32_t)args[4];
+        c2 = false;
+        goto do_addsub2;
+    case INDEX_op_add2_i64:
+        a2 = args[4];
+        c2 = false;
+        goto do_addsub2;
+    case INDEX_op_sub2_i32:
+        a2 = (int32_t)args[4];
+        c2 = true;
+        goto do_addsub2;
+    case INDEX_op_sub2_i64:
+        a2 = args[4];
+        c2 = true;
+        goto do_addsub2;
+    do_addsub2:
+        tcg_out_addsub2(s, ext, a0, a1, REG0(2), REG0(3), a2,
+                        args[5], const_args[4], const_args[5], c2);
+        break;
+
     case INDEX_op_mov_i64:
     case INDEX_op_mov_i32:
     case INDEX_op_movi_i64:
@@ -1625,6 +1698,11 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
     { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
 
+    { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rwA", "rwMZ" } },
+    { INDEX_op_add2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
+    { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rwA", "rwMZ" } },
+    { INDEX_op_sub2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
+
     { -1 },
 };
 
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 712e4e7..05e43e4 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -57,8 +57,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i32          0
 #define TCG_TARGET_HAS_deposit_i32      1
 #define TCG_TARGET_HAS_movcond_i32      1
-#define TCG_TARGET_HAS_add2_i32         0
-#define TCG_TARGET_HAS_sub2_i32         0
+#define TCG_TARGET_HAS_add2_i32         1
+#define TCG_TARGET_HAS_sub2_i32         1
 #define TCG_TARGET_HAS_mulu2_i32        0
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
@@ -85,8 +85,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i64          0
 #define TCG_TARGET_HAS_deposit_i64      1
 #define TCG_TARGET_HAS_movcond_i64      1
-#define TCG_TARGET_HAS_add2_i64         0
-#define TCG_TARGET_HAS_sub2_i64         0
+#define TCG_TARGET_HAS_add2_i64         1
+#define TCG_TARGET_HAS_sub2_i64         1
 #define TCG_TARGET_HAS_mulu2_i64        0
 #define TCG_TARGET_HAS_muls2_i64        0
 #define TCG_TARGET_HAS_muluh_i64        0
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 17/29] tcg-aarch64: Support muluh, mulsh
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (15 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 16/29] tcg-aarch64: Support add2, sub2 Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 18/29] tcg-aarch64: Support div, rem Richard Henderson
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 12 ++++++++++++
 tcg/aarch64/tcg-target.h |  4 ++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 0587765..5ab0596 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -295,6 +295,8 @@ typedef enum {
     INSN_LSRV  = 0x1ac02400,
     INSN_ASRV  = 0x1ac02800,
     INSN_RORV  = 0x1ac02c00,
+    INSN_SMULH = 0x9b407c00,
+    INSN_UMULH = 0x9bc07c00,
 
     /* Bitfield instructions */
     INSN_BFM   = 0x33000000,
@@ -1576,6 +1578,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         args[5], const_args[4], const_args[5], c2);
         break;
 
+    case INDEX_op_muluh_i64:
+        tcg_fmt_Rdnm(s, INSN_UMULH, 1, a0, a1, a2);
+        break;
+    case INDEX_op_mulsh_i64:
+        tcg_fmt_Rdnm(s, INSN_SMULH, 1, a0, a1, a2);
+        break;
+
     case INDEX_op_mov_i64:
     case INDEX_op_mov_i32:
     case INDEX_op_movi_i64:
@@ -1703,6 +1712,9 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rwA", "rwMZ" } },
     { INDEX_op_sub2_i64, { "r", "r", "rZ", "rZ", "rA", "rMZ" } },
 
+    { INDEX_op_muluh_i64, { "r", "r", "r" } },
+    { INDEX_op_mulsh_i64, { "r", "r", "r" } },
+
     { -1 },
 };
 
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 05e43e4..8fd6771 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -89,8 +89,8 @@ typedef enum {
 #define TCG_TARGET_HAS_sub2_i64         1
 #define TCG_TARGET_HAS_mulu2_i64        0
 #define TCG_TARGET_HAS_muls2_i64        0
-#define TCG_TARGET_HAS_muluh_i64        0
-#define TCG_TARGET_HAS_mulsh_i64        0
+#define TCG_TARGET_HAS_muluh_i64        1
+#define TCG_TARGET_HAS_mulsh_i64        1
 
 enum {
     TCG_AREG0 = TCG_REG_X19,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 18/29] tcg-aarch64: Support div, rem
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (16 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 17/29] tcg-aarch64: Support muluh, mulsh Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s Richard Henderson
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

For remainder, generic code will produce mul+sub,
whereas we can implement with msub.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 50 +++++++++++++++++++++++++++++++++++++++---------
 tcg/aarch64/tcg-target.h |  8 ++++----
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5ab0596..09ccd67 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -297,6 +297,12 @@ typedef enum {
     INSN_RORV  = 0x1ac02c00,
     INSN_SMULH = 0x9b407c00,
     INSN_UMULH = 0x9bc07c00,
+    INSN_UDIV  = 0x1ac00800,
+    INSN_SDIV  = 0x1ac00c00,
+
+    /* Data-processing (3 source) instructions */
+    INSN_MADD  = 0x1b000000,
+    INSN_MSUB  = 0x1b008000,
 
     /* Bitfield instructions */
     INSN_BFM   = 0x33000000,
@@ -398,6 +404,12 @@ static inline void tcg_fmt_Rdnm(TCGContext *s, AArch64Insn insn, bool ext,
     tcg_out32(s, insn | ext << 31 | rm << 16 | rn << 5 | rd);
 }
 
+static inline void tcg_fmt_Rdnma(TCGContext *s, AArch64Insn insn, bool ext,
+                                 TCGReg rd, TCGReg rn, TCGReg rm, TCGReg ra)
+{
+    tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd);
+}
+
 static inline void tcg_fmt_Rdnm_shift(TCGContext *s, AArch64Insn insn,
                                       bool ext, TCGReg rd, TCGReg rn,
                                       TCGReg rm, int shift_imm)
@@ -604,14 +616,6 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                  arg, arg1, arg2);
 }
 
-static inline void tcg_out_mul(TCGContext *s, bool ext,
-                               TCGReg rd, TCGReg rn, TCGReg rm)
-{
-    /* Using MADD 0x1b000000 with Ra = wzr alias MUL 0x1b007c00 */
-    unsigned int base = ext ? 0x9b007c00 : 0x1b007c00;
-    tcg_out32(s, base | rm << 16 | rn << 5 | rd);
-}
-
 static inline void tcg_out_bfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
                                unsigned int a, unsigned int b)
 {
@@ -1404,7 +1408,27 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_mul_i64:
     case INDEX_op_mul_i32:
-        tcg_out_mul(s, ext, a0, a1, a2);
+        tcg_fmt_Rdnma(s, INSN_MADD, ext, a0, a1, a2, TCG_REG_XZR);
+        break;
+
+    case INDEX_op_div_i64:
+    case INDEX_op_div_i32:
+        tcg_fmt_Rdnm(s, INSN_SDIV, ext, a0, a1, a2);
+        break;
+    case INDEX_op_divu_i64:
+    case INDEX_op_divu_i32:
+        tcg_fmt_Rdnm(s, INSN_UDIV, ext, a0, a1, a2);
+        break;
+
+    case INDEX_op_rem_i64:
+    case INDEX_op_rem_i32:
+        tcg_fmt_Rdnm(s, INSN_SDIV, ext, TCG_REG_TMP, a1, a2);
+        tcg_fmt_Rdnma(s, INSN_MSUB, ext, a0, TCG_REG_TMP, a2, a1);
+        break;
+    case INDEX_op_remu_i64:
+    case INDEX_op_remu_i32:
+        tcg_fmt_Rdnm(s, INSN_UDIV, ext, TCG_REG_TMP, a1, a2);
+        tcg_fmt_Rdnma(s, INSN_MSUB, ext, a0, TCG_REG_TMP, a2, a1);
         break;
 
     case INDEX_op_shl_i64:
@@ -1637,6 +1661,14 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_sub_i64, { "r", "rZ", "rA" } },
     { INDEX_op_mul_i32, { "r", "r", "r" } },
     { INDEX_op_mul_i64, { "r", "r", "r" } },
+    { INDEX_op_div_i32, { "r", "r", "r" } },
+    { INDEX_op_div_i64, { "r", "r", "r" } },
+    { INDEX_op_divu_i32, { "r", "r", "r" } },
+    { INDEX_op_divu_i64, { "r", "r", "r" } },
+    { INDEX_op_rem_i32, { "r", "r", "r" } },
+    { INDEX_op_rem_i64, { "r", "r", "r" } },
+    { INDEX_op_remu_i32, { "r", "r", "r" } },
+    { INDEX_op_remu_i64, { "r", "r", "r" } },
     { INDEX_op_and_i32, { "r", "r", "rwL" } },
     { INDEX_op_and_i64, { "r", "r", "rL" } },
     { INDEX_op_or_i32, { "r", "r", "rwL" } },
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8fd6771..bf72e62 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -39,8 +39,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET    0
 
 /* optional instructions */
-#define TCG_TARGET_HAS_div_i32          0
-#define TCG_TARGET_HAS_rem_i32          0
+#define TCG_TARGET_HAS_div_i32          1
+#define TCG_TARGET_HAS_rem_i32          1
 #define TCG_TARGET_HAS_ext8s_i32        1
 #define TCG_TARGET_HAS_ext16s_i32       1
 #define TCG_TARGET_HAS_ext8u_i32        1
@@ -64,8 +64,8 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-#define TCG_TARGET_HAS_div_i64          0
-#define TCG_TARGET_HAS_rem_i64          0
+#define TCG_TARGET_HAS_div_i64          1
+#define TCG_TARGET_HAS_rem_i64          1
 #define TCG_TARGET_HAS_ext8s_i64        1
 #define TCG_TARGET_HAS_ext16s_i64       1
 #define TCG_TARGET_HAS_ext32s_i64       1
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (17 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 18/29] tcg-aarch64: Support div, rem Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-05 13:32   ` Claudio Fontana
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 20/29] tcg-aarch64: Improve tcg_out_movi Richard Henderson
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Cleaning up the implementation of tcg_out_movi at the same time.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 48 ++++++++++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 09ccd67..59e5026 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -274,6 +274,11 @@ typedef enum {
     INSN_EOR   = 0x4a000000,
     INSN_EON   = 0x4a200000,
 
+    /* Move wide immediate instructions */
+    INSN_MOVN  = 0x12800000,
+    INSN_MOVZ  = 0x52800000,
+    INSN_MOVK  = 0x72800000,
+
     /* Add/subtract immediate instructions */
     INSN_ADDI  = 0x11000000,
     INSN_ADDSI = 0x31000000,
@@ -478,6 +483,12 @@ static inline void tcg_fmt_Rdnm_cond(TCGContext *s, AArch64Insn insn,
               | tcg_cond_to_aarch64[c] << 12);
 }
 
+static inline void tcg_fmt_Rd_uimm_s(TCGContext *s, AArch64Insn insn, bool ext,
+                                     TCGReg rd, uint16_t half, unsigned shift)
+{
+    tcg_out32(s, insn | ext << 31 | shift << 17 | half << 5 | rd);
+}
+
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
@@ -522,38 +533,23 @@ static inline void tcg_out_movr_sp(TCGContext *s, bool ext,
     tcg_fmt_Rdn_aimm(s, INSN_ADDI, ext, rd, rn, 0);
 }
 
-static inline void tcg_out_movi_aux(TCGContext *s,
-                                    TCGReg rd, uint64_t value)
+static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
+                         tcg_target_long value)
 {
-    uint32_t half, base, shift, movk = 0;
-    /* construct halfwords of the immediate with MOVZ/MOVK with LSL */
-    /* using MOVZ 0x52800000 | extended reg.. */
-    base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000;
-    /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
-       first MOVZ with the half-word immediate skipping the zeros, with a shift
-       (LSL) equal to this number. Then morph all next instructions into MOVKs.
-       Zero the processed half-word in the value, continue until empty.
-       We build the final result 16bits at a time with up to 4 instructions,
-       but do not emit instructions for 16bit zero holes. */
+    AArch64Insn insn = INSN_MOVZ;
+
+    if (type == TCG_TYPE_I32) {
+        value = (uint32_t)value;
+    }
+
     do {
-        shift = ctz64(value) & (63 & -16);
-        half = (value >> shift) & 0xffff;
-        tcg_out32(s, base | movk | shift << 17 | half << 5 | rd);
-        movk = 0x20000000; /* morph next MOVZs into MOVKs */
+        unsigned shift = ctz64(value) & (63 & -16);
+        tcg_fmt_Rd_uimm_s(s, insn, shift >= 32, rd, value >> shift, shift);
         value &= ~(0xffffUL << shift);
+        insn = INSN_MOVK;
     } while (value);
 }
 
-static inline void tcg_out_movi(TCGContext *s, TCGType type,
-                                TCGReg rd, tcg_target_long value)
-{
-    if (type == TCG_TYPE_I64) {
-        tcg_out_movi_aux(s, rd, value);
-    } else {
-        tcg_out_movi_aux(s, rd, value & 0xffffffff);
-    }
-}
-
 static inline void tcg_out_ldst_r(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 20/29] tcg-aarch64: Improve tcg_out_movi
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (18 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 21/29] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Handle small positive and negative numbers early.  Check for logical
immediates.  Check if using MOVN for the first set helps.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 61 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 53 insertions(+), 8 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 59e5026..54f5ce8 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -534,20 +534,65 @@ static inline void tcg_out_movr_sp(TCGContext *s, bool ext,
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
-                         tcg_target_long value)
+                         tcg_target_long svalue)
 {
-    AArch64Insn insn = INSN_MOVZ;
+    tcg_target_ulong value = svalue;
+    AArch64Insn insn;
+    bool ext = true;
+    int wantinv, shift;
 
     if (type == TCG_TYPE_I32) {
-        value = (uint32_t)value;
+        value = (uint32_t)svalue;
+        svalue = (int32_t)svalue;
+        ext = false;
+    } else if ((value & ~0xffffffffull) == 0) {
+        svalue = (int32_t)svalue;
+        ext = false;
+    }
+
+    /* Check small positive values.  */
+    if ((value & ~0xffffull) == 0) {
+        tcg_fmt_Rd_uimm_s(s, INSN_MOVZ, 0, rd, value, 0);
+        return;
+    }
+    /* Check small negative values.  */
+    if ((~svalue & 0xffffull) == 0) {
+        tcg_fmt_Rd_uimm_s(s, INSN_MOVN, ext, rd, ~svalue, 0);
+        return;
     }
+    /* Check for bitfield immediates.  */
+    if (is_limm(svalue)) {
+        tcg_fmt_Rdn_limm(s, INSN_ORRI, ext, rd, TCG_REG_XZR, svalue);
+        return;
+    }
+
+    /* Would it take fewer insns to begin with MOVN?  */
+    insn = INSN_MOVZ;
+    wantinv = 0;
+    wantinv += (value & 0x000000000000ffffull) == 0;
+    wantinv += (value & 0x00000000ffff0000ull) == 0;
+    wantinv += (value & 0x0000ffff00000000ull) == 0;
+    wantinv += (value & 0xffff000000000000ull) == 0;
+    wantinv = -wantinv;
+    wantinv += (~value & 0x000000000000ffffull) == 0;
+    wantinv += (~value & 0x00000000ffff0000ull) == 0;
+    wantinv += (~value & 0x0000ffff00000000ull) == 0;
+    wantinv += (~value & 0xffff000000000000ull) == 0;
+    if (wantinv > 0) {
+        insn = INSN_MOVN;
+        value = ~value;
+    }
+
+    shift = ctz64(value) & (63 & -16);
+    tcg_fmt_Rd_uimm_s(s, insn, ext, rd, value >> shift, shift);
+    value &= ~(0xffffUL << shift);
 
-    do {
-        unsigned shift = ctz64(value) & (63 & -16);
-        tcg_fmt_Rd_uimm_s(s, insn, shift >= 32, rd, value >> shift, shift);
+    while (value) {
+        shift = ctz64(value) & (63 & -16);
+        tcg_fmt_Rd_uimm_s(s, INSN_MOVK, ext, rd,
+                          (value ^ -(wantinv > 0)) >> shift, shift);
         value &= ~(0xffffUL << shift);
-        insn = INSN_MOVK;
-    } while (value);
+    }
 }
 
 static inline void tcg_out_ldst_r(TCGContext *s,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 21/29] tcg-aarch64: Avoid add with zero in tlb load
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (19 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 20/29] tcg-aarch64: Improve tcg_out_movi Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 22/29] tcg-aarch64: Use adrp in tcg_out_movi Richard Henderson
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Some guest env are small enough to reach the tlb with only a 12-bit addition.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 54f5ce8..ddf1ece 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1045,46 +1045,58 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
    slow path for the failure case, which will be patched later when finalizing
    the slow path. Generated code returns the host addend in X1,
    clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
-            int s_bits, uint8_t **label_ptr, int mem_index, int is_read)
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
+                             uint8_t **label_ptr, int mem_index, int is_read)
 {
     TCGReg base = TCG_AREG0;
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
         : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
+
     /* Extract the TLB index from the address into X0.
        X0<CPU_TLB_BITS:0> =
        addr_reg<TARGET_PAGE_BITS+CPU_TLB_BITS:TARGET_PAGE_BITS> */
     tcg_out_ubfm(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, addr_reg,
                  TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
+
     /* Store the page mask part of the address and the low s_bits into X3.
        Later this allows checking for equality and alignment at the same time.
        X3 = addr_reg & (PAGE_MASK | ((1 << s_bits) - 1)) */
     tcg_fmt_Rdn_limm(s, INSN_ANDI, TARGET_LONG_BITS == 64, TCG_REG_X3,
                      addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+
     /* Add any "high bits" from the tlb offset to the env address into X2,
        to take advantage of the LSL12 form of the ADDI instruction.
        X2 = env + (tlb_offset & 0xfff000) */
-    tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_X2, base, tlb_offset & 0xfff000);
+    if (tlb_offset & 0xfff000) {
+        tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_X2, base,
+                         tlb_offset & 0xfff000);
+        base = TCG_REG_X2;
+    }
+
     /* Merge the tlb index contribution into X2.
        X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
-    tcg_fmt_Rdnm_shift(s, INSN_ADD, 1, TCG_REG_X2, TCG_REG_X2,
+    tcg_fmt_Rdnm_shift(s, INSN_ADD, 1, TCG_REG_X2, base,
                        TCG_REG_X0, -CPU_TLB_ENTRY_BITS);
+
     /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
        X0 = load [X2 + (tlb_offset & 0x000fff)] */
     tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
                  LDST_LD, TCG_REG_X0, TCG_REG_X2,
                  (tlb_offset & 0xfff));
+
     /* Load the tlb addend. Do that early to avoid stalling.
        X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
     tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
                  (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
                  (is_read ? offsetof(CPUTLBEntry, addr_read)
                   : offsetof(CPUTLBEntry, addr_write)));
+
     /* Perform the address comparison. */
     tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0);
-    *label_ptr = s->code_ptr;
+
     /* If not equal, we jump to the slow path. */
+    *label_ptr = s->code_ptr;
     tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 22/29] tcg-aarch64: Use adrp in tcg_out_movi
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (20 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 21/29] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 23/29] tcg-aarch64: Pass return address to load/store helpers directly Richard Henderson
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Loading an qemu pointer as an immediate happens often.  E.g.

- exit_tb $0x7fa8140013
+ exit_tb $0x7f81ee0013
...
- :  d2800260        mov     x0, #0x13
- :  f2b50280        movk    x0, #0xa814, lsl #16
- :  f2c00fe0        movk    x0, #0x7f, lsl #32
+ :  90ff1000        adrp    x0, 0x7f81ee0000
+ :  91004c00        add     x0, x0, #0x13

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index ddf1ece..be74d2b 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -318,6 +318,10 @@ typedef enum {
     /* Conditional select instructions */
     INSN_CSEL  = 0x1a800000,
     INSN_CSINC = 0x1a800400,
+
+    /* PC relative addressing instructions */
+    INSN_ADR   = 0x10000000,
+    INSN_ADRP  = 0x90000000,
 } AArch64Insn;
 
 static inline enum aarch64_ldst_op_data
@@ -489,6 +493,12 @@ static inline void tcg_fmt_Rd_uimm_s(TCGContext *s, AArch64Insn insn, bool ext,
     tcg_out32(s, insn | ext << 31 | shift << 17 | half << 5 | rd);
 }
 
+static inline void tcg_fmt_Rd_disp21(TCGContext *s, AArch64Insn insn,
+                                     TCGReg rd, tcg_target_long disp)
+{
+    tcg_out32(s, insn | (disp & 3) << 29 | (disp & 0x1ffffc) << (5 - 2) | rd);
+}
+
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
@@ -566,6 +576,17 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
         return;
     }
 
+    /* Look for host pointer values within 4G of the PC.  This happens
+       often when loading pointers to QEMU's data structures.  */
+    svalue = (value >> 12) - ((intptr_t)s->code_ptr >> 12);
+    if (svalue == sextract64(svalue, 0, 21)) {
+        tcg_fmt_Rd_disp21(s, INSN_ADRP, rd, svalue);
+        if (value & 0xfff) {
+            tcg_fmt_Rdn_aimm(s, INSN_ADDI, ext, rd, rd, value & 0xfff);
+        }
+        return;
+    }
+
     /* Would it take fewer insns to begin with MOVN?  */
     insn = INSN_MOVZ;
     wantinv = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 23/29] tcg-aarch64: Pass return address to load/store helpers directly.
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (21 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 22/29] tcg-aarch64: Use adrp in tcg_out_movi Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 24/29] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/exec-all.h  | 18 ------------------
 tcg/aarch64/tcg-target.c | 44 ++++++++++++++++++++++++++------------------
 2 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 7510246..9a3ec05 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -320,24 +320,6 @@ extern uintptr_t tci_tb_ptr;
 
 #define GETPC()  (GETRA() - GETPC_ADJ)
 
-/* The LDST optimizations splits code generation into fast and slow path.
-   In some implementations, we pass the "logical" return address manually;
-   in others, we must infer the logical return from the true return.  */
-#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU)
-# if defined(__aarch64__)
-#  define GETRA_LDST(RA)  tcg_getra_ldst(RA)
-static inline uintptr_t tcg_getra_ldst(uintptr_t ra)
-{
-    int32_t b;
-    ra += 4;                    /* skip one instruction */
-    b = *(int32_t *)ra;         /* load the branch insn */
-    b = (b << 6) >> (6 - 2);    /* extract the displacement */
-    ra += b;                    /* apply the displacement  */
-    return ra;
-}
-# endif
-#endif /* CONFIG_QEMU_LDST_OPTIMIZATION */
-
 /* ??? Delete these once they are no longer used.  */
 bool is_tcg_gen_code(uintptr_t pc_ptr);
 #ifdef GETRA_LDST
diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index be74d2b..1d0db02 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -968,39 +968,46 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
     }
 }
 
-static inline void tcg_out_nop(TCGContext *s)
-{
-    tcg_out32(s, 0xd503201f);
-}
-
 #ifdef CONFIG_SOFTMMU
-/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
-   int mmu_idx) */
+/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
+ *                                     int mmu_idx, uintptr_t ra)
+ */
 static const void * const qemu_ld_helpers[4] = {
-    helper_ldb_mmu,
-    helper_ldw_mmu,
-    helper_ldl_mmu,
-    helper_ldq_mmu,
+    helper_ret_ldub_mmu,
+    helper_ret_lduw_mmu,
+    helper_ret_ldul_mmu,
+    helper_ret_ldq_mmu,
 };
 
-/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
-   uintxx_t val, int mmu_idx) */
+/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
+ *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
+ */
 static const void * const qemu_st_helpers[4] = {
-    helper_stb_mmu,
-    helper_stw_mmu,
-    helper_stl_mmu,
-    helper_stq_mmu,
+    helper_ret_stb_mmu,
+    helper_ret_stw_mmu,
+    helper_ret_stl_mmu,
+    helper_ret_stq_mmu,
 };
 
+static inline void tcg_out_adr(TCGContext *s, TCGReg rd, tcg_target_long addr)
+{
+    addr -= (tcg_target_long)s->code_ptr;
+    assert(addr == sextract64(addr, 0, 21));
+    tcg_fmt_Rd_disp21(s, INSN_ADR, rd, addr);
+}
+
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
     reloc_pc19(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
     tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
     tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
+    tcg_out_adr(s, TCG_REG_X3, (uintptr_t)lb->raddr);
+
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
                  (tcg_target_long)qemu_ld_helpers[lb->opc & 3]);
     tcg_out_callr(s, TCG_REG_TMP);
+
     if (lb->opc & 0x04) {
         tcg_out_sxt(s, 1, lb->opc & 3, lb->datalo_reg, TCG_REG_X0);
     } else {
@@ -1018,11 +1025,12 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movr(s, 1, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
+    tcg_out_adr(s, TCG_REG_X4, (uintptr_t)lb->raddr);
+
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
                  (tcg_target_long)qemu_st_helpers[lb->opc & 3]);
     tcg_out_callr(s, TCG_REG_TMP);
 
-    tcg_out_nop(s);
     tcg_out_goto(s, (tcg_target_long)lb->raddr);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 24/29] tcg-aarch64: Use tcg_out_call for qemu_ld/st
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (22 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 23/29] tcg-aarch64: Pass return address to load/store helpers directly Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 25/29] tcg-aarch64: Use symbolic names for branches Richard Henderson
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

In some cases, a direct branch will be in range.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 1d0db02..42edf9e 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1004,9 +1004,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X3, (uintptr_t)lb->raddr);
 
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
-                 (tcg_target_long)qemu_ld_helpers[lb->opc & 3]);
-    tcg_out_callr(s, TCG_REG_TMP);
+    tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[lb->opc & 3]);
 
     if (lb->opc & 0x04) {
         tcg_out_sxt(s, 1, lb->opc & 3, lb->datalo_reg, TCG_REG_X0);
@@ -1027,9 +1025,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X4, (uintptr_t)lb->raddr);
 
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
-                 (tcg_target_long)qemu_st_helpers[lb->opc & 3]);
-    tcg_out_callr(s, TCG_REG_TMP);
+    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[lb->opc & 3]);
 
     tcg_out_goto(s, (tcg_target_long)lb->raddr);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 25/29] tcg-aarch64: Use symbolic names for branches
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (23 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 24/29] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
@ 2013-09-02 17:54 ` Richard Henderson
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 26/29] tcg-aarch64: Implement tcg_register_jit Richard Henderson
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 48 ++++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 42edf9e..385d97a 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -322,6 +322,14 @@ typedef enum {
     /* PC relative addressing instructions */
     INSN_ADR   = 0x10000000,
     INSN_ADRP  = 0x90000000,
+
+    /* Branch instructions */
+    INSN_B     = 0x14000000,
+    INSN_BL    = 0x94000000,
+    INSN_BR    = 0xd61f0000,
+    INSN_BLR   = 0xd63f0000,
+    INSN_RET   = 0xd65f0000,
+    INSN_B_C   = 0x54000000,
 } AArch64Insn;
 
 static inline enum aarch64_ldst_op_data
@@ -769,15 +777,14 @@ static void tcg_out_cmp(TCGContext *s, bool ext, TCGReg a,
 
 static inline void tcg_out_goto(TCGContext *s, tcg_target_long target)
 {
-    tcg_target_long offset;
-    offset = (target - (tcg_target_long)s->code_ptr) / 4;
+    tcg_target_long offset = (target - (tcg_target_long)s->code_ptr) / 4;
 
     if (offset < -0x02000000 || offset >= 0x02000000) {
         /* out of 26bit range */
         tcg_abort();
     }
 
-    tcg_out32(s, 0x14000000 | (offset & 0x03ffffff));
+    tcg_out32(s, INSN_B | (offset & 0x03ffffff));
 }
 
 static inline void tcg_out_goto_noaddr(TCGContext *s)
@@ -787,25 +794,21 @@ static inline void tcg_out_goto_noaddr(TCGContext *s)
        kept coherent during retranslation.
        Mask away possible garbage in the high bits for the first translation,
        while keeping the offset bits for retranslation. */
-    uint32_t insn;
-    insn = (tcg_in32(s) & 0x03ffffff) | 0x14000000;
-    tcg_out32(s, insn);
+    uint32_t old = tcg_in32(s) & 0x03ffffff;
+    tcg_out32(s, INSN_B | old);
 }
 
 static inline void tcg_out_goto_cond_noaddr(TCGContext *s, TCGCond c)
 {
-    /* see comments in tcg_out_goto_noaddr */
-    uint32_t insn;
-    insn = tcg_in32(s) & (0x07ffff << 5);
-    insn |= 0x54000000 | tcg_cond_to_aarch64[c];
-    tcg_out32(s, insn);
+    /* See comments in tcg_out_goto_noaddr.  */
+    uint32_t old = tcg_in32(s) & (0x07ffff << 5);
+    tcg_out32(s, INSN_B_C | tcg_cond_to_aarch64[c] | old);
 }
 
 static inline void tcg_out_goto_cond(TCGContext *s, TCGCond c,
                                      tcg_target_long target)
 {
-    tcg_target_long offset;
-    offset = (target - (tcg_target_long)s->code_ptr) / 4;
+    tcg_target_long offset = (target - (tcg_target_long)s->code_ptr) / 4;
 
     if (offset < -0x40000 || offset >= 0x40000) {
         /* out of 19bit range */
@@ -813,37 +816,34 @@ static inline void tcg_out_goto_cond(TCGContext *s, TCGCond c,
     }
 
     offset &= 0x7ffff;
-    tcg_out32(s, 0x54000000 | tcg_cond_to_aarch64[c] | offset << 5);
+    tcg_out32(s, INSN_B_C | tcg_cond_to_aarch64[c] | offset << 5);
 }
 
 static inline void tcg_out_callr(TCGContext *s, TCGReg reg)
 {
-    tcg_out32(s, 0xd63f0000 | reg << 5);
+    tcg_out32(s, INSN_BLR | reg << 5);
 }
 
 static inline void tcg_out_gotor(TCGContext *s, TCGReg reg)
 {
-    tcg_out32(s, 0xd61f0000 | reg << 5);
+    tcg_out32(s, INSN_BR | reg << 5);
 }
 
 static inline void tcg_out_call(TCGContext *s, tcg_target_long target)
 {
-    tcg_target_long offset;
-
-    offset = (target - (tcg_target_long)s->code_ptr) / 4;
+    tcg_target_long offset = (target - (tcg_target_long)s->code_ptr) / 4;
 
     if (offset < -0x02000000 || offset >= 0x02000000) { /* out of 26bit rng */
         tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, target);
         tcg_out_callr(s, TCG_REG_TMP);
     } else {
-        tcg_out32(s, 0x94000000 | (offset & 0x03ffffff));
+        tcg_out32(s, INSN_BL | (offset & 0x03ffffff));
     }
 }
 
-static inline void tcg_out_ret(TCGContext *s)
+static inline void tcg_out_ret(TCGContext *s, TCGReg rn)
 {
-    /* emit RET { LR } */
-    tcg_out32(s, 0xd65f03c0);
+    tcg_out32(s, INSN_RET | rn << 5);
 }
 
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
@@ -1923,5 +1923,5 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* pop (FP, LR), restore SP to previous frame, return */
     tcg_out_pop_pair(s, TCG_REG_SP,
                      TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
-    tcg_out_ret(s);
+    tcg_out_ret(s, TCG_REG_LR);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 26/29] tcg-aarch64: Implement tcg_register_jit
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (24 preceding siblings ...)
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 25/29] tcg-aarch64: Use symbolic names for branches Richard Henderson
@ 2013-09-02 17:55 ` Richard Henderson
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 27/29] tcg-aarch64: Reuse FP and LR in translated code Richard Henderson
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 93 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 70 insertions(+), 23 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 385d97a..be51d97 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1860,30 +1860,27 @@ static void tcg_target_init(TCGContext *s)
     tcg_add_target_add_op_defs(aarch64_op_defs);
 }
 
+/* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
+#define PUSH_SIZE  ((30 - 19 + 1) * 8)
+
+#define FRAME_SIZE \
+    ((PUSH_SIZE \
+      + TCG_STATIC_CALL_ARGS_SIZE \
+      + CPU_TEMP_BUF_NLONGS * sizeof(long) \
+      + TCG_TARGET_STACK_ALIGN - 1) \
+     & ~(TCG_TARGET_STACK_ALIGN - 1))
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    /* NB: frame sizes are in 16 byte stack units! */
-    int frame_size_callee_saved, frame_size_tcg_locals;
     TCGReg r;
 
-    /* save pairs             (FP, LR) and (X19, X20) .. (X27, X28) */
-    frame_size_callee_saved = (1) + (TCG_REG_X28 - TCG_REG_X19) / 2 + 1;
-
-    /* frame size requirement for TCG local variables */
-    frame_size_tcg_locals = TCG_STATIC_CALL_ARGS_SIZE
-        + CPU_TEMP_BUF_NLONGS * sizeof(long)
-        + (TCG_TARGET_STACK_ALIGN - 1);
-    frame_size_tcg_locals &= ~(TCG_TARGET_STACK_ALIGN - 1);
-    frame_size_tcg_locals /= TCG_TARGET_STACK_ALIGN;
-
-    /* push (FP, LR) and update sp */
-    tcg_out_push_pair(s, TCG_REG_SP,
-                      TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
+    /* Push (FP, LR) and allocate space for all saved registers.  */
+    tcg_out_push_pair(s, TCG_REG_SP, TCG_REG_FP, TCG_REG_LR, PUSH_SIZE / 16);
 
     /* FP -> callee_saved */
     tcg_out_movr_sp(s, 1, TCG_REG_FP, TCG_REG_SP);
 
-    /* store callee-preserved regs x19..x28 using FP -> callee_saved */
+    /* Store callee-preserved regs x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
         int idx = (r - TCG_REG_X19) / 2 + 1;
         tcg_out_store_pair(s, TCG_REG_FP, r, r + 1, idx);
@@ -1891,7 +1888,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Make stack space for TCG locals.  */
     tcg_fmt_Rdn_aimm(s, INSN_SUBI, 1, TCG_REG_SP, TCG_REG_SP,
-                     frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+                     FRAME_SIZE - PUSH_SIZE);
 
     /* inform TCG about how to find TCG locals with register, offset, size */
     tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE,
@@ -1911,17 +1908,67 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Remove TCG locals stack space.  */
     tcg_fmt_Rdn_aimm(s, INSN_ADDI, 1, TCG_REG_SP, TCG_REG_SP,
-                     frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+                     FRAME_SIZE - PUSH_SIZE);
 
-    /* restore registers x19..x28.
-       FP must be preserved, so it still points to callee_saved area */
+    /* Restore callee-preserved registers x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
         int idx = (r - TCG_REG_X19) / 2 + 1;
         tcg_out_load_pair(s, TCG_REG_FP, r, r + 1, idx);
     }
 
-    /* pop (FP, LR), restore SP to previous frame, return */
-    tcg_out_pop_pair(s, TCG_REG_SP,
-                     TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
+    /* Pop (FP, LR), restore SP to previous frame, return.  */
+    tcg_out_pop_pair(s, TCG_REG_SP, TCG_REG_FP, TCG_REG_LR, PUSH_SIZE / 16);
     tcg_out_ret(s, TCG_REG_LR);
 }
+
+typedef struct {
+    DebugFrameCIE cie;
+    DebugFrameFDEHeader fde;
+    uint8_t fde_def_cfa[4];
+    uint8_t fde_reg_ofs[24];
+} DebugFrame;
+
+/* We're expecting a 2 byte uleb128 encoded value.  */
+QEMU_BUILD_BUG_ON(FRAME_SIZE >= (1 << 14));
+
+#define ELF_HOST_MACHINE EM_AARCH64
+
+static DebugFrame debug_frame = {
+    .cie.len = sizeof(DebugFrameCIE)-4, /* length after .len member */
+    .cie.id = -1,
+    .cie.version = 1,
+    .cie.code_align = 1,
+    .cie.data_align = 0x78,             /* sleb128 -8 */
+    .cie.return_column = 16,
+
+    /* Total FDE size does not include the "len" member.  */
+    .fde.len = sizeof(DebugFrame) - offsetof(DebugFrame, fde.cie_offset),
+
+    .fde_def_cfa = {
+        12, 7,                          /* DW_CFA_def_cfa %rsp, ... */
+        (FRAME_SIZE & 0x7f) | 0x80,     /* ... uleb128 FRAME_SIZE */
+        (FRAME_SIZE >> 7)
+    },
+    .fde_reg_ofs = {
+        0x80 + 30, 1,                   /* DW_CFA_offset,  lr,  -8 */
+        0x80 + 29, 2,                   /* DW_CFA_offset,  fp, -16 */
+        0x80 + 28, 3,                   /* DW_CFA_offset, x28, -24 */
+        0x80 + 27, 4,                   /* DW_CFA_offset, x27, -32 */
+        0x80 + 26, 5,                   /* DW_CFA_offset, x26, -40 */
+        0x80 + 25, 6,                   /* DW_CFA_offset, x25, -48 */
+        0x80 + 24, 7,                   /* DW_CFA_offset, x24, -56 */
+        0x80 + 23, 8,                   /* DW_CFA_offset, x23, -64 */
+        0x80 + 22, 9,                   /* DW_CFA_offset, x22, -72 */
+        0x80 + 21, 10,                  /* DW_CFA_offset, x21, -80 */
+        0x80 + 20, 11,                  /* DW_CFA_offset, x20, -88 */
+        0x80 + 19, 12,                  /* DW_CFA_offset, x1p, -96 */
+    }
+};
+
+void tcg_register_jit(void *buf, size_t buf_size)
+{
+    debug_frame.fde.func_start = (tcg_target_long) buf;
+    debug_frame.fde.func_len = buf_size;
+
+    tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 27/29] tcg-aarch64: Reuse FP and LR in translated code
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (25 preceding siblings ...)
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 26/29] tcg-aarch64: Implement tcg_register_jit Richard Henderson
@ 2013-09-02 17:55 ` Richard Henderson
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 28/29] tcg-aarch64: Introduce tcg_out_ldst_pair Richard Henderson
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

We don't need the FP within translated code, and the LR is
otherwise unused.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 36 +++++++++++++++---------------------
 tcg/aarch64/tcg-target.h | 32 +++++++++++++++++---------------
 2 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index be51d97..f59809c 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -17,10 +17,7 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7",
     "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15",
     "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23",
-    "%x24", "%x25", "%x26", "%x27", "%x28",
-    "%fp", /* frame pointer */
-    "%lr", /* link register */
-    "%sp",  /* stack pointer */
+    "%x24", "%x25", "%x26", "%x27", "%x28", "%x29", "%x30", "%sp",
 };
 #endif /* NDEBUG */
 
@@ -33,18 +30,19 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
     TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
-    TCG_REG_X28, /* we will reserve this for GUEST_BASE if configured */
+    TCG_REG_X28,
+    TCG_REG_X29, /* maybe used for TCG_REG_GUEST_BASE */
 
-    TCG_REG_X9, TCG_REG_X10, TCG_REG_X11, TCG_REG_X12,
-    TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
+    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
+    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
     TCG_REG_X16, TCG_REG_X17,
 
-    TCG_REG_X18, TCG_REG_X19, /* will not use these, see tcg_target_init */
-
     TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
     TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
 
-    TCG_REG_X8, /* will not use, see tcg_target_init */
+    /* X18 reserved by system */
+    /* X19 reserved for AREG0 */
+    /* X30 reserved as temporary */
 };
 
 static const int tcg_target_call_iarg_regs[8] = {
@@ -55,13 +53,13 @@ static const int tcg_target_call_oarg_regs[1] = {
     TCG_REG_X0
 };
 
-#define TCG_REG_TMP TCG_REG_X8
+#define TCG_REG_TMP TCG_REG_X30
 
 #ifndef CONFIG_SOFTMMU
-# if defined(CONFIG_USE_GUEST_BASE)
-# define TCG_REG_GUEST_BASE TCG_REG_X28
+# ifdef CONFIG_USE_GUEST_BASE
+#  define TCG_REG_GUEST_BASE TCG_REG_X29
 # else
-# define TCG_REG_GUEST_BASE TCG_REG_XZR
+#  define TCG_REG_GUEST_BASE TCG_REG_XZR
 # endif
 #endif
 
@@ -1849,11 +1847,10 @@ static void tcg_target_init(TCGContext *s)
                      (1 << TCG_REG_X12) | (1 << TCG_REG_X13) |
                      (1 << TCG_REG_X14) | (1 << TCG_REG_X15) |
                      (1 << TCG_REG_X16) | (1 << TCG_REG_X17) |
-                     (1 << TCG_REG_X18));
+                     (1 << TCG_REG_X18) | (1 << TCG_REG_X30));
 
     tcg_regset_clear(s->reserved_regs);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
 
@@ -1877,13 +1874,10 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* Push (FP, LR) and allocate space for all saved registers.  */
     tcg_out_push_pair(s, TCG_REG_SP, TCG_REG_FP, TCG_REG_LR, PUSH_SIZE / 16);
 
-    /* FP -> callee_saved */
-    tcg_out_movr_sp(s, 1, TCG_REG_FP, TCG_REG_SP);
-
     /* Store callee-preserved regs x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
         int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_store_pair(s, TCG_REG_FP, r, r + 1, idx);
+        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
     }
 
     /* Make stack space for TCG locals.  */
@@ -1913,7 +1907,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* Restore callee-preserved registers x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
         int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_load_pair(s, TCG_REG_FP, r, r + 1, idx);
+        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
     }
 
     /* Pop (FP, LR), restore SP to previous frame, return.  */
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index bf72e62..04f3870 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -17,17 +17,23 @@
 #undef TCG_TARGET_STACK_GROWSUP
 
 typedef enum {
-    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3, TCG_REG_X4,
-    TCG_REG_X5, TCG_REG_X6, TCG_REG_X7, TCG_REG_X8, TCG_REG_X9,
-    TCG_REG_X10, TCG_REG_X11, TCG_REG_X12, TCG_REG_X13, TCG_REG_X14,
-    TCG_REG_X15, TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
-    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23, TCG_REG_X24,
-    TCG_REG_X25, TCG_REG_X26, TCG_REG_X27, TCG_REG_X28,
-    TCG_REG_FP,  /* frame pointer */
-    TCG_REG_LR, /* link register */
-    TCG_REG_SP,  /* stack pointer or zero register */
-    TCG_REG_XZR = TCG_REG_SP /* same register number */
-    /* program counter is not directly accessible! */
+    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
+    TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
+    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
+    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
+    TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
+    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
+    TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
+    TCG_REG_X28, TCG_REG_X29, TCG_REG_X30,
+
+    /* X31 is either the stack pointer or zero, depending on context.  */
+    TCG_REG_SP = 31,
+    TCG_REG_XZR = 31,
+
+    /* Aliases.  */
+    TCG_REG_FP = TCG_REG_X29,
+    TCG_REG_LR = TCG_REG_X30,
+    TCG_AREG0  = TCG_REG_X19,
 } TCGReg;
 
 #define TCG_TARGET_NB_REGS 32
@@ -92,10 +98,6 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
-enum {
-    TCG_AREG0 = TCG_REG_X19,
-};
-
 static inline void flush_icache_range(tcg_target_ulong start,
                                       tcg_target_ulong stop)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 28/29] tcg-aarch64: Introduce tcg_out_ldst_pair
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (26 preceding siblings ...)
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 27/29] tcg-aarch64: Reuse FP and LR in translated code Richard Henderson
@ 2013-09-02 17:55 ` Richard Henderson
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 29/29] tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check Richard Henderson
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Combines 4 other inline functions and tidies the prologue.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 84 ++++++++++++++++--------------------------------
 1 file changed, 27 insertions(+), 57 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index f59809c..3a68821 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -328,6 +328,10 @@ typedef enum {
     INSN_BLR   = 0xd63f0000,
     INSN_RET   = 0xd65f0000,
     INSN_B_C   = 0x54000000,
+
+    /* Load/store instructions */
+    INSN_LDP   = 0x28400000,
+    INSN_STP   = 0x28000000,
 } AArch64Insn;
 
 static inline enum aarch64_ldst_op_data
@@ -1261,56 +1265,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 
 static uint8_t *tb_ret_addr;
 
-/* callee stack use example:
-   stp     x29, x30, [sp,#-32]!
-   mov     x29, sp
-   stp     x1, x2, [sp,#16]
-   ...
-   ldp     x1, x2, [sp,#16]
-   ldp     x29, x30, [sp],#32
-   ret
-*/
-
-/* push r1 and r2, and alloc stack space for a total of
-   alloc_n elements (1 element=16 bytes, must be between 1 and 31. */
-static inline void tcg_out_push_pair(TCGContext *s, TCGReg addr,
-                                     TCGReg r1, TCGReg r2, int alloc_n)
-{
-    /* using indexed scaled simm7 STP 0x28800000 | (ext) | 0x01000000 (pre-idx)
-       | alloc_n * (-1) << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(alloc_n > 0 && alloc_n < 0x20);
-    alloc_n = (-alloc_n) & 0x3f;
-    tcg_out32(s, 0xa9800000 | alloc_n << 16 | r2 << 10 | addr << 5 | r1);
-}
-
-/* dealloc stack space for a total of alloc_n elements and pop r1, r2.  */
-static inline void tcg_out_pop_pair(TCGContext *s, TCGReg addr,
-                                    TCGReg r1, TCGReg r2, int alloc_n)
-{
-    /* using indexed scaled simm7 LDP 0x28c00000 | (ext) | nothing (post-idx)
-       | alloc_n << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(alloc_n > 0 && alloc_n < 0x20);
-    tcg_out32(s, 0xa8c00000 | alloc_n << 16 | r2 << 10 | addr << 5 | r1);
-}
-
-static inline void tcg_out_store_pair(TCGContext *s, TCGReg addr,
-                                      TCGReg r1, TCGReg r2, int idx)
-{
-    /* using register pair offset simm7 STP 0x29000000 | (ext)
-       | idx << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(idx > 0 && idx < 0x20);
-    tcg_out32(s, 0xa9000000 | idx << 16 | r2 << 10 | addr << 5 | r1);
-}
-
-static inline void tcg_out_load_pair(TCGContext *s, TCGReg addr,
-                                     TCGReg r1, TCGReg r2, int idx)
-{
-    /* using register pair offset simm7 LDP 0x29400000 | (ext)
-       | idx << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(idx > 0 && idx < 0x20);
-    tcg_out32(s, 0xa9400000 | idx << 16 | r2 << 10 | addr << 5 | r1);
-}
-
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const TCGArg args[TCG_MAX_OP_ARGS],
                        const int const_args[TCG_MAX_OP_ARGS])
@@ -1867,17 +1821,32 @@ static void tcg_target_init(TCGContext *s)
       + TCG_TARGET_STACK_ALIGN - 1) \
      & ~(TCG_TARGET_STACK_ALIGN - 1))
 
+static void tcg_out_ldst_pair(TCGContext *s, AArch64Insn insn,
+                              TCGReg r1, TCGReg r2, TCGReg base,
+                              tcg_target_long ofs, bool pre, bool w)
+{
+    insn |= 1u << 31; /* ext */
+    insn |= pre << 24;
+    insn |= w << 23;
+
+    assert(ofs >= -0x200 && ofs < 0x200 && (ofs & 7) == 0);
+    insn |= (ofs & (0x7f << 3)) << (15 - 3);
+
+    tcg_out32(s, insn | r2 << 10 | base << 5 | r1);
+}
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
     TCGReg r;
 
     /* Push (FP, LR) and allocate space for all saved registers.  */
-    tcg_out_push_pair(s, TCG_REG_SP, TCG_REG_FP, TCG_REG_LR, PUSH_SIZE / 16);
+    tcg_out_ldst_pair(s, INSN_STP, TCG_REG_FP, TCG_REG_LR,
+                      TCG_REG_SP, -PUSH_SIZE, 1, 1);
 
     /* Store callee-preserved regs x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
-        int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
+        int ofs = (r - TCG_REG_X19 + 2) * 8;
+        tcg_out_ldst_pair(s, INSN_STP, r, r + 1, TCG_REG_SP, ofs, 1, 0);
     }
 
     /* Make stack space for TCG locals.  */
@@ -1906,12 +1875,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Restore callee-preserved registers x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
-        int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
+        int ofs = (r - TCG_REG_X19 + 2) * 8;
+        tcg_out_ldst_pair(s, INSN_LDP, r, r + 1, TCG_REG_SP, ofs, 1, 0);
     }
 
-    /* Pop (FP, LR), restore SP to previous frame, return.  */
-    tcg_out_pop_pair(s, TCG_REG_SP, TCG_REG_FP, TCG_REG_LR, PUSH_SIZE / 16);
+    /* Pop (FP, LR), restore SP to previous frame.  */
+    tcg_out_ldst_pair(s, INSN_LDP, TCG_REG_FP, TCG_REG_LR,
+                      TCG_REG_SP, PUSH_SIZE, 0, 1);
     tcg_out_ret(s, TCG_REG_LR);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 29/29] tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (27 preceding siblings ...)
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 28/29] tcg-aarch64: Introduce tcg_out_ldst_pair Richard Henderson
@ 2013-09-02 17:55 ` Richard Henderson
  2013-09-03  7:37 ` [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard W.M. Jones
  2013-09-09  8:13 ` Claudio Fontana
  30 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, Richard Henderson

Removed from other targets in 56bbc2f967ce185fa1c5c39e1aeb5b68b26242e9.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 3a68821..1bf609c 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1782,12 +1782,6 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 
 static void tcg_target_init(TCGContext *s)
 {
-#if !defined(CONFIG_USER_ONLY)
-    /* fail safe */
-    if ((1ULL << CPU_TLB_ENTRY_BITS) != sizeof(CPUTLBEntry)) {
-        tcg_abort();
-    }
-#endif
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffffffff);
     tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (28 preceding siblings ...)
  2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 29/29] tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check Richard Henderson
@ 2013-09-03  7:37 ` Richard W.M. Jones
  2013-09-03  7:42   ` Laurent Desnogues
  2013-09-03  8:00   ` Peter Maydell
  2013-09-09  8:13 ` Claudio Fontana
  30 siblings, 2 replies; 59+ messages in thread
From: Richard W.M. Jones @ 2013-09-03  7:37 UTC (permalink / raw)
  To: Richard Henderson, Peter Maydell; +Cc: claudio.fontana, qemu-devel

On Mon, Sep 02, 2013 at 10:54:34AM -0700, Richard Henderson wrote:
> I'm not sure if I posted v2 or not, but my branch is named -3,
> therefore this is v3.  ;-)
> 
> The jumbo "fixme" patch from v1 has been split up.  This has been
> updated for the changes in the tlb helpers over the past few weeks.
> For the benefit of trivial conflict resolution, it's relative to a
> tree that contains basically all of my patches.
> 
> See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
> you find yourself missing any of the dependencies.

Is there a way yet to compile and run a 'qemu-system-aarch64'? [on a
regular x86-64 host]

I tried your git branch above and Peter's v5 patch posted a while back
(which doesn't cleanly apply), but I don't seem to have the right
combination of bits to make a working binary.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-03  7:37 ` [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard W.M. Jones
@ 2013-09-03  7:42   ` Laurent Desnogues
  2013-09-03  8:00   ` Peter Maydell
  1 sibling, 0 replies; 59+ messages in thread
From: Laurent Desnogues @ 2013-09-03  7:42 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Peter Maydell, Claudio Fontana, qemu-devel, Richard Henderson

On Tue, Sep 3, 2013 at 9:37 AM, Richard W.M. Jones <rjones@redhat.com> wrote:
> On Mon, Sep 02, 2013 at 10:54:34AM -0700, Richard Henderson wrote:
>> I'm not sure if I posted v2 or not, but my branch is named -3,
>> therefore this is v3.  ;-)
>>
>> The jumbo "fixme" patch from v1 has been split up.  This has been
>> updated for the changes in the tlb helpers over the past few weeks.
>> For the benefit of trivial conflict resolution, it's relative to a
>> tree that contains basically all of my patches.
>>
>> See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
>> you find yourself missing any of the dependencies.
>
> Is there a way yet to compile and run a 'qemu-system-aarch64'? [on a
> regular x86-64 host]

The current public work is only to run QEMU on Aarch64 host, not
Aarch64 on other hosts ;-)

> I tried your git branch above and Peter's v5 patch posted a while back
> (which doesn't cleanly apply), but I don't seem to have the right
> combination of bits to make a working binary.

You'll need a cross-compiler or ARM foundation model.


Laurent

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-03  7:37 ` [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard W.M. Jones
  2013-09-03  7:42   ` Laurent Desnogues
@ 2013-09-03  8:00   ` Peter Maydell
  1 sibling, 0 replies; 59+ messages in thread
From: Peter Maydell @ 2013-09-03  8:00 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Claudio Fontana, QEMU Developers, Richard Henderson

On 3 September 2013 08:37, Richard W.M. Jones <rjones@redhat.com> wrote:
> Is there a way yet to compile and run a 'qemu-system-aarch64'? [on a
> regular x86-64 host]

The code for this has not yet been written :-)
The patchset I posted will build a qemu-system-aarch64 but
with no actual 64 bit CPUs (you can run all the 32 bit CPUs
if you like). It's foundational work for doing the system emulation
on and also for the linux-user 64 bit emulation which Alex is doing.

As Laurent says, don't confuse this with the tcg-aarch64 code
in tree, which is for emulating MIPS/x86/etc on aarch64 hosts.

> I tried your git branch above and Peter's v5 patch posted a while back
> (which doesn't cleanly apply)

Try the git branch I mention in the cover letter (or its followup),
which I've been rebasing. Or you could wait a day or two for v6.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s Richard Henderson
@ 2013-09-05 13:32   ` Claudio Fontana
  2013-09-05 15:41     ` Richard Henderson
  0 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-05 13:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

Hello Richard,

thanks for your prolific work. Few comments below for starters:

On 02.09.2013 19:54, Richard Henderson wrote:
> Cleaning up the implementation of tcg_out_movi at the same time.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 48 ++++++++++++++++++++++--------------------------
>  1 file changed, 22 insertions(+), 26 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 09ccd67..59e5026 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -274,6 +274,11 @@ typedef enum {
>      INSN_EOR   = 0x4a000000,
>      INSN_EON   = 0x4a200000,
>  
> +    /* Move wide immediate instructions */
> +    INSN_MOVN  = 0x12800000,
> +    INSN_MOVZ  = 0x52800000,
> +    INSN_MOVK  = 0x72800000,
> +
>      /* Add/subtract immediate instructions */
>      INSN_ADDI  = 0x11000000,
>      INSN_ADDSI = 0x31000000,
> @@ -478,6 +483,12 @@ static inline void tcg_fmt_Rdnm_cond(TCGContext *s, AArch64Insn insn,
>                | tcg_cond_to_aarch64[c] << 12);
>  }
>  
> +static inline void tcg_fmt_Rd_uimm_s(TCGContext *s, AArch64Insn insn, bool ext,
> +                                     TCGReg rd, uint16_t half, unsigned shift)
> +{
> +    tcg_out32(s, insn | ext << 31 | shift << 17 | half << 5 | rd);
> +}
> +
>  static inline void tcg_out_ldst_9(TCGContext *s,
>                                    enum aarch64_ldst_op_data op_data,
>                                    enum aarch64_ldst_op_type op_type,
> @@ -522,38 +533,23 @@ static inline void tcg_out_movr_sp(TCGContext *s, bool ext,
>      tcg_fmt_Rdn_aimm(s, INSN_ADDI, ext, rd, rn, 0);
>  }
>  
> -static inline void tcg_out_movi_aux(TCGContext *s,
> -                                    TCGReg rd, uint64_t value)
> +static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
> +                         tcg_target_long value)
>  {
> -    uint32_t half, base, shift, movk = 0;
> -    /* construct halfwords of the immediate with MOVZ/MOVK with LSL */
> -    /* using MOVZ 0x52800000 | extended reg.. */
> -    base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000;
> -    /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
> -       first MOVZ with the half-word immediate skipping the zeros, with a shift
> -       (LSL) equal to this number. Then morph all next instructions into MOVKs.
> -       Zero the processed half-word in the value, continue until empty.
> -       We build the final result 16bits at a time with up to 4 instructions,
> -       but do not emit instructions for 16bit zero holes. */

Please do not remove these comments.
In my judgement this part of the code profits from some verbose clarification.
What is happening might be obvious to you, but not to others trying to step in.

> +    AArch64Insn insn = INSN_MOVZ;
> +
> +    if (type == TCG_TYPE_I32) {
> +        value = (uint32_t)value;
> +    }
> +
>      do {
> -        shift = ctz64(value) & (63 & -16);
> -        half = (value >> shift) & 0xffff;
> -        tcg_out32(s, base | movk | shift << 17 | half << 5 | rd);
> -        movk = 0x20000000; /* morph next MOVZs into MOVKs */
> +        unsigned shift = ctz64(value) & (63 & -16);
> +        tcg_fmt_Rd_uimm_s(s, insn, shift >= 32, rd, value >> shift, shift);
>          value &= ~(0xffffUL << shift);
> +        insn = INSN_MOVK;
>      } while (value);
>  }
>  
> -static inline void tcg_out_movi(TCGContext *s, TCGType type,
> -                                TCGReg rd, tcg_target_long value)
> -{
> -    if (type == TCG_TYPE_I64) {
> -        tcg_out_movi_aux(s, rd, value);
> -    } else {
> -        tcg_out_movi_aux(s, rd, value & 0xffffffff);
> -    }
> -}
> -
>  static inline void tcg_out_ldst_r(TCGContext *s,
>                                    enum aarch64_ldst_op_data op_data,
>                                    enum aarch64_ldst_op_type op_type,
> 


Note that the movi change you introduce with the combination of patches 19 and 20 is not correct, breaks all targets I tried.
I will dig in the details tomorrow commenting patch 20.

In general I'd prefer to keep movi as it was (functionally-wise) for the time being, replacing it with a more efficient version once we can get some numbers (which will be soon) with which to justify (or not) the added code complexity.

But using the INSN_* you introduced instead of inline numbers is of course fine for me.

Claudio

-- 
Claudio Fontana
Server OS Architect
Huawei Technologies Duesseldorf GmbH

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
  2013-09-05 13:32   ` Claudio Fontana
@ 2013-09-05 15:41     ` Richard Henderson
  2013-09-06  9:06       ` Claudio Fontana
  0 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-05 15:41 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel

On 09/05/2013 06:32 AM, Claudio Fontana wrote:
>>  {
>> -    uint32_t half, base, shift, movk = 0;
>> -    /* construct halfwords of the immediate with MOVZ/MOVK with LSL */
>> -    /* using MOVZ 0x52800000 | extended reg.. */
>> -    base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000;
>> -    /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
>> -       first MOVZ with the half-word immediate skipping the zeros, with a shift
>> -       (LSL) equal to this number. Then morph all next instructions into MOVKs.
>> -       Zero the processed half-word in the value, continue until empty.
>> -       We build the final result 16bits at a time with up to 4 instructions,
>> -       but do not emit instructions for 16bit zero holes. */
> 
> Please do not remove these comments.
> In my judgement this part of the code profits from some verbose clarification.
> What is happening might be obvious to you, but not to others trying to step in.

Fair enough.

> In general I'd prefer to keep movi as it was (functionally-wise) for the
> time being, replacing it with a more efficient version once we can get some
> numbers (which will be soon) with which to justify (or not) the added code
> complexity.

The most important thing we're not doing at the moment is handling negative
numbers efficiently.  E.g. we're using 4 insns to load -1.



r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
  2013-09-05 15:41     ` Richard Henderson
@ 2013-09-06  9:06       ` Claudio Fontana
  0 siblings, 0 replies; 59+ messages in thread
From: Claudio Fontana @ 2013-09-06  9:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 05.09.2013 17:41, Richard Henderson wrote:
> On 09/05/2013 06:32 AM, Claudio Fontana wrote:
>>>  {
>>> -    uint32_t half, base, shift, movk = 0;
>>> -    /* construct halfwords of the immediate with MOVZ/MOVK with LSL */
>>> -    /* using MOVZ 0x52800000 | extended reg.. */
>>> -    base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000;
>>> -    /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
>>> -       first MOVZ with the half-word immediate skipping the zeros, with a shift
>>> -       (LSL) equal to this number. Then morph all next instructions into MOVKs.
>>> -       Zero the processed half-word in the value, continue until empty.
>>> -       We build the final result 16bits at a time with up to 4 instructions,
>>> -       but do not emit instructions for 16bit zero holes. */
>>
>> Please do not remove these comments.
>> In my judgement this part of the code profits from some verbose clarification.
>> What is happening might be obvious to you, but not to others trying to step in.
> 
> Fair enough.
> 
>> In general I'd prefer to keep movi as it was (functionally-wise) for the
>> time being, replacing it with a more efficient version once we can get some
>> numbers (which will be soon) with which to justify (or not) the added code
>> complexity.
> 
> The most important thing we're not doing at the moment is handling negative
> numbers efficiently.  E.g. we're using 4 insns to load -1.

Ok, lets punctually address that then.

> r~
> 

Claudio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
                   ` (29 preceding siblings ...)
  2013-09-03  7:37 ` [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard W.M. Jones
@ 2013-09-09  8:13 ` Claudio Fontana
  2013-09-09 14:08   ` Richard Henderson
  30 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-09  8:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

Hello Richard,

On 02.09.2013 19:54, Richard Henderson wrote:
> I'm not sure if I posted v2 or not, but my branch is named -3,
> therefore this is v3.  ;-)
> 
> The jumbo "fixme" patch from v1 has been split up.  This has been
> updated for the changes in the tlb helpers over the past few weeks.
> For the benefit of trivial conflict resolution, it's relative to a
> tree that contains basically all of my patches.
> 
> See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
> you find yourself missing any of the dependencies.
> 
> 
> r~
> Richard Henderson (29):
>   tcg-aarch64: Set ext based on TCG_OPF_64BIT
>   tcg-aarch64: Change all ext variables to bool
>   tcg-aarch64: Don't handle mov/movi in tcg_out_op
>   tcg-aarch64: Hoist common argument loads in tcg_out_op
>   tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
>   tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
>   tcg-aarch64: Introduce tcg_fmt_* functions
>   tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
>   tcg-aarch64: Implement mov with tcg_fmt_* functions
>   tcg-aarch64: Handle constant operands to add, sub, and compare
>   tcg-aarch64: Handle constant operands to and, or, xor
>   tcg-aarch64: Support andc, orc, eqv, not
>   tcg-aarch64: Handle zero as first argument to sub
>   tcg-aarch64: Support movcond
>   tcg-aarch64: Support deposit
>   tcg-aarch64: Support add2, sub2
>   tcg-aarch64: Support muluh, mulsh
>   tcg-aarch64: Support div, rem
>   tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
>   tcg-aarch64: Improve tcg_out_movi
>   tcg-aarch64: Avoid add with zero in tlb load
>   tcg-aarch64: Use adrp in tcg_out_movi
>   tcg-aarch64: Pass return address to load/store helpers directly.
>   tcg-aarch64: Use tcg_out_call for qemu_ld/st
>   tcg-aarch64: Use symbolic names for branches
>   tcg-aarch64: Implement tcg_register_jit
>   tcg-aarch64: Reuse FP and LR in translated code
>   tcg-aarch64: Introduce tcg_out_ldst_pair
>   tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check
> 
>  include/exec/exec-all.h  |   18 -
>  tcg/aarch64/tcg-target.c | 1276 ++++++++++++++++++++++++++++++----------------
>  tcg/aarch64/tcg-target.h |   76 +--
>  3 files changed, 867 insertions(+), 503 deletions(-)
> 

after carefully reading and testing your patches, this is how I suggest to proceed: 

first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code.
No type changes, no refactoring, no beautification.

Once we agree on those, introduce the meaningful restructuring you want to do,
like the new INSN type, the "don't handle mov/movi in tcg_out_op", the TCG_OPF_64BIT thing, etc.

Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use "1" and "0": adapt them as well) etc.

Right now the patchset is difficult to digest, given the regressions it introduces coupled with a mixing of functional changes, restructuring and cosmetics.

I think this will allow us to proceed quicker towards agreement.

Thanks,

Claudio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-09  8:13 ` Claudio Fontana
@ 2013-09-09 14:08   ` Richard Henderson
  2013-09-09 15:02     ` Claudio Fontana
  0 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-09 14:08 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel

On 09/09/2013 01:13 AM, Claudio Fontana wrote:
> after carefully reading and testing your patches, this is how I suggest to proceed: 
> 
> first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code.
> No type changes, no refactoring, no beautification.
> 
> Once we agree on those, introduce the meaningful restructuring you want to do,
> like the new INSN type, the "don't handle mov/movi in tcg_out_op", the TCG_OPF_64BIT thing, etc.
> 
> Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use "1" and "0": adapt them as well) etc.

No, I don't agree.  Especially with respect to the insn type.

I'd much rather do all the "cosmetic stuff", as you put it, first.  It makes
all of the "real" changes much easier to understand.


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-09 14:08   ` Richard Henderson
@ 2013-09-09 15:02     ` Claudio Fontana
  2013-09-09 15:04       ` Peter Maydell
  2013-09-09 15:07       ` Richard Henderson
  0 siblings, 2 replies; 59+ messages in thread
From: Claudio Fontana @ 2013-09-09 15:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 09.09.2013 16:08, Richard Henderson wrote:
> On 09/09/2013 01:13 AM, Claudio Fontana wrote:
>> after carefully reading and testing your patches, this is how I suggest to proceed: 
>>
>> first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code.
>> No type changes, no refactoring, no beautification.
>>
>> Once we agree on those, introduce the meaningful restructuring you want to do,
>> like the new INSN type, the "don't handle mov/movi in tcg_out_op", the TCG_OPF_64BIT thing, etc.
>>
>> Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use "1" and "0": adapt them as well) etc.
> 
> No, I don't agree.  Especially with respect to the insn type.
> 
> I'd much rather do all the "cosmetic stuff", as you put it, first.  It makes
> all of the "real" changes much easier to understand.
> 
> 
> r~
> 

I guess we are stuck then. With the cosmetic and restructuring stuff coming before, I cannot cherry pick the good parts later.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-09 15:02     ` Claudio Fontana
@ 2013-09-09 15:04       ` Peter Maydell
  2013-09-09 15:07       ` Richard Henderson
  1 sibling, 0 replies; 59+ messages in thread
From: Peter Maydell @ 2013-09-09 15:04 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: QEMU Developers, Richard Henderson

On 9 September 2013 16:02, Claudio Fontana <claudio.fontana@huawei.com> wrote:
> I guess we are stuck then. With the cosmetic and restructuring
> stuff coming before, I cannot cherry pick the good parts later.

...what do you need to cherry pick it into?

-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-09 15:02     ` Claudio Fontana
  2013-09-09 15:04       ` Peter Maydell
@ 2013-09-09 15:07       ` Richard Henderson
  2013-09-10  8:27         ` Claudio Fontana
  1 sibling, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-09 15:07 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel

On 09/09/2013 08:02 AM, Claudio Fontana wrote:
> On 09.09.2013 16:08, Richard Henderson wrote:
>> On 09/09/2013 01:13 AM, Claudio Fontana wrote:
>>> after carefully reading and testing your patches, this is how I suggest to proceed: 
>>>
>>> first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code.
>>> No type changes, no refactoring, no beautification.
>>>
>>> Once we agree on those, introduce the meaningful restructuring you want to do,
>>> like the new INSN type, the "don't handle mov/movi in tcg_out_op", the TCG_OPF_64BIT thing, etc.
>>>
>>> Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use "1" and "0": adapt them as well) etc.
>>
>> No, I don't agree.  Especially with respect to the insn type.
>>
>> I'd much rather do all the "cosmetic stuff", as you put it, first.  It makes
>> all of the "real" changes much easier to understand.
>>
>>
>> r~
>>
> 
> I guess we are stuck then. With the cosmetic and restructuring stuff coming before, I cannot cherry pick the good parts later.
> 
> 

Have you tested the first 9 patches on their own?  I.e.

>   tcg-aarch64: Set ext based on TCG_OPF_64BIT
>   tcg-aarch64: Change all ext variables to bool
>   tcg-aarch64: Don't handle mov/movi in tcg_out_op
>   tcg-aarch64: Hoist common argument loads in tcg_out_op
>   tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
>   tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
>   tcg-aarch64: Introduce tcg_fmt_* functions
>   tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
>   tcg-aarch64: Implement mov with tcg_fmt_* functions

There should be no functional change to the backend, producing 100% identical
output code.  There should even be little or no change in tcg.o itself.

This should make it trivial to verify that these patches are not at fault.



r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 14/29] tcg-aarch64: Support movcond
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 14/29] tcg-aarch64: Support movcond Richard Henderson
@ 2013-09-09 15:09   ` Claudio Fontana
  0 siblings, 0 replies; 59+ messages in thread
From: Claudio Fontana @ 2013-09-09 15:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 02.09.2013 19:54, Richard Henderson wrote:
> Also tidy the implementation of setcond in order to share code.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 33 +++++++++++++++++++++++++--------
>  tcg/aarch64/tcg-target.h |  4 ++--
>  2 files changed, 27 insertions(+), 10 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index ea1db85..322660d 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -284,6 +284,10 @@ typedef enum {
>      INSN_LSRV  = 0x1ac02400,
>      INSN_ASRV  = 0x1ac02800,
>      INSN_RORV  = 0x1ac02c00,
> +
> +    /* Conditional select instructions */
> +    INSN_CSEL  = 0x1a800000,
> +    INSN_CSINC = 0x1a800400,
>  } AArch64Insn;
>  
>  static inline enum aarch64_ldst_op_data
> @@ -435,6 +439,14 @@ static void tcg_fmt_Rdn_limm(TCGContext *s, AArch64Insn insn, bool ext,
>      tcg_fmt_Rdn_r_s(s, insn, ext, rd, rn, r, c);
>  }
>  
> +static inline void tcg_fmt_Rdnm_cond(TCGContext *s, AArch64Insn insn,
> +                                     bool ext, TCGReg rd, TCGReg rn,
> +                                     TCGReg rm, TCGCond c)
> +{
> +    tcg_out32(s, insn | ext << 31 | rm << 16 | rn << 5 | rd
> +              | tcg_cond_to_aarch64[c] << 12);
> +}
> +
>  static inline void tcg_out_ldst_9(TCGContext *s,
>                                    enum aarch64_ldst_op_data op_data,
>                                    enum aarch64_ldst_op_type op_type,
> @@ -661,13 +673,6 @@ static void tcg_out_cmp(TCGContext *s, bool ext, TCGReg a,
>      }
>  }
>  
> -static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)
> -{
> -    /* Using CSET alias of CSINC 0x1a800400 Xd, XZR, XZR, invert(cond) */
> -    unsigned int base = ext ? 0x9a9f07e0 : 0x1a9f07e0;
> -    tcg_out32(s, base | tcg_cond_to_aarch64[tcg_invert_cond(c)] << 12 | rd);
> -}
> -
>  static inline void tcg_out_goto(TCGContext *s, tcg_target_long target)
>  {
>      tcg_target_long offset;
> @@ -1394,7 +1399,17 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          /* FALLTHRU */
>      case INDEX_op_setcond_i64:
>          tcg_out_cmp(s, ext, a1, a2, c2);
> -        tcg_out_cset(s, 0, a0, args[3]);
> +        /* Using CSET alias of CSINC Xd, XZR, XZR, invert(cond) */
> +        tcg_fmt_Rdnm_cond(s, INSN_CSINC, 0, a0, TCG_REG_XZR,
> +                          TCG_REG_XZR, tcg_invert_cond(args[3]));
> +        break;
> +
> +    case INDEX_op_movcond_i32:
> +        a2 = (int32_t)a2;
> +        /* FALLTHRU */
> +    case INDEX_op_movcond_i64:
> +        tcg_out_cmp(s, ext, a1, a2, c2);
> +        tcg_fmt_Rdnm_cond(s, INSN_CSEL, ext, a0, REG0(3), REG0(4), args[5]);
>          break;
>  
>      case INDEX_op_qemu_ld8u:
> @@ -1553,6 +1568,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_brcond_i64, { "r", "rA" } },
>      { INDEX_op_setcond_i32, { "r", "r", "rwA" } },
>      { INDEX_op_setcond_i64, { "r", "r", "rA" } },
> +    { INDEX_op_movcond_i32, { "r", "r", "rwA", "rZ", "rZ" } },
> +    { INDEX_op_movcond_i64, { "r", "r", "rwA", "rZ", "rZ" } },
>  
>      { INDEX_op_qemu_ld8u, { "r", "l" } },
>      { INDEX_op_qemu_ld8s, { "r", "l" } },
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 6242136..ff073ca 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -56,7 +56,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i32         0
>  #define TCG_TARGET_HAS_nor_i32          0
>  #define TCG_TARGET_HAS_deposit_i32      0
> -#define TCG_TARGET_HAS_movcond_i32      0
> +#define TCG_TARGET_HAS_movcond_i32      1
>  #define TCG_TARGET_HAS_add2_i32         0
>  #define TCG_TARGET_HAS_sub2_i32         0
>  #define TCG_TARGET_HAS_mulu2_i32        0
> @@ -84,7 +84,7 @@ typedef enum {
>  #define TCG_TARGET_HAS_nand_i64         0
>  #define TCG_TARGET_HAS_nor_i64          0
>  #define TCG_TARGET_HAS_deposit_i64      0
> -#define TCG_TARGET_HAS_movcond_i64      0
> +#define TCG_TARGET_HAS_movcond_i64      1
>  #define TCG_TARGET_HAS_add2_i64         0
>  #define TCG_TARGET_HAS_sub2_i64         0
>  #define TCG_TARGET_HAS_mulu2_i64        0
> 

This breaks the x86-64 target.
Had you separated the additional operation from the cset change we would know which part is at fault.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-09 15:07       ` Richard Henderson
@ 2013-09-10  8:27         ` Claudio Fontana
  2013-09-10  8:45           ` Peter Maydell
  2013-09-10 13:16           ` Richard Henderson
  0 siblings, 2 replies; 59+ messages in thread
From: Claudio Fontana @ 2013-09-10  8:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 09.09.2013 17:07, Richard Henderson wrote:
> On 09/09/2013 08:02 AM, Claudio Fontana wrote:
>> On 09.09.2013 16:08, Richard Henderson wrote:
>>> On 09/09/2013 01:13 AM, Claudio Fontana wrote:
>>>> after carefully reading and testing your patches, this is how I suggest to proceed: 
>>>>
>>>> first do the implementation of the new functionality (tcg opcodes, jit) in a way that is consistent with the existing code.
>>>> No type changes, no refactoring, no beautification.
>>>>
>>>> Once we agree on those, introduce the meaningful restructuring you want to do,
>>>> like the new INSN type, the "don't handle mov/movi in tcg_out_op", the TCG_OPF_64BIT thing, etc.
>>>>
>>>> Last do the cosmetic stuff if you really want to do it, like the change all ext to bool (note that there is no point if the callers still use "1" and "0": adapt them as well) etc.
>>>
>>> No, I don't agree.  Especially with respect to the insn type.
>>>
>>> I'd much rather do all the "cosmetic stuff", as you put it, first.  It makes
>>> all of the "real" changes much easier to understand.
>>>
>>>
>>> r~
>>>
>>
>> I guess we are stuck then. With the cosmetic and restructuring stuff coming before, I cannot cherry pick the good parts later.
>>
>>
> 
> Have you tested the first 9 patches on their own?  I.e.
> 
>>   tcg-aarch64: Set ext based on TCG_OPF_64BIT
>>   tcg-aarch64: Change all ext variables to bool
>>   tcg-aarch64: Don't handle mov/movi in tcg_out_op
>>   tcg-aarch64: Hoist common argument loads in tcg_out_op
>>   tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
>>   tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
>>   tcg-aarch64: Introduce tcg_fmt_* functions
>>   tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
>>   tcg-aarch64: Implement mov with tcg_fmt_* functions

yes.

> 
> There should be no functional change to the backend, producing 100% identical
> output code.  There should even be little or no change in tcg.o itself.

There are two aspects.

On one side, although some changes do not break anything, I see some problems in them.
Putting them as a prerequisite for the rest forces us to agreeing on everything before moving forward, instead of being able to agree on separate chunks (meat first, rest later). In my view, this makes the process longer.

On another side, I end up having to manually revert some parts of these which you put as prerequisites, during bisection when landing after them, which is a huge time drain when tracking regressions introduced in the later part of the series.

> This should make it trivial to verify that these patches are not at fault.
> 
> r~

They don't break the targets, no.

Claudio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-10  8:27         ` Claudio Fontana
@ 2013-09-10  8:45           ` Peter Maydell
  2013-09-12  8:03             ` Claudio Fontana
  2013-09-10 13:16           ` Richard Henderson
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Maydell @ 2013-09-10  8:45 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: QEMU Developers, Richard Henderson

On 10 September 2013 09:27, Claudio Fontana <claudio.fontana@huawei.com> wrote:
> On another side, I end up having to manually revert some parts
> of these which you put as prerequisites, during bisection when
> landing after them, which is a huge time drain when tracking
> regressions introduced in the later part of the series.

I don't understand this; can you explain? If these early
refactoring patches have bugs then we should just identify
them and fix them. If they don't have bugs why would you
need to manually revert parts of them?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-10  8:27         ` Claudio Fontana
  2013-09-10  8:45           ` Peter Maydell
@ 2013-09-10 13:16           ` Richard Henderson
  2013-09-12  8:11             ` Claudio Fontana
  1 sibling, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-10 13:16 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel

On 09/10/2013 01:27 AM, Claudio Fontana wrote:
> There are two aspects.
> 
> On one side, although some changes do not break anything, I see some problems in them.

Then let us discuss them, sooner rather than later.

> Putting them as a prerequisite for the rest forces us to agreeing on
> everything before moving forward, instead of being able to agree on separate
> chunks (meat first, rest later). In my view, this makes the process longer.

If we have no common ground on how the port should look, then we simply cannot
move forward full stop.

Having put together a foundation of AArch64Insn and tcg_fmt_*, that I believe
to be clean and easy to understand, I simply refuse on aesthetic grounds to
rewrite later patches to instead use the magic number and open-coded insn
format used throughout the port today.  That way leads to a much greater chance
of error in my opinion.


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-10  8:45           ` Peter Maydell
@ 2013-09-12  8:03             ` Claudio Fontana
  2013-09-12  8:55               ` Peter Maydell
  0 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12  8:03 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

On 10.09.2013 10:45, Peter Maydell wrote:
> On 10 September 2013 09:27, Claudio Fontana <claudio.fontana@huawei.com> wrote:
>> On another side, I end up having to manually revert some parts
>> of these which you put as prerequisites, during bisection when
>> landing after them, which is a huge time drain when tracking
>> regressions introduced in the later part of the series.
> 
> I don't understand this; can you explain? If these early
> refactoring patches have bugs then we should just identify
> them and fix them. If they don't have bugs why would you
> need to manually revert parts of them?
> 

To revert the next patches which do introduce bugs.

I could not see bugs in the refactoring patches, but there is stuff to fix regardless of bugs.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-10 13:16           ` Richard Henderson
@ 2013-09-12  8:11             ` Claudio Fontana
  0 siblings, 0 replies; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12  8:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 10.09.2013 15:16, Richard Henderson wrote:
> On 09/10/2013 01:27 AM, Claudio Fontana wrote:
>> There are two aspects.
>>
>> On one side, although some changes do not break anything, I see some problems in them.
> 
> Then let us discuss them, sooner rather than later.
> 
>> Putting them as a prerequisite for the rest forces us to agreeing on
>> everything before moving forward, instead of being able to agree on separate
>> chunks (meat first, rest later). In my view, this makes the process longer.
> 
> If we have no common ground on how the port should look, then we simply cannot
> move forward full stop.
> 
> Having put together a foundation of AArch64Insn and tcg_fmt_*, that I believe
> to be clean and easy to understand, I simply refuse on aesthetic grounds to

on aesthetic grounds?

> rewrite later patches to instead use the magic number and open-coded insn
> format used throughout the port today.  That way leads to a much greater chance
> of error in my opinion.
> 

I just asked you to reorder the way you do things, so that I had less work to do when dissecting problems in the actual functional changes.
If it's really impossible for you to do that, I guess we can move forward anyway, it just creates more work here before we can have a chunk we agree on.

I will put additional comments on the parts that I would like to see improved.

Thanks,

Claudio

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT Richard Henderson
@ 2013-09-12  8:25   ` Claudio Fontana
  2013-09-12  8:58     ` Peter Maydell
  0 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12  8:25 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 02.09.2013 19:54, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 28 +++++++---------------------
>  1 file changed, 7 insertions(+), 21 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 55ff700..5b067fe 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1105,9 +1105,9 @@ static inline void tcg_out_load_pair(TCGContext *s, TCGReg addr,
>  static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                         const TCGArg *args, const int *const_args)
>  {
> -    /* ext will be set in the switch below, which will fall through to the
> -       common code. It triggers the use of extended regs where appropriate. */
> -    int ext = 0;
> +    /* 99% of the time, we can signal the use of extension registers
> +       by looking to see if the opcode handles 64-bit data.  */
> +    bool ext = (tcg_op_defs[opc].flags & TCG_OPF_64BIT) != 0;
>  
>      switch (opc) {
>      case INDEX_op_exit_tb:
> @@ -1163,7 +1163,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_mov_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_mov_i32:
>          tcg_out_movr(s, ext, args[0], args[1]);
>          break;
> @@ -1176,43 +1175,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_add_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_add_i32:
>          tcg_out_arith(s, ARITH_ADD, ext, args[0], args[1], args[2], 0);
>          break;
>  
>      case INDEX_op_sub_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_sub_i32:
>          tcg_out_arith(s, ARITH_SUB, ext, args[0], args[1], args[2], 0);
>          break;
>  
>      case INDEX_op_and_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_and_i32:
>          tcg_out_arith(s, ARITH_AND, ext, args[0], args[1], args[2], 0);
>          break;
>  
>      case INDEX_op_or_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_or_i32:
>          tcg_out_arith(s, ARITH_OR, ext, args[0], args[1], args[2], 0);
>          break;
>  
>      case INDEX_op_xor_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_xor_i32:
>          tcg_out_arith(s, ARITH_XOR, ext, args[0], args[1], args[2], 0);
>          break;
>  
>      case INDEX_op_mul_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_mul_i32:
>          tcg_out_mul(s, ext, args[0], args[1], args[2]);
>          break;
>  
>      case INDEX_op_shl_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_shl_i32:
>          if (const_args[2]) {    /* LSL / UBFM Wd, Wn, (32 - m) */
>              tcg_out_shl(s, ext, args[0], args[1], args[2]);
> @@ -1222,7 +1214,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_shr_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_shr_i32:
>          if (const_args[2]) {    /* LSR / UBFM Wd, Wn, m, 31 */
>              tcg_out_shr(s, ext, args[0], args[1], args[2]);
> @@ -1232,7 +1223,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_sar_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_sar_i32:
>          if (const_args[2]) {    /* ASR / SBFM Wd, Wn, m, 31 */
>              tcg_out_sar(s, ext, args[0], args[1], args[2]);
> @@ -1242,7 +1232,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_rotr_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_rotr_i32:
>          if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, m */
>              tcg_out_rotr(s, ext, args[0], args[1], args[2]);
> @@ -1252,7 +1241,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_rotl_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_rotl_i32:     /* same as rotate right by (32 - m) */
>          if (const_args[2]) {    /* ROR / EXTR Wd, Wm, Wm, 32 - m */
>              tcg_out_rotl(s, ext, args[0], args[1], args[2]);
> @@ -1265,14 +1253,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_brcond_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_brcond_i32: /* CMP 0, 1, cond(2), label 3 */
>          tcg_out_cmp(s, ext, args[0], args[1], 0);
>          tcg_out_goto_label_cond(s, args[2], args[3]);
>          break;
>  
>      case INDEX_op_setcond_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_setcond_i32:
>          tcg_out_cmp(s, ext, args[1], args[2], 0);
>          tcg_out_cset(s, 0, args[0], args[3]);

There's not point to change to 'bool' if you pass '0': either we keep as int, and we pass 0,
or we change to bool, and we pass 'false'.
There are instances of this also in successive patches, I point out only this one, but it should be checked in the whole series.

> @@ -1315,9 +1301,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_qemu_st(s, args, 3);
>          break;
>  
> -    case INDEX_op_bswap64_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_bswap32_i64:
> +        /* Despite the _i64, this is a 32-bit bswap.  */
> +        ext = 0;
> +        /* FALLTHRU */
> +    case INDEX_op_bswap64_i64:

we waste too much y space here, which gives context and is a scarse resource.
What about 

case INDEX_op_bswap32_i64: /* Despite the _i64, this is a 32-bit bswap.  */
    ext = false; /* FALLTHRU */

>      case INDEX_op_bswap32_i32:
>          tcg_out_rev(s, ext, args[0], args[1]);
>          break;
> @@ -1327,12 +1315,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_ext8s_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_ext8s_i32:
>          tcg_out_sxt(s, ext, 0, args[0], args[1]);
>          break;
>      case INDEX_op_ext16s_i64:
> -        ext = 1; /* fall through */
>      case INDEX_op_ext16s_i32:
>          tcg_out_sxt(s, ext, 1, args[0], args[1]);
>          break;
> 

C.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool Richard Henderson
@ 2013-09-12  8:29   ` Claudio Fontana
  2013-09-12 13:45     ` Richard Henderson
  0 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12  8:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 02.09.2013 19:54, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 44 ++++++++++++++++++++++----------------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 5b067fe..bde4c72 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -326,7 +326,7 @@ static inline void tcg_out_ldst_12(TCGContext *s,
>                | op_type << 20 | scaled_uimm << 10 | rn << 5 | rd);
>  }
>  
> -static inline void tcg_out_movr(TCGContext *s, int ext, TCGReg rd, TCGReg src)
> +static inline void tcg_out_movr(TCGContext *s, bool ext, TCGReg rd, TCGReg src)
>  {
>      /* register to register move using MOV (shifted register with no shift) */
>      /* using MOV 0x2a0003e0 | (shift).. */
> @@ -407,7 +407,7 @@ static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data data,
>  }
>  
>  /* mov alias implemented with add immediate, useful to move to/from SP */
> -static inline void tcg_out_movr_sp(TCGContext *s, int ext, TCGReg rd, TCGReg rn)
> +static inline void tcg_out_movr_sp(TCGContext *s, bool ext, TCGReg rd, TCGReg rn)
>  {
>      /* using ADD 0x11000000 | (ext) | rn << 5 | rd */
>      unsigned int base = ext ? 0x91000000 : 0x11000000;
> @@ -437,7 +437,7 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>  }
>  
>  static inline void tcg_out_arith(TCGContext *s, enum aarch64_arith_opc opc,
> -                                 int ext, TCGReg rd, TCGReg rn, TCGReg rm,
> +                                 bool ext, TCGReg rd, TCGReg rn, TCGReg rm,
>                                   int shift_imm)
>  {
>      /* Using shifted register arithmetic operations */
> @@ -453,7 +453,7 @@ static inline void tcg_out_arith(TCGContext *s, enum aarch64_arith_opc opc,
>      tcg_out32(s, base | rm << 16 | shift | rn << 5 | rd);
>  }
>  
> -static inline void tcg_out_mul(TCGContext *s, int ext,
> +static inline void tcg_out_mul(TCGContext *s, bool ext,
>                                 TCGReg rd, TCGReg rn, TCGReg rm)
>  {
>      /* Using MADD 0x1b000000 with Ra = wzr alias MUL 0x1b007c00 */
> @@ -462,7 +462,7 @@ static inline void tcg_out_mul(TCGContext *s, int ext,
>  }
>  
>  static inline void tcg_out_shiftrot_reg(TCGContext *s,
> -                                        enum aarch64_srr_opc opc, int ext,
> +                                        enum aarch64_srr_opc opc, bool ext,
>                                          TCGReg rd, TCGReg rn, TCGReg rm)
>  {
>      /* using 2-source data processing instructions 0x1ac02000 */
> @@ -470,7 +470,7 @@ static inline void tcg_out_shiftrot_reg(TCGContext *s,
>      tcg_out32(s, base | rm << 16 | opc << 8 | rn << 5 | rd);
>  }
>  
> -static inline void tcg_out_ubfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
> +static inline void tcg_out_ubfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
>                                  unsigned int a, unsigned int b)
>  {
>      /* Using UBFM 0x53000000 Wd, Wn, a, b */
> @@ -478,7 +478,7 @@ static inline void tcg_out_ubfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
>      tcg_out32(s, base | a << 16 | b << 10 | rn << 5 | rd);
>  }
>  
> -static inline void tcg_out_sbfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
> +static inline void tcg_out_sbfm(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
>                                  unsigned int a, unsigned int b)
>  {
>      /* Using SBFM 0x13000000 Wd, Wn, a, b */
> @@ -486,7 +486,7 @@ static inline void tcg_out_sbfm(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
>      tcg_out32(s, base | a << 16 | b << 10 | rn << 5 | rd);
>  }
>  
> -static inline void tcg_out_extr(TCGContext *s, int ext, TCGReg rd,
> +static inline void tcg_out_extr(TCGContext *s, bool ext, TCGReg rd,
>                                  TCGReg rn, TCGReg rm, unsigned int a)
>  {
>      /* Using EXTR 0x13800000 Wd, Wn, Wm, a */
> @@ -494,7 +494,7 @@ static inline void tcg_out_extr(TCGContext *s, int ext, TCGReg rd,
>      tcg_out32(s, base | rm << 16 | a << 10 | rn << 5 | rd);
>  }
>  
> -static inline void tcg_out_shl(TCGContext *s, int ext,
> +static inline void tcg_out_shl(TCGContext *s, bool ext,
>                                 TCGReg rd, TCGReg rn, unsigned int m)
>  {
>      int bits, max;
> @@ -503,28 +503,28 @@ static inline void tcg_out_shl(TCGContext *s, int ext,
>      tcg_out_ubfm(s, ext, rd, rn, bits - (m & max), max - (m & max));
>  }
>  
> -static inline void tcg_out_shr(TCGContext *s, int ext,
> +static inline void tcg_out_shr(TCGContext *s, bool ext,
>                                 TCGReg rd, TCGReg rn, unsigned int m)
>  {
>      int max = ext ? 63 : 31;
>      tcg_out_ubfm(s, ext, rd, rn, m & max, max);
>  }
>  
> -static inline void tcg_out_sar(TCGContext *s, int ext,
> +static inline void tcg_out_sar(TCGContext *s, bool ext,
>                                 TCGReg rd, TCGReg rn, unsigned int m)
>  {
>      int max = ext ? 63 : 31;
>      tcg_out_sbfm(s, ext, rd, rn, m & max, max);
>  }
>  
> -static inline void tcg_out_rotr(TCGContext *s, int ext,
> +static inline void tcg_out_rotr(TCGContext *s, bool ext,
>                                  TCGReg rd, TCGReg rn, unsigned int m)
>  {
>      int max = ext ? 63 : 31;
>      tcg_out_extr(s, ext, rd, rn, rn, m & max);
>  }
>  
> -static inline void tcg_out_rotl(TCGContext *s, int ext,
> +static inline void tcg_out_rotl(TCGContext *s, bool ext,
>                                  TCGReg rd, TCGReg rn, unsigned int m)
>  {
>      int bits, max;
> @@ -533,14 +533,14 @@ static inline void tcg_out_rotl(TCGContext *s, int ext,
>      tcg_out_extr(s, ext, rd, rn, rn, bits - (m & max));
>  }
>  
> -static inline void tcg_out_cmp(TCGContext *s, int ext, TCGReg rn, TCGReg rm,
> +static inline void tcg_out_cmp(TCGContext *s, bool ext, TCGReg rn, TCGReg rm,
>                                 int shift_imm)
>  {
>      /* Using CMP alias SUBS wzr, Wn, Wm */
>      tcg_out_arith(s, ARITH_SUBS, ext, TCG_REG_XZR, rn, rm, shift_imm);
>  }
>  
> -static inline void tcg_out_cset(TCGContext *s, int ext, TCGReg rd, TCGCond c)
> +static inline void tcg_out_cset(TCGContext *s, bool ext, TCGReg rd, TCGCond c)

I see the problem related to the previous patch.
What about continuing to use int in the previous patch,
and replace it with bool in this one? The previous patch would only target the way ext is set,
and this one would really contain all bool changes.

>  {
>      /* Using CSET alias of CSINC 0x1a800400 Xd, XZR, XZR, invert(cond) */
>      unsigned int base = ext ? 0x9a9f07e0 : 0x1a9f07e0;
> @@ -637,7 +637,7 @@ aarch64_limm(unsigned int m, unsigned int r)
>     to test a 32bit reg against 0xff000000, pass M = 8,  R = 8.
>     to test a 32bit reg against 0xff0000ff, pass M = 16, R = 8.
>   */
> -static inline void tcg_out_tst(TCGContext *s, int ext, TCGReg rn,
> +static inline void tcg_out_tst(TCGContext *s, bool ext, TCGReg rn,
>                                 unsigned int m, unsigned int r)
>  {
>      /* using TST alias of ANDS XZR, Xn,#bimm64 0x7200001f */
> @@ -646,7 +646,7 @@ static inline void tcg_out_tst(TCGContext *s, int ext, TCGReg rn,
>  }
>  
>  /* and a register with a bit pattern, similarly to TST, no flags change */
> -static inline void tcg_out_andi(TCGContext *s, int ext, TCGReg rd, TCGReg rn,
> +static inline void tcg_out_andi(TCGContext *s, bool ext, TCGReg rd, TCGReg rn,
>                                  unsigned int m, unsigned int r)
>  {
>      /* using AND 0x12000000 */
> @@ -700,21 +700,21 @@ static inline void tcg_out_goto_label_cond(TCGContext *s,
>      }
>  }
>  
> -static inline void tcg_out_rev(TCGContext *s, int ext, TCGReg rd, TCGReg rm)
> +static inline void tcg_out_rev(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
>  {
>      /* using REV 0x5ac00800 */
>      unsigned int base = ext ? 0xdac00c00 : 0x5ac00800;
>      tcg_out32(s, base | rm << 5 | rd);
>  }
>  
> -static inline void tcg_out_rev16(TCGContext *s, int ext, TCGReg rd, TCGReg rm)
> +static inline void tcg_out_rev16(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
>  {
>      /* using REV16 0x5ac00400 */
>      unsigned int base = ext ? 0xdac00400 : 0x5ac00400;
>      tcg_out32(s, base | rm << 5 | rd);
>  }
>  
> -static inline void tcg_out_sxt(TCGContext *s, int ext, int s_bits,
> +static inline void tcg_out_sxt(TCGContext *s, bool ext, int s_bits,
>                                 TCGReg rd, TCGReg rn)
>  {
>      /* using ALIASes SXTB 0x13001c00, SXTH 0x13003c00, SXTW 0x93407c00
> @@ -732,7 +732,7 @@ static inline void tcg_out_uxt(TCGContext *s, int s_bits,
>      tcg_out_ubfm(s, 0, rd, rn, 0, bits);
>  }
>  
> -static inline void tcg_out_addi(TCGContext *s, int ext,
> +static inline void tcg_out_addi(TCGContext *s, bool ext,
>                                  TCGReg rd, TCGReg rn, unsigned int aimm)
>  {
>      /* add immediate aimm unsigned 12bit value (with LSL 0 or 12) */
> @@ -752,7 +752,7 @@ static inline void tcg_out_addi(TCGContext *s, int ext,
>      tcg_out32(s, base | aimm | (rn << 5) | rd);
>  }
>  
> -static inline void tcg_out_subi(TCGContext *s, int ext,
> +static inline void tcg_out_subi(TCGContext *s, bool ext,
>                                  TCGReg rd, TCGReg rn, unsigned int aimm)
>  {
>      /* sub immediate aimm unsigned 12bit value (with LSL 0 or 12) */
> 

C.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op
  2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op Richard Henderson
@ 2013-09-12  8:30   ` Claudio Fontana
  2013-09-12 14:02     ` Richard Henderson
  0 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12  8:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 02.09.2013 19:54, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 20 +++++++-------------
>  1 file changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index bde4c72..79a447d 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1162,18 +1162,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                       args[0], args[1], args[2]);
>          break;
>  
> -    case INDEX_op_mov_i64:
> -    case INDEX_op_mov_i32:
> -        tcg_out_movr(s, ext, args[0], args[1]);
> -        break;
> -
> -    case INDEX_op_movi_i64:
> -        tcg_out_movi(s, TCG_TYPE_I64, args[0], args[1]);
> -        break;
> -    case INDEX_op_movi_i32:
> -        tcg_out_movi(s, TCG_TYPE_I32, args[0], args[1]);
> -        break;
> -
>      case INDEX_op_add_i64:
>      case INDEX_op_add_i32:
>          tcg_out_arith(s, ARITH_ADD, ext, args[0], args[1], args[2], 0);
> @@ -1337,8 +1325,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_movr(s, 0, args[0], args[1]);
>          break;
>  
> +    case INDEX_op_mov_i64:
> +    case INDEX_op_mov_i32:
> +    case INDEX_op_movi_i64:
> +    case INDEX_op_movi_i32:
> +        /* Always implemented with tcg_out_mov/i, never with tcg_out_op.  */
>      default:
> -        tcg_abort(); /* opcode not implemented */
> +        /* Opcode not implemented.  */
> +        tcg_abort();
>      }
>  }
>  

Ok



-- 
Claudio Fontana
Server OS Architect
Huawei Technologies Duesseldorf GmbH
Riesstraße 25 - 80992 München

office: +49 89 158834 4135
mobile: +49 15253060158

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
  2013-09-12  8:03             ` Claudio Fontana
@ 2013-09-12  8:55               ` Peter Maydell
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Maydell @ 2013-09-12  8:55 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: QEMU Developers, Richard Henderson

On 12 September 2013 09:03, Claudio Fontana <claudio.fontana@huawei.com> wrote:
> On 10.09.2013 10:45, Peter Maydell wrote:
>> On 10 September 2013 09:27, Claudio Fontana <claudio.fontana@huawei.com> wrote:
>>> On another side, I end up having to manually revert some parts
>>> of these which you put as prerequisites, during bisection when
>>> landing after them, which is a huge time drain when tracking
>>> regressions introduced in the later part of the series.
>>
>> I don't understand this; can you explain? If these early
>> refactoring patches have bugs then we should just identify
>> them and fix them. If they don't have bugs why would you
>> need to manually revert parts of them?

> To revert the next patches which do introduce bugs.

Huh? The next patches would apply on top of the refactoring
patches, so you don't need to remove the refactoring to
revert the functional changes. (On the other hand if we
did things the way round you're suggesting with the
functional changes first then we would need to revert
or manually undo the refactoring parts in order to
revert the functional change patches.)

Personally I think that "first refactor/clean up, then
add new features/improvements" is a fairly standard order
to do things.

-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT
  2013-09-12  8:25   ` Claudio Fontana
@ 2013-09-12  8:58     ` Peter Maydell
  2013-09-12  9:01       ` Claudio Fontana
  2013-09-12 13:21       ` Richard Henderson
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Maydell @ 2013-09-12  8:58 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: QEMU Developers, Richard Henderson

On 12 September 2013 09:25, Claudio Fontana <claudio.fontana@huawei.com> wrote:
> On 02.09.2013 19:54, Richard Henderson wrote:
>>
>> -    case INDEX_op_bswap64_i64:
>> -        ext = 1; /* fall through */
>>      case INDEX_op_bswap32_i64:
>> +        /* Despite the _i64, this is a 32-bit bswap.  */
>> +        ext = 0;
>> +        /* FALLTHRU */
>> +    case INDEX_op_bswap64_i64:
>
> we waste too much y space here, which gives context and is a scarse resource.
> What about
>
> case INDEX_op_bswap32_i64: /* Despite the _i64, this is a 32-bit bswap.  */
>     ext = false; /* FALLTHRU */

Consensus in the rest of the code is for /* fall through */
rather than /* FALLTHRU */ -- there's only 28 of the
latter compared to 169 of the former.

-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT
  2013-09-12  8:58     ` Peter Maydell
@ 2013-09-12  9:01       ` Claudio Fontana
  2013-09-12 13:21       ` Richard Henderson
  1 sibling, 0 replies; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12  9:01 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Richard Henderson

On 12.09.2013 10:58, Peter Maydell wrote:
> On 12 September 2013 09:25, Claudio Fontana <claudio.fontana@huawei.com> wrote:
>> On 02.09.2013 19:54, Richard Henderson wrote:
>>>
>>> -    case INDEX_op_bswap64_i64:
>>> -        ext = 1; /* fall through */
>>>      case INDEX_op_bswap32_i64:
>>> +        /* Despite the _i64, this is a 32-bit bswap.  */
>>> +        ext = 0;
>>> +        /* FALLTHRU */
>>> +    case INDEX_op_bswap64_i64:
>>
>> we waste too much y space here, which gives context and is a scarse resource.
>> What about
>>
>> case INDEX_op_bswap32_i64: /* Despite the _i64, this is a 32-bit bswap.  */
>>     ext = false; /* FALLTHRU */
> 
> Consensus in the rest of the code is for /* fall through */
> rather than /* FALLTHRU */ -- there's only 28 of the
> latter compared to 169 of the former.
> 

I like /* fall through */ better as well.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT
  2013-09-12  8:58     ` Peter Maydell
  2013-09-12  9:01       ` Claudio Fontana
@ 2013-09-12 13:21       ` Richard Henderson
  1 sibling, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-12 13:21 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Claudio Fontana, QEMU Developers

On 09/12/2013 01:58 AM, Peter Maydell wrote:
> On 12 September 2013 09:25, Claudio Fontana <claudio.fontana@huawei.com> wrote:
>> On 02.09.2013 19:54, Richard Henderson wrote:
>>>
>>> -    case INDEX_op_bswap64_i64:
>>> -        ext = 1; /* fall through */
>>>      case INDEX_op_bswap32_i64:
>>> +        /* Despite the _i64, this is a 32-bit bswap.  */
>>> +        ext = 0;
>>> +        /* FALLTHRU */
>>> +    case INDEX_op_bswap64_i64:
>>
>> we waste too much y space here, which gives context and is a scarse resource.
>> What about
>>
>> case INDEX_op_bswap32_i64: /* Despite the _i64, this is a 32-bit bswap.  */
>>     ext = false; /* FALLTHRU */
> 
> Consensus in the rest of the code is for /* fall through */
> rather than /* FALLTHRU */ -- there's only 28 of the
> latter compared to 169 of the former.

Those 28 may well be all mine too.  The fingers still remember
that one must use FALLTHRU for lint.   ;-)


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool
  2013-09-12  8:29   ` Claudio Fontana
@ 2013-09-12 13:45     ` Richard Henderson
  0 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-12 13:45 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel

On 09/12/2013 01:29 AM, Claudio Fontana wrote:
> I see the problem related to the previous patch.
> What about continuing to use int in the previous patch,
> and replace it with bool in this one? The previous patch would only target the way ext is set,
> and this one would really contain all bool changes.

For the next edition, I'm planning to swap the order of the first two
patches, and also quit using bool.

I think TCGType is a bit more descriptive here, and also uses the
same 0/1 values that we want.  I added a QEMU_BUILD_BUG_ON to assert
that the enum values don't change.



r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op
  2013-09-12  8:30   ` Claudio Fontana
@ 2013-09-12 14:02     ` Richard Henderson
  2013-09-12 14:31       ` Claudio Fontana
  0 siblings, 1 reply; 59+ messages in thread
From: Richard Henderson @ 2013-09-12 14:02 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: qemu-devel

On 09/12/2013 01:30 AM, Claudio Fontana wrote:
>> +    case INDEX_op_mov_i64:
>> +    case INDEX_op_mov_i32:
>> +    case INDEX_op_movi_i64:
>> +    case INDEX_op_movi_i32:
>> +        /* Always implemented with tcg_out_mov/i, never with tcg_out_op.  */
>>      default:
>> -        tcg_abort(); /* opcode not implemented */
>> +        /* Opcode not implemented.  */
>> +        tcg_abort();
>>      }
>>  }
>>  
> 
> Ok

Sadly, "Ok" is neither Reviewed-by or Signed-off-by.


r~

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op
  2013-09-12 14:02     ` Richard Henderson
@ 2013-09-12 14:31       ` Claudio Fontana
  2013-09-12 14:35         ` Peter Maydell
  0 siblings, 1 reply; 59+ messages in thread
From: Claudio Fontana @ 2013-09-12 14:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On 12.09.2013 16:02, Richard Henderson wrote:
> On 09/12/2013 01:30 AM, Claudio Fontana wrote:
>>> +    case INDEX_op_mov_i64:
>>> +    case INDEX_op_mov_i32:
>>> +    case INDEX_op_movi_i64:
>>> +    case INDEX_op_movi_i32:
>>> +        /* Always implemented with tcg_out_mov/i, never with tcg_out_op.  */
>>>      default:
>>> -        tcg_abort(); /* opcode not implemented */
>>> +        /* Opcode not implemented.  */
>>> +        tcg_abort();
>>>      }
>>>  }
>>>  
>>
>> Ok
> 
> Sadly, "Ok" is neither Reviewed-by or Signed-off-by.
> 
> 

There is nothing sad about it. When it's reviewed, you will know.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op
  2013-09-12 14:31       ` Claudio Fontana
@ 2013-09-12 14:35         ` Peter Maydell
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Maydell @ 2013-09-12 14:35 UTC (permalink / raw)
  To: Claudio Fontana; +Cc: QEMU Developers, Richard Henderson

On 12 September 2013 15:31, Claudio Fontana <claudio.fontana@huawei.com> wrote:
> On 12.09.2013 16:02, Richard Henderson wrote:
>> On 09/12/2013 01:30 AM, Claudio Fontana wrote:
>>> Ok
>>
>> Sadly, "Ok" is neither Reviewed-by or Signed-off-by

> There is nothing sad about it. When it's reviewed, you will know.

The point is that "Ok" is not particularly useful feedback
because it looks like a reviewd-by/acked-by but isn't. It's
confusing so it's better to avoid it.

-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements
@ 2013-09-02 17:54 Richard Henderson
  0 siblings, 0 replies; 59+ messages in thread
From: Richard Henderson @ 2013-09-02 17:54 UTC (permalink / raw)
  To: qemu-devel

I'm not sure if I posted v2 or not, but my branch is named -3,
therefore this is v3.  ;-)

The jumbo "fixme" patch from v1 has been split up.  This has been
updated for the changes in the tlb helpers over the past few weeks.
For the benefit of trivial conflict resolution, it's relative to a
tree that contains basically all of my patches.

See git://github.com/rth7680/qemu.git tcg-aarch-3 for the tree, if
you find yourself missing any of the dependencies.


r~


Richard Henderson (29):
  tcg-aarch64: Set ext based on TCG_OPF_64BIT
  tcg-aarch64: Change all ext variables to bool
  tcg-aarch64: Don't handle mov/movi in tcg_out_op
  tcg-aarch64: Hoist common argument loads in tcg_out_op
  tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn
  tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn
  tcg-aarch64: Introduce tcg_fmt_* functions
  tcg-aarch64: Introduce tcg_fmt_Rdn_aimm
  tcg-aarch64: Implement mov with tcg_fmt_* functions
  tcg-aarch64: Handle constant operands to add, sub, and compare
  tcg-aarch64: Handle constant operands to and, or, xor
  tcg-aarch64: Support andc, orc, eqv, not
  tcg-aarch64: Handle zero as first argument to sub
  tcg-aarch64: Support movcond
  tcg-aarch64: Support deposit
  tcg-aarch64: Support add2, sub2
  tcg-aarch64: Support muluh, mulsh
  tcg-aarch64: Support div, rem
  tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s
  tcg-aarch64: Improve tcg_out_movi
  tcg-aarch64: Avoid add with zero in tlb load
  tcg-aarch64: Use adrp in tcg_out_movi
  tcg-aarch64: Pass return address to load/store helpers directly.
  tcg-aarch64: Use tcg_out_call for qemu_ld/st
  tcg-aarch64: Use symbolic names for branches
  tcg-aarch64: Implement tcg_register_jit
  tcg-aarch64: Reuse FP and LR in translated code
  tcg-aarch64: Introduce tcg_out_ldst_pair
  tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check

 include/exec/exec-all.h  |   18 -
 tcg/aarch64/tcg-target.c | 1276 ++++++++++++++++++++++++++++++----------------
 tcg/aarch64/tcg-target.h |   76 +--
 3 files changed, 867 insertions(+), 503 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2013-09-12 14:35 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-02 17:54 [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 01/29] tcg-aarch64: Set ext based on TCG_OPF_64BIT Richard Henderson
2013-09-12  8:25   ` Claudio Fontana
2013-09-12  8:58     ` Peter Maydell
2013-09-12  9:01       ` Claudio Fontana
2013-09-12 13:21       ` Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 02/29] tcg-aarch64: Change all ext variables to bool Richard Henderson
2013-09-12  8:29   ` Claudio Fontana
2013-09-12 13:45     ` Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 03/29] tcg-aarch64: Don't handle mov/movi in tcg_out_op Richard Henderson
2013-09-12  8:30   ` Claudio Fontana
2013-09-12 14:02     ` Richard Henderson
2013-09-12 14:31       ` Claudio Fontana
2013-09-12 14:35         ` Peter Maydell
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 04/29] tcg-aarch64: Hoist common argument loads " Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 05/29] tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 06/29] tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 07/29] tcg-aarch64: Introduce tcg_fmt_* functions Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 08/29] tcg-aarch64: Introduce tcg_fmt_Rdn_aimm Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 09/29] tcg-aarch64: Implement mov with tcg_fmt_* functions Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 10/29] tcg-aarch64: Handle constant operands to add, sub, and compare Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 11/29] tcg-aarch64: Handle constant operands to and, or, xor Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 12/29] tcg-aarch64: Support andc, orc, eqv, not Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 13/29] tcg-aarch64: Handle zero as first argument to sub Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 14/29] tcg-aarch64: Support movcond Richard Henderson
2013-09-09 15:09   ` Claudio Fontana
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 15/29] tcg-aarch64: Support deposit Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 16/29] tcg-aarch64: Support add2, sub2 Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 17/29] tcg-aarch64: Support muluh, mulsh Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 18/29] tcg-aarch64: Support div, rem Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 19/29] tcg-aarch64: Introduce tcg_fmt_Rd_uimm_s Richard Henderson
2013-09-05 13:32   ` Claudio Fontana
2013-09-05 15:41     ` Richard Henderson
2013-09-06  9:06       ` Claudio Fontana
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 20/29] tcg-aarch64: Improve tcg_out_movi Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 21/29] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 22/29] tcg-aarch64: Use adrp in tcg_out_movi Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 23/29] tcg-aarch64: Pass return address to load/store helpers directly Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 24/29] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
2013-09-02 17:54 ` [Qemu-devel] [PATCH v3 25/29] tcg-aarch64: Use symbolic names for branches Richard Henderson
2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 26/29] tcg-aarch64: Implement tcg_register_jit Richard Henderson
2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 27/29] tcg-aarch64: Reuse FP and LR in translated code Richard Henderson
2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 28/29] tcg-aarch64: Introduce tcg_out_ldst_pair Richard Henderson
2013-09-02 17:55 ` [Qemu-devel] [PATCH v3 29/29] tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check Richard Henderson
2013-09-03  7:37 ` [Qemu-devel] [PATCH v3 00/29] tcg-aarch64 improvements Richard W.M. Jones
2013-09-03  7:42   ` Laurent Desnogues
2013-09-03  8:00   ` Peter Maydell
2013-09-09  8:13 ` Claudio Fontana
2013-09-09 14:08   ` Richard Henderson
2013-09-09 15:02     ` Claudio Fontana
2013-09-09 15:04       ` Peter Maydell
2013-09-09 15:07       ` Richard Henderson
2013-09-10  8:27         ` Claudio Fontana
2013-09-10  8:45           ` Peter Maydell
2013-09-12  8:03             ` Claudio Fontana
2013-09-12  8:55               ` Peter Maydell
2013-09-10 13:16           ` Richard Henderson
2013-09-12  8:11             ` Claudio Fontana
  -- strict thread matches above, loose matches on Subject: below --
2013-09-02 17:54 Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.