[Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3
@ 2014-04-03 19:56 Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes Richard Henderson
                   ` (25 more replies)
  0 siblings, 26 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Changes from v1:
  * Frame pointer backtrace linkage retained.  Not that gdb seems to
    use this at all; the tcg_register_jit patch is still required in
    order to get a proper backtrace.
  * Several patches re-ordered in order to reduce churn.  Especially
    the qemu_ld/st related patches.
  * Avoid using ADRP for 32-bit values.  It would be exceedingly rare
    to save an instruction for this case.
  * Don't merge tcg_out_movr{,_sp} with their callers.

I believe I've incorporated all of Claudio's feedback from v1.


r~


Richard Henderson (26):
  tcg-aarch64: Properly detect SIGSEGV writes
  tcg-aarch64: Use intptr_t apropriately
  tcg-aarch64: Use TCGType and TCGMemOp constants
  tcg-aarch64: Use MOVN in tcg_out_movi
  tcg-aarch64: Use ORRI in tcg_out_movi
  tcg-aarch64: Special case small constants in tcg_out_movi
  tcg-aarch64: Use adrp in tcg_out_movi
  tcg-aarch64: Use symbolic names for branches
  tcg-aarch64: Create tcg_out_brcond
  tcg-aarch64: Use CBZ and CBNZ
  tcg-aarch64: Reuse LR in translated code
  tcg-aarch64: Introduce tcg_out_insn_3314
  tcg-aarch64: Implement tcg_register_jit
  tcg-aarch64: Avoid add with zero in tlb load
  tcg-aarch64: Use tcg_out_call for qemu_ld/st
  tcg-aarch64: Use ADR to pass the return address to the ld/st helpers
  tcg-aarch64: Use TCGMemOp in qemu_ld/st
  tcg-aarch64: Pass qemu_ld/st arguments directly
  tcg-aarch64: Implement TCG_TARGET_HAS_new_ldst
  tcg-aarch64: Support stores of zero
  tcg-aarch64: Introduce tcg_out_insn_3507
  tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op
  tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp
  tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType
  tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst
  tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr

 tcg/aarch64/tcg-target.c | 1076 ++++++++++++++++++++++++----------------------
 tcg/aarch64/tcg-target.h |   34 +-
 user-exec.c              |   29 +-
 3 files changed, 593 insertions(+), 546 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-07  7:58   ` Claudio Fontana
  2014-04-07 16:39   ` Peter Maydell
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 02/26] tcg-aarch64: Use intptr_t apropriately Richard Henderson
                   ` (24 subsequent siblings)
  25 siblings, 2 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Since the kernel doesn't pass any info on the reason for the fault,
disassemble the instruction to detect a store.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 user-exec.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/user-exec.c b/user-exec.c
index bc58056..52f76c9 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -465,16 +465,33 @@ int cpu_signal_handler(int host_signum, void *pinfo,
 
 #elif defined(__aarch64__)
 
-int cpu_signal_handler(int host_signum, void *pinfo,
-                       void *puc)
+int cpu_signal_handler(int host_signum, void *pinfo, void *puc)
 {
     siginfo_t *info = pinfo;
     struct ucontext *uc = puc;
-    uint64_t pc;
-    int is_write = 0; /* XXX how to determine? */
+    uintptr_t pc = uc->uc_mcontext.pc;
+    uint32_t insn = *(uint32_t *)pc;
+    bool is_write;
 
-    pc = uc->uc_mcontext.pc;
-    return handle_cpu_signal(pc, (uint64_t)info->si_addr,
+    /* XXX: need kernel patch to get write flag faster.  */
+    /* XXX: several of these could be combined.  */
+    is_write = (   (insn & 0xbfff0000) == 0x0c000000   /* C3.3.1 */
+                || (insn & 0xbfe00000) == 0x0c800000   /* C3.3.2 */
+                || (insn & 0xbfdf0000) == 0x0d000000   /* C3.3.3 */
+                || (insn & 0xbfc00000) == 0x0d800000   /* C3.3.4 */
+                || (insn & 0x3f400000) == 0x08000000   /* C3.3.6 */
+                || (insn & 0x3bc00000) == 0x28400000   /* C3.3.7 */
+                || (insn & 0x3be00c00) == 0x38000400   /* C3.3.8 */
+                || (insn & 0x3be00c00) == 0x38000c00   /* C3.3.9 */
+                || (insn & 0x3be00c00) == 0x38200800   /* C3.3.10 */
+                || (insn & 0x3be00c00) == 0x38000800   /* C3.3.11 */
+                || (insn & 0x3be00c00) == 0x38000000   /* C3.3.12 */
+                || (insn & 0x3bc00000) == 0x39000000   /* C3.3.13 */
+                || (insn & 0x3bc00000) == 0x29000000   /* C3.3.14 */
+                || (insn & 0x3bc00000) == 0x28800000   /* C3.3.15 */
+                || (insn & 0x3bc00000) == 0x29800000); /* C3.3.16 */
+
+    return handle_cpu_signal(pc, (uintptr_t)info->si_addr,
                              is_write, &uc->uc_sigmask, puc);
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 02/26] tcg-aarch64: Use intptr_t apropriately
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 03/26] tcg-aarch64: Use TCGType and TCGMemOp constants Richard Henderson
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

As opposed to tcg_target_long.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 661a5af..6938248 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -497,7 +497,7 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
 static inline void tcg_out_ldst_9(TCGContext *s,
                                   enum aarch64_ldst_op_data op_data,
                                   enum aarch64_ldst_op_type op_type,
-                                  TCGReg rd, TCGReg rn, tcg_target_long offset)
+                                  TCGReg rd, TCGReg rn, intptr_t offset)
 {
     /* use LDUR with BASE register with 9bit signed unscaled offset */
     tcg_out32(s, op_data << 24 | op_type << 20
@@ -566,7 +566,7 @@ static inline void tcg_out_ldst_r(TCGContext *s,
 /* solve the whole ldst problem */
 static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data data,
                                 enum aarch64_ldst_op_type type,
-                                TCGReg rd, TCGReg rn, tcg_target_long offset)
+                                TCGReg rd, TCGReg rn, intptr_t offset)
 {
     if (offset >= -256 && offset < 256) {
         tcg_out_ldst_9(s, data, type, rd, rn, offset);
@@ -954,9 +954,9 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
     tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
-    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X3, (tcg_target_long)lb->raddr);
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X3, (intptr_t)lb->raddr);
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
-                 (tcg_target_long)qemu_ld_helpers[lb->opc & 3]);
+                 (intptr_t)qemu_ld_helpers[lb->opc & 3]);
     tcg_out_callr(s, TCG_REG_TMP);
     if (lb->opc & 0x04) {
         tcg_out_sxt(s, 1, lb->opc & 3, lb->datalo_reg, TCG_REG_X0);
@@ -979,7 +979,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
                  (intptr_t)qemu_st_helpers[lb->opc & 3]);
     tcg_out_callr(s, TCG_REG_TMP);
-    tcg_out_goto(s, (tcg_target_long)lb->raddr);
+    tcg_out_goto(s, (intptr_t)lb->raddr);
 }
 
 static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 03/26] tcg-aarch64: Use TCGType and TCGMemOp constants
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 02/26] tcg-aarch64: Use intptr_t apropriately Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 04/26] tcg-aarch64: Use MOVN in tcg_out_movi Richard Henderson
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Rather than raw constants that could mean anything.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 73 +++++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 35 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 6938248..5e6d10b 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -595,7 +595,7 @@ static inline void tcg_out_mov(TCGContext *s,
                                TCGType type, TCGReg ret, TCGReg arg)
 {
     if (ret != arg) {
-        tcg_out_movr(s, type == TCG_TYPE_I64, ret, arg);
+        tcg_out_movr(s, type, ret, arg);
     }
 }
 
@@ -828,19 +828,19 @@ static inline void tcg_out_rev16(TCGContext *s, TCGType ext,
     tcg_out32(s, base | rm << 5 | rd);
 }
 
-static inline void tcg_out_sxt(TCGContext *s, TCGType ext, int s_bits,
+static inline void tcg_out_sxt(TCGContext *s, TCGType ext, TCGMemOp s_bits,
                                TCGReg rd, TCGReg rn)
 {
     /* Using ALIASes SXTB, SXTH, SXTW, of SBFM Xd, Xn, #0, #7|15|31 */
-    int bits = 8 * (1 << s_bits) - 1;
+    int bits = (8 << s_bits) - 1;
     tcg_out_sbfm(s, ext, rd, rn, 0, bits);
 }
 
-static inline void tcg_out_uxt(TCGContext *s, int s_bits,
+static inline void tcg_out_uxt(TCGContext *s, TCGMemOp s_bits,
                                TCGReg rd, TCGReg rn)
 {
     /* Using ALIASes UXTB, UXTH of UBFM Wd, Wn, #0, #7|15 */
-    int bits = 8 * (1 << s_bits) - 1;
+    int bits = (8 << s_bits) - 1;
     tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
 
@@ -949,19 +949,21 @@ static const void * const qemu_st_helpers[4] = {
 
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
+    TCGMemOp opc = lb->opc;
+    TCGMemOp size = opc & MO_SIZE;
+
     reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
 
-    tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
-    tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb->addrlo_reg);
+    tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
+    tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X3, (intptr_t)lb->raddr);
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
-                 (intptr_t)qemu_ld_helpers[lb->opc & 3]);
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)qemu_ld_helpers[size]);
     tcg_out_callr(s, TCG_REG_TMP);
-    if (lb->opc & 0x04) {
-        tcg_out_sxt(s, 1, lb->opc & 3, lb->datalo_reg, TCG_REG_X0);
+    if (opc & MO_SIGN) {
+        tcg_out_sxt(s, TCG_TYPE_I64, size, lb->datalo_reg, TCG_REG_X0);
     } else {
-        tcg_out_movr(s, 1, lb->datalo_reg, TCG_REG_X0);
+        tcg_out_movr(s, TCG_TYPE_I64, lb->datalo_reg, TCG_REG_X0);
     }
 
     tcg_out_goto(s, (intptr_t)lb->raddr);
@@ -969,15 +971,16 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
+    TCGMemOp size = lb->opc;
+
     reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
 
-    tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
-    tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb->addrlo_reg);
-    tcg_out_movr(s, 1, TCG_REG_X2, lb->datalo_reg);
+    tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
+    tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
+    tcg_out_movr(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X4, (intptr_t)lb->raddr);
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
-                 (intptr_t)qemu_st_helpers[lb->opc & 3]);
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)qemu_st_helpers[size]);
     tcg_out_callr(s, TCG_REG_TMP);
     tcg_out_goto(s, (intptr_t)lb->raddr);
 }
@@ -1061,14 +1064,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
     case 1:
         tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
         if (TCG_LDST_BSWAP) {
-            tcg_out_rev16(s, 0, data_r, data_r);
+            tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
     case 1 | 4:
         if (TCG_LDST_BSWAP) {
             tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
-            tcg_out_rev16(s, 0, data_r, data_r);
-            tcg_out_sxt(s, 1, 1, data_r, data_r);
+            tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
+            tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
         } else {
             tcg_out_ldst_r(s, LDST_16, LDST_LD_S_X, data_r, addr_r, off_r);
         }
@@ -1076,14 +1079,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
     case 2:
         tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
         if (TCG_LDST_BSWAP) {
-            tcg_out_rev(s, 0, data_r, data_r);
+            tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
     case 2 | 4:
         if (TCG_LDST_BSWAP) {
             tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
-            tcg_out_rev(s, 0, data_r, data_r);
-            tcg_out_sxt(s, 1, 2, data_r, data_r);
+            tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
+            tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
         } else {
             tcg_out_ldst_r(s, LDST_32, LDST_LD_S_X, data_r, addr_r, off_r);
         }
@@ -1091,7 +1094,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
     case 3:
         tcg_out_ldst_r(s, LDST_64, LDST_LD, data_r, addr_r, off_r);
         if (TCG_LDST_BSWAP) {
-            tcg_out_rev(s, 1, data_r, data_r);
+            tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
         }
         break;
     default:
@@ -1108,7 +1111,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data_r,
         break;
     case 1:
         if (TCG_LDST_BSWAP) {
-            tcg_out_rev16(s, 0, TCG_REG_TMP, data_r);
+            tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             tcg_out_ldst_r(s, LDST_16, LDST_ST, TCG_REG_TMP, addr_r, off_r);
         } else {
             tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
@@ -1116,7 +1119,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data_r,
         break;
     case 2:
         if (TCG_LDST_BSWAP) {
-            tcg_out_rev(s, 0, TCG_REG_TMP, data_r);
+            tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             tcg_out_ldst_r(s, LDST_32, LDST_ST, TCG_REG_TMP, addr_r, off_r);
         } else {
             tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
@@ -1124,7 +1127,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data_r,
         break;
     case 3:
         if (TCG_LDST_BSWAP) {
-            tcg_out_rev(s, 1, TCG_REG_TMP, data_r);
+            tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
             tcg_out_ldst_r(s, LDST_64, LDST_ST, TCG_REG_TMP, addr_r, off_r);
         } else {
             tcg_out_ldst_r(s, LDST_64, LDST_ST, data_r, addr_r, off_r);
@@ -1547,30 +1550,30 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_bswap16_i64:
     case INDEX_op_bswap16_i32:
-        tcg_out_rev16(s, 0, a0, a1);
+        tcg_out_rev16(s, TCG_TYPE_I32, a0, a1);
         break;
 
     case INDEX_op_ext8s_i64:
     case INDEX_op_ext8s_i32:
-        tcg_out_sxt(s, ext, 0, a0, a1);
+        tcg_out_sxt(s, ext, MO_8, a0, a1);
         break;
     case INDEX_op_ext16s_i64:
     case INDEX_op_ext16s_i32:
-        tcg_out_sxt(s, ext, 1, a0, a1);
+        tcg_out_sxt(s, ext, MO_16, a0, a1);
         break;
     case INDEX_op_ext32s_i64:
-        tcg_out_sxt(s, 1, 2, a0, a1);
+        tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
         break;
     case INDEX_op_ext8u_i64:
     case INDEX_op_ext8u_i32:
-        tcg_out_uxt(s, 0, a0, a1);
+        tcg_out_uxt(s, MO_8, a0, a1);
         break;
     case INDEX_op_ext16u_i64:
     case INDEX_op_ext16u_i32:
-        tcg_out_uxt(s, 1, a0, a1);
+        tcg_out_uxt(s, MO_16, a0, a1);
         break;
     case INDEX_op_ext32u_i64:
-        tcg_out_movr(s, 0, a0, a1);
+        tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
         break;
 
     case INDEX_op_deposit_i64:
@@ -1794,7 +1797,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
                       TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
 
     /* FP -> callee_saved */
-    tcg_out_movr_sp(s, 1, TCG_REG_FP, TCG_REG_SP);
+    tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
 
     /* store callee-preserved regs x19..x28 using FP -> callee_saved */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 04/26] tcg-aarch64: Use MOVN in tcg_out_movi
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (2 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 03/26] tcg-aarch64: Use TCGType and TCGMemOp constants Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 05/26] tcg-aarch64: Use ORRI " Richard Henderson
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

When profitable, initialize the register with MOVN instead of MOVZ,
before setting the remaining lanes with MOVK.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 63 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 50 insertions(+), 13 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5e6d10b..1d7612c 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -531,24 +531,61 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
                          tcg_target_long value)
 {
     AArch64Insn insn;
-
-    if (type == TCG_TYPE_I32) {
+    int i, wantinv, shift;
+    tcg_target_long svalue = value;
+    tcg_target_long ivalue = ~value;
+    tcg_target_long imask;
+
+    /* For 32-bit values, discard potential garbage in value.  For 64-bit
+       values within [2**31, 2**32-1], we can create smaller sequences by
+       interpreting this as a negative 32-bit number, while ensuring that
+       the high 32 bits are cleared by setting SF=0.  */
+    if (type == TCG_TYPE_I32 || (value & ~0xffffffffull) == 0) {
+        svalue = (int32_t)value;
         value = (uint32_t)value;
+        ivalue = (uint32_t)ivalue;
+        type = TCG_TYPE_I32;
+    }
+
+    /* Would it take fewer insns to begin with MOVN?  For the value and its
+       inverse, count the number of 16-bit lanes that are 0.  */
+    for (i = wantinv = imask = 0; i < 64; i += 16) {
+        tcg_target_long mask = 0xffffull << i;
+        if ((value & mask) == 0) {
+            wantinv -= 1;
+        }
+        if ((ivalue & mask) == 0) {
+            wantinv += 1;
+            imask |= mask;
+        }
     }
 
-    /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
-       first MOVZ with the half-word immediate skipping the zeros, with a shift
-       (LSL) equal to this number. Then all next instructions use MOVKs.
-       Zero the processed half-word in the value, continue until empty.
-       We build the final result 16bits at a time with up to 4 instructions,
-       but do not emit instructions for 16bit zero holes. */
+    /* If we had more 0xffff than 0x0000, invert VALUE and use MOVN.  */
     insn = I3405_MOVZ;
-    do {
-        unsigned shift = ctz64(value) & (63 & -16);
-        tcg_out_insn_3405(s, insn, shift >= 32, rd, value >> shift, shift);
+    if (wantinv > 0) {
+        value = ivalue;
+        insn = I3405_MOVN;
+    }
+
+    /* Find the lowest lane that is not 0x0000.  */
+    shift = ctz64(value) & (63 & -16);
+    tcg_out_insn_3405(s, insn, type, rd, value >> shift, shift);
+
+    if (wantinv > 0) {
+        /* Re-invert the value, so MOVK sees non-inverted bits.  */
+        value = ~value;
+        /* Clear out all the 0xffff lanes.  */
+        value ^= imask;
+    }
+    /* Clear out the lane that we just set.  */
+    value &= ~(0xffffUL << shift);
+
+    /* Iterate until all lanes have been set, and thus cleared from VALUE.  */
+    while (value) {
+        shift = ctz64(value) & (63 & -16);
+        tcg_out_insn(s, 3405, MOVK, type, rd, value >> shift, shift);
         value &= ~(0xffffUL << shift);
-        insn = I3405_MOVK;
-    } while (value);
+    }
 }
 
 static inline void tcg_out_ldst_r(TCGContext *s,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 05/26] tcg-aarch64: Use ORRI in tcg_out_movi
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (3 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 04/26] tcg-aarch64: Use MOVN in tcg_out_movi Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 06/26] tcg-aarch64: Special case small constants " Richard Henderson
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

The subset of logical immediates that we support is quite quick to test,
and such constants are quite common to want to load.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 70 +++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 1d7612c..c1d9895 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -527,6 +527,37 @@ static void tcg_out_movr_sp(TCGContext *s, TCGType ext, TCGReg rd, TCGReg rn)
     tcg_out_insn(s, 3401, ADDI, ext, rd, rn, 0);
 }
 
+/* This function is used for the Logical (immediate) instruction group.
+   The value of LIMM must satisfy IS_LIMM.  See the comment above about
+   only supporting simplified logical immediates.  */
+static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
+                             TCGReg rd, TCGReg rn, uint64_t limm)
+{
+    unsigned h, l, r, c;
+
+    assert(is_limm(limm));
+
+    h = clz64(limm);
+    l = ctz64(limm);
+    if (l == 0) {
+        r = 0;                  /* form 0....01....1 */
+        c = ctz64(~limm) - 1;
+        if (h == 0) {
+            r = clz64(~limm);   /* form 1..10..01..1 */
+            c += r;
+        }
+    } else {
+        r = 64 - l;             /* form 1....10....0 or 0..01..10..0 */
+        c = r - h - 1;
+    }
+    if (ext == TCG_TYPE_I32) {
+        r &= 31;
+        c &= 31;
+    }
+
+    tcg_out_insn_3404(s, insn, ext, rd, rn, ext, r, c);
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
                          tcg_target_long value)
 {
@@ -547,6 +578,14 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
         type = TCG_TYPE_I32;
     }
 
+    /* Check for bitfield immediates.  For the benefit of 32-bit quantities,
+       use the sign-extended value.  That lets us match rotated values such
+       as 0xff0000ff with the same 64-bit logic matching 0xffffffffff0000ff. */
+    if (is_limm(svalue)) {
+        tcg_out_logicali(s, I3404_ORRI, type, rd, TCG_REG_XZR, svalue);
+        return;
+    }
+
     /* Would it take fewer insns to begin with MOVN?  For the value and its
        inverse, count the number of 16-bit lanes that are 0.  */
     for (i = wantinv = imask = 0; i < 64; i += 16) {
@@ -891,37 +930,6 @@ static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
     }
 }
 
-/* This function is used for the Logical (immediate) instruction group.
-   The value of LIMM must satisfy IS_LIMM.  See the comment above about
-   only supporting simplified logical immediates.  */
-static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
-                             TCGReg rd, TCGReg rn, uint64_t limm)
-{
-    unsigned h, l, r, c;
-
-    assert(is_limm(limm));
-
-    h = clz64(limm);
-    l = ctz64(limm);
-    if (l == 0) {
-        r = 0;                  /* form 0....01....1 */
-        c = ctz64(~limm) - 1;
-        if (h == 0) {
-            r = clz64(~limm);   /* form 1..10..01..1 */
-            c += r;
-        }
-    } else {
-        r = 64 - l;             /* form 1....10....0 or 0..01..10..0 */
-        c = r - h - 1;
-    }
-    if (ext == TCG_TYPE_I32) {
-        r &= 31;
-        c &= 31;
-    }
-
-    tcg_out_insn_3404(s, insn, ext, rd, rn, ext, r, c);
-}
-
 static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
                                    TCGReg rh, TCGReg al, TCGReg ah,
                                    tcg_target_long bl, tcg_target_long bh,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 06/26] tcg-aarch64: Special case small constants in tcg_out_movi
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (4 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 05/26] tcg-aarch64: Use ORRI " Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 07/26] tcg-aarch64: Use adrp " Richard Henderson
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index c1d9895..a08f6c7 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -578,6 +578,16 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
         type = TCG_TYPE_I32;
     }
 
+    /* Speed things up by handling the common case of small positive
+       and negative values specially.  */
+    if ((value & ~0xffffull) == 0) {
+        tcg_out_insn(s, 3405, MOVZ, type, rd, value, 0);
+        return;
+    } else if ((ivalue & ~0xffffull) == 0) {
+        tcg_out_insn(s, 3405, MOVN, type, rd, ivalue, 0);
+        return;
+    }
+
     /* Check for bitfield immediates.  For the benefit of 32-bit quantities,
        use the sign-extended value.  That lets us match rotated values such
        as 0xff0000ff with the same 64-bit logic matching 0xffffffffff0000ff. */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 07/26] tcg-aarch64: Use adrp in tcg_out_movi
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (5 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 06/26] tcg-aarch64: Special case small constants " Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 08/26] tcg-aarch64: Use symbolic names for branches Richard Henderson
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Loading an qemu pointer as an immediate happens often.  E.g.

- exit_tb $0x7fa8140013
+ exit_tb $0x7f81ee0013
...
- :  d2800260        mov     x0, #0x13
- :  f2b50280        movk    x0, #0xa814, lsl #16
- :  f2c00fe0        movk    x0, #0x7f, lsl #32
+ :  90ff1000        adrp    x0, 0x7f81ee0000
+ :  91004c00        add     x0, x0, #0x13

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index a08f6c7..1337a13 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -294,6 +294,10 @@ typedef enum {
     I3405_MOVZ      = 0x52800000,
     I3405_MOVK      = 0x72800000,
 
+    /* PC relative addressing instructions.  */
+    I3406_ADR       = 0x10000000,
+    I3406_ADRP      = 0x90000000,
+
     /* Add/subtract shifted register instructions (without a shift).  */
     I3502_ADD       = 0x0b000000,
     I3502_ADDS      = 0x2b000000,
@@ -457,6 +461,12 @@ static void tcg_out_insn_3405(TCGContext *s, AArch64Insn insn, TCGType ext,
     tcg_out32(s, insn | ext << 31 | shift << (21 - 4) | half << 5 | rd);
 }
 
+static void tcg_out_insn_3406(TCGContext *s, AArch64Insn insn,
+                              TCGReg rd, int64_t disp)
+{
+    tcg_out32(s, insn | (disp & 3) << 29 | (disp & 0x1ffffc) << (5 - 2) | rd);
+}
+
 /* This function is for both 3.5.2 (Add/Subtract shifted register), for
    the rare occasion when we actually want to supply a shift amount.  */
 static inline void tcg_out_insn_3502S(TCGContext *s, AArch64Insn insn,
@@ -596,6 +606,19 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
         return;
     }
 
+    /* Look for host pointer values within 4G of the PC.  This happens
+       often when loading pointers to QEMU's own data structures.  */
+    if (type == TCG_TYPE_I64) {
+        tcg_target_long disp = (value >> 12) - ((intptr_t)s->code_ptr >> 12);
+        if (disp == sextract64(disp, 0, 21)) {
+            tcg_out_insn(s, 3406, ADRP, rd, disp);
+            if (value & 0xfff) {
+                tcg_out_insn(s, 3401, ADDI, type, rd, rd, value & 0xfff);
+            }
+            return;
+        }
+    }
+
     /* Would it take fewer insns to begin with MOVN?  For the value and its
        inverse, count the number of 16-bit lanes that are 0.  */
     for (i = wantinv = imask = 0; i < 64; i += 16) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 08/26] tcg-aarch64: Use symbolic names for branches
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (6 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 07/26] tcg-aarch64: Use adrp " Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 09/26] tcg-aarch64: Create tcg_out_brcond Richard Henderson
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 74 ++++++++++++++++++++++++++++--------------------
 1 file changed, 43 insertions(+), 31 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 1337a13..8b15d3b 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -270,6 +270,18 @@ enum aarch64_ldst_op_type { /* type of operation */
    use the section number of the architecture reference manual in which the
    instruction group is described.  */
 typedef enum {
+    /* Conditional branch (immediate).  */
+    I3202_B_C       = 0x54000000,
+
+    /* Unconditional branch (immediate).  */
+    I3206_B         = 0x14000000,
+    I3206_BL        = 0x94000000,
+
+    /* Unconditional branch (register).  */
+    I3207_BR        = 0xd61f0000,
+    I3207_BLR       = 0xd63f0000,
+    I3207_RET       = 0xd65f0000,
+
     /* Add/subtract immediate instructions.  */
     I3401_ADDI      = 0x11000000,
     I3401_ADDSI     = 0x31000000,
@@ -421,6 +433,22 @@ static inline uint32_t tcg_in32(TCGContext *s)
 #define tcg_out_insn(S, FMT, OP, ...) \
     glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
 
+static void tcg_out_insn_3202(TCGContext *s, AArch64Insn insn,
+                              TCGCond c, int imm19)
+{
+    tcg_out32(s, insn | tcg_cond_to_aarch64[c] | (imm19 & 0x7ffff) << 5);
+}
+
+static void tcg_out_insn_3206(TCGContext *s, AArch64Insn insn, int imm26)
+{
+    tcg_out32(s, insn | (imm26 & 0x03ffffff));
+}
+
+static void tcg_out_insn_3207(TCGContext *s, AArch64Insn insn, TCGReg rn)
+{
+    tcg_out32(s, insn | rn << 5);
+}
+
 static void tcg_out_insn_3401(TCGContext *s, AArch64Insn insn, TCGType ext,
                               TCGReg rd, TCGReg rn, uint64_t aimm)
 {
@@ -817,28 +845,24 @@ static inline void tcg_out_goto(TCGContext *s, intptr_t target)
         tcg_abort();
     }
 
-    tcg_out32(s, 0x14000000 | (offset & 0x03ffffff));
+    tcg_out_insn(s, 3206, B, offset);
 }
 
 static inline void tcg_out_goto_noaddr(TCGContext *s)
 {
-    /* We pay attention here to not modify the branch target by
-       reading from the buffer. This ensure that caches and memory are
-       kept coherent during retranslation.
-       Mask away possible garbage in the high bits for the first translation,
-       while keeping the offset bits for retranslation. */
-    uint32_t insn;
-    insn = (tcg_in32(s) & 0x03ffffff) | 0x14000000;
-    tcg_out32(s, insn);
+    /* We pay attention here to not modify the branch target by reading from
+       the buffer. This ensure that caches and memory are kept coherent during
+       retranslation.  Mask away possible garbage in the high bits for the
+       first translation, while keeping the offset bits for retranslation. */
+    uint32_t old = tcg_in32(s);
+    tcg_out_insn(s, 3206, B, old);
 }
 
 static inline void tcg_out_goto_cond_noaddr(TCGContext *s, TCGCond c)
 {
-    /* see comments in tcg_out_goto_noaddr */
-    uint32_t insn;
-    insn = tcg_in32(s) & (0x07ffff << 5);
-    insn |= 0x54000000 | tcg_cond_to_aarch64[c];
-    tcg_out32(s, insn);
+    /* See comments in tcg_out_goto_noaddr.  */
+    uint32_t old = tcg_in32(s) >> 5;
+    tcg_out_insn(s, 3202, B_C, c, old);
 }
 
 static inline void tcg_out_goto_cond(TCGContext *s, TCGCond c, intptr_t target)
@@ -850,18 +874,12 @@ static inline void tcg_out_goto_cond(TCGContext *s, TCGCond c, intptr_t target)
         tcg_abort();
     }
 
-    offset &= 0x7ffff;
-    tcg_out32(s, 0x54000000 | tcg_cond_to_aarch64[c] | offset << 5);
+    tcg_out_insn(s, 3202, B_C, c, offset);
 }
 
 static inline void tcg_out_callr(TCGContext *s, TCGReg reg)
 {
-    tcg_out32(s, 0xd63f0000 | reg << 5);
-}
-
-static inline void tcg_out_gotor(TCGContext *s, TCGReg reg)
-{
-    tcg_out32(s, 0xd61f0000 | reg << 5);
+    tcg_out_insn(s, 3207, BLR, reg);
 }
 
 static inline void tcg_out_call(TCGContext *s, intptr_t target)
@@ -872,16 +890,10 @@ static inline void tcg_out_call(TCGContext *s, intptr_t target)
         tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, target);
         tcg_out_callr(s, TCG_REG_TMP);
     } else {
-        tcg_out32(s, 0x94000000 | (offset & 0x03ffffff));
+        tcg_out_insn(s, 3206, BL, offset);
     }
 }
 
-static inline void tcg_out_ret(TCGContext *s)
-{
-    /* emit RET { LR } */
-    tcg_out32(s, 0xd65f03c0);
-}
-
 void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
     intptr_t target = addr;
@@ -1899,7 +1911,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 #endif
 
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
-    tcg_out_gotor(s, tcg_target_call_iarg_regs[1]);
+    tcg_out_insn(s, 3207, BR, tcg_target_call_iarg_regs[1]);
 
     tb_ret_addr = s->code_ptr;
 
@@ -1917,5 +1929,5 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     /* pop (FP, LR), restore SP to previous frame, return */
     tcg_out_pop_pair(s, TCG_REG_SP,
                      TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
-    tcg_out_ret(s);
+    tcg_out_insn(s, 3207, RET, TCG_REG_LR);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 09/26] tcg-aarch64: Create tcg_out_brcond
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (7 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 08/26] tcg-aarch64: Use symbolic names for branches Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 10/26] tcg-aarch64: Use CBZ and CBNZ Richard Henderson
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Rearrange code to put the compare and branch in the same place.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 8b15d3b..5889a98 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -865,18 +865,6 @@ static inline void tcg_out_goto_cond_noaddr(TCGContext *s, TCGCond c)
     tcg_out_insn(s, 3202, B_C, c, old);
 }
 
-static inline void tcg_out_goto_cond(TCGContext *s, TCGCond c, intptr_t target)
-{
-    intptr_t offset = (target - (intptr_t)s->code_ptr) / 4;
-
-    if (offset < -0x40000 || offset >= 0x40000) {
-        /* out of 19bit range */
-        tcg_abort();
-    }
-
-    tcg_out_insn(s, 3202, B_C, c, offset);
-}
-
 static inline void tcg_out_callr(TCGContext *s, TCGReg reg)
 {
     tcg_out_insn(s, 3207, BLR, reg);
@@ -920,17 +908,24 @@ static inline void tcg_out_goto_label(TCGContext *s, int label_index)
     }
 }
 
-static inline void tcg_out_goto_label_cond(TCGContext *s,
-                                           TCGCond c, int label_index)
+static void tcg_out_brcond(TCGContext *s, TCGMemOp ext, TCGCond c, TCGArg a,
+                           TCGArg b, bool b_const, int label)
 {
-    TCGLabel *l = &s->labels[label_index];
+    TCGLabel *l = &s->labels[label];
+    intptr_t offset;
+
+    tcg_out_cmp(s, ext, a, b, b_const);
 
     if (!l->has_value) {
-        tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, label_index, 0);
-        tcg_out_goto_cond_noaddr(s, c);
+        tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, label, 0);
+        offset = tcg_in32(s) >> 5;
     } else {
-        tcg_out_goto_cond(s, c, l->u.value);
+        offset = l->u.value - (uintptr_t)s->code_ptr;
+        offset >>= 2;
+        assert(offset >= -0x40000 && offset < 0x40000);
     }
+
+    tcg_out_insn(s, 3202, B_C, c, offset);
 }
 
 static inline void tcg_out_rev(TCGContext *s, TCGType ext,
@@ -1571,8 +1566,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         a1 = (int32_t)a1;
         /* FALLTHRU */
     case INDEX_op_brcond_i64:
-        tcg_out_cmp(s, ext, a0, a1, const_args[1]);
-        tcg_out_goto_label_cond(s, a2, args[3]);
+        tcg_out_brcond(s, ext, a2, a0, a1, const_args[1], args[3]);
         break;
 
     case INDEX_op_setcond_i32:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 10/26] tcg-aarch64: Use CBZ and CBNZ
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (8 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 09/26] tcg-aarch64: Create tcg_out_brcond Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code Richard Henderson
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

A compare and branch against zero happens at the start of
every single TB.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5889a98..48a246d 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -270,6 +270,10 @@ enum aarch64_ldst_op_type { /* type of operation */
    use the section number of the architecture reference manual in which the
    instruction group is described.  */
 typedef enum {
+    /* Compare and branch (immediate).  */
+    I3201_CBZ       = 0x34000000,
+    I3201_CBNZ      = 0x35000000,
+
     /* Conditional branch (immediate).  */
     I3202_B_C       = 0x54000000,
 
@@ -433,6 +437,12 @@ static inline uint32_t tcg_in32(TCGContext *s)
 #define tcg_out_insn(S, FMT, OP, ...) \
     glue(tcg_out_insn_,FMT)(S, glue(glue(glue(I,FMT),_),OP), ## __VA_ARGS__)
 
+static void tcg_out_insn_3201(TCGContext *s, AArch64Insn insn, TCGType ext,
+                              TCGReg rt, int imm19)
+{
+    tcg_out32(s, insn | ext << 31 | (imm19 & 0x7ffff) << 5 | rt);
+}
+
 static void tcg_out_insn_3202(TCGContext *s, AArch64Insn insn,
                               TCGCond c, int imm19)
 {
@@ -913,8 +923,14 @@ static void tcg_out_brcond(TCGContext *s, TCGMemOp ext, TCGCond c, TCGArg a,
 {
     TCGLabel *l = &s->labels[label];
     intptr_t offset;
+    bool need_cmp;
 
-    tcg_out_cmp(s, ext, a, b, b_const);
+    if (b_const && b == 0 && (c == TCG_COND_EQ || c == TCG_COND_NE)) {
+        need_cmp = false;
+    } else {
+        need_cmp = true;
+        tcg_out_cmp(s, ext, a, b, b_const);
+    }
 
     if (!l->has_value) {
         tcg_out_reloc(s, s->code_ptr, R_AARCH64_CONDBR19, label, 0);
@@ -925,7 +941,13 @@ static void tcg_out_brcond(TCGContext *s, TCGMemOp ext, TCGCond c, TCGArg a,
         assert(offset >= -0x40000 && offset < 0x40000);
     }
 
-    tcg_out_insn(s, 3202, B_C, c, offset);
+    if (need_cmp) {
+        tcg_out_insn(s, 3202, B_C, c, offset);
+    } else if (c == TCG_COND_EQ) {
+        tcg_out_insn(s, 3201, CBZ, ext, a, offset);
+    } else {
+        tcg_out_insn(s, 3201, CBNZ, ext, a, offset);
+    }
 }
 
 static inline void tcg_out_rev(TCGContext *s, TCGType ext,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (9 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 10/26] tcg-aarch64: Use CBZ and CBNZ Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-07  8:03   ` Claudio Fontana
  2014-04-11 12:33   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 12/26] tcg-aarch64: Introduce tcg_out_insn_3314 Richard Henderson
                   ` (14 subsequent siblings)
  25 siblings, 2 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

It's obviously call-clobbered, but is otherwise unused.
Repurpose it as the TCG temporary.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 34 ++++++++++++++++------------------
 tcg/aarch64/tcg-target.h | 32 +++++++++++++++++---------------
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 48a246d..e36909e 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -23,10 +23,7 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7",
     "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15",
     "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23",
-    "%x24", "%x25", "%x26", "%x27", "%x28",
-    "%fp", /* frame pointer */
-    "%lr", /* link register */
-    "%sp",  /* stack pointer */
+    "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp",
 };
 #endif /* NDEBUG */
 
@@ -41,16 +38,17 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
     TCG_REG_X28, /* we will reserve this for GUEST_BASE if configured */
 
-    TCG_REG_X9, TCG_REG_X10, TCG_REG_X11, TCG_REG_X12,
-    TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
+    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
+    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
     TCG_REG_X16, TCG_REG_X17,
 
-    TCG_REG_X18, TCG_REG_X19, /* will not use these, see tcg_target_init */
-
     TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
     TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
 
-    TCG_REG_X8, /* will not use, see tcg_target_init */
+    /* X18 reserved by system */
+    /* X19 reserved for AREG0 */
+    /* X29 reserved as fp */
+    /* X30 reserved as temporary */
 };
 
 static const int tcg_target_call_iarg_regs[8] = {
@@ -61,13 +59,13 @@ static const int tcg_target_call_oarg_regs[1] = {
     TCG_REG_X0
 };
 
-#define TCG_REG_TMP TCG_REG_X8
+#define TCG_REG_TMP TCG_REG_X30
 
 #ifndef CONFIG_SOFTMMU
-# if defined(CONFIG_USE_GUEST_BASE)
-# define TCG_REG_GUEST_BASE TCG_REG_X28
+# ifdef CONFIG_USE_GUEST_BASE
+#  define TCG_REG_GUEST_BASE TCG_REG_X28
 # else
-# define TCG_REG_GUEST_BASE TCG_REG_XZR
+#  define TCG_REG_GUEST_BASE TCG_REG_XZR
 # endif
 #endif
 
@@ -1871,7 +1869,7 @@ static void tcg_target_init(TCGContext *s)
                      (1 << TCG_REG_X12) | (1 << TCG_REG_X13) |
                      (1 << TCG_REG_X14) | (1 << TCG_REG_X15) |
                      (1 << TCG_REG_X16) | (1 << TCG_REG_X17) |
-                     (1 << TCG_REG_X18));
+                     (1 << TCG_REG_X18) | (1 << TCG_REG_X30));
 
     tcg_regset_clear(s->reserved_regs);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
@@ -1902,13 +1900,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_push_pair(s, TCG_REG_SP,
                       TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
 
-    /* FP -> callee_saved */
+    /* Set up frame pointer for canonical unwinding.  */
     tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
 
-    /* store callee-preserved regs x19..x28 using FP -> callee_saved */
+    /* Store callee-preserved regs x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
         int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_store_pair(s, TCG_REG_FP, r, r + 1, idx);
+        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
     }
 
     /* Make stack space for TCG locals.  */
@@ -1939,7 +1937,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
        FP must be preserved, so it still points to callee_saved area */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
         int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_load_pair(s, TCG_REG_FP, r, r + 1, idx);
+        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
     }
 
     /* pop (FP, LR), restore SP to previous frame, return */
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 988983e..faccc36 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -17,17 +17,23 @@
 #undef TCG_TARGET_STACK_GROWSUP
 
 typedef enum {
-    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3, TCG_REG_X4,
-    TCG_REG_X5, TCG_REG_X6, TCG_REG_X7, TCG_REG_X8, TCG_REG_X9,
-    TCG_REG_X10, TCG_REG_X11, TCG_REG_X12, TCG_REG_X13, TCG_REG_X14,
-    TCG_REG_X15, TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
-    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23, TCG_REG_X24,
-    TCG_REG_X25, TCG_REG_X26, TCG_REG_X27, TCG_REG_X28,
-    TCG_REG_FP,  /* frame pointer */
-    TCG_REG_LR, /* link register */
-    TCG_REG_SP,  /* stack pointer or zero register */
-    TCG_REG_XZR = TCG_REG_SP /* same register number */
-    /* program counter is not directly accessible! */
+    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
+    TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
+    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
+    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
+    TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
+    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
+    TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
+    TCG_REG_X28, TCG_REG_X29, TCG_REG_X30,
+
+    /* X31 is either the stack pointer or zero, depending on context.  */
+    TCG_REG_SP = 31,
+    TCG_REG_XZR = 31,
+
+    /* Aliases.  */
+    TCG_REG_FP = TCG_REG_X29,
+    TCG_REG_LR = TCG_REG_X30,
+    TCG_AREG0  = TCG_REG_X19,
 } TCGReg;
 
 #define TCG_TARGET_NB_REGS 32
@@ -92,10 +98,6 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
-enum {
-    TCG_AREG0 = TCG_REG_X19,
-};
-
 #define TCG_TARGET_HAS_new_ldst         0
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 12/26] tcg-aarch64: Introduce tcg_out_insn_3314
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (10 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:34   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 13/26] tcg-aarch64: Implement tcg_register_jit Richard Henderson
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Combines 4 other inline functions and tidies the prologue.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 100 ++++++++++++++++-------------------------------
 1 file changed, 33 insertions(+), 67 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index e36909e..5cffe50 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -284,6 +284,10 @@ typedef enum {
     I3207_BLR       = 0xd63f0000,
     I3207_RET       = 0xd65f0000,
 
+    /* Load/store register pair instructions.  */
+    I3314_LDP       = 0x28400000,
+    I3314_STP       = 0x28000000,
+
     /* Add/subtract immediate instructions.  */
     I3401_ADDI      = 0x11000000,
     I3401_ADDSI     = 0x31000000,
@@ -457,6 +461,20 @@ static void tcg_out_insn_3207(TCGContext *s, AArch64Insn insn, TCGReg rn)
     tcg_out32(s, insn | rn << 5);
 }
 
+static void tcg_out_insn_3314(TCGContext *s, AArch64Insn insn,
+                              TCGReg r1, TCGReg r2, TCGReg rn,
+                              tcg_target_long ofs, bool pre, bool w)
+{
+    insn |= 1u << 31; /* ext */
+    insn |= pre << 24;
+    insn |= w << 23;
+
+    assert(ofs >= -0x200 && ofs < 0x200 && (ofs & 7) == 0);
+    insn |= (ofs & (0x7f << 3)) << (15 - 3);
+
+    tcg_out32(s, insn | r2 << 10 | rn << 5 | r1);
+}
+
 static void tcg_out_insn_3401(TCGContext *s, AArch64Insn insn, TCGType ext,
                               TCGReg rd, TCGReg rn, uint64_t aimm)
 {
@@ -1292,56 +1310,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 
 static uint8_t *tb_ret_addr;
 
-/* callee stack use example:
-   stp     x29, x30, [sp,#-32]!
-   mov     x29, sp
-   stp     x1, x2, [sp,#16]
-   ...
-   ldp     x1, x2, [sp,#16]
-   ldp     x29, x30, [sp],#32
-   ret
-*/
-
-/* push r1 and r2, and alloc stack space for a total of
-   alloc_n elements (1 element=16 bytes, must be between 1 and 31. */
-static inline void tcg_out_push_pair(TCGContext *s, TCGReg addr,
-                                     TCGReg r1, TCGReg r2, int alloc_n)
-{
-    /* using indexed scaled simm7 STP 0x28800000 | (ext) | 0x01000000 (pre-idx)
-       | alloc_n * (-1) << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(alloc_n > 0 && alloc_n < 0x20);
-    alloc_n = (-alloc_n) & 0x3f;
-    tcg_out32(s, 0xa9800000 | alloc_n << 16 | r2 << 10 | addr << 5 | r1);
-}
-
-/* dealloc stack space for a total of alloc_n elements and pop r1, r2.  */
-static inline void tcg_out_pop_pair(TCGContext *s, TCGReg addr,
-                                    TCGReg r1, TCGReg r2, int alloc_n)
-{
-    /* using indexed scaled simm7 LDP 0x28c00000 | (ext) | nothing (post-idx)
-       | alloc_n << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(alloc_n > 0 && alloc_n < 0x20);
-    tcg_out32(s, 0xa8c00000 | alloc_n << 16 | r2 << 10 | addr << 5 | r1);
-}
-
-static inline void tcg_out_store_pair(TCGContext *s, TCGReg addr,
-                                      TCGReg r1, TCGReg r2, int idx)
-{
-    /* using register pair offset simm7 STP 0x29000000 | (ext)
-       | idx << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(idx > 0 && idx < 0x20);
-    tcg_out32(s, 0xa9000000 | idx << 16 | r2 << 10 | addr << 5 | r1);
-}
-
-static inline void tcg_out_load_pair(TCGContext *s, TCGReg addr,
-                                     TCGReg r1, TCGReg r2, int idx)
-{
-    /* using register pair offset simm7 LDP 0x29400000 | (ext)
-       | idx << 16 | r2 << 10 | addr << 5 | r1 */
-    assert(idx > 0 && idx < 0x20);
-    tcg_out32(s, 0xa9400000 | idx << 16 | r2 << 10 | addr << 5 | r1);
-}
-
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
                        const TCGArg args[TCG_MAX_OP_ARGS],
                        const int const_args[TCG_MAX_OP_ARGS])
@@ -1887,33 +1855,32 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     TCGReg r;
 
     /* save pairs             (FP, LR) and (X19, X20) .. (X27, X28) */
-    frame_size_callee_saved = (1) + (TCG_REG_X28 - TCG_REG_X19) / 2 + 1;
+    frame_size_callee_saved = 16 + (TCG_REG_X28 - TCG_REG_X19 + 1) * 8;
 
     /* frame size requirement for TCG local variables */
     frame_size_tcg_locals = TCG_STATIC_CALL_ARGS_SIZE
         + CPU_TEMP_BUF_NLONGS * sizeof(long)
         + (TCG_TARGET_STACK_ALIGN - 1);
     frame_size_tcg_locals &= ~(TCG_TARGET_STACK_ALIGN - 1);
-    frame_size_tcg_locals /= TCG_TARGET_STACK_ALIGN;
 
-    /* push (FP, LR) and update sp */
-    tcg_out_push_pair(s, TCG_REG_SP,
-                      TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
+    /* Push (FP, LR) and allocate space for all saved registers.  */
+    tcg_out_insn(s, 3314, STP, TCG_REG_FP, TCG_REG_LR,
+                 TCG_REG_SP, -frame_size_callee_saved, 1, 1);
 
     /* Set up frame pointer for canonical unwinding.  */
     tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
 
     /* Store callee-preserved regs x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
-        int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
+        int ofs = (r - TCG_REG_X19 + 2) * 8;
+        tcg_out_insn(s, 3314, STP, r, r + 1, TCG_REG_SP, ofs, 1, 0);
     }
 
     /* Make stack space for TCG locals.  */
     tcg_out_insn(s, 3401, SUBI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
-                 frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+                 frame_size_tcg_locals);
 
-    /* inform TCG about how to find TCG locals with register, offset, size */
+    /* Inform TCG about how to find TCG locals with register, offset, size.  */
     tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE,
                   CPU_TEMP_BUF_NLONGS * sizeof(long));
 
@@ -1931,17 +1898,16 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Remove TCG locals stack space.  */
     tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
-                 frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
+                 frame_size_tcg_locals);
 
-    /* restore registers x19..x28.
-       FP must be preserved, so it still points to callee_saved area */
+    /* Restore registers x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
-        int idx = (r - TCG_REG_X19) / 2 + 1;
-        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
+        int ofs = (r - TCG_REG_X19 + 2) * 8;
+        tcg_out_insn(s, 3314, LDP, r, r + 1, TCG_REG_SP, ofs, 1, 0);
     }
 
-    /* pop (FP, LR), restore SP to previous frame, return */
-    tcg_out_pop_pair(s, TCG_REG_SP,
-                     TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
+    /* Pop (FP, LR), restore SP to previous frame.  */
+    tcg_out_insn(s, 3314, LDP, TCG_REG_FP, TCG_REG_LR,
+                 TCG_REG_SP, frame_size_callee_saved, 0, 1);
     tcg_out_insn(s, 3207, RET, TCG_REG_LR);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 13/26] tcg-aarch64: Implement tcg_register_jit
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (11 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 12/26] tcg-aarch64: Introduce tcg_out_insn_3314 Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:34   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 14/26] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 84 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 69 insertions(+), 15 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5cffe50..4414bd1 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1848,24 +1848,29 @@ static void tcg_target_init(TCGContext *s)
     tcg_add_target_add_op_defs(aarch64_op_defs);
 }
 
+/* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
+#define PUSH_SIZE  ((30 - 19 + 1) * 8)
+
+#define FRAME_SIZE \
+    ((PUSH_SIZE \
+      + TCG_STATIC_CALL_ARGS_SIZE \
+      + CPU_TEMP_BUF_NLONGS * sizeof(long) \
+      + TCG_TARGET_STACK_ALIGN - 1) \
+     & ~(TCG_TARGET_STACK_ALIGN - 1))
+
+/* We're expecting a 2 byte uleb128 encoded value.  */
+QEMU_BUILD_BUG_ON(FRAME_SIZE >= (1 << 14));
+
+/* We're expecting to use a single ADDI insn.  */
+QEMU_BUILD_BUG_ON(FRAME_SIZE - PUSH_SIZE > 0xfff);
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    /* NB: frame sizes are in 16 byte stack units! */
-    int frame_size_callee_saved, frame_size_tcg_locals;
     TCGReg r;
 
-    /* save pairs             (FP, LR) and (X19, X20) .. (X27, X28) */
-    frame_size_callee_saved = 16 + (TCG_REG_X28 - TCG_REG_X19 + 1) * 8;
-
-    /* frame size requirement for TCG local variables */
-    frame_size_tcg_locals = TCG_STATIC_CALL_ARGS_SIZE
-        + CPU_TEMP_BUF_NLONGS * sizeof(long)
-        + (TCG_TARGET_STACK_ALIGN - 1);
-    frame_size_tcg_locals &= ~(TCG_TARGET_STACK_ALIGN - 1);
-
     /* Push (FP, LR) and allocate space for all saved registers.  */
     tcg_out_insn(s, 3314, STP, TCG_REG_FP, TCG_REG_LR,
-                 TCG_REG_SP, -frame_size_callee_saved, 1, 1);
+                 TCG_REG_SP, -PUSH_SIZE, 1, 1);
 
     /* Set up frame pointer for canonical unwinding.  */
     tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
@@ -1878,7 +1883,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Make stack space for TCG locals.  */
     tcg_out_insn(s, 3401, SUBI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
-                 frame_size_tcg_locals);
+                 FRAME_SIZE - PUSH_SIZE);
 
     /* Inform TCG about how to find TCG locals with register, offset, size.  */
     tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE,
@@ -1898,7 +1903,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Remove TCG locals stack space.  */
     tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
-                 frame_size_tcg_locals);
+                 FRAME_SIZE - PUSH_SIZE);
 
     /* Restore registers x19..x28.  */
     for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
@@ -1908,6 +1913,55 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     /* Pop (FP, LR), restore SP to previous frame.  */
     tcg_out_insn(s, 3314, LDP, TCG_REG_FP, TCG_REG_LR,
-                 TCG_REG_SP, frame_size_callee_saved, 0, 1);
+                 TCG_REG_SP, PUSH_SIZE, 0, 1);
     tcg_out_insn(s, 3207, RET, TCG_REG_LR);
 }
+
+typedef struct {
+    DebugFrameCIE cie;
+    DebugFrameFDEHeader fde;
+    uint8_t fde_def_cfa[4];
+    uint8_t fde_reg_ofs[24];
+} DebugFrame;
+
+#define ELF_HOST_MACHINE EM_AARCH64
+
+static DebugFrame debug_frame = {
+    .cie.len = sizeof(DebugFrameCIE)-4, /* length after .len member */
+    .cie.id = -1,
+    .cie.version = 1,
+    .cie.code_align = 1,
+    .cie.data_align = 0x78,             /* sleb128 -8 */
+    .cie.return_column = TCG_REG_LR,
+
+    /* Total FDE size does not include the "len" member.  */
+    .fde.len = sizeof(DebugFrame) - offsetof(DebugFrame, fde.cie_offset),
+
+    .fde_def_cfa = {
+        12, TCG_REG_SP,                 /* DW_CFA_def_cfa sp, ... */
+        (FRAME_SIZE & 0x7f) | 0x80,     /* ... uleb128 FRAME_SIZE */
+        (FRAME_SIZE >> 7)
+    },
+    .fde_reg_ofs = {
+        0x80 + 28, 1,                   /* DW_CFA_offset, x28,  -8 */
+        0x80 + 27, 2,                   /* DW_CFA_offset, x27, -16 */
+        0x80 + 26, 3,                   /* DW_CFA_offset, x26, -24 */
+        0x80 + 25, 4,                   /* DW_CFA_offset, x25, -32 */
+        0x80 + 24, 5,                   /* DW_CFA_offset, x24, -40 */
+        0x80 + 23, 6,                   /* DW_CFA_offset, x23, -48 */
+        0x80 + 22, 7,                   /* DW_CFA_offset, x22, -56 */
+        0x80 + 21, 8,                   /* DW_CFA_offset, x21, -64 */
+        0x80 + 20, 9,                   /* DW_CFA_offset, x20, -72 */
+        0x80 + 19, 10,                  /* DW_CFA_offset, x1p, -80 */
+        0x80 + 30, 11,                  /* DW_CFA_offset,  lr, -88 */
+        0x80 + 29, 12,                  /* DW_CFA_offset,  fp, -96 */
+    }
+};
+
+void tcg_register_jit(void *buf, size_t buf_size)
+{
+    debug_frame.fde.func_start = (intptr_t)buf;
+    debug_frame.fde.func_len = buf_size;
+
+    tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
+}
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 14/26] tcg-aarch64: Avoid add with zero in tlb load
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (12 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 13/26] tcg-aarch64: Implement tcg_register_jit Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 15/26] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Some guest env are small enough to reach the tlb with only a 12-bit addition.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 4414bd1..5186311 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1128,47 +1128,57 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
    slow path for the failure case, which will be patched later when finalizing
    the slow path. Generated code returns the host addend in X1,
    clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg,
-            int s_bits, uint8_t **label_ptr, int mem_index, int is_read)
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
+                             uint8_t **label_ptr, int mem_index, bool is_read)
 {
     TCGReg base = TCG_AREG0;
     int tlb_offset = is_read ?
         offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
         : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
+
     /* Extract the TLB index from the address into X0.
        X0<CPU_TLB_BITS:0> =
        addr_reg<TARGET_PAGE_BITS+CPU_TLB_BITS:TARGET_PAGE_BITS> */
-    tcg_out_ubfm(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, addr_reg,
+    tcg_out_ubfm(s, TARGET_LONG_BITS == 64, TCG_REG_X0, addr_reg,
                  TARGET_PAGE_BITS, TARGET_PAGE_BITS + CPU_TLB_BITS);
+
     /* Store the page mask part of the address and the low s_bits into X3.
        Later this allows checking for equality and alignment at the same time.
        X3 = addr_reg & (PAGE_MASK | ((1 << s_bits) - 1)) */
     tcg_out_logicali(s, I3404_ANDI, TARGET_LONG_BITS == 64, TCG_REG_X3,
                      addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+
     /* Add any "high bits" from the tlb offset to the env address into X2,
        to take advantage of the LSL12 form of the ADDI instruction.
        X2 = env + (tlb_offset & 0xfff000) */
-    tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_X2, base,
-                 tlb_offset & 0xfff000);
+    if (tlb_offset & 0xfff000) {
+        tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_X2, base,
+                     tlb_offset & 0xfff000);
+        base = TCG_REG_X2;
+    }
+
     /* Merge the tlb index contribution into X2.
        X2 = X2 + (X0 << CPU_TLB_ENTRY_BITS) */
-    tcg_out_insn(s, 3502S, ADD_LSL, 1, TCG_REG_X2, TCG_REG_X2,
+    tcg_out_insn(s, 3502S, ADD_LSL, TCG_TYPE_I64, TCG_REG_X2, base,
                  TCG_REG_X0, CPU_TLB_ENTRY_BITS);
+
     /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
        X0 = load [X2 + (tlb_offset & 0x000fff)] */
     tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
-                 LDST_LD, TCG_REG_X0, TCG_REG_X2,
-                 (tlb_offset & 0xfff));
+                 LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
+
     /* Load the tlb addend. Do that early to avoid stalling.
        X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
     tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
                  (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
                  (is_read ? offsetof(CPUTLBEntry, addr_read)
                   : offsetof(CPUTLBEntry, addr_write)));
+
     /* Perform the address comparison. */
     tcg_out_cmp(s, (TARGET_LONG_BITS == 64), TCG_REG_X0, TCG_REG_X3, 0);
-    *label_ptr = s->code_ptr;
+
     /* If not equal, we jump to the slow path. */
+    *label_ptr = s->code_ptr;
     tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 15/26] tcg-aarch64: Use tcg_out_call for qemu_ld/st
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (13 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 14/26] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 16/26] tcg-aarch64: Use ADR to pass the return address to the ld/st helpers Richard Henderson
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

In some cases, a direct branch will be in range.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5186311..4729d11 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1081,8 +1081,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X3, (intptr_t)lb->raddr);
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)qemu_ld_helpers[size]);
-    tcg_out_callr(s, TCG_REG_TMP);
+    tcg_out_call(s, (intptr_t)qemu_ld_helpers[size]);
     if (opc & MO_SIGN) {
         tcg_out_sxt(s, TCG_TYPE_I64, size, lb->datalo_reg, TCG_REG_X0);
     } else {
@@ -1103,8 +1102,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X4, (intptr_t)lb->raddr);
-    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, (intptr_t)qemu_st_helpers[size]);
-    tcg_out_callr(s, TCG_REG_TMP);
+    tcg_out_call(s, (intptr_t)qemu_st_helpers[size]);
     tcg_out_goto(s, (intptr_t)lb->raddr);
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 16/26] tcg-aarch64: Use ADR to pass the return address to the ld/st helpers
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (14 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 15/26] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 17/26] tcg-aarch64: Use TCGMemOp in qemu_ld/st Richard Henderson
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 4729d11..5d19e27 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1070,6 +1070,13 @@ static const void * const qemu_st_helpers[4] = {
     helper_ret_stq_mmu,
 };
 
+static inline void tcg_out_adr(TCGContext *s, TCGReg rd, uintptr_t addr)
+{
+    addr -= (uintptr_t)s->code_ptr;
+    assert(addr == sextract64(addr, 0, 21));
+    tcg_out_insn(s, 3406, ADR, rd, addr);
+}
+
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
     TCGMemOp opc = lb->opc;
@@ -1080,7 +1087,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
     tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
-    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X3, (intptr_t)lb->raddr);
+    tcg_out_adr(s, TCG_REG_X3, (intptr_t)lb->raddr);
     tcg_out_call(s, (intptr_t)qemu_ld_helpers[size]);
     if (opc & MO_SIGN) {
         tcg_out_sxt(s, TCG_TYPE_I64, size, lb->datalo_reg, TCG_REG_X0);
@@ -1101,7 +1108,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movr(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
-    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_X4, (intptr_t)lb->raddr);
+    tcg_out_adr(s, TCG_REG_X4, (intptr_t)lb->raddr);
     tcg_out_call(s, (intptr_t)qemu_st_helpers[size]);
     tcg_out_goto(s, (intptr_t)lb->raddr);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 17/26] tcg-aarch64: Use TCGMemOp in qemu_ld/st
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (15 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 16/26] tcg-aarch64: Use ADR to pass the return address to the ld/st helpers Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 18/26] tcg-aarch64: Pass qemu_ld/st arguments directly Richard Henderson
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Making the bswap conditional on the memop instead of a compile-time test.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 131 +++++++++++++++++++++++------------------------
 1 file changed, 63 insertions(+), 68 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5d19e27..68305ea 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -27,12 +27,6 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif /* NDEBUG */
 
-#ifdef TARGET_WORDS_BIGENDIAN
- #define TCG_LDST_BSWAP 1
-#else
- #define TCG_LDST_BSWAP 0
-#endif
-
 static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
     TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
@@ -1113,7 +1107,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_goto(s, (intptr_t)lb->raddr);
 }
 
-static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
+static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOp opc,
                                 TCGReg data_reg, TCGReg addr_reg,
                                 int mem_index,
                                 uint8_t *raddr, uint8_t *label_ptr)
@@ -1133,7 +1127,7 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
    slow path for the failure case, which will be patched later when finalizing
    the slow path. Generated code returns the host addend in X1,
    clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
                              uint8_t **label_ptr, int mem_index, bool is_read)
 {
     TCGReg base = TCG_AREG0;
@@ -1189,24 +1183,26 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
 
 #endif /* CONFIG_SOFTMMU */
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
-                                   TCGReg addr_r, TCGReg off_r)
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop,
+                                   TCGReg data_r, TCGReg addr_r, TCGReg off_r)
 {
-    switch (opc) {
-    case 0:
+    const TCGMemOp bswap = memop & MO_BSWAP;
+
+    switch (memop & MO_SSIZE) {
+    case MO_UB:
         tcg_out_ldst_r(s, LDST_8, LDST_LD, data_r, addr_r, off_r);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ldst_r(s, LDST_8, LDST_LD_S_X, data_r, addr_r, off_r);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
-        if (TCG_LDST_BSWAP) {
+        if (bswap) {
             tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
-    case 1 | 4:
-        if (TCG_LDST_BSWAP) {
+    case MO_SW:
+        if (bswap) {
             tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
             tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
             tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
@@ -1214,14 +1210,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
             tcg_out_ldst_r(s, LDST_16, LDST_LD_S_X, data_r, addr_r, off_r);
         }
         break;
-    case 2:
+    case MO_UL:
         tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
-        if (TCG_LDST_BSWAP) {
+        if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
-    case 2 | 4:
-        if (TCG_LDST_BSWAP) {
+    case MO_SL:
+        if (bswap) {
             tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
             tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
             tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
@@ -1229,9 +1225,9 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
             tcg_out_ldst_r(s, LDST_32, LDST_LD_S_X, data_r, addr_r, off_r);
         }
         break;
-    case 3:
+    case MO_Q:
         tcg_out_ldst_r(s, LDST_64, LDST_LD, data_r, addr_r, off_r);
-        if (TCG_LDST_BSWAP) {
+        if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
         }
         break;
@@ -1240,47 +1236,47 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
     }
 }
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data_r,
-                                   TCGReg addr_r, TCGReg off_r)
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
+                                   TCGReg data_r, TCGReg addr_r, TCGReg off_r)
 {
-    switch (opc) {
-    case 0:
+    const TCGMemOp bswap = memop & MO_BSWAP;
+
+    switch (memop & MO_SIZE) {
+    case MO_8:
         tcg_out_ldst_r(s, LDST_8, LDST_ST, data_r, addr_r, off_r);
         break;
-    case 1:
-        if (TCG_LDST_BSWAP) {
+    case MO_16:
+        if (bswap) {
             tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
-            tcg_out_ldst_r(s, LDST_16, LDST_ST, TCG_REG_TMP, addr_r, off_r);
-        } else {
-            tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
+            data_r = TCG_REG_TMP;
         }
+        tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
         break;
-    case 2:
-        if (TCG_LDST_BSWAP) {
+    case MO_32:
+        if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
-            tcg_out_ldst_r(s, LDST_32, LDST_ST, TCG_REG_TMP, addr_r, off_r);
-        } else {
-            tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
+            data_r = TCG_REG_TMP;
         }
+        tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
         break;
-    case 3:
-        if (TCG_LDST_BSWAP) {
+    case MO_64:
+        if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
-            tcg_out_ldst_r(s, LDST_64, LDST_ST, TCG_REG_TMP, addr_r, off_r);
-        } else {
-            tcg_out_ldst_r(s, LDST_64, LDST_ST, data_r, addr_r, off_r);
+            data_r = TCG_REG_TMP;
         }
+        tcg_out_ldst_r(s, LDST_64, LDST_ST, data_r, addr_r, off_r);
         break;
     default:
         tcg_abort();
     }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp memop)
 {
     TCGReg addr_reg, data_reg;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index;
+    TCGMemOp s_bits;
     uint8_t *label_ptr;
 #endif
     data_reg = args[0];
@@ -1288,22 +1284,23 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = args[2];
-    s_bits = opc & 3;
+    s_bits = memop & MO_SIZE;
     tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 1);
-    tcg_out_qemu_ld_direct(s, opc, data_reg, addr_reg, TCG_REG_X1);
-    add_qemu_ldst_label(s, 1, opc, data_reg, addr_reg,
+    tcg_out_qemu_ld_direct(s, memop, data_reg, addr_reg, TCG_REG_X1);
+    add_qemu_ldst_label(s, 1, memop, data_reg, addr_reg,
                         mem_index, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
-    tcg_out_qemu_ld_direct(s, opc, data_reg, addr_reg,
+    tcg_out_qemu_ld_direct(s, memop, data_reg, addr_reg,
                            GUEST_BASE ? TCG_REG_GUEST_BASE : TCG_REG_XZR);
 #endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp memop)
 {
     TCGReg addr_reg, data_reg;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index;
+    TCGMemOp s_bits;
     uint8_t *label_ptr;
 #endif
     data_reg = args[0];
@@ -1311,14 +1308,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = args[2];
-    s_bits = opc & 3;
+    s_bits = memop & MO_SIZE;
 
     tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 0);
-    tcg_out_qemu_st_direct(s, opc, data_reg, addr_reg, TCG_REG_X1);
-    add_qemu_ldst_label(s, 0, opc, data_reg, addr_reg,
+    tcg_out_qemu_st_direct(s, memop, data_reg, addr_reg, TCG_REG_X1);
+    add_qemu_ldst_label(s, 0, memop, data_reg, addr_reg,
                         mem_index, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
-    tcg_out_qemu_st_direct(s, opc, data_reg, addr_reg,
+    tcg_out_qemu_st_direct(s, memop, data_reg, addr_reg,
                            GUEST_BASE ? TCG_REG_GUEST_BASE : TCG_REG_XZR);
 #endif /* CONFIG_SOFTMMU */
 }
@@ -1591,40 +1588,38 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0 | 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 4 | 0);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 0 | 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 4 | 1);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32u:
-        tcg_out_qemu_ld(s, args, 0 | 2);
+    case INDEX_op_qemu_ld32:
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, 4 | 2);
-        break;
-    case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 0 | 2);
+        tcg_out_qemu_ld(s, args, MO_TESL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 0 | 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     case INDEX_op_bswap32_i64:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 18/26] tcg-aarch64: Pass qemu_ld/st arguments directly
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (16 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 17/26] tcg-aarch64: Use TCGMemOp in qemu_ld/st Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:34   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 19/26] tcg-aarch64: Implement TCG_TARGET_HAS_new_ldst Richard Henderson
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Instead of passing them the "args" array.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 49 +++++++++++++++++-------------------------------
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 68305ea..3a2955f 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1271,20 +1271,13 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
     }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp memop)
+static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
+                            TCGMemOp memop, int mem_index)
 {
-    TCGReg addr_reg, data_reg;
 #ifdef CONFIG_SOFTMMU
-    int mem_index;
-    TCGMemOp s_bits;
+    TCGMemOp s_bits = memop & MO_SIZE;
     uint8_t *label_ptr;
-#endif
-    data_reg = args[0];
-    addr_reg = args[1];
 
-#ifdef CONFIG_SOFTMMU
-    mem_index = args[2];
-    s_bits = memop & MO_SIZE;
     tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 1);
     tcg_out_qemu_ld_direct(s, memop, data_reg, addr_reg, TCG_REG_X1);
     add_qemu_ldst_label(s, 1, memop, data_reg, addr_reg,
@@ -1295,20 +1288,12 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp memop)
 #endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp memop)
+static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
+                            TCGMemOp memop, int mem_index)
 {
-    TCGReg addr_reg, data_reg;
 #ifdef CONFIG_SOFTMMU
-    int mem_index;
-    TCGMemOp s_bits;
+    TCGMemOp s_bits = memop & MO_SIZE;
     uint8_t *label_ptr;
-#endif
-    data_reg = args[0];
-    addr_reg = args[1];
-
-#ifdef CONFIG_SOFTMMU
-    mem_index = args[2];
-    s_bits = memop & MO_SIZE;
 
     tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 0);
     tcg_out_qemu_st_direct(s, memop, data_reg, addr_reg, TCG_REG_X1);
@@ -1588,38 +1573,38 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, MO_UB);
+        tcg_out_qemu_ld(s, a0, a1, MO_UB, a2);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, MO_SB);
+        tcg_out_qemu_ld(s, a0, a1, MO_SB, a2);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, MO_TEUW);
+        tcg_out_qemu_ld(s, a0, a1, MO_TEUW, a2);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, MO_TESW);
+        tcg_out_qemu_ld(s, a0, a1, MO_TESW, a2);
         break;
     case INDEX_op_qemu_ld32u:
     case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, MO_TEUL);
+        tcg_out_qemu_ld(s, a0, a1, MO_TEUL, a2);
         break;
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, MO_TESL);
+        tcg_out_qemu_ld(s, a0, a1, MO_TESL, a2);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, MO_TEQ);
+        tcg_out_qemu_ld(s, a0, a1, MO_TEQ, a2);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, MO_UB);
+        tcg_out_qemu_st(s, a0, a1, MO_UB, a2);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, MO_TEUW);
+        tcg_out_qemu_st(s, a0, a1, MO_TEUW, a2);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, MO_TEUL);
+        tcg_out_qemu_st(s, a0, a1, MO_TEUL, a2);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, MO_TEQ);
+        tcg_out_qemu_st(s, a0, a1, MO_TEQ, a2);
         break;
 
     case INDEX_op_bswap32_i64:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 19/26] tcg-aarch64: Implement TCG_TARGET_HAS_new_ldst
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (17 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 18/26] tcg-aarch64: Pass qemu_ld/st arguments directly Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 20/26] tcg-aarch64: Support stores of zero Richard Henderson
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 89 ++++++++++++++++--------------------------------
 tcg/aarch64/tcg-target.h |  2 +-
 2 files changed, 31 insertions(+), 60 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 3a2955f..34e477d 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1047,21 +1047,27 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
-static const void * const qemu_ld_helpers[4] = {
-    helper_ret_ldub_mmu,
-    helper_ret_lduw_mmu,
-    helper_ret_ldul_mmu,
-    helper_ret_ldq_mmu,
+static const void * const qemu_ld_helpers[16] = {
+    [MO_UB]   = helper_ret_ldub_mmu,
+    [MO_LEUW] = helper_le_lduw_mmu,
+    [MO_LEUL] = helper_le_ldul_mmu,
+    [MO_LEQ]  = helper_le_ldq_mmu,
+    [MO_BEUW] = helper_be_lduw_mmu,
+    [MO_BEUL] = helper_be_ldul_mmu,
+    [MO_BEQ]  = helper_be_ldq_mmu,
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
-static const void * const qemu_st_helpers[4] = {
-    helper_ret_stb_mmu,
-    helper_ret_stw_mmu,
-    helper_ret_stl_mmu,
-    helper_ret_stq_mmu,
+static const void * const qemu_st_helpers[16] = {
+    [MO_UB]   = helper_ret_stb_mmu,
+    [MO_LEUW] = helper_le_stw_mmu,
+    [MO_LEUL] = helper_le_stl_mmu,
+    [MO_LEQ]  = helper_le_stq_mmu,
+    [MO_BEUW] = helper_be_stw_mmu,
+    [MO_BEUL] = helper_be_stl_mmu,
+    [MO_BEQ]  = helper_be_stq_mmu,
 };
 
 static inline void tcg_out_adr(TCGContext *s, TCGReg rd, uintptr_t addr)
@@ -1082,7 +1088,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X3, (intptr_t)lb->raddr);
-    tcg_out_call(s, (intptr_t)qemu_ld_helpers[size]);
+    tcg_out_call(s, (intptr_t)qemu_ld_helpers[opc & ~MO_SIGN]);
     if (opc & MO_SIGN) {
         tcg_out_sxt(s, TCG_TYPE_I64, size, lb->datalo_reg, TCG_REG_X0);
     } else {
@@ -1094,7 +1100,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-    TCGMemOp size = lb->opc;
+    TCGMemOp opc = lb->opc;
+    TCGMemOp size = opc & MO_SIZE;
 
     reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
 
@@ -1103,7 +1110,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movr(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X4, (intptr_t)lb->raddr);
-    tcg_out_call(s, (intptr_t)qemu_st_helpers[size]);
+    tcg_out_call(s, (intptr_t)qemu_st_helpers[opc]);
     tcg_out_goto(s, (intptr_t)lb->raddr);
 }
 
@@ -1572,39 +1579,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3506, CSEL, ext, a0, REG0(3), REG0(4), args[5]);
         break;
 
-    case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, a0, a1, MO_UB, a2);
-        break;
-    case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, a0, a1, MO_SB, a2);
-        break;
-    case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, a0, a1, MO_TEUW, a2);
-        break;
-    case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, a0, a1, MO_TESW, a2);
-        break;
-    case INDEX_op_qemu_ld32u:
-    case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, a0, a1, MO_TEUL, a2);
-        break;
-    case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, a0, a1, MO_TESL, a2);
+    case INDEX_op_qemu_ld_i32:
+    case INDEX_op_qemu_ld_i64:
+        tcg_out_qemu_ld(s, a0, a1, a2, args[3]);
         break;
-    case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, a0, a1, MO_TEQ, a2);
-        break;
-    case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, a0, a1, MO_UB, a2);
-        break;
-    case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, a0, a1, MO_TEUW, a2);
-        break;
-    case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, a0, a1, MO_TEUL, a2);
-        break;
-    case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, a0, a1, MO_TEQ, a2);
+    case INDEX_op_qemu_st_i32:
+    case INDEX_op_qemu_st_i64:
+        tcg_out_qemu_st(s, a0, a1, a2, args[3]);
         break;
 
     case INDEX_op_bswap32_i64:
@@ -1770,20 +1751,10 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_movcond_i32, { "r", "r", "rwA", "rZ", "rZ" } },
     { INDEX_op_movcond_i64, { "r", "r", "rA", "rZ", "rZ" } },
 
-    { INDEX_op_qemu_ld8u, { "r", "l" } },
-    { INDEX_op_qemu_ld8s, { "r", "l" } },
-    { INDEX_op_qemu_ld16u, { "r", "l" } },
-    { INDEX_op_qemu_ld16s, { "r", "l" } },
-    { INDEX_op_qemu_ld32u, { "r", "l" } },
-    { INDEX_op_qemu_ld32s, { "r", "l" } },
-
-    { INDEX_op_qemu_ld32, { "r", "l" } },
-    { INDEX_op_qemu_ld64, { "r", "l" } },
-
-    { INDEX_op_qemu_st8, { "l", "l" } },
-    { INDEX_op_qemu_st16, { "l", "l" } },
-    { INDEX_op_qemu_st32, { "l", "l" } },
-    { INDEX_op_qemu_st64, { "l", "l" } },
+    { INDEX_op_qemu_ld_i32, { "r", "l" } },
+    { INDEX_op_qemu_ld_i64, { "r", "l" } },
+    { INDEX_op_qemu_st_i32, { "l", "l" } },
+    { INDEX_op_qemu_st_i64, { "l", "l" } },
 
     { INDEX_op_bswap16_i32, { "r", "r" } },
     { INDEX_op_bswap32_i32, { "r", "r" } },
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index faccc36..adf0261 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -98,7 +98,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
-#define TCG_TARGET_HAS_new_ldst         0
+#define TCG_TARGET_HAS_new_ldst         1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 20/26] tcg-aarch64: Support stores of zero
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (18 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 19/26] tcg-aarch64: Implement TCG_TARGET_HAS_new_ldst Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:34   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507 Richard Henderson
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 34e477d..caaf8a2 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -1253,21 +1253,21 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
         tcg_out_ldst_r(s, LDST_8, LDST_ST, data_r, addr_r, off_r);
         break;
     case MO_16:
-        if (bswap) {
+        if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
         tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
         break;
     case MO_32:
-        if (bswap) {
+        if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
         tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
         break;
     case MO_64:
-        if (bswap) {
+        if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
@@ -1364,8 +1364,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_ld_i32:
     case INDEX_op_ld_i64:
-    case INDEX_op_st_i32:
-    case INDEX_op_st_i64:
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8s_i32:
     case INDEX_op_ld16u_i32:
@@ -1376,13 +1374,18 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ld16s_i64:
     case INDEX_op_ld32u_i64:
     case INDEX_op_ld32s_i64:
+        tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
+                     a0, a1, a2);
+        break;
+    case INDEX_op_st_i32:
+    case INDEX_op_st_i64:
     case INDEX_op_st8_i32:
     case INDEX_op_st8_i64:
     case INDEX_op_st16_i32:
     case INDEX_op_st16_i64:
     case INDEX_op_st32_i64:
         tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
-                     a0, a1, a2);
+                     REG0(0), a1, a2);
         break;
 
     case INDEX_op_add_i32:
@@ -1585,7 +1588,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
     case INDEX_op_qemu_st_i32:
     case INDEX_op_qemu_st_i64:
-        tcg_out_qemu_st(s, a0, a1, a2, args[3]);
+        tcg_out_qemu_st(s, REG0(0), a1, a2, args[3]);
         break;
 
     case INDEX_op_bswap32_i64:
@@ -1693,13 +1696,13 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_ld32s_i64, { "r", "r" } },
     { INDEX_op_ld_i64, { "r", "r" } },
 
-    { INDEX_op_st8_i32, { "r", "r" } },
-    { INDEX_op_st16_i32, { "r", "r" } },
-    { INDEX_op_st_i32, { "r", "r" } },
-    { INDEX_op_st8_i64, { "r", "r" } },
-    { INDEX_op_st16_i64, { "r", "r" } },
-    { INDEX_op_st32_i64, { "r", "r" } },
-    { INDEX_op_st_i64, { "r", "r" } },
+    { INDEX_op_st8_i32, { "rZ", "r" } },
+    { INDEX_op_st16_i32, { "rZ", "r" } },
+    { INDEX_op_st_i32, { "rZ", "r" } },
+    { INDEX_op_st8_i64, { "rZ", "r" } },
+    { INDEX_op_st16_i64, { "rZ", "r" } },
+    { INDEX_op_st32_i64, { "rZ", "r" } },
+    { INDEX_op_st_i64, { "rZ", "r" } },
 
     { INDEX_op_add_i32, { "r", "r", "rwA" } },
     { INDEX_op_add_i64, { "r", "r", "rA" } },
@@ -1753,8 +1756,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 
     { INDEX_op_qemu_ld_i32, { "r", "l" } },
     { INDEX_op_qemu_ld_i64, { "r", "l" } },
-    { INDEX_op_qemu_st_i32, { "l", "l" } },
-    { INDEX_op_qemu_st_i64, { "l", "l" } },
+    { INDEX_op_qemu_st_i32, { "lZ", "l" } },
+    { INDEX_op_qemu_st_i64, { "lZ", "l" } },
 
     { INDEX_op_bswap16_i32, { "r", "r" } },
     { INDEX_op_bswap32_i32, { "r", "r" } },
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (19 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 20/26] tcg-aarch64: Support stores of zero Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-09 12:54   ` Claudio Fontana
  2014-04-11 12:36   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 22/26] tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op Richard Henderson
                   ` (4 subsequent siblings)
  25 siblings, 2 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Cleaning up the implementation of REV and REV16 at the same time.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index caaf8a2..de7490d 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -327,6 +327,10 @@ typedef enum {
     I3506_CSEL      = 0x1a800000,
     I3506_CSINC     = 0x1a800400,
 
+    /* Data-processing (1 source) instructions.  */
+    I3507_REV16     = 0x5ac00400,
+    I3507_REV       = 0x5ac00800,
+
     /* Data-processing (2 source) instructions.  */
     I3508_LSLV      = 0x1ac02000,
     I3508_LSRV      = 0x1ac02400,
@@ -545,6 +549,12 @@ static void tcg_out_insn_3506(TCGContext *s, AArch64Insn insn, TCGType ext,
               | tcg_cond_to_aarch64[c] << 12);
 }
 
+static void tcg_out_insn_3507(TCGContext *s, AArch64Insn insn, TCGType ext,
+                              TCGReg rd, TCGReg rn)
+{
+    tcg_out32(s, insn | ext << 31 | rn << 5 | rd);
+}
+
 static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
                               TCGReg rd, TCGReg rn, TCGReg rm, TCGReg ra)
 {
@@ -961,19 +971,15 @@ static void tcg_out_brcond(TCGContext *s, TCGMemOp ext, TCGCond c, TCGArg a,
 }
 
 static inline void tcg_out_rev(TCGContext *s, TCGType ext,
-                               TCGReg rd, TCGReg rm)
+                               TCGReg rd, TCGReg rn)
 {
-    /* using REV 0x5ac00800 */
-    unsigned int base = ext ? 0xdac00c00 : 0x5ac00800;
-    tcg_out32(s, base | rm << 5 | rd);
+    tcg_out_insn(s, 3507, REV, ext, rd, rn);
 }
 
 static inline void tcg_out_rev16(TCGContext *s, TCGType ext,
-                                 TCGReg rd, TCGReg rm)
+                                 TCGReg rd, TCGReg rn)
 {
-    /* using REV16 0x5ac00400 */
-    unsigned int base = ext ? 0xdac00400 : 0x5ac00400;
-    tcg_out32(s, base | rm << 5 | rd);
+    tcg_out_insn(s, 3507, REV16, ext, rd, rn);
 }
 
 static inline void tcg_out_sxt(TCGContext *s, TCGType ext, TCGMemOp s_bits,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 22/26] tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (20 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507 Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:34   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 23/26] tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp Richard Henderson
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 115 +++++++++++++----------------------------------
 1 file changed, 32 insertions(+), 83 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index de7490d..5ecc20c 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -355,78 +355,6 @@ typedef enum {
     I3510_ANDS      = 0x6a000000,
 } AArch64Insn;
 
-static inline enum aarch64_ldst_op_data
-aarch64_ldst_get_data(TCGOpcode tcg_op)
-{
-    switch (tcg_op) {
-    case INDEX_op_ld8u_i32:
-    case INDEX_op_ld8s_i32:
-    case INDEX_op_ld8u_i64:
-    case INDEX_op_ld8s_i64:
-    case INDEX_op_st8_i32:
-    case INDEX_op_st8_i64:
-        return LDST_8;
-
-    case INDEX_op_ld16u_i32:
-    case INDEX_op_ld16s_i32:
-    case INDEX_op_ld16u_i64:
-    case INDEX_op_ld16s_i64:
-    case INDEX_op_st16_i32:
-    case INDEX_op_st16_i64:
-        return LDST_16;
-
-    case INDEX_op_ld_i32:
-    case INDEX_op_st_i32:
-    case INDEX_op_ld32u_i64:
-    case INDEX_op_ld32s_i64:
-    case INDEX_op_st32_i64:
-        return LDST_32;
-
-    case INDEX_op_ld_i64:
-    case INDEX_op_st_i64:
-        return LDST_64;
-
-    default:
-        tcg_abort();
-    }
-}
-
-static inline enum aarch64_ldst_op_type
-aarch64_ldst_get_type(TCGOpcode tcg_op)
-{
-    switch (tcg_op) {
-    case INDEX_op_st8_i32:
-    case INDEX_op_st16_i32:
-    case INDEX_op_st8_i64:
-    case INDEX_op_st16_i64:
-    case INDEX_op_st_i32:
-    case INDEX_op_st32_i64:
-    case INDEX_op_st_i64:
-        return LDST_ST;
-
-    case INDEX_op_ld8u_i32:
-    case INDEX_op_ld16u_i32:
-    case INDEX_op_ld8u_i64:
-    case INDEX_op_ld16u_i64:
-    case INDEX_op_ld_i32:
-    case INDEX_op_ld32u_i64:
-    case INDEX_op_ld_i64:
-        return LDST_LD;
-
-    case INDEX_op_ld8s_i32:
-    case INDEX_op_ld16s_i32:
-        return LDST_LD_S_W;
-
-    case INDEX_op_ld8s_i64:
-    case INDEX_op_ld16s_i64:
-    case INDEX_op_ld32s_i64:
-        return LDST_LD_S_X;
-
-    default:
-        tcg_abort();
-    }
-}
-
 static inline uint32_t tcg_in32(TCGContext *s)
 {
     uint32_t v = *(uint32_t *)s->code_ptr;
@@ -1368,30 +1296,51 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_goto_label(s, a0);
         break;
 
-    case INDEX_op_ld_i32:
-    case INDEX_op_ld_i64:
     case INDEX_op_ld8u_i32:
-    case INDEX_op_ld8s_i32:
-    case INDEX_op_ld16u_i32:
-    case INDEX_op_ld16s_i32:
     case INDEX_op_ld8u_i64:
+        tcg_out_ldst(s, LDST_8, LDST_LD, a0, a1, a2);
+        break;
+    case INDEX_op_ld8s_i32:
+        tcg_out_ldst(s, LDST_8, LDST_LD_S_W, a0, a1, a2);
+        break;
     case INDEX_op_ld8s_i64:
+        tcg_out_ldst(s, LDST_8, LDST_LD_S_X, a0, a1, a2);
+        break;
+    case INDEX_op_ld16u_i32:
     case INDEX_op_ld16u_i64:
+        tcg_out_ldst(s, LDST_16, LDST_LD, a0, a1, a2);
+        break;
+    case INDEX_op_ld16s_i32:
+        tcg_out_ldst(s, LDST_16, LDST_LD_S_W, a0, a1, a2);
+        break;
     case INDEX_op_ld16s_i64:
+        tcg_out_ldst(s, LDST_16, LDST_LD_S_X, a0, a1, a2);
+        break;
+    case INDEX_op_ld_i32:
     case INDEX_op_ld32u_i64:
+        tcg_out_ldst(s, LDST_32, LDST_LD, a0, a1, a2);
+        break;
     case INDEX_op_ld32s_i64:
-        tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
-                     a0, a1, a2);
+        tcg_out_ldst(s, LDST_32, LDST_LD_S_X, a0, a1, a2);
         break;
-    case INDEX_op_st_i32:
-    case INDEX_op_st_i64:
+    case INDEX_op_ld_i64:
+        tcg_out_ldst(s, LDST_64, LDST_LD, a0, a1, a2);
+        break;
+
     case INDEX_op_st8_i32:
     case INDEX_op_st8_i64:
+        tcg_out_ldst(s, LDST_8, LDST_ST, REG0(0), a1, a2);
+        break;
     case INDEX_op_st16_i32:
     case INDEX_op_st16_i64:
+        tcg_out_ldst(s, LDST_16, LDST_ST, REG0(0), a1, a2);
+        break;
+    case INDEX_op_st_i32:
     case INDEX_op_st32_i64:
-        tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
-                     REG0(0), a1, a2);
+        tcg_out_ldst(s, LDST_32, LDST_ST, REG0(0), a1, a2);
+        break;
+    case INDEX_op_st_i64:
+        tcg_out_ldst(s, LDST_64, LDST_ST, REG0(0), a1, a2);
         break;
 
     case INDEX_op_add_i32:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 23/26] tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (21 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 22/26] tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:35   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType Richard Henderson
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

The definition of op_data included opcode bits, not just
the size field of the various ldst instructions.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 111 +++++++++++++++++++++--------------------------
 1 file changed, 49 insertions(+), 62 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 5ecc20c..9a2e4a6 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -242,13 +242,6 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = {
     [TCG_COND_LEU] = COND_LS,
 };
 
-/* opcodes for LDR / STR instructions with base + simm9 addressing */
-enum aarch64_ldst_op_data { /* size of the data moved */
-    LDST_8 = 0x38,
-    LDST_16 = 0x78,
-    LDST_32 = 0xb8,
-    LDST_64 = 0xf8,
-};
 enum aarch64_ldst_op_type { /* type of operation */
     LDST_ST = 0x0,    /* store */
     LDST_LD = 0x4,    /* load */
@@ -490,25 +483,23 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
 }
 
 
-static inline void tcg_out_ldst_9(TCGContext *s,
-                                  enum aarch64_ldst_op_data op_data,
+static inline void tcg_out_ldst_9(TCGContext *s, TCGMemOp size,
                                   enum aarch64_ldst_op_type op_type,
                                   TCGReg rd, TCGReg rn, intptr_t offset)
 {
     /* use LDUR with BASE register with 9bit signed unscaled offset */
-    tcg_out32(s, op_data << 24 | op_type << 20
+    tcg_out32(s, 0x38000000 | size << 30 | op_type << 20
               | (offset & 0x1ff) << 12 | rn << 5 | rd);
 }
 
 /* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
-static inline void tcg_out_ldst_12(TCGContext *s,
-                                   enum aarch64_ldst_op_data op_data,
+static inline void tcg_out_ldst_12(TCGContext *s, TCGMemOp size,
                                    enum aarch64_ldst_op_type op_type,
                                    TCGReg rd, TCGReg rn,
                                    tcg_target_ulong scaled_uimm)
 {
-    tcg_out32(s, (op_data | 1) << 24
-              | op_type << 20 | scaled_uimm << 10 | rn << 5 | rd);
+    tcg_out32(s, 0x39000000 | size << 30 | op_type << 20
+              | scaled_uimm << 10 | rn << 5 | rd);
 }
 
 /* Register to register move using ORR (shifted register with no shift). */
@@ -646,44 +637,40 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     }
 }
 
-static inline void tcg_out_ldst_r(TCGContext *s,
-                                  enum aarch64_ldst_op_data op_data,
+static inline void tcg_out_ldst_r(TCGContext *s, TCGMemOp size,
                                   enum aarch64_ldst_op_type op_type,
                                   TCGReg rd, TCGReg base, TCGReg regoff)
 {
     /* load from memory to register using base + 64bit register offset */
     /* using f.e. STR Wt, [Xn, Xm] 0xb8600800|(regoff << 16)|(base << 5)|rd */
     /* the 0x6000 is for the "no extend field" */
-    tcg_out32(s, 0x00206800
-              | op_data << 24 | op_type << 20 | regoff << 16 | base << 5 | rd);
+    tcg_out32(s, 0x38206800 | size << 30 | op_type << 20
+              | regoff << 16 | base << 5 | rd);
 }
 
 /* solve the whole ldst problem */
-static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data data,
+static inline void tcg_out_ldst(TCGContext *s, TCGMemOp size,
                                 enum aarch64_ldst_op_type type,
                                 TCGReg rd, TCGReg rn, intptr_t offset)
 {
     if (offset >= -256 && offset < 256) {
-        tcg_out_ldst_9(s, data, type, rd, rn, offset);
+        tcg_out_ldst_9(s, size, type, rd, rn, offset);
         return;
     }
 
-    if (offset >= 256) {
-        /* if the offset is naturally aligned and in range,
-           then we can use the scaled uimm12 encoding */
-        unsigned int s_bits = data >> 6;
-        if (!(offset & ((1 << s_bits) - 1))) {
-            tcg_target_ulong scaled_uimm = offset >> s_bits;
-            if (scaled_uimm <= 0xfff) {
-                tcg_out_ldst_12(s, data, type, rd, rn, scaled_uimm);
-                return;
-            }
+    /* If the offset is naturally aligned and in range, then we can
+       use the scaled uimm12 encoding */
+    if (offset >= 0 && !(offset & ((1 << size) - 1))) {
+        tcg_target_ulong scaled_uimm = offset >> size;
+        if (scaled_uimm <= 0xfff) {
+            tcg_out_ldst_12(s, size, type, rd, rn, scaled_uimm);
+            return;
         }
     }
 
-    /* worst-case scenario, move offset to temp register, use reg offset */
+    /* Worst-case scenario, move offset to temp register, use reg offset.  */
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
-    tcg_out_ldst_r(s, data, type, rd, rn, TCG_REG_TMP);
+    tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
 }
 
 static inline void tcg_out_mov(TCGContext *s,
@@ -697,14 +684,14 @@ static inline void tcg_out_mov(TCGContext *s,
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    tcg_out_ldst(s, (type == TCG_TYPE_I64) ? LDST_64 : LDST_32, LDST_LD,
+    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_LD,
                  arg, arg1, arg2);
 }
 
 static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    tcg_out_ldst(s, (type == TCG_TYPE_I64) ? LDST_64 : LDST_32, LDST_ST,
+    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_ST,
                  arg, arg1, arg2);
 }
 
@@ -1104,12 +1091,12 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
 
     /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
        X0 = load [X2 + (tlb_offset & 0x000fff)] */
-    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
+    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? MO_64 : MO_32,
                  LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
 
     /* Load the tlb addend. Do that early to avoid stalling.
        X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
-    tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
+    tcg_out_ldst(s, MO_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
                  (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
                  (is_read ? offsetof(CPUTLBEntry, addr_read)
                   : offsetof(CPUTLBEntry, addr_write)));
@@ -1131,43 +1118,43 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop,
 
     switch (memop & MO_SSIZE) {
     case MO_UB:
-        tcg_out_ldst_r(s, LDST_8, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_8, LDST_LD, data_r, addr_r, off_r);
         break;
     case MO_SB:
-        tcg_out_ldst_r(s, LDST_8, LDST_LD_S_X, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_8, LDST_LD_S_X, data_r, addr_r, off_r);
         break;
     case MO_UW:
-        tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
         if (bswap) {
             tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
     case MO_SW:
         if (bswap) {
-            tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
             tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
             tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
         } else {
-            tcg_out_ldst_r(s, LDST_16, LDST_LD_S_X, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, MO_16, LDST_LD_S_X, data_r, addr_r, off_r);
         }
         break;
     case MO_UL:
-        tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
         if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
     case MO_SL:
         if (bswap) {
-            tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
             tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
             tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
         } else {
-            tcg_out_ldst_r(s, LDST_32, LDST_LD_S_X, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, MO_32, LDST_LD_S_X, data_r, addr_r, off_r);
         }
         break;
     case MO_Q:
-        tcg_out_ldst_r(s, LDST_64, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_64, LDST_LD, data_r, addr_r, off_r);
         if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
         }
@@ -1184,28 +1171,28 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
 
     switch (memop & MO_SIZE) {
     case MO_8:
-        tcg_out_ldst_r(s, LDST_8, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_8, LDST_ST, data_r, addr_r, off_r);
         break;
     case MO_16:
         if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
-        tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_16, LDST_ST, data_r, addr_r, off_r);
         break;
     case MO_32:
         if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
-        tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_32, LDST_ST, data_r, addr_r, off_r);
         break;
     case MO_64:
         if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
-        tcg_out_ldst_r(s, LDST_64, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, MO_64, LDST_ST, data_r, addr_r, off_r);
         break;
     default:
         tcg_abort();
@@ -1298,49 +1285,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8u_i64:
-        tcg_out_ldst(s, LDST_8, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, MO_8, LDST_LD, a0, a1, a2);
         break;
     case INDEX_op_ld8s_i32:
-        tcg_out_ldst(s, LDST_8, LDST_LD_S_W, a0, a1, a2);
+        tcg_out_ldst(s, MO_8, LDST_LD_S_W, a0, a1, a2);
         break;
     case INDEX_op_ld8s_i64:
-        tcg_out_ldst(s, LDST_8, LDST_LD_S_X, a0, a1, a2);
+        tcg_out_ldst(s, MO_8, LDST_LD_S_X, a0, a1, a2);
         break;
     case INDEX_op_ld16u_i32:
     case INDEX_op_ld16u_i64:
-        tcg_out_ldst(s, LDST_16, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, MO_16, LDST_LD, a0, a1, a2);
         break;
     case INDEX_op_ld16s_i32:
-        tcg_out_ldst(s, LDST_16, LDST_LD_S_W, a0, a1, a2);
+        tcg_out_ldst(s, MO_16, LDST_LD_S_W, a0, a1, a2);
         break;
     case INDEX_op_ld16s_i64:
-        tcg_out_ldst(s, LDST_16, LDST_LD_S_X, a0, a1, a2);
+        tcg_out_ldst(s, MO_16, LDST_LD_S_X, a0, a1, a2);
         break;
     case INDEX_op_ld_i32:
     case INDEX_op_ld32u_i64:
-        tcg_out_ldst(s, LDST_32, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, MO_32, LDST_LD, a0, a1, a2);
         break;
     case INDEX_op_ld32s_i64:
-        tcg_out_ldst(s, LDST_32, LDST_LD_S_X, a0, a1, a2);
+        tcg_out_ldst(s, MO_32, LDST_LD_S_X, a0, a1, a2);
         break;
     case INDEX_op_ld_i64:
-        tcg_out_ldst(s, LDST_64, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, MO_64, LDST_LD, a0, a1, a2);
         break;
 
     case INDEX_op_st8_i32:
     case INDEX_op_st8_i64:
-        tcg_out_ldst(s, LDST_8, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, MO_8, LDST_ST, REG0(0), a1, a2);
         break;
     case INDEX_op_st16_i32:
     case INDEX_op_st16_i64:
-        tcg_out_ldst(s, LDST_16, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, MO_16, LDST_ST, REG0(0), a1, a2);
         break;
     case INDEX_op_st_i32:
     case INDEX_op_st32_i64:
-        tcg_out_ldst(s, LDST_32, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, MO_32, LDST_ST, REG0(0), a1, a2);
         break;
     case INDEX_op_st_i64:
-        tcg_out_ldst(s, LDST_64, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, MO_64, LDST_ST, REG0(0), a1, a2);
         break;
 
     case INDEX_op_add_i32:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (22 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 23/26] tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-07 11:45   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 25/26] tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst Richard Henderson
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 26/26] tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr Richard Henderson
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

The definition of op_type wasn't encoded for the proper shift for
the field, making the implementations confusing.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 42 +++++++++++++++++-------------------------
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 9a2e4a6..a538a87 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -242,12 +242,12 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = {
     [TCG_COND_LEU] = COND_LS,
 };
 
-enum aarch64_ldst_op_type { /* type of operation */
-    LDST_ST = 0x0,    /* store */
-    LDST_LD = 0x4,    /* load */
-    LDST_LD_S_X = 0x8,  /* load and sign-extend into Xt */
-    LDST_LD_S_W = 0xc,  /* load and sign-extend into Wt */
-};
+typedef enum {
+    LDST_ST = 0,    /* store */
+    LDST_LD = 1,    /* load */
+    LDST_LD_S_X = 2,  /* load and sign-extend into Xt */
+    LDST_LD_S_W = 3,  /* load and sign-extend into Wt */
+} AArch64LdstType;
 
 /* We encode the format of the insn into the beginning of the name, so that
    we can have the preprocessor help "typecheck" the insn vs the output
@@ -483,22 +483,19 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
 }
 
 
-static inline void tcg_out_ldst_9(TCGContext *s, TCGMemOp size,
-                                  enum aarch64_ldst_op_type op_type,
-                                  TCGReg rd, TCGReg rn, intptr_t offset)
+static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
+                           TCGReg rd, TCGReg rn, intptr_t offset)
 {
     /* use LDUR with BASE register with 9bit signed unscaled offset */
-    tcg_out32(s, 0x38000000 | size << 30 | op_type << 20
+    tcg_out32(s, 0x38000000 | size << 30 | type << 22
               | (offset & 0x1ff) << 12 | rn << 5 | rd);
 }
 
 /* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
-static inline void tcg_out_ldst_12(TCGContext *s, TCGMemOp size,
-                                   enum aarch64_ldst_op_type op_type,
-                                   TCGReg rd, TCGReg rn,
-                                   tcg_target_ulong scaled_uimm)
+static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
+                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
 {
-    tcg_out32(s, 0x39000000 | size << 30 | op_type << 20
+    tcg_out32(s, 0x39000000 | size << 30 | type << 22
               | scaled_uimm << 10 | rn << 5 | rd);
 }
 
@@ -637,21 +634,16 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     }
 }
 
-static inline void tcg_out_ldst_r(TCGContext *s, TCGMemOp size,
-                                  enum aarch64_ldst_op_type op_type,
-                                  TCGReg rd, TCGReg base, TCGReg regoff)
+static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
+                           TCGReg rd, TCGReg base, TCGReg regoff)
 {
-    /* load from memory to register using base + 64bit register offset */
-    /* using f.e. STR Wt, [Xn, Xm] 0xb8600800|(regoff << 16)|(base << 5)|rd */
-    /* the 0x6000 is for the "no extend field" */
-    tcg_out32(s, 0x38206800 | size << 30 | op_type << 20
+    tcg_out32(s, 0x38206800 | size << 30 | type << 22
               | regoff << 16 | base << 5 | rd);
 }
 
 /* solve the whole ldst problem */
-static inline void tcg_out_ldst(TCGContext *s, TCGMemOp size,
-                                enum aarch64_ldst_op_type type,
-                                TCGReg rd, TCGReg rn, intptr_t offset)
+static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
+                         TCGReg rd, TCGReg rn, intptr_t offset)
 {
     if (offset >= -256 && offset < 256) {
         tcg_out_ldst_9(s, size, type, rd, rn, offset);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 25/26] tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (23 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:35   ` Claudio Fontana
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 26/26] tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr Richard Henderson
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

The assembler seems to prefer them, perhaps we should too.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index a538a87..58597e7 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -645,11 +645,6 @@ static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
 static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
                          TCGReg rd, TCGReg rn, intptr_t offset)
 {
-    if (offset >= -256 && offset < 256) {
-        tcg_out_ldst_9(s, size, type, rd, rn, offset);
-        return;
-    }
-
     /* If the offset is naturally aligned and in range, then we can
        use the scaled uimm12 encoding */
     if (offset >= 0 && !(offset & ((1 << size) - 1))) {
@@ -660,6 +655,11 @@ static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
         }
     }
 
+    if (offset >= -256 && offset < 256) {
+        tcg_out_ldst_9(s, size, type, rd, rn, offset);
+        return;
+    }
+
     /* Worst-case scenario, move offset to temp register, use reg offset.  */
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
     tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH v3 26/26] tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr
  2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
                   ` (24 preceding siblings ...)
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 25/26] tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst Richard Henderson
@ 2014-04-03 19:56 ` Richard Henderson
  2014-04-11 12:36   ` Claudio Fontana
  25 siblings, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-03 19:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

It's the more canonical interface.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 58597e7..ab4cd25 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -951,9 +951,7 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
     }
     tcg_out_insn_3503(s, insn, ext, rh, ah, bh);
 
-    if (rl != orig_rl) {
-        tcg_out_movr(s, ext, orig_rl, rl);
-    }
+    tcg_out_mov(s, ext, orig_rl, rl);
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -997,15 +995,15 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
     reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
 
-    tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
-    tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
+    tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
+    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X3, (intptr_t)lb->raddr);
     tcg_out_call(s, (intptr_t)qemu_ld_helpers[opc & ~MO_SIGN]);
     if (opc & MO_SIGN) {
         tcg_out_sxt(s, TCG_TYPE_I64, size, lb->datalo_reg, TCG_REG_X0);
     } else {
-        tcg_out_movr(s, TCG_TYPE_I64, lb->datalo_reg, TCG_REG_X0);
+        tcg_out_mov(s, size == MO_64, lb->datalo_reg, TCG_REG_X0);
     }
 
     tcg_out_goto(s, (intptr_t)lb->raddr);
@@ -1018,9 +1016,9 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
     reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
 
-    tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
-    tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-    tcg_out_movr(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
+    tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
+    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
+    tcg_out_mov(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X4, (intptr_t)lb->raddr);
     tcg_out_call(s, (intptr_t)qemu_st_helpers[opc]);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes Richard Henderson
@ 2014-04-07  7:58   ` Claudio Fontana
  2014-04-07 16:33     ` Richard Henderson
  2014-04-07 16:39   ` Peter Maydell
  1 sibling, 1 reply; 52+ messages in thread
From: Claudio Fontana @ 2014-04-07  7:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Peter Maydell, claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Since the kernel doesn't pass any info on the reason for the fault,
> disassemble the instruction to detect a store.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  user-exec.c | 29 +++++++++++++++++++++++------
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/user-exec.c b/user-exec.c
> index bc58056..52f76c9 100644
> --- a/user-exec.c
> +++ b/user-exec.c
> @@ -465,16 +465,33 @@ int cpu_signal_handler(int host_signum, void *pinfo,
>  
>  #elif defined(__aarch64__)
>  
> -int cpu_signal_handler(int host_signum, void *pinfo,
> -                       void *puc)
> +int cpu_signal_handler(int host_signum, void *pinfo, void *puc)
>  {
>      siginfo_t *info = pinfo;
>      struct ucontext *uc = puc;
> -    uint64_t pc;
> -    int is_write = 0; /* XXX how to determine? */
> +    uintptr_t pc = uc->uc_mcontext.pc;
> +    uint32_t insn = *(uint32_t *)pc;
> +    bool is_write;
>  
> -    pc = uc->uc_mcontext.pc;
> -    return handle_cpu_signal(pc, (uint64_t)info->si_addr,
> +    /* XXX: need kernel patch to get write flag faster.  */
> +    /* XXX: several of these could be combined.  */
> +    is_write = (   (insn & 0xbfff0000) == 0x0c000000   /* C3.3.1 */
> +                || (insn & 0xbfe00000) == 0x0c800000   /* C3.3.2 */
> +                || (insn & 0xbfdf0000) == 0x0d000000   /* C3.3.3 */
> +                || (insn & 0xbfc00000) == 0x0d800000   /* C3.3.4 */
> +                || (insn & 0x3f400000) == 0x08000000   /* C3.3.6 */
> +                || (insn & 0x3bc00000) == 0x28400000   /* C3.3.7 */

I think the Load (L) bit should be 0 here so

== 0x28000000

> +                || (insn & 0x3be00c00) == 0x38000400   /* C3.3.8 */

With V=1, an opc of 0b10 is also a write, I think. It's the 128bit FP/SIMD STR.

> +                || (insn & 0x3be00c00) == 0x38000c00   /* C3.3.9 */

Same here.

> +                || (insn & 0x3be00c00) == 0x38200800   /* C3.3.10 */

Same.

> +                || (insn & 0x3be00c00) == 0x38000800   /* C3.3.11 */
> +                || (insn & 0x3be00c00) == 0x38000000   /* C3.3.12 */

Same.

> +                || (insn & 0x3bc00000) == 0x39000000   /* C3.3.13 */

Same.

> +                || (insn & 0x3bc00000) == 0x29000000   /* C3.3.14 */
> +                || (insn & 0x3bc00000) == 0x28800000   /* C3.3.15 */
> +                || (insn & 0x3bc00000) == 0x29800000); /* C3.3.16 */
> +
> +    return handle_cpu_signal(pc, (uintptr_t)info->si_addr,
>                               is_write, &uc->uc_sigmask, puc);
>  }
>  
> 

Thanks,

Claudio

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code Richard Henderson
@ 2014-04-07  8:03   ` Claudio Fontana
  2014-04-07  9:49     ` Peter Maydell
  2014-04-11 12:33   ` Claudio Fontana
  1 sibling, 1 reply; 52+ messages in thread
From: Claudio Fontana @ 2014-04-07  8:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Laurent Desnogues, Peter Maydell, claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> It's obviously call-clobbered, but is otherwise unused.
> Repurpose it as the TCG temporary.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 34 ++++++++++++++++------------------
>  tcg/aarch64/tcg-target.h | 32 +++++++++++++++++---------------
>  2 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 48a246d..e36909e 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -23,10 +23,7 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>      "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7",
>      "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15",
>      "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23",
> -    "%x24", "%x25", "%x26", "%x27", "%x28",
> -    "%fp", /* frame pointer */
> -    "%lr", /* link register */
> -    "%sp",  /* stack pointer */
> +    "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp",
>  };
>  #endif /* NDEBUG */
>  
> @@ -41,16 +38,17 @@ static const int tcg_target_reg_alloc_order[] = {
>      TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
>      TCG_REG_X28, /* we will reserve this for GUEST_BASE if configured */
>  
> -    TCG_REG_X9, TCG_REG_X10, TCG_REG_X11, TCG_REG_X12,
> -    TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
> +    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
> +    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
>      TCG_REG_X16, TCG_REG_X17,
>  
> -    TCG_REG_X18, TCG_REG_X19, /* will not use these, see tcg_target_init */
> -
>      TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
>      TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
>  
> -    TCG_REG_X8, /* will not use, see tcg_target_init */
> +    /* X18 reserved by system */
> +    /* X19 reserved for AREG0 */
> +    /* X29 reserved as fp */
> +    /* X30 reserved as temporary */
>  };
>  
>  static const int tcg_target_call_iarg_regs[8] = {
> @@ -61,13 +59,13 @@ static const int tcg_target_call_oarg_regs[1] = {
>      TCG_REG_X0
>  };
>  
> -#define TCG_REG_TMP TCG_REG_X8
> +#define TCG_REG_TMP TCG_REG_X30
>  
>  #ifndef CONFIG_SOFTMMU
> -# if defined(CONFIG_USE_GUEST_BASE)
> -# define TCG_REG_GUEST_BASE TCG_REG_X28
> +# ifdef CONFIG_USE_GUEST_BASE
> +#  define TCG_REG_GUEST_BASE TCG_REG_X28
>  # else
> -# define TCG_REG_GUEST_BASE TCG_REG_XZR
> +#  define TCG_REG_GUEST_BASE TCG_REG_XZR
>  # endif
>  #endif
>  
> @@ -1871,7 +1869,7 @@ static void tcg_target_init(TCGContext *s)
>                       (1 << TCG_REG_X12) | (1 << TCG_REG_X13) |
>                       (1 << TCG_REG_X14) | (1 << TCG_REG_X15) |
>                       (1 << TCG_REG_X16) | (1 << TCG_REG_X17) |
> -                     (1 << TCG_REG_X18));
> +                     (1 << TCG_REG_X18) | (1 << TCG_REG_X30));
>  
>      tcg_regset_clear(s->reserved_regs);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
> @@ -1902,13 +1900,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>      tcg_out_push_pair(s, TCG_REG_SP,
>                        TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
>  
> -    /* FP -> callee_saved */
> +    /* Set up frame pointer for canonical unwinding.  */
>      tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
>  
> -    /* store callee-preserved regs x19..x28 using FP -> callee_saved */
> +    /* Store callee-preserved regs x19..x28.  */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
>          int idx = (r - TCG_REG_X19) / 2 + 1;
> -        tcg_out_store_pair(s, TCG_REG_FP, r, r + 1, idx);
> +        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
>      }
>  
>      /* Make stack space for TCG locals.  */
> @@ -1939,7 +1937,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>         FP must be preserved, so it still points to callee_saved area */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
>          int idx = (r - TCG_REG_X19) / 2 + 1;
> -        tcg_out_load_pair(s, TCG_REG_FP, r, r + 1, idx);
> +        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
>      }
>  
>      /* pop (FP, LR), restore SP to previous frame, return */
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 988983e..faccc36 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -17,17 +17,23 @@
>  #undef TCG_TARGET_STACK_GROWSUP
>  
>  typedef enum {
> -    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3, TCG_REG_X4,
> -    TCG_REG_X5, TCG_REG_X6, TCG_REG_X7, TCG_REG_X8, TCG_REG_X9,
> -    TCG_REG_X10, TCG_REG_X11, TCG_REG_X12, TCG_REG_X13, TCG_REG_X14,
> -    TCG_REG_X15, TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
> -    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23, TCG_REG_X24,
> -    TCG_REG_X25, TCG_REG_X26, TCG_REG_X27, TCG_REG_X28,
> -    TCG_REG_FP,  /* frame pointer */
> -    TCG_REG_LR, /* link register */
> -    TCG_REG_SP,  /* stack pointer or zero register */
> -    TCG_REG_XZR = TCG_REG_SP /* same register number */
> -    /* program counter is not directly accessible! */
> +    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
> +    TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
> +    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
> +    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
> +    TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
> +    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
> +    TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
> +    TCG_REG_X28, TCG_REG_X29, TCG_REG_X30,
> +
> +    /* X31 is either the stack pointer or zero, depending on context.  */
> +    TCG_REG_SP = 31,
> +    TCG_REG_XZR = 31,
> +
> +    /* Aliases.  */
> +    TCG_REG_FP = TCG_REG_X29,
> +    TCG_REG_LR = TCG_REG_X30,
> +    TCG_AREG0  = TCG_REG_X19,
>  } TCGReg;
>  
>  #define TCG_TARGET_NB_REGS 32
> @@ -92,10 +98,6 @@ typedef enum {
>  #define TCG_TARGET_HAS_muluh_i64        1
>  #define TCG_TARGET_HAS_mulsh_i64        1
>  
> -enum {
> -    TCG_AREG0 = TCG_REG_X19,
> -};
> -
>  #define TCG_TARGET_HAS_new_ldst         0
>  
>  static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> 

Giving one last chance to the ARM guys to speak up about repurposing LR.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code
  2014-04-07  8:03   ` Claudio Fontana
@ 2014-04-07  9:49     ` Peter Maydell
  2014-04-07 11:11       ` Claudio Fontana
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Maydell @ 2014-04-07  9:49 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Desnogues, QEMU Developers, claudio.fontana, Richard Henderson

On 7 April 2014 09:03, Claudio Fontana <claudio.fontana@huawei.com> wrote:
> On 03.04.2014 21:56, Richard Henderson wrote:
>> It's obviously call-clobbered, but is otherwise unused.
>> Repurpose it as the TCG temporary.

> Giving one last chance to the ARM guys to speak up about repurposing LR.

Can you clarify what you think the issue is with using LR?
I think you said before but I forget :-(

thanks
-- PMM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code
  2014-04-07  9:49     ` Peter Maydell
@ 2014-04-07 11:11       ` Claudio Fontana
  2014-04-07 11:28         ` Peter Maydell
  0 siblings, 1 reply; 52+ messages in thread
From: Claudio Fontana @ 2014-04-07 11:11 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Laurent Desnogues, QEMU Developers, claudio.fontana, Richard Henderson

On 07.04.2014 11:49, Peter Maydell wrote:
> On 7 April 2014 09:03, Claudio Fontana <claudio.fontana@huawei.com> wrote:
>> On 03.04.2014 21:56, Richard Henderson wrote:
>>> It's obviously call-clobbered, but is otherwise unused.
>>> Repurpose it as the TCG temporary.
> 
>> Giving one last chance to the ARM guys to speak up about repurposing LR.
> 
> Can you clarify what you think the issue is with using LR?
> I think you said before but I forget :-(
> 
My doubt was about the AAPCS64 (Procedure Call standard for the ARM 64-bit Architecture),
and what the platforms in our case dictate regarding FP and LR use.

I think that LR should be ok to use, because basically the whole generated code from the prologue to the end can be seen as a single big subroutine, and there does not seem to be a clear mandate to keep LR's special significance inside subroutines at all times.

The role of the registers is described in 5.1.1, where it is mentioned that "in all variants of the pcs, registers r16,r17,r29 and r30 have special roles" [...]

The standard says at 5.2.3 that "conforming code" shall construct a linked list of stack frames, [...] A platform shall mandate the minimum level of conformance[...]. Options are:

* It may require the frame pointer to address a valid frame record at all times, except that small subroutines which do not modify the link register may elect not to create a frame record

* It may require the frame pointer to address a valid frame record at all times, except that any subroutine may elect not to create a frame record

* It may permit the frame pointer register to be used as a general-purpose callee-saved register, but provide a platform-specific mechanism for external agents to reliably detect this condition

* It may elect not to maintain a frame chain and to use the frame pointer register as a general-purpose callee-saved register.

I think however that since the latest version of RH's patches do not repurpose FP, this is not that relevant anymore, but I think that the general question remains topical, ie, which of these options do our platforms dictate?

Thanks,

Claudio

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code
  2014-04-07 11:11       ` Claudio Fontana
@ 2014-04-07 11:28         ` Peter Maydell
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Maydell @ 2014-04-07 11:28 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Laurent Desnogues, QEMU Developers, claudio.fontana, Richard Henderson

On 7 April 2014 12:11, Claudio Fontana <claudio.fontana@huawei.com> wrote:
[your mail client is generating very long lines]
> My doubt was about the AAPCS64 (Procedure Call standard for the
> ARM 64-bit Architecture), and what the platforms in our case dictate
> regarding FP and LR use.
>
> I think that LR should be ok to use, because basically the whole
> generated code from the prologue to the end can be seen as a single
> big subroutine, and there does not seem to be a clear mandate to
> keep LR's special significance inside subroutines at all times.

Agreed.

> The role of the registers is described in 5.1.1, where it is
> mentioned that "in all variants of the pcs, registers r16,r17,r29
> and r30 have special roles" [...]
>
> The standard says at 5.2.3 that "conforming code" shall construct
> a linked list of stack frames, [...] A platform shall mandate the
> minimum level of conformance[...].

Since our TCG generated code is not platform specific we should
take the most conservative approach, and make sure we set up
a frame record in the prologue and don't touch FP after that.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType Richard Henderson
@ 2014-04-07 11:45   ` Claudio Fontana
  2014-04-07 14:31     ` Richard Henderson
  2014-04-07 18:34     ` [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313 Richard Henderson
  0 siblings, 2 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-07 11:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> The definition of op_type wasn't encoded for the proper shift for
> the field, making the implementations confusing.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>

At the end of the day the magic values remain in the load/store instructions though.
Can we find a way to replace them with INSN_-something like for the others?

I think I was doing something of the sort in a now obsolete patch I suggested some time early this year, see if it helps:

http://lists.gnu.org/archive/html/qemu-devel/2014-02/msg05074.html

Claudio

> ---
>  tcg/aarch64/tcg-target.c | 42 +++++++++++++++++-------------------------
>  1 file changed, 17 insertions(+), 25 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 9a2e4a6..a538a87 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -242,12 +242,12 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = {
>      [TCG_COND_LEU] = COND_LS,
>  };
>  
> -enum aarch64_ldst_op_type { /* type of operation */
> -    LDST_ST = 0x0,    /* store */
> -    LDST_LD = 0x4,    /* load */
> -    LDST_LD_S_X = 0x8,  /* load and sign-extend into Xt */
> -    LDST_LD_S_W = 0xc,  /* load and sign-extend into Wt */
> -};
> +typedef enum {
> +    LDST_ST = 0,    /* store */
> +    LDST_LD = 1,    /* load */
> +    LDST_LD_S_X = 2,  /* load and sign-extend into Xt */
> +    LDST_LD_S_W = 3,  /* load and sign-extend into Wt */
> +} AArch64LdstType;
>  
>  /* We encode the format of the insn into the beginning of the name, so that
>     we can have the preprocessor help "typecheck" the insn vs the output
> @@ -483,22 +483,19 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>  }
>  
>  
> -static inline void tcg_out_ldst_9(TCGContext *s, TCGMemOp size,
> -                                  enum aarch64_ldst_op_type op_type,
> -                                  TCGReg rd, TCGReg rn, intptr_t offset)
> +static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> +                           TCGReg rd, TCGReg rn, intptr_t offset)
>  {
>      /* use LDUR with BASE register with 9bit signed unscaled offset */
> -    tcg_out32(s, 0x38000000 | size << 30 | op_type << 20
> +    tcg_out32(s, 0x38000000 | size << 30 | type << 22
>                | (offset & 0x1ff) << 12 | rn << 5 | rd);
>  }
>  
>  /* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
> -static inline void tcg_out_ldst_12(TCGContext *s, TCGMemOp size,
> -                                   enum aarch64_ldst_op_type op_type,
> -                                   TCGReg rd, TCGReg rn,
> -                                   tcg_target_ulong scaled_uimm)
> +static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> +                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
>  {
> -    tcg_out32(s, 0x39000000 | size << 30 | op_type << 20
> +    tcg_out32(s, 0x39000000 | size << 30 | type << 22
>                | scaled_uimm << 10 | rn << 5 | rd);
>  }
>  
> @@ -637,21 +634,16 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>      }
>  }
>  
> -static inline void tcg_out_ldst_r(TCGContext *s, TCGMemOp size,
> -                                  enum aarch64_ldst_op_type op_type,
> -                                  TCGReg rd, TCGReg base, TCGReg regoff)
> +static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> +                           TCGReg rd, TCGReg base, TCGReg regoff)
>  {
> -    /* load from memory to register using base + 64bit register offset */
> -    /* using f.e. STR Wt, [Xn, Xm] 0xb8600800|(regoff << 16)|(base << 5)|rd */
> -    /* the 0x6000 is for the "no extend field" */
> -    tcg_out32(s, 0x38206800 | size << 30 | op_type << 20
> +    tcg_out32(s, 0x38206800 | size << 30 | type << 22
>                | regoff << 16 | base << 5 | rd);
>  }
>  
>  /* solve the whole ldst problem */
> -static inline void tcg_out_ldst(TCGContext *s, TCGMemOp size,
> -                                enum aarch64_ldst_op_type type,
> -                                TCGReg rd, TCGReg rn, intptr_t offset)
> +static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> +                         TCGReg rd, TCGReg rn, intptr_t offset)
>  {
>      if (offset >= -256 && offset < 256) {
>          tcg_out_ldst_9(s, size, type, rd, rn, offset);
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType
  2014-04-07 11:45   ` Claudio Fontana
@ 2014-04-07 14:31     ` Richard Henderson
  2014-04-11 12:35       ` Claudio Fontana
  2014-04-07 18:34     ` [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313 Richard Henderson
  1 sibling, 1 reply; 52+ messages in thread
From: Richard Henderson @ 2014-04-07 14:31 UTC (permalink / raw)
  To: Claudio Fontana, qemu-devel; +Cc: claudio.fontana

On 04/07/2014 04:45 AM, Claudio Fontana wrote:
> On 03.04.2014 21:56, Richard Henderson wrote:
>> The definition of op_type wasn't encoded for the proper shift for
>> the field, making the implementations confusing.
>>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
> 
> At the end of the day the magic values remain in the load/store instructions though.
> Can we find a way to replace them with INSN_-something like for the others?
> 
> I think I was doing something of the sort in a now obsolete patch I suggested some time early this year, see if it helps:
> 
> http://lists.gnu.org/archive/html/qemu-devel/2014-02/msg05074.html

Yes, we can.  I'll do something for v3,

> 
> Claudio
> 
>> ---
>>  tcg/aarch64/tcg-target.c | 42 +++++++++++++++++-------------------------
>>  1 file changed, 17 insertions(+), 25 deletions(-)
>>
>> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
>> index 9a2e4a6..a538a87 100644
>> --- a/tcg/aarch64/tcg-target.c
>> +++ b/tcg/aarch64/tcg-target.c
>> @@ -242,12 +242,12 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = {
>>      [TCG_COND_LEU] = COND_LS,
>>  };
>>  
>> -enum aarch64_ldst_op_type { /* type of operation */
>> -    LDST_ST = 0x0,    /* store */
>> -    LDST_LD = 0x4,    /* load */
>> -    LDST_LD_S_X = 0x8,  /* load and sign-extend into Xt */
>> -    LDST_LD_S_W = 0xc,  /* load and sign-extend into Wt */
>> -};
>> +typedef enum {
>> +    LDST_ST = 0,    /* store */
>> +    LDST_LD = 1,    /* load */
>> +    LDST_LD_S_X = 2,  /* load and sign-extend into Xt */
>> +    LDST_LD_S_W = 3,  /* load and sign-extend into Wt */
>> +} AArch64LdstType;
>>  
>>  /* We encode the format of the insn into the beginning of the name, so that
>>     we can have the preprocessor help "typecheck" the insn vs the output
>> @@ -483,22 +483,19 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>>  }
>>  
>>  
>> -static inline void tcg_out_ldst_9(TCGContext *s, TCGMemOp size,
>> -                                  enum aarch64_ldst_op_type op_type,
>> -                                  TCGReg rd, TCGReg rn, intptr_t offset)
>> +static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>> +                           TCGReg rd, TCGReg rn, intptr_t offset)
>>  {
>>      /* use LDUR with BASE register with 9bit signed unscaled offset */
>> -    tcg_out32(s, 0x38000000 | size << 30 | op_type << 20
>> +    tcg_out32(s, 0x38000000 | size << 30 | type << 22
>>                | (offset & 0x1ff) << 12 | rn << 5 | rd);
>>  }
>>  
>>  /* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
>> -static inline void tcg_out_ldst_12(TCGContext *s, TCGMemOp size,
>> -                                   enum aarch64_ldst_op_type op_type,
>> -                                   TCGReg rd, TCGReg rn,
>> -                                   tcg_target_ulong scaled_uimm)
>> +static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>> +                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
>>  {
>> -    tcg_out32(s, 0x39000000 | size << 30 | op_type << 20
>> +    tcg_out32(s, 0x39000000 | size << 30 | type << 22
>>                | scaled_uimm << 10 | rn << 5 | rd);
>>  }
>>  
>> @@ -637,21 +634,16 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>>      }
>>  }
>>  
>> -static inline void tcg_out_ldst_r(TCGContext *s, TCGMemOp size,
>> -                                  enum aarch64_ldst_op_type op_type,
>> -                                  TCGReg rd, TCGReg base, TCGReg regoff)
>> +static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>> +                           TCGReg rd, TCGReg base, TCGReg regoff)
>>  {
>> -    /* load from memory to register using base + 64bit register offset */
>> -    /* using f.e. STR Wt, [Xn, Xm] 0xb8600800|(regoff << 16)|(base << 5)|rd */
>> -    /* the 0x6000 is for the "no extend field" */
>> -    tcg_out32(s, 0x38206800 | size << 30 | op_type << 20
>> +    tcg_out32(s, 0x38206800 | size << 30 | type << 22
>>                | regoff << 16 | base << 5 | rd);
>>  }
>>  
>>  /* solve the whole ldst problem */
>> -static inline void tcg_out_ldst(TCGContext *s, TCGMemOp size,
>> -                                enum aarch64_ldst_op_type type,
>> -                                TCGReg rd, TCGReg rn, intptr_t offset)
>> +static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>> +                         TCGReg rd, TCGReg rn, intptr_t offset)
>>  {
>>      if (offset >= -256 && offset < 256) {
>>          tcg_out_ldst_9(s, size, type, rd, rn, offset);
>>
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes
  2014-04-07  7:58   ` Claudio Fontana
@ 2014-04-07 16:33     ` Richard Henderson
  0 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-07 16:33 UTC (permalink / raw)
  To: Claudio Fontana, qemu-devel; +Cc: Peter Maydell, claudio.fontana

On 04/07/2014 12:58 AM, Claudio Fontana wrote:
>> +                || (insn & 0x3bc00000) == 0x28400000   /* C3.3.7 */
> 
> I think the Load (L) bit should be 0 here so
> 
> == 0x28000000

Oops.  Fixed.

> 
>> +                || (insn & 0x3be00c00) == 0x38000400   /* C3.3.8 */
> 
> With V=1, an opc of 0b10 is also a write, I think. It's the 128bit FP/SIMD STR.

Exactly, that's why I'm masking it out, to ignore it.

 insn  =  size 1 1   1 v 0 0 ...
 mask  =   0 0 1 1   1 0 1 1 ...  = 0x3b...
 equal =   0 0 1 1   1 0 0 0 ...  = 0x38...


r~

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes Richard Henderson
  2014-04-07  7:58   ` Claudio Fontana
@ 2014-04-07 16:39   ` Peter Maydell
  1 sibling, 0 replies; 52+ messages in thread
From: Peter Maydell @ 2014-04-07 16:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, claudio.fontana

On 3 April 2014 20:56, Richard Henderson <rth@twiddle.net> wrote:
> Since the kernel doesn't pass any info on the reason for the fault,

There are now patches proposed to the kernel to supply this:
http://www.spinics.net/lists/arm-kernel/msg320268.html

thanks
-- PMM

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313
  2014-04-07 11:45   ` Claudio Fontana
  2014-04-07 14:31     ` Richard Henderson
@ 2014-04-07 18:34     ` Richard Henderson
  2014-04-08  9:00       ` Claudio Fontana
  2014-04-11 12:36       ` Claudio Fontana
  1 sibling, 2 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-07 18:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana

Merge TCGMemOp size, AArch64LdstType type and a few stray opcode bits
into a single I3312_* argument, eliminating some magic numbers from
helper functions.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 129 ++++++++++++++++++++++++++++-------------------
 1 file changed, 76 insertions(+), 53 deletions(-)
---

I'm not really sure how much clearer this is, especially since we do
have to re-extract the size within tcg_out_ldst.  But it does at least
eliminate some of the magic numbers within the helpers.

Thoughts?


r~



diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index ab4cd25..324a452 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -271,6 +271,28 @@ typedef enum {
     I3207_BLR       = 0xd63f0000,
     I3207_RET       = 0xd65f0000,
 
+    /* Load/store register.  Described here as 3.3.12, but the helper
+       that emits them can transform to 3.3.10 or 3.3.13.  */
+    I3312_STRB      = 0x38000000 | LDST_ST << 22 | MO_8 << 30,
+    I3312_STRH      = 0x38000000 | LDST_ST << 22 | MO_16 << 30,
+    I3312_STRW      = 0x38000000 | LDST_ST << 22 | MO_32 << 30,
+    I3312_STRX      = 0x38000000 | LDST_ST << 22 | MO_64 << 30,
+
+    I3312_LDRB      = 0x38000000 | LDST_LD << 22 | MO_8 << 30,
+    I3312_LDRH      = 0x38000000 | LDST_LD << 22 | MO_16 << 30,
+    I3312_LDRW      = 0x38000000 | LDST_LD << 22 | MO_32 << 30,
+    I3312_LDRX      = 0x38000000 | LDST_LD << 22 | MO_64 << 30,
+
+    I3312_LDRSBW    = 0x38000000 | LDST_LD_S_W << 22 | MO_8 << 30,
+    I3312_LDRSHW    = 0x38000000 | LDST_LD_S_W << 22 | MO_16 << 30,
+
+    I3312_LDRSBX    = 0x38000000 | LDST_LD_S_X << 22 | MO_8 << 30,
+    I3312_LDRSHX    = 0x38000000 | LDST_LD_S_X << 22 | MO_16 << 30,
+    I3312_LDRSWX    = 0x38000000 | LDST_LD_S_X << 22 | MO_32 << 30,
+
+    I3312_TO_I3310  = 0x00206800,
+    I3312_TO_I3313  = 0x01000000,
+
     /* Load/store register pair instructions.  */
     I3314_LDP       = 0x28400000,
     I3314_STP       = 0x28000000,
@@ -482,21 +504,25 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
     tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd);
 }
 
+static void tcg_out_insn_3310(TCGContext *s, AArch64Insn insn,
+                              TCGReg rd, TCGReg base, TCGReg regoff)
+{
+    /* Note the AArch64Insn constants above are for C3.3.12.  Adjust.  */
+    tcg_out32(s, insn | I3312_TO_I3310 | regoff << 16 | base << 5 | rd);
+}
+
 
-static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
-                           TCGReg rd, TCGReg rn, intptr_t offset)
+static void tcg_out_insn_3312(TCGContext *s, AArch64Insn insn,
+                              TCGReg rd, TCGReg rn, intptr_t offset)
 {
-    /* use LDUR with BASE register with 9bit signed unscaled offset */
-    tcg_out32(s, 0x38000000 | size << 30 | type << 22
-              | (offset & 0x1ff) << 12 | rn << 5 | rd);
+    tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | rd);
 }
 
-/* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
-static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
-                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
+static void tcg_out_insn_3313(TCGContext *s, AArch64Insn insn,
+                              TCGReg rd, TCGReg rn, uintptr_t scaled_uimm)
 {
-    tcg_out32(s, 0x39000000 | size << 30 | type << 22
-              | scaled_uimm << 10 | rn << 5 | rd);
+    /* Note the AArch64Insn constants above are for C3.3.12.  Adjust.  */
+    tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 | rn << 5 | rd);
 }
 
 /* Register to register move using ORR (shifted register with no shift). */
@@ -634,35 +660,32 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
     }
 }
 
-static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
-                           TCGReg rd, TCGReg base, TCGReg regoff)
-{
-    tcg_out32(s, 0x38206800 | size << 30 | type << 22
-              | regoff << 16 | base << 5 | rd);
-}
+/* Define something more legible for general use.  */
+#define tcg_out_ldst_r  tcg_out_insn_3310
 
-/* solve the whole ldst problem */
-static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
+static void tcg_out_ldst(TCGContext *s, AArch64Insn insn,
                          TCGReg rd, TCGReg rn, intptr_t offset)
 {
+    TCGMemOp size = (uint32_t)insn >> 30;
+
     /* If the offset is naturally aligned and in range, then we can
        use the scaled uimm12 encoding */
     if (offset >= 0 && !(offset & ((1 << size) - 1))) {
-        tcg_target_ulong scaled_uimm = offset >> size;
+        uintptr_t scaled_uimm = offset >> size;
         if (scaled_uimm <= 0xfff) {
-            tcg_out_ldst_12(s, size, type, rd, rn, scaled_uimm);
+            tcg_out_insn_3313(s, insn, rd, rn, scaled_uimm);
             return;
         }
     }
 
     if (offset >= -256 && offset < 256) {
-        tcg_out_ldst_9(s, size, type, rd, rn, offset);
+        tcg_out_insn_3312(s, insn, rd, rn, offset);
         return;
     }
 
     /* Worst-case scenario, move offset to temp register, use reg offset.  */
     tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
-    tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
+    tcg_out_ldst_r(s, insn, rd, rn, TCG_REG_TMP);
 }
 
 static inline void tcg_out_mov(TCGContext *s,
@@ -676,14 +699,14 @@ static inline void tcg_out_mov(TCGContext *s,
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_LD,
+    tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_LDRW : I3312_LDRX,
                  arg, arg1, arg2);
 }
 
 static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_ST,
+    tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_STRW : I3312_STRX,
                  arg, arg1, arg2);
 }
 
@@ -1081,12 +1104,12 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
 
     /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
        X0 = load [X2 + (tlb_offset & 0x000fff)] */
-    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? MO_64 : MO_32,
-                 LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
+    tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX,
+                 TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
 
     /* Load the tlb addend. Do that early to avoid stalling.
        X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
-    tcg_out_ldst(s, MO_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
+    tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2,
                  (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
                  (is_read ? offsetof(CPUTLBEntry, addr_read)
                   : offsetof(CPUTLBEntry, addr_write)));
@@ -1108,43 +1131,43 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop,
 
     switch (memop & MO_SSIZE) {
     case MO_UB:
-        tcg_out_ldst_r(s, MO_8, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_LDRB, data_r, addr_r, off_r);
         break;
     case MO_SB:
-        tcg_out_ldst_r(s, MO_8, LDST_LD_S_X, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_LDRSBX, data_r, addr_r, off_r);
         break;
     case MO_UW:
-        tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, off_r);
         if (bswap) {
             tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
     case MO_SW:
         if (bswap) {
-            tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, off_r);
             tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
             tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
         } else {
-            tcg_out_ldst_r(s, MO_16, LDST_LD_S_X, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, I3312_LDRSHX, data_r, addr_r, off_r);
         }
         break;
     case MO_UL:
-        tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, off_r);
         if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
         }
         break;
     case MO_SL:
         if (bswap) {
-            tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, off_r);
             tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
             tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
         } else {
-            tcg_out_ldst_r(s, MO_32, LDST_LD_S_X, data_r, addr_r, off_r);
+            tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, off_r);
         }
         break;
     case MO_Q:
-        tcg_out_ldst_r(s, MO_64, LDST_LD, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_LDRX, data_r, addr_r, off_r);
         if (bswap) {
             tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
         }
@@ -1161,28 +1184,28 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
 
     switch (memop & MO_SIZE) {
     case MO_8:
-        tcg_out_ldst_r(s, MO_8, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_STRB, data_r, addr_r, off_r);
         break;
     case MO_16:
         if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
-        tcg_out_ldst_r(s, MO_16, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_STRH, data_r, addr_r, off_r);
         break;
     case MO_32:
         if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
-        tcg_out_ldst_r(s, MO_32, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_STRW, data_r, addr_r, off_r);
         break;
     case MO_64:
         if (bswap && data_r != TCG_REG_XZR) {
             tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
             data_r = TCG_REG_TMP;
         }
-        tcg_out_ldst_r(s, MO_64, LDST_ST, data_r, addr_r, off_r);
+        tcg_out_ldst_r(s, I3312_STRX, data_r, addr_r, off_r);
         break;
     default:
         tcg_abort();
@@ -1275,49 +1298,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8u_i64:
-        tcg_out_ldst(s, MO_8, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRB, a0, a1, a2);
         break;
     case INDEX_op_ld8s_i32:
-        tcg_out_ldst(s, MO_8, LDST_LD_S_W, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2);
         break;
     case INDEX_op_ld8s_i64:
-        tcg_out_ldst(s, MO_8, LDST_LD_S_X, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2);
         break;
     case INDEX_op_ld16u_i32:
     case INDEX_op_ld16u_i64:
-        tcg_out_ldst(s, MO_16, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRH, a0, a1, a2);
         break;
     case INDEX_op_ld16s_i32:
-        tcg_out_ldst(s, MO_16, LDST_LD_S_W, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2);
         break;
     case INDEX_op_ld16s_i64:
-        tcg_out_ldst(s, MO_16, LDST_LD_S_X, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2);
         break;
     case INDEX_op_ld_i32:
     case INDEX_op_ld32u_i64:
-        tcg_out_ldst(s, MO_32, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRW, a0, a1, a2);
         break;
     case INDEX_op_ld32s_i64:
-        tcg_out_ldst(s, MO_32, LDST_LD_S_X, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2);
         break;
     case INDEX_op_ld_i64:
-        tcg_out_ldst(s, MO_64, LDST_LD, a0, a1, a2);
+        tcg_out_ldst(s, I3312_LDRX, a0, a1, a2);
         break;
 
     case INDEX_op_st8_i32:
     case INDEX_op_st8_i64:
-        tcg_out_ldst(s, MO_8, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2);
         break;
     case INDEX_op_st16_i32:
     case INDEX_op_st16_i64:
-        tcg_out_ldst(s, MO_16, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2);
         break;
     case INDEX_op_st_i32:
     case INDEX_op_st32_i64:
-        tcg_out_ldst(s, MO_32, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2);
         break;
     case INDEX_op_st_i64:
-        tcg_out_ldst(s, MO_64, LDST_ST, REG0(0), a1, a2);
+        tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2);
         break;
 
     case INDEX_op_add_i32:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313
  2014-04-07 18:34     ` [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313 Richard Henderson
@ 2014-04-08  9:00       ` Claudio Fontana
  2014-04-11 12:36       ` Claudio Fontana
  1 sibling, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-08  9:00 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 07.04.2014 20:34, Richard Henderson wrote:
> Merge TCGMemOp size, AArch64LdstType type and a few stray opcode bits
> into a single I3312_* argument, eliminating some magic numbers from
> helper functions.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 129 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 76 insertions(+), 53 deletions(-)
> ---
> 
> I'm not really sure how much clearer this is, especially since we do
> have to re-extract the size within tcg_out_ldst.  But it does at least
> eliminate some of the magic numbers within the helpers.
> 
> Thoughts?

Looks good to me, I'll get to it in more detail later.

C.

> 
> 
> r~
> 
> 
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index ab4cd25..324a452 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -271,6 +271,28 @@ typedef enum {
>      I3207_BLR       = 0xd63f0000,
>      I3207_RET       = 0xd65f0000,
>  
> +    /* Load/store register.  Described here as 3.3.12, but the helper
> +       that emits them can transform to 3.3.10 or 3.3.13.  */
> +    I3312_STRB      = 0x38000000 | LDST_ST << 22 | MO_8 << 30,
> +    I3312_STRH      = 0x38000000 | LDST_ST << 22 | MO_16 << 30,
> +    I3312_STRW      = 0x38000000 | LDST_ST << 22 | MO_32 << 30,
> +    I3312_STRX      = 0x38000000 | LDST_ST << 22 | MO_64 << 30,
> +
> +    I3312_LDRB      = 0x38000000 | LDST_LD << 22 | MO_8 << 30,
> +    I3312_LDRH      = 0x38000000 | LDST_LD << 22 | MO_16 << 30,
> +    I3312_LDRW      = 0x38000000 | LDST_LD << 22 | MO_32 << 30,
> +    I3312_LDRX      = 0x38000000 | LDST_LD << 22 | MO_64 << 30,
> +
> +    I3312_LDRSBW    = 0x38000000 | LDST_LD_S_W << 22 | MO_8 << 30,
> +    I3312_LDRSHW    = 0x38000000 | LDST_LD_S_W << 22 | MO_16 << 30,
> +
> +    I3312_LDRSBX    = 0x38000000 | LDST_LD_S_X << 22 | MO_8 << 30,
> +    I3312_LDRSHX    = 0x38000000 | LDST_LD_S_X << 22 | MO_16 << 30,
> +    I3312_LDRSWX    = 0x38000000 | LDST_LD_S_X << 22 | MO_32 << 30,
> +
> +    I3312_TO_I3310  = 0x00206800,
> +    I3312_TO_I3313  = 0x01000000,
> +
>      /* Load/store register pair instructions.  */
>      I3314_LDP       = 0x28400000,
>      I3314_STP       = 0x28000000,
> @@ -482,21 +504,25 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>      tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd);
>  }
>  
> +static void tcg_out_insn_3310(TCGContext *s, AArch64Insn insn,
> +                              TCGReg rd, TCGReg base, TCGReg regoff)
> +{
> +    /* Note the AArch64Insn constants above are for C3.3.12.  Adjust.  */
> +    tcg_out32(s, insn | I3312_TO_I3310 | regoff << 16 | base << 5 | rd);
> +}
> +
>  
> -static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> -                           TCGReg rd, TCGReg rn, intptr_t offset)
> +static void tcg_out_insn_3312(TCGContext *s, AArch64Insn insn,
> +                              TCGReg rd, TCGReg rn, intptr_t offset)
>  {
> -    /* use LDUR with BASE register with 9bit signed unscaled offset */
> -    tcg_out32(s, 0x38000000 | size << 30 | type << 22
> -              | (offset & 0x1ff) << 12 | rn << 5 | rd);
> +    tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | rd);
>  }
>  
> -/* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
> -static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> -                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
> +static void tcg_out_insn_3313(TCGContext *s, AArch64Insn insn,
> +                              TCGReg rd, TCGReg rn, uintptr_t scaled_uimm)
>  {
> -    tcg_out32(s, 0x39000000 | size << 30 | type << 22
> -              | scaled_uimm << 10 | rn << 5 | rd);
> +    /* Note the AArch64Insn constants above are for C3.3.12.  Adjust.  */
> +    tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 | rn << 5 | rd);
>  }
>  
>  /* Register to register move using ORR (shifted register with no shift). */
> @@ -634,35 +660,32 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>      }
>  }
>  
> -static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> -                           TCGReg rd, TCGReg base, TCGReg regoff)
> -{
> -    tcg_out32(s, 0x38206800 | size << 30 | type << 22
> -              | regoff << 16 | base << 5 | rd);
> -}
> +/* Define something more legible for general use.  */
> +#define tcg_out_ldst_r  tcg_out_insn_3310
>  
> -/* solve the whole ldst problem */
> -static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> +static void tcg_out_ldst(TCGContext *s, AArch64Insn insn,
>                           TCGReg rd, TCGReg rn, intptr_t offset)
>  {
> +    TCGMemOp size = (uint32_t)insn >> 30;
> +
>      /* If the offset is naturally aligned and in range, then we can
>         use the scaled uimm12 encoding */
>      if (offset >= 0 && !(offset & ((1 << size) - 1))) {
> -        tcg_target_ulong scaled_uimm = offset >> size;
> +        uintptr_t scaled_uimm = offset >> size;
>          if (scaled_uimm <= 0xfff) {
> -            tcg_out_ldst_12(s, size, type, rd, rn, scaled_uimm);
> +            tcg_out_insn_3313(s, insn, rd, rn, scaled_uimm);
>              return;
>          }
>      }
>  
>      if (offset >= -256 && offset < 256) {
> -        tcg_out_ldst_9(s, size, type, rd, rn, offset);
> +        tcg_out_insn_3312(s, insn, rd, rn, offset);
>          return;
>      }
>  
>      /* Worst-case scenario, move offset to temp register, use reg offset.  */
>      tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
> -    tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
> +    tcg_out_ldst_r(s, insn, rd, rn, TCG_REG_TMP);
>  }
>  
>  static inline void tcg_out_mov(TCGContext *s,
> @@ -676,14 +699,14 @@ static inline void tcg_out_mov(TCGContext *s,
>  static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>                                TCGReg arg1, intptr_t arg2)
>  {
> -    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_LD,
> +    tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_LDRW : I3312_LDRX,
>                   arg, arg1, arg2);
>  }
>  
>  static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>                                TCGReg arg1, intptr_t arg2)
>  {
> -    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_ST,
> +    tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_STRW : I3312_STRX,
>                   arg, arg1, arg2);
>  }
>  
> @@ -1081,12 +1104,12 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
>  
>      /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
>         X0 = load [X2 + (tlb_offset & 0x000fff)] */
> -    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? MO_64 : MO_32,
> -                 LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
> +    tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX,
> +                 TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
>  
>      /* Load the tlb addend. Do that early to avoid stalling.
>         X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
> -    tcg_out_ldst(s, MO_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
> +    tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2,
>                   (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
>                   (is_read ? offsetof(CPUTLBEntry, addr_read)
>                    : offsetof(CPUTLBEntry, addr_write)));
> @@ -1108,43 +1131,43 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop,
>  
>      switch (memop & MO_SSIZE) {
>      case MO_UB:
> -        tcg_out_ldst_r(s, MO_8, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRB, data_r, addr_r, off_r);
>          break;
>      case MO_SB:
> -        tcg_out_ldst_r(s, MO_8, LDST_LD_S_X, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRSBX, data_r, addr_r, off_r);
>          break;
>      case MO_UW:
> -        tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
>          }
>          break;
>      case MO_SW:
>          if (bswap) {
> -            tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, off_r);
>              tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
>              tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
>          } else {
> -            tcg_out_ldst_r(s, MO_16, LDST_LD_S_X, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRSHX, data_r, addr_r, off_r);
>          }
>          break;
>      case MO_UL:
> -        tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
>          }
>          break;
>      case MO_SL:
>          if (bswap) {
> -            tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, off_r);
>              tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
>              tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
>          } else {
> -            tcg_out_ldst_r(s, MO_32, LDST_LD_S_X, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, off_r);
>          }
>          break;
>      case MO_Q:
> -        tcg_out_ldst_r(s, MO_64, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRX, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
>          }
> @@ -1161,28 +1184,28 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
>  
>      switch (memop & MO_SIZE) {
>      case MO_8:
> -        tcg_out_ldst_r(s, MO_8, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRB, data_r, addr_r, off_r);
>          break;
>      case MO_16:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, MO_16, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRH, data_r, addr_r, off_r);
>          break;
>      case MO_32:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, MO_32, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRW, data_r, addr_r, off_r);
>          break;
>      case MO_64:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, MO_64, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRX, data_r, addr_r, off_r);
>          break;
>      default:
>          tcg_abort();
> @@ -1275,49 +1298,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_ld8u_i32:
>      case INDEX_op_ld8u_i64:
> -        tcg_out_ldst(s, MO_8, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRB, a0, a1, a2);
>          break;
>      case INDEX_op_ld8s_i32:
> -        tcg_out_ldst(s, MO_8, LDST_LD_S_W, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2);
>          break;
>      case INDEX_op_ld8s_i64:
> -        tcg_out_ldst(s, MO_8, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2);
>          break;
>      case INDEX_op_ld16u_i32:
>      case INDEX_op_ld16u_i64:
> -        tcg_out_ldst(s, MO_16, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRH, a0, a1, a2);
>          break;
>      case INDEX_op_ld16s_i32:
> -        tcg_out_ldst(s, MO_16, LDST_LD_S_W, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2);
>          break;
>      case INDEX_op_ld16s_i64:
> -        tcg_out_ldst(s, MO_16, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2);
>          break;
>      case INDEX_op_ld_i32:
>      case INDEX_op_ld32u_i64:
> -        tcg_out_ldst(s, MO_32, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRW, a0, a1, a2);
>          break;
>      case INDEX_op_ld32s_i64:
> -        tcg_out_ldst(s, MO_32, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2);
>          break;
>      case INDEX_op_ld_i64:
> -        tcg_out_ldst(s, MO_64, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRX, a0, a1, a2);
>          break;
>  
>      case INDEX_op_st8_i32:
>      case INDEX_op_st8_i64:
> -        tcg_out_ldst(s, MO_8, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st16_i32:
>      case INDEX_op_st16_i64:
> -        tcg_out_ldst(s, MO_16, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st_i32:
>      case INDEX_op_st32_i64:
> -        tcg_out_ldst(s, MO_32, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st_i64:
> -        tcg_out_ldst(s, MO_64, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2);
>          break;
>  
>      case INDEX_op_add_i32:
> 


-- 
Claudio Fontana
Server Virtualization Architect
Huawei Technologies Duesseldorf GmbH
Riesstraße 25 - 80992 München

office: +49 89 158834 4135
mobile: +49 15253060158

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507 Richard Henderson
@ 2014-04-09 12:54   ` Claudio Fontana
  2014-04-09 17:17     ` Richard Henderson
  2014-04-11 12:36   ` Claudio Fontana
  1 sibling, 1 reply; 52+ messages in thread
From: Claudio Fontana @ 2014-04-09 12:54 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Cleaning up the implementation of REV and REV16 at the same time.
> 
> Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)

During testing I found this patch causes a regression for big endian targets (sparc).

Can you take a look?
I think it might be related to the extended form of the REV instruction needing
an additional 0x400. See below.

> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index caaf8a2..de7490d 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -327,6 +327,10 @@ typedef enum {
>      I3506_CSEL      = 0x1a800000,
>      I3506_CSINC     = 0x1a800400,
>  
> +    /* Data-processing (1 source) instructions.  */
> +    I3507_REV16     = 0x5ac00400,
> +    I3507_REV       = 0x5ac00800,
> +
>      /* Data-processing (2 source) instructions.  */
>      I3508_LSLV      = 0x1ac02000,
>      I3508_LSRV      = 0x1ac02400,
> @@ -545,6 +549,12 @@ static void tcg_out_insn_3506(TCGContext *s, AArch64Insn insn, TCGType ext,
>                | tcg_cond_to_aarch64[c] << 12);
>  }
>  
> +static void tcg_out_insn_3507(TCGContext *s, AArch64Insn insn, TCGType ext,
> +                              TCGReg rd, TCGReg rn)
> +{
> +    tcg_out32(s, insn | ext << 31 | rn << 5 | rd);
> +}
> +
>  static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>                                TCGReg rd, TCGReg rn, TCGReg rm, TCGReg ra)
>  {
> @@ -961,19 +971,15 @@ static void tcg_out_brcond(TCGContext *s, TCGMemOp ext, TCGCond c, TCGArg a,
>  }
>  
>  static inline void tcg_out_rev(TCGContext *s, TCGType ext,
> -                               TCGReg rd, TCGReg rm)
> +                               TCGReg rd, TCGReg rn)
>  {
> -    /* using REV 0x5ac00800 */
> -    unsigned int base = ext ? 0xdac00c00 : 0x5ac00800;

see the extended form 0xdac00c00 <-


> -    tcg_out32(s, base | rm << 5 | rd);
> +    tcg_out_insn(s, 3507, REV, ext, rd, rn);
>  }
>  
>  static inline void tcg_out_rev16(TCGContext *s, TCGType ext,
> -                                 TCGReg rd, TCGReg rm)
> +                                 TCGReg rd, TCGReg rn)
>  {
> -    /* using REV16 0x5ac00400 */
> -    unsigned int base = ext ? 0xdac00400 : 0x5ac00400;
> -    tcg_out32(s, base | rm << 5 | rd);
> +    tcg_out_insn(s, 3507, REV16, ext, rd, rn);

while this does not have it.

>  }
>  
>  static inline void tcg_out_sxt(TCGContext *s, TCGType ext, TCGMemOp s_bits,
> 

Ciao,

Claudio

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507
  2014-04-09 12:54   ` Claudio Fontana
@ 2014-04-09 17:17     ` Richard Henderson
  0 siblings, 0 replies; 52+ messages in thread
From: Richard Henderson @ 2014-04-09 17:17 UTC (permalink / raw)
  To: Claudio Fontana, qemu-devel; +Cc: claudio.fontana

On 04/09/2014 05:54 AM, Claudio Fontana wrote:
> During testing I found this patch causes a regression for big endian targets (sparc).
> 
> Can you take a look?
> I think it might be related to the extended form of the REV instruction needing
> an additional 0x400. See below.

You're right.  It's disassembling as "rev32 x0, x0".

Bizzarely, sparc32 bios was working.  I guess it only uses 64-bit load/store
for ldd/std for register pair save and restore.  And since we rev'ed them the
same way for load/store, it worked.

Uploading a full mipseb system image to test now.


r~

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code Richard Henderson
  2014-04-07  8:03   ` Claudio Fontana
@ 2014-04-11 12:33   ` Claudio Fontana
  1 sibling, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> It's obviously call-clobbered, but is otherwise unused.
> Repurpose it as the TCG temporary.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 34 ++++++++++++++++------------------
>  tcg/aarch64/tcg-target.h | 32 +++++++++++++++++---------------
>  2 files changed, 33 insertions(+), 33 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 48a246d..e36909e 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -23,10 +23,7 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>      "%x0", "%x1", "%x2", "%x3", "%x4", "%x5", "%x6", "%x7",
>      "%x8", "%x9", "%x10", "%x11", "%x12", "%x13", "%x14", "%x15",
>      "%x16", "%x17", "%x18", "%x19", "%x20", "%x21", "%x22", "%x23",
> -    "%x24", "%x25", "%x26", "%x27", "%x28",
> -    "%fp", /* frame pointer */
> -    "%lr", /* link register */
> -    "%sp",  /* stack pointer */
> +    "%x24", "%x25", "%x26", "%x27", "%x28", "%fp", "%x30", "%sp",
>  };
>  #endif /* NDEBUG */
>  
> @@ -41,16 +38,17 @@ static const int tcg_target_reg_alloc_order[] = {
>      TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
>      TCG_REG_X28, /* we will reserve this for GUEST_BASE if configured */
>  
> -    TCG_REG_X9, TCG_REG_X10, TCG_REG_X11, TCG_REG_X12,
> -    TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
> +    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
> +    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
>      TCG_REG_X16, TCG_REG_X17,
>  
> -    TCG_REG_X18, TCG_REG_X19, /* will not use these, see tcg_target_init */
> -
>      TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
>      TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
>  
> -    TCG_REG_X8, /* will not use, see tcg_target_init */
> +    /* X18 reserved by system */
> +    /* X19 reserved for AREG0 */
> +    /* X29 reserved as fp */
> +    /* X30 reserved as temporary */
>  };
>  
>  static const int tcg_target_call_iarg_regs[8] = {
> @@ -61,13 +59,13 @@ static const int tcg_target_call_oarg_regs[1] = {
>      TCG_REG_X0
>  };
>  
> -#define TCG_REG_TMP TCG_REG_X8
> +#define TCG_REG_TMP TCG_REG_X30
>  
>  #ifndef CONFIG_SOFTMMU
> -# if defined(CONFIG_USE_GUEST_BASE)
> -# define TCG_REG_GUEST_BASE TCG_REG_X28
> +# ifdef CONFIG_USE_GUEST_BASE
> +#  define TCG_REG_GUEST_BASE TCG_REG_X28
>  # else
> -# define TCG_REG_GUEST_BASE TCG_REG_XZR
> +#  define TCG_REG_GUEST_BASE TCG_REG_XZR
>  # endif
>  #endif
>  
> @@ -1871,7 +1869,7 @@ static void tcg_target_init(TCGContext *s)
>                       (1 << TCG_REG_X12) | (1 << TCG_REG_X13) |
>                       (1 << TCG_REG_X14) | (1 << TCG_REG_X15) |
>                       (1 << TCG_REG_X16) | (1 << TCG_REG_X17) |
> -                     (1 << TCG_REG_X18));
> +                     (1 << TCG_REG_X18) | (1 << TCG_REG_X30));
>  
>      tcg_regset_clear(s->reserved_regs);
>      tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
> @@ -1902,13 +1900,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>      tcg_out_push_pair(s, TCG_REG_SP,
>                        TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
>  
> -    /* FP -> callee_saved */
> +    /* Set up frame pointer for canonical unwinding.  */
>      tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
>  
> -    /* store callee-preserved regs x19..x28 using FP -> callee_saved */
> +    /* Store callee-preserved regs x19..x28.  */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
>          int idx = (r - TCG_REG_X19) / 2 + 1;
> -        tcg_out_store_pair(s, TCG_REG_FP, r, r + 1, idx);
> +        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
>      }
>  
>      /* Make stack space for TCG locals.  */
> @@ -1939,7 +1937,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>         FP must be preserved, so it still points to callee_saved area */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
>          int idx = (r - TCG_REG_X19) / 2 + 1;
> -        tcg_out_load_pair(s, TCG_REG_FP, r, r + 1, idx);
> +        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
>      }
>  
>      /* pop (FP, LR), restore SP to previous frame, return */
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 988983e..faccc36 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -17,17 +17,23 @@
>  #undef TCG_TARGET_STACK_GROWSUP
>  
>  typedef enum {
> -    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3, TCG_REG_X4,
> -    TCG_REG_X5, TCG_REG_X6, TCG_REG_X7, TCG_REG_X8, TCG_REG_X9,
> -    TCG_REG_X10, TCG_REG_X11, TCG_REG_X12, TCG_REG_X13, TCG_REG_X14,
> -    TCG_REG_X15, TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
> -    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23, TCG_REG_X24,
> -    TCG_REG_X25, TCG_REG_X26, TCG_REG_X27, TCG_REG_X28,
> -    TCG_REG_FP,  /* frame pointer */
> -    TCG_REG_LR, /* link register */
> -    TCG_REG_SP,  /* stack pointer or zero register */
> -    TCG_REG_XZR = TCG_REG_SP /* same register number */
> -    /* program counter is not directly accessible! */
> +    TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
> +    TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7,
> +    TCG_REG_X8, TCG_REG_X9, TCG_REG_X10, TCG_REG_X11,
> +    TCG_REG_X12, TCG_REG_X13, TCG_REG_X14, TCG_REG_X15,
> +    TCG_REG_X16, TCG_REG_X17, TCG_REG_X18, TCG_REG_X19,
> +    TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
> +    TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
> +    TCG_REG_X28, TCG_REG_X29, TCG_REG_X30,
> +
> +    /* X31 is either the stack pointer or zero, depending on context.  */
> +    TCG_REG_SP = 31,
> +    TCG_REG_XZR = 31,
> +
> +    /* Aliases.  */
> +    TCG_REG_FP = TCG_REG_X29,
> +    TCG_REG_LR = TCG_REG_X30,
> +    TCG_AREG0  = TCG_REG_X19,
>  } TCGReg;
>  
>  #define TCG_TARGET_NB_REGS 32
> @@ -92,10 +98,6 @@ typedef enum {
>  #define TCG_TARGET_HAS_muluh_i64        1
>  #define TCG_TARGET_HAS_mulsh_i64        1
>  
> -enum {
> -    TCG_AREG0 = TCG_REG_X19,
> -};
> -
>  #define TCG_TARGET_HAS_new_ldst         0
>  
>  static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 12/26] tcg-aarch64: Introduce tcg_out_insn_3314
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 12/26] tcg-aarch64: Introduce tcg_out_insn_3314 Richard Henderson
@ 2014-04-11 12:34   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Combines 4 other inline functions and tidies the prologue.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 100 ++++++++++++++++-------------------------------
>  1 file changed, 33 insertions(+), 67 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index e36909e..5cffe50 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -284,6 +284,10 @@ typedef enum {
>      I3207_BLR       = 0xd63f0000,
>      I3207_RET       = 0xd65f0000,
>  
> +    /* Load/store register pair instructions.  */
> +    I3314_LDP       = 0x28400000,
> +    I3314_STP       = 0x28000000,
> +
>      /* Add/subtract immediate instructions.  */
>      I3401_ADDI      = 0x11000000,
>      I3401_ADDSI     = 0x31000000,
> @@ -457,6 +461,20 @@ static void tcg_out_insn_3207(TCGContext *s, AArch64Insn insn, TCGReg rn)
>      tcg_out32(s, insn | rn << 5);
>  }
>  
> +static void tcg_out_insn_3314(TCGContext *s, AArch64Insn insn,
> +                              TCGReg r1, TCGReg r2, TCGReg rn,
> +                              tcg_target_long ofs, bool pre, bool w)
> +{
> +    insn |= 1u << 31; /* ext */
> +    insn |= pre << 24;
> +    insn |= w << 23;
> +
> +    assert(ofs >= -0x200 && ofs < 0x200 && (ofs & 7) == 0);
> +    insn |= (ofs & (0x7f << 3)) << (15 - 3);
> +
> +    tcg_out32(s, insn | r2 << 10 | rn << 5 | r1);
> +}
> +
>  static void tcg_out_insn_3401(TCGContext *s, AArch64Insn insn, TCGType ext,
>                                TCGReg rd, TCGReg rn, uint64_t aimm)
>  {
> @@ -1292,56 +1310,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>  
>  static uint8_t *tb_ret_addr;
>  
> -/* callee stack use example:
> -   stp     x29, x30, [sp,#-32]!
> -   mov     x29, sp
> -   stp     x1, x2, [sp,#16]
> -   ...
> -   ldp     x1, x2, [sp,#16]
> -   ldp     x29, x30, [sp],#32
> -   ret
> -*/
> -
> -/* push r1 and r2, and alloc stack space for a total of
> -   alloc_n elements (1 element=16 bytes, must be between 1 and 31. */
> -static inline void tcg_out_push_pair(TCGContext *s, TCGReg addr,
> -                                     TCGReg r1, TCGReg r2, int alloc_n)
> -{
> -    /* using indexed scaled simm7 STP 0x28800000 | (ext) | 0x01000000 (pre-idx)
> -       | alloc_n * (-1) << 16 | r2 << 10 | addr << 5 | r1 */
> -    assert(alloc_n > 0 && alloc_n < 0x20);
> -    alloc_n = (-alloc_n) & 0x3f;
> -    tcg_out32(s, 0xa9800000 | alloc_n << 16 | r2 << 10 | addr << 5 | r1);
> -}
> -
> -/* dealloc stack space for a total of alloc_n elements and pop r1, r2.  */
> -static inline void tcg_out_pop_pair(TCGContext *s, TCGReg addr,
> -                                    TCGReg r1, TCGReg r2, int alloc_n)
> -{
> -    /* using indexed scaled simm7 LDP 0x28c00000 | (ext) | nothing (post-idx)
> -       | alloc_n << 16 | r2 << 10 | addr << 5 | r1 */
> -    assert(alloc_n > 0 && alloc_n < 0x20);
> -    tcg_out32(s, 0xa8c00000 | alloc_n << 16 | r2 << 10 | addr << 5 | r1);
> -}
> -
> -static inline void tcg_out_store_pair(TCGContext *s, TCGReg addr,
> -                                      TCGReg r1, TCGReg r2, int idx)
> -{
> -    /* using register pair offset simm7 STP 0x29000000 | (ext)
> -       | idx << 16 | r2 << 10 | addr << 5 | r1 */
> -    assert(idx > 0 && idx < 0x20);
> -    tcg_out32(s, 0xa9000000 | idx << 16 | r2 << 10 | addr << 5 | r1);
> -}
> -
> -static inline void tcg_out_load_pair(TCGContext *s, TCGReg addr,
> -                                     TCGReg r1, TCGReg r2, int idx)
> -{
> -    /* using register pair offset simm7 LDP 0x29400000 | (ext)
> -       | idx << 16 | r2 << 10 | addr << 5 | r1 */
> -    assert(idx > 0 && idx < 0x20);
> -    tcg_out32(s, 0xa9400000 | idx << 16 | r2 << 10 | addr << 5 | r1);
> -}
> -
>  static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>                         const TCGArg args[TCG_MAX_OP_ARGS],
>                         const int const_args[TCG_MAX_OP_ARGS])
> @@ -1887,33 +1855,32 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>      TCGReg r;
>  
>      /* save pairs             (FP, LR) and (X19, X20) .. (X27, X28) */
> -    frame_size_callee_saved = (1) + (TCG_REG_X28 - TCG_REG_X19) / 2 + 1;
> +    frame_size_callee_saved = 16 + (TCG_REG_X28 - TCG_REG_X19 + 1) * 8;
>  
>      /* frame size requirement for TCG local variables */
>      frame_size_tcg_locals = TCG_STATIC_CALL_ARGS_SIZE
>          + CPU_TEMP_BUF_NLONGS * sizeof(long)
>          + (TCG_TARGET_STACK_ALIGN - 1);
>      frame_size_tcg_locals &= ~(TCG_TARGET_STACK_ALIGN - 1);
> -    frame_size_tcg_locals /= TCG_TARGET_STACK_ALIGN;
>  
> -    /* push (FP, LR) and update sp */
> -    tcg_out_push_pair(s, TCG_REG_SP,
> -                      TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
> +    /* Push (FP, LR) and allocate space for all saved registers.  */
> +    tcg_out_insn(s, 3314, STP, TCG_REG_FP, TCG_REG_LR,
> +                 TCG_REG_SP, -frame_size_callee_saved, 1, 1);
>  
>      /* Set up frame pointer for canonical unwinding.  */
>      tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
>  
>      /* Store callee-preserved regs x19..x28.  */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
> -        int idx = (r - TCG_REG_X19) / 2 + 1;
> -        tcg_out_store_pair(s, TCG_REG_SP, r, r + 1, idx);
> +        int ofs = (r - TCG_REG_X19 + 2) * 8;
> +        tcg_out_insn(s, 3314, STP, r, r + 1, TCG_REG_SP, ofs, 1, 0);
>      }
>  
>      /* Make stack space for TCG locals.  */
>      tcg_out_insn(s, 3401, SUBI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
> -                 frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
> +                 frame_size_tcg_locals);
>  
> -    /* inform TCG about how to find TCG locals with register, offset, size */
> +    /* Inform TCG about how to find TCG locals with register, offset, size.  */
>      tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE,
>                    CPU_TEMP_BUF_NLONGS * sizeof(long));
>  
> @@ -1931,17 +1898,16 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>  
>      /* Remove TCG locals stack space.  */
>      tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
> -                 frame_size_tcg_locals * TCG_TARGET_STACK_ALIGN);
> +                 frame_size_tcg_locals);
>  
> -    /* restore registers x19..x28.
> -       FP must be preserved, so it still points to callee_saved area */
> +    /* Restore registers x19..x28.  */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
> -        int idx = (r - TCG_REG_X19) / 2 + 1;
> -        tcg_out_load_pair(s, TCG_REG_SP, r, r + 1, idx);
> +        int ofs = (r - TCG_REG_X19 + 2) * 8;
> +        tcg_out_insn(s, 3314, LDP, r, r + 1, TCG_REG_SP, ofs, 1, 0);
>      }
>  
> -    /* pop (FP, LR), restore SP to previous frame, return */
> -    tcg_out_pop_pair(s, TCG_REG_SP,
> -                     TCG_REG_FP, TCG_REG_LR, frame_size_callee_saved);
> +    /* Pop (FP, LR), restore SP to previous frame.  */
> +    tcg_out_insn(s, 3314, LDP, TCG_REG_FP, TCG_REG_LR,
> +                 TCG_REG_SP, frame_size_callee_saved, 0, 1);
>      tcg_out_insn(s, 3207, RET, TCG_REG_LR);
>  }
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 13/26] tcg-aarch64: Implement tcg_register_jit
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 13/26] tcg-aarch64: Implement tcg_register_jit Richard Henderson
@ 2014-04-11 12:34   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 84 +++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 69 insertions(+), 15 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 5cffe50..4414bd1 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1848,24 +1848,29 @@ static void tcg_target_init(TCGContext *s)
>      tcg_add_target_add_op_defs(aarch64_op_defs);
>  }
>  
> +/* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
> +#define PUSH_SIZE  ((30 - 19 + 1) * 8)
> +
> +#define FRAME_SIZE \
> +    ((PUSH_SIZE \
> +      + TCG_STATIC_CALL_ARGS_SIZE \
> +      + CPU_TEMP_BUF_NLONGS * sizeof(long) \
> +      + TCG_TARGET_STACK_ALIGN - 1) \
> +     & ~(TCG_TARGET_STACK_ALIGN - 1))
> +
> +/* We're expecting a 2 byte uleb128 encoded value.  */
> +QEMU_BUILD_BUG_ON(FRAME_SIZE >= (1 << 14));
> +
> +/* We're expecting to use a single ADDI insn.  */
> +QEMU_BUILD_BUG_ON(FRAME_SIZE - PUSH_SIZE > 0xfff);
> +
>  static void tcg_target_qemu_prologue(TCGContext *s)
>  {
> -    /* NB: frame sizes are in 16 byte stack units! */
> -    int frame_size_callee_saved, frame_size_tcg_locals;
>      TCGReg r;
>  
> -    /* save pairs             (FP, LR) and (X19, X20) .. (X27, X28) */
> -    frame_size_callee_saved = 16 + (TCG_REG_X28 - TCG_REG_X19 + 1) * 8;
> -
> -    /* frame size requirement for TCG local variables */
> -    frame_size_tcg_locals = TCG_STATIC_CALL_ARGS_SIZE
> -        + CPU_TEMP_BUF_NLONGS * sizeof(long)
> -        + (TCG_TARGET_STACK_ALIGN - 1);
> -    frame_size_tcg_locals &= ~(TCG_TARGET_STACK_ALIGN - 1);
> -
>      /* Push (FP, LR) and allocate space for all saved registers.  */
>      tcg_out_insn(s, 3314, STP, TCG_REG_FP, TCG_REG_LR,
> -                 TCG_REG_SP, -frame_size_callee_saved, 1, 1);
> +                 TCG_REG_SP, -PUSH_SIZE, 1, 1);
>  
>      /* Set up frame pointer for canonical unwinding.  */
>      tcg_out_movr_sp(s, TCG_TYPE_I64, TCG_REG_FP, TCG_REG_SP);
> @@ -1878,7 +1883,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>  
>      /* Make stack space for TCG locals.  */
>      tcg_out_insn(s, 3401, SUBI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
> -                 frame_size_tcg_locals);
> +                 FRAME_SIZE - PUSH_SIZE);
>  
>      /* Inform TCG about how to find TCG locals with register, offset, size.  */
>      tcg_set_frame(s, TCG_REG_SP, TCG_STATIC_CALL_ARGS_SIZE,
> @@ -1898,7 +1903,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>  
>      /* Remove TCG locals stack space.  */
>      tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_SP, TCG_REG_SP,
> -                 frame_size_tcg_locals);
> +                 FRAME_SIZE - PUSH_SIZE);
>  
>      /* Restore registers x19..x28.  */
>      for (r = TCG_REG_X19; r <= TCG_REG_X27; r += 2) {
> @@ -1908,6 +1913,55 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>  
>      /* Pop (FP, LR), restore SP to previous frame.  */
>      tcg_out_insn(s, 3314, LDP, TCG_REG_FP, TCG_REG_LR,
> -                 TCG_REG_SP, frame_size_callee_saved, 0, 1);
> +                 TCG_REG_SP, PUSH_SIZE, 0, 1);
>      tcg_out_insn(s, 3207, RET, TCG_REG_LR);
>  }
> +
> +typedef struct {
> +    DebugFrameCIE cie;
> +    DebugFrameFDEHeader fde;
> +    uint8_t fde_def_cfa[4];
> +    uint8_t fde_reg_ofs[24];
> +} DebugFrame;
> +
> +#define ELF_HOST_MACHINE EM_AARCH64
> +
> +static DebugFrame debug_frame = {
> +    .cie.len = sizeof(DebugFrameCIE)-4, /* length after .len member */
> +    .cie.id = -1,
> +    .cie.version = 1,
> +    .cie.code_align = 1,
> +    .cie.data_align = 0x78,             /* sleb128 -8 */
> +    .cie.return_column = TCG_REG_LR,
> +
> +    /* Total FDE size does not include the "len" member.  */
> +    .fde.len = sizeof(DebugFrame) - offsetof(DebugFrame, fde.cie_offset),
> +
> +    .fde_def_cfa = {
> +        12, TCG_REG_SP,                 /* DW_CFA_def_cfa sp, ... */
> +        (FRAME_SIZE & 0x7f) | 0x80,     /* ... uleb128 FRAME_SIZE */
> +        (FRAME_SIZE >> 7)
> +    },
> +    .fde_reg_ofs = {
> +        0x80 + 28, 1,                   /* DW_CFA_offset, x28,  -8 */
> +        0x80 + 27, 2,                   /* DW_CFA_offset, x27, -16 */
> +        0x80 + 26, 3,                   /* DW_CFA_offset, x26, -24 */
> +        0x80 + 25, 4,                   /* DW_CFA_offset, x25, -32 */
> +        0x80 + 24, 5,                   /* DW_CFA_offset, x24, -40 */
> +        0x80 + 23, 6,                   /* DW_CFA_offset, x23, -48 */
> +        0x80 + 22, 7,                   /* DW_CFA_offset, x22, -56 */
> +        0x80 + 21, 8,                   /* DW_CFA_offset, x21, -64 */
> +        0x80 + 20, 9,                   /* DW_CFA_offset, x20, -72 */
> +        0x80 + 19, 10,                  /* DW_CFA_offset, x1p, -80 */
> +        0x80 + 30, 11,                  /* DW_CFA_offset,  lr, -88 */
> +        0x80 + 29, 12,                  /* DW_CFA_offset,  fp, -96 */
> +    }
> +};
> +
> +void tcg_register_jit(void *buf, size_t buf_size)
> +{
> +    debug_frame.fde.func_start = (intptr_t)buf;
> +    debug_frame.fde.func_len = buf_size;
> +
> +    tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
> +}
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 18/26] tcg-aarch64: Pass qemu_ld/st arguments directly
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 18/26] tcg-aarch64: Pass qemu_ld/st arguments directly Richard Henderson
@ 2014-04-11 12:34   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Instead of passing them the "args" array.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 49 +++++++++++++++++-------------------------------
>  1 file changed, 17 insertions(+), 32 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 68305ea..3a2955f 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1271,20 +1271,13 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
>      }
>  }
>  
> -static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp memop)
> +static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
> +                            TCGMemOp memop, int mem_index)
>  {
> -    TCGReg addr_reg, data_reg;
>  #ifdef CONFIG_SOFTMMU
> -    int mem_index;
> -    TCGMemOp s_bits;
> +    TCGMemOp s_bits = memop & MO_SIZE;
>      uint8_t *label_ptr;
> -#endif
> -    data_reg = args[0];
> -    addr_reg = args[1];
>  
> -#ifdef CONFIG_SOFTMMU
> -    mem_index = args[2];
> -    s_bits = memop & MO_SIZE;
>      tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 1);
>      tcg_out_qemu_ld_direct(s, memop, data_reg, addr_reg, TCG_REG_X1);
>      add_qemu_ldst_label(s, 1, memop, data_reg, addr_reg,
> @@ -1295,20 +1288,12 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp memop)
>  #endif /* CONFIG_SOFTMMU */
>  }
>  
> -static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp memop)
> +static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
> +                            TCGMemOp memop, int mem_index)
>  {
> -    TCGReg addr_reg, data_reg;
>  #ifdef CONFIG_SOFTMMU
> -    int mem_index;
> -    TCGMemOp s_bits;
> +    TCGMemOp s_bits = memop & MO_SIZE;
>      uint8_t *label_ptr;
> -#endif
> -    data_reg = args[0];
> -    addr_reg = args[1];
> -
> -#ifdef CONFIG_SOFTMMU
> -    mem_index = args[2];
> -    s_bits = memop & MO_SIZE;
>  
>      tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 0);
>      tcg_out_qemu_st_direct(s, memop, data_reg, addr_reg, TCG_REG_X1);
> @@ -1588,38 +1573,38 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>  
>      case INDEX_op_qemu_ld8u:
> -        tcg_out_qemu_ld(s, args, MO_UB);
> +        tcg_out_qemu_ld(s, a0, a1, MO_UB, a2);
>          break;
>      case INDEX_op_qemu_ld8s:
> -        tcg_out_qemu_ld(s, args, MO_SB);
> +        tcg_out_qemu_ld(s, a0, a1, MO_SB, a2);
>          break;
>      case INDEX_op_qemu_ld16u:
> -        tcg_out_qemu_ld(s, args, MO_TEUW);
> +        tcg_out_qemu_ld(s, a0, a1, MO_TEUW, a2);
>          break;
>      case INDEX_op_qemu_ld16s:
> -        tcg_out_qemu_ld(s, args, MO_TESW);
> +        tcg_out_qemu_ld(s, a0, a1, MO_TESW, a2);
>          break;
>      case INDEX_op_qemu_ld32u:
>      case INDEX_op_qemu_ld32:
> -        tcg_out_qemu_ld(s, args, MO_TEUL);
> +        tcg_out_qemu_ld(s, a0, a1, MO_TEUL, a2);
>          break;
>      case INDEX_op_qemu_ld32s:
> -        tcg_out_qemu_ld(s, args, MO_TESL);
> +        tcg_out_qemu_ld(s, a0, a1, MO_TESL, a2);
>          break;
>      case INDEX_op_qemu_ld64:
> -        tcg_out_qemu_ld(s, args, MO_TEQ);
> +        tcg_out_qemu_ld(s, a0, a1, MO_TEQ, a2);
>          break;
>      case INDEX_op_qemu_st8:
> -        tcg_out_qemu_st(s, args, MO_UB);
> +        tcg_out_qemu_st(s, a0, a1, MO_UB, a2);
>          break;
>      case INDEX_op_qemu_st16:
> -        tcg_out_qemu_st(s, args, MO_TEUW);
> +        tcg_out_qemu_st(s, a0, a1, MO_TEUW, a2);
>          break;
>      case INDEX_op_qemu_st32:
> -        tcg_out_qemu_st(s, args, MO_TEUL);
> +        tcg_out_qemu_st(s, a0, a1, MO_TEUL, a2);
>          break;
>      case INDEX_op_qemu_st64:
> -        tcg_out_qemu_st(s, args, MO_TEQ);
> +        tcg_out_qemu_st(s, a0, a1, MO_TEQ, a2);
>          break;
>  
>      case INDEX_op_bswap32_i64:
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 20/26] tcg-aarch64: Support stores of zero
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 20/26] tcg-aarch64: Support stores of zero Richard Henderson
@ 2014-04-11 12:34   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 35 +++++++++++++++++++----------------
>  1 file changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 34e477d..caaf8a2 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -1253,21 +1253,21 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
>          tcg_out_ldst_r(s, LDST_8, LDST_ST, data_r, addr_r, off_r);
>          break;
>      case MO_16:
> -        if (bswap) {
> +        if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
>          tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
>          break;
>      case MO_32:
> -        if (bswap) {
> +        if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
>          tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
>          break;
>      case MO_64:
> -        if (bswap) {
> +        if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> @@ -1364,8 +1364,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_ld_i32:
>      case INDEX_op_ld_i64:
> -    case INDEX_op_st_i32:
> -    case INDEX_op_st_i64:
>      case INDEX_op_ld8u_i32:
>      case INDEX_op_ld8s_i32:
>      case INDEX_op_ld16u_i32:
> @@ -1376,13 +1374,18 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>      case INDEX_op_ld16s_i64:
>      case INDEX_op_ld32u_i64:
>      case INDEX_op_ld32s_i64:
> +        tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
> +                     a0, a1, a2);
> +        break;
> +    case INDEX_op_st_i32:
> +    case INDEX_op_st_i64:
>      case INDEX_op_st8_i32:
>      case INDEX_op_st8_i64:
>      case INDEX_op_st16_i32:
>      case INDEX_op_st16_i64:
>      case INDEX_op_st32_i64:
>          tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
> -                     a0, a1, a2);
> +                     REG0(0), a1, a2);
>          break;
>  
>      case INDEX_op_add_i32:
> @@ -1585,7 +1588,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          break;
>      case INDEX_op_qemu_st_i32:
>      case INDEX_op_qemu_st_i64:
> -        tcg_out_qemu_st(s, a0, a1, a2, args[3]);
> +        tcg_out_qemu_st(s, REG0(0), a1, a2, args[3]);
>          break;
>  
>      case INDEX_op_bswap32_i64:
> @@ -1693,13 +1696,13 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_ld32s_i64, { "r", "r" } },
>      { INDEX_op_ld_i64, { "r", "r" } },
>  
> -    { INDEX_op_st8_i32, { "r", "r" } },
> -    { INDEX_op_st16_i32, { "r", "r" } },
> -    { INDEX_op_st_i32, { "r", "r" } },
> -    { INDEX_op_st8_i64, { "r", "r" } },
> -    { INDEX_op_st16_i64, { "r", "r" } },
> -    { INDEX_op_st32_i64, { "r", "r" } },
> -    { INDEX_op_st_i64, { "r", "r" } },
> +    { INDEX_op_st8_i32, { "rZ", "r" } },
> +    { INDEX_op_st16_i32, { "rZ", "r" } },
> +    { INDEX_op_st_i32, { "rZ", "r" } },
> +    { INDEX_op_st8_i64, { "rZ", "r" } },
> +    { INDEX_op_st16_i64, { "rZ", "r" } },
> +    { INDEX_op_st32_i64, { "rZ", "r" } },
> +    { INDEX_op_st_i64, { "rZ", "r" } },
>  
>      { INDEX_op_add_i32, { "r", "r", "rwA" } },
>      { INDEX_op_add_i64, { "r", "r", "rA" } },
> @@ -1753,8 +1756,8 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>  
>      { INDEX_op_qemu_ld_i32, { "r", "l" } },
>      { INDEX_op_qemu_ld_i64, { "r", "l" } },
> -    { INDEX_op_qemu_st_i32, { "l", "l" } },
> -    { INDEX_op_qemu_st_i64, { "l", "l" } },
> +    { INDEX_op_qemu_st_i32, { "lZ", "l" } },
> +    { INDEX_op_qemu_st_i64, { "lZ", "l" } },
>  
>      { INDEX_op_bswap16_i32, { "r", "r" } },
>      { INDEX_op_bswap32_i32, { "r", "r" } },
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 22/26] tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 22/26] tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op Richard Henderson
@ 2014-04-11 12:34   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 115 +++++++++++++----------------------------------
>  1 file changed, 32 insertions(+), 83 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index de7490d..5ecc20c 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -355,78 +355,6 @@ typedef enum {
>      I3510_ANDS      = 0x6a000000,
>  } AArch64Insn;
>  
> -static inline enum aarch64_ldst_op_data
> -aarch64_ldst_get_data(TCGOpcode tcg_op)
> -{
> -    switch (tcg_op) {
> -    case INDEX_op_ld8u_i32:
> -    case INDEX_op_ld8s_i32:
> -    case INDEX_op_ld8u_i64:
> -    case INDEX_op_ld8s_i64:
> -    case INDEX_op_st8_i32:
> -    case INDEX_op_st8_i64:
> -        return LDST_8;
> -
> -    case INDEX_op_ld16u_i32:
> -    case INDEX_op_ld16s_i32:
> -    case INDEX_op_ld16u_i64:
> -    case INDEX_op_ld16s_i64:
> -    case INDEX_op_st16_i32:
> -    case INDEX_op_st16_i64:
> -        return LDST_16;
> -
> -    case INDEX_op_ld_i32:
> -    case INDEX_op_st_i32:
> -    case INDEX_op_ld32u_i64:
> -    case INDEX_op_ld32s_i64:
> -    case INDEX_op_st32_i64:
> -        return LDST_32;
> -
> -    case INDEX_op_ld_i64:
> -    case INDEX_op_st_i64:
> -        return LDST_64;
> -
> -    default:
> -        tcg_abort();
> -    }
> -}
> -
> -static inline enum aarch64_ldst_op_type
> -aarch64_ldst_get_type(TCGOpcode tcg_op)
> -{
> -    switch (tcg_op) {
> -    case INDEX_op_st8_i32:
> -    case INDEX_op_st16_i32:
> -    case INDEX_op_st8_i64:
> -    case INDEX_op_st16_i64:
> -    case INDEX_op_st_i32:
> -    case INDEX_op_st32_i64:
> -    case INDEX_op_st_i64:
> -        return LDST_ST;
> -
> -    case INDEX_op_ld8u_i32:
> -    case INDEX_op_ld16u_i32:
> -    case INDEX_op_ld8u_i64:
> -    case INDEX_op_ld16u_i64:
> -    case INDEX_op_ld_i32:
> -    case INDEX_op_ld32u_i64:
> -    case INDEX_op_ld_i64:
> -        return LDST_LD;
> -
> -    case INDEX_op_ld8s_i32:
> -    case INDEX_op_ld16s_i32:
> -        return LDST_LD_S_W;
> -
> -    case INDEX_op_ld8s_i64:
> -    case INDEX_op_ld16s_i64:
> -    case INDEX_op_ld32s_i64:
> -        return LDST_LD_S_X;
> -
> -    default:
> -        tcg_abort();
> -    }
> -}
> -
>  static inline uint32_t tcg_in32(TCGContext *s)
>  {
>      uint32_t v = *(uint32_t *)s->code_ptr;
> @@ -1368,30 +1296,51 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_goto_label(s, a0);
>          break;
>  
> -    case INDEX_op_ld_i32:
> -    case INDEX_op_ld_i64:
>      case INDEX_op_ld8u_i32:
> -    case INDEX_op_ld8s_i32:
> -    case INDEX_op_ld16u_i32:
> -    case INDEX_op_ld16s_i32:
>      case INDEX_op_ld8u_i64:
> +        tcg_out_ldst(s, LDST_8, LDST_LD, a0, a1, a2);
> +        break;
> +    case INDEX_op_ld8s_i32:
> +        tcg_out_ldst(s, LDST_8, LDST_LD_S_W, a0, a1, a2);
> +        break;
>      case INDEX_op_ld8s_i64:
> +        tcg_out_ldst(s, LDST_8, LDST_LD_S_X, a0, a1, a2);
> +        break;
> +    case INDEX_op_ld16u_i32:
>      case INDEX_op_ld16u_i64:
> +        tcg_out_ldst(s, LDST_16, LDST_LD, a0, a1, a2);
> +        break;
> +    case INDEX_op_ld16s_i32:
> +        tcg_out_ldst(s, LDST_16, LDST_LD_S_W, a0, a1, a2);
> +        break;
>      case INDEX_op_ld16s_i64:
> +        tcg_out_ldst(s, LDST_16, LDST_LD_S_X, a0, a1, a2);
> +        break;
> +    case INDEX_op_ld_i32:
>      case INDEX_op_ld32u_i64:
> +        tcg_out_ldst(s, LDST_32, LDST_LD, a0, a1, a2);
> +        break;
>      case INDEX_op_ld32s_i64:
> -        tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
> -                     a0, a1, a2);
> +        tcg_out_ldst(s, LDST_32, LDST_LD_S_X, a0, a1, a2);
>          break;
> -    case INDEX_op_st_i32:
> -    case INDEX_op_st_i64:
> +    case INDEX_op_ld_i64:
> +        tcg_out_ldst(s, LDST_64, LDST_LD, a0, a1, a2);
> +        break;
> +
>      case INDEX_op_st8_i32:
>      case INDEX_op_st8_i64:
> +        tcg_out_ldst(s, LDST_8, LDST_ST, REG0(0), a1, a2);
> +        break;
>      case INDEX_op_st16_i32:
>      case INDEX_op_st16_i64:
> +        tcg_out_ldst(s, LDST_16, LDST_ST, REG0(0), a1, a2);
> +        break;
> +    case INDEX_op_st_i32:
>      case INDEX_op_st32_i64:
> -        tcg_out_ldst(s, aarch64_ldst_get_data(opc), aarch64_ldst_get_type(opc),
> -                     REG0(0), a1, a2);
> +        tcg_out_ldst(s, LDST_32, LDST_ST, REG0(0), a1, a2);
> +        break;
> +    case INDEX_op_st_i64:
> +        tcg_out_ldst(s, LDST_64, LDST_ST, REG0(0), a1, a2);
>          break;
>  
>      case INDEX_op_add_i32:
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 23/26] tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 23/26] tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp Richard Henderson
@ 2014-04-11 12:35   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> The definition of op_data included opcode bits, not just
> the size field of the various ldst instructions.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 111 +++++++++++++++++++++--------------------------
>  1 file changed, 49 insertions(+), 62 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 5ecc20c..9a2e4a6 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -242,13 +242,6 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = {
>      [TCG_COND_LEU] = COND_LS,
>  };
>  
> -/* opcodes for LDR / STR instructions with base + simm9 addressing */
> -enum aarch64_ldst_op_data { /* size of the data moved */
> -    LDST_8 = 0x38,
> -    LDST_16 = 0x78,
> -    LDST_32 = 0xb8,
> -    LDST_64 = 0xf8,
> -};
>  enum aarch64_ldst_op_type { /* type of operation */
>      LDST_ST = 0x0,    /* store */
>      LDST_LD = 0x4,    /* load */
> @@ -490,25 +483,23 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>  }
>  
>  
> -static inline void tcg_out_ldst_9(TCGContext *s,
> -                                  enum aarch64_ldst_op_data op_data,
> +static inline void tcg_out_ldst_9(TCGContext *s, TCGMemOp size,
>                                    enum aarch64_ldst_op_type op_type,
>                                    TCGReg rd, TCGReg rn, intptr_t offset)
>  {
>      /* use LDUR with BASE register with 9bit signed unscaled offset */
> -    tcg_out32(s, op_data << 24 | op_type << 20
> +    tcg_out32(s, 0x38000000 | size << 30 | op_type << 20
>                | (offset & 0x1ff) << 12 | rn << 5 | rd);
>  }
>  
>  /* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
> -static inline void tcg_out_ldst_12(TCGContext *s,
> -                                   enum aarch64_ldst_op_data op_data,
> +static inline void tcg_out_ldst_12(TCGContext *s, TCGMemOp size,
>                                     enum aarch64_ldst_op_type op_type,
>                                     TCGReg rd, TCGReg rn,
>                                     tcg_target_ulong scaled_uimm)
>  {
> -    tcg_out32(s, (op_data | 1) << 24
> -              | op_type << 20 | scaled_uimm << 10 | rn << 5 | rd);
> +    tcg_out32(s, 0x39000000 | size << 30 | op_type << 20
> +              | scaled_uimm << 10 | rn << 5 | rd);
>  }
>  
>  /* Register to register move using ORR (shifted register with no shift). */
> @@ -646,44 +637,40 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>      }
>  }
>  
> -static inline void tcg_out_ldst_r(TCGContext *s,
> -                                  enum aarch64_ldst_op_data op_data,
> +static inline void tcg_out_ldst_r(TCGContext *s, TCGMemOp size,
>                                    enum aarch64_ldst_op_type op_type,
>                                    TCGReg rd, TCGReg base, TCGReg regoff)
>  {
>      /* load from memory to register using base + 64bit register offset */
>      /* using f.e. STR Wt, [Xn, Xm] 0xb8600800|(regoff << 16)|(base << 5)|rd */
>      /* the 0x6000 is for the "no extend field" */
> -    tcg_out32(s, 0x00206800
> -              | op_data << 24 | op_type << 20 | regoff << 16 | base << 5 | rd);
> +    tcg_out32(s, 0x38206800 | size << 30 | op_type << 20
> +              | regoff << 16 | base << 5 | rd);
>  }
>  
>  /* solve the whole ldst problem */
> -static inline void tcg_out_ldst(TCGContext *s, enum aarch64_ldst_op_data data,
> +static inline void tcg_out_ldst(TCGContext *s, TCGMemOp size,
>                                  enum aarch64_ldst_op_type type,
>                                  TCGReg rd, TCGReg rn, intptr_t offset)
>  {
>      if (offset >= -256 && offset < 256) {
> -        tcg_out_ldst_9(s, data, type, rd, rn, offset);
> +        tcg_out_ldst_9(s, size, type, rd, rn, offset);
>          return;
>      }
>  
> -    if (offset >= 256) {
> -        /* if the offset is naturally aligned and in range,
> -           then we can use the scaled uimm12 encoding */
> -        unsigned int s_bits = data >> 6;
> -        if (!(offset & ((1 << s_bits) - 1))) {
> -            tcg_target_ulong scaled_uimm = offset >> s_bits;
> -            if (scaled_uimm <= 0xfff) {
> -                tcg_out_ldst_12(s, data, type, rd, rn, scaled_uimm);
> -                return;
> -            }
> +    /* If the offset is naturally aligned and in range, then we can
> +       use the scaled uimm12 encoding */
> +    if (offset >= 0 && !(offset & ((1 << size) - 1))) {
> +        tcg_target_ulong scaled_uimm = offset >> size;
> +        if (scaled_uimm <= 0xfff) {
> +            tcg_out_ldst_12(s, size, type, rd, rn, scaled_uimm);
> +            return;
>          }
>      }
>  
> -    /* worst-case scenario, move offset to temp register, use reg offset */
> +    /* Worst-case scenario, move offset to temp register, use reg offset.  */
>      tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
> -    tcg_out_ldst_r(s, data, type, rd, rn, TCG_REG_TMP);
> +    tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
>  }
>  
>  static inline void tcg_out_mov(TCGContext *s,
> @@ -697,14 +684,14 @@ static inline void tcg_out_mov(TCGContext *s,
>  static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>                                TCGReg arg1, intptr_t arg2)
>  {
> -    tcg_out_ldst(s, (type == TCG_TYPE_I64) ? LDST_64 : LDST_32, LDST_LD,
> +    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_LD,
>                   arg, arg1, arg2);
>  }
>  
>  static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>                                TCGReg arg1, intptr_t arg2)
>  {
> -    tcg_out_ldst(s, (type == TCG_TYPE_I64) ? LDST_64 : LDST_32, LDST_ST,
> +    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_ST,
>                   arg, arg1, arg2);
>  }
>  
> @@ -1104,12 +1091,12 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
>  
>      /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
>         X0 = load [X2 + (tlb_offset & 0x000fff)] */
> -    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? LDST_64 : LDST_32,
> +    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? MO_64 : MO_32,
>                   LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
>  
>      /* Load the tlb addend. Do that early to avoid stalling.
>         X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
> -    tcg_out_ldst(s, LDST_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
> +    tcg_out_ldst(s, MO_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
>                   (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
>                   (is_read ? offsetof(CPUTLBEntry, addr_read)
>                    : offsetof(CPUTLBEntry, addr_write)));
> @@ -1131,43 +1118,43 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop,
>  
>      switch (memop & MO_SSIZE) {
>      case MO_UB:
> -        tcg_out_ldst_r(s, LDST_8, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_8, LDST_LD, data_r, addr_r, off_r);
>          break;
>      case MO_SB:
> -        tcg_out_ldst_r(s, LDST_8, LDST_LD_S_X, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_8, LDST_LD_S_X, data_r, addr_r, off_r);
>          break;
>      case MO_UW:
> -        tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
>          }
>          break;
>      case MO_SW:
>          if (bswap) {
> -            tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
>              tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
>              tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
>          } else {
> -            tcg_out_ldst_r(s, LDST_16, LDST_LD_S_X, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, MO_16, LDST_LD_S_X, data_r, addr_r, off_r);
>          }
>          break;
>      case MO_UL:
> -        tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
>          }
>          break;
>      case MO_SL:
>          if (bswap) {
> -            tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
>              tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
>              tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
>          } else {
> -            tcg_out_ldst_r(s, LDST_32, LDST_LD_S_X, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, MO_32, LDST_LD_S_X, data_r, addr_r, off_r);
>          }
>          break;
>      case MO_Q:
> -        tcg_out_ldst_r(s, LDST_64, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_64, LDST_LD, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
>          }
> @@ -1184,28 +1171,28 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
>  
>      switch (memop & MO_SIZE) {
>      case MO_8:
> -        tcg_out_ldst_r(s, LDST_8, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_8, LDST_ST, data_r, addr_r, off_r);
>          break;
>      case MO_16:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_16, LDST_ST, data_r, addr_r, off_r);
>          break;
>      case MO_32:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_32, LDST_ST, data_r, addr_r, off_r);
>          break;
>      case MO_64:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, LDST_64, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, MO_64, LDST_ST, data_r, addr_r, off_r);
>          break;
>      default:
>          tcg_abort();
> @@ -1298,49 +1285,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_ld8u_i32:
>      case INDEX_op_ld8u_i64:
> -        tcg_out_ldst(s, LDST_8, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, MO_8, LDST_LD, a0, a1, a2);
>          break;
>      case INDEX_op_ld8s_i32:
> -        tcg_out_ldst(s, LDST_8, LDST_LD_S_W, a0, a1, a2);
> +        tcg_out_ldst(s, MO_8, LDST_LD_S_W, a0, a1, a2);
>          break;
>      case INDEX_op_ld8s_i64:
> -        tcg_out_ldst(s, LDST_8, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, MO_8, LDST_LD_S_X, a0, a1, a2);
>          break;
>      case INDEX_op_ld16u_i32:
>      case INDEX_op_ld16u_i64:
> -        tcg_out_ldst(s, LDST_16, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, MO_16, LDST_LD, a0, a1, a2);
>          break;
>      case INDEX_op_ld16s_i32:
> -        tcg_out_ldst(s, LDST_16, LDST_LD_S_W, a0, a1, a2);
> +        tcg_out_ldst(s, MO_16, LDST_LD_S_W, a0, a1, a2);
>          break;
>      case INDEX_op_ld16s_i64:
> -        tcg_out_ldst(s, LDST_16, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, MO_16, LDST_LD_S_X, a0, a1, a2);
>          break;
>      case INDEX_op_ld_i32:
>      case INDEX_op_ld32u_i64:
> -        tcg_out_ldst(s, LDST_32, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, MO_32, LDST_LD, a0, a1, a2);
>          break;
>      case INDEX_op_ld32s_i64:
> -        tcg_out_ldst(s, LDST_32, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, MO_32, LDST_LD_S_X, a0, a1, a2);
>          break;
>      case INDEX_op_ld_i64:
> -        tcg_out_ldst(s, LDST_64, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, MO_64, LDST_LD, a0, a1, a2);
>          break;
>  
>      case INDEX_op_st8_i32:
>      case INDEX_op_st8_i64:
> -        tcg_out_ldst(s, LDST_8, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, MO_8, LDST_ST, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st16_i32:
>      case INDEX_op_st16_i64:
> -        tcg_out_ldst(s, LDST_16, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, MO_16, LDST_ST, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st_i32:
>      case INDEX_op_st32_i64:
> -        tcg_out_ldst(s, LDST_32, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, MO_32, LDST_ST, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st_i64:
> -        tcg_out_ldst(s, LDST_64, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, MO_64, LDST_ST, REG0(0), a1, a2);
>          break;
>  
>      case INDEX_op_add_i32:
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType
  2014-04-07 14:31     ` Richard Henderson
@ 2014-04-11 12:35       ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

On 07.04.2014 16:31, Richard Henderson wrote:
> On 04/07/2014 04:45 AM, Claudio Fontana wrote:
>> On 03.04.2014 21:56, Richard Henderson wrote:
>>> The definition of op_type wasn't encoded for the proper shift for
>>> the field, making the implementations confusing.
>>>
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>
>> At the end of the day the magic values remain in the load/store instructions though.
>> Can we find a way to replace them with INSN_-something like for the others?
>>
>> I think I was doing something of the sort in a now obsolete patch I suggested some time early this year, see if it helps:
>>
>> http://lists.gnu.org/archive/html/qemu-devel/2014-02/msg05074.html
> 
> Yes, we can.  I'll do something for v3,
> 
>>
>> Claudio
>>
>>> ---
>>>  tcg/aarch64/tcg-target.c | 42 +++++++++++++++++-------------------------
>>>  1 file changed, 17 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
>>> index 9a2e4a6..a538a87 100644
>>> --- a/tcg/aarch64/tcg-target.c
>>> +++ b/tcg/aarch64/tcg-target.c
>>> @@ -242,12 +242,12 @@ static const enum aarch64_cond_code tcg_cond_to_aarch64[] = {
>>>      [TCG_COND_LEU] = COND_LS,
>>>  };
>>>  
>>> -enum aarch64_ldst_op_type { /* type of operation */
>>> -    LDST_ST = 0x0,    /* store */
>>> -    LDST_LD = 0x4,    /* load */
>>> -    LDST_LD_S_X = 0x8,  /* load and sign-extend into Xt */
>>> -    LDST_LD_S_W = 0xc,  /* load and sign-extend into Wt */
>>> -};
>>> +typedef enum {
>>> +    LDST_ST = 0,    /* store */
>>> +    LDST_LD = 1,    /* load */
>>> +    LDST_LD_S_X = 2,  /* load and sign-extend into Xt */
>>> +    LDST_LD_S_W = 3,  /* load and sign-extend into Wt */
>>> +} AArch64LdstType;
>>>  
>>>  /* We encode the format of the insn into the beginning of the name, so that
>>>     we can have the preprocessor help "typecheck" the insn vs the output
>>> @@ -483,22 +483,19 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>>>  }
>>>  
>>>  
>>> -static inline void tcg_out_ldst_9(TCGContext *s, TCGMemOp size,
>>> -                                  enum aarch64_ldst_op_type op_type,
>>> -                                  TCGReg rd, TCGReg rn, intptr_t offset)
>>> +static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>>> +                           TCGReg rd, TCGReg rn, intptr_t offset)
>>>  {
>>>      /* use LDUR with BASE register with 9bit signed unscaled offset */
>>> -    tcg_out32(s, 0x38000000 | size << 30 | op_type << 20
>>> +    tcg_out32(s, 0x38000000 | size << 30 | type << 22
>>>                | (offset & 0x1ff) << 12 | rn << 5 | rd);
>>>  }
>>>  
>>>  /* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
>>> -static inline void tcg_out_ldst_12(TCGContext *s, TCGMemOp size,
>>> -                                   enum aarch64_ldst_op_type op_type,
>>> -                                   TCGReg rd, TCGReg rn,
>>> -                                   tcg_target_ulong scaled_uimm)
>>> +static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>>> +                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
>>>  {
>>> -    tcg_out32(s, 0x39000000 | size << 30 | op_type << 20
>>> +    tcg_out32(s, 0x39000000 | size << 30 | type << 22
>>>                | scaled_uimm << 10 | rn << 5 | rd);
>>>  }
>>>  
>>> @@ -637,21 +634,16 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>>>      }
>>>  }
>>>  
>>> -static inline void tcg_out_ldst_r(TCGContext *s, TCGMemOp size,
>>> -                                  enum aarch64_ldst_op_type op_type,
>>> -                                  TCGReg rd, TCGReg base, TCGReg regoff)
>>> +static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>>> +                           TCGReg rd, TCGReg base, TCGReg regoff)
>>>  {
>>> -    /* load from memory to register using base + 64bit register offset */
>>> -    /* using f.e. STR Wt, [Xn, Xm] 0xb8600800|(regoff << 16)|(base << 5)|rd */
>>> -    /* the 0x6000 is for the "no extend field" */
>>> -    tcg_out32(s, 0x38206800 | size << 30 | op_type << 20
>>> +    tcg_out32(s, 0x38206800 | size << 30 | type << 22
>>>                | regoff << 16 | base << 5 | rd);
>>>  }
>>>  
>>>  /* solve the whole ldst problem */
>>> -static inline void tcg_out_ldst(TCGContext *s, TCGMemOp size,
>>> -                                enum aarch64_ldst_op_type type,
>>> -                                TCGReg rd, TCGReg rn, intptr_t offset)
>>> +static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>>> +                         TCGReg rd, TCGReg rn, intptr_t offset)
>>>  {
>>>      if (offset >= -256 && offset < 256) {
>>>          tcg_out_ldst_9(s, size, type, rd, rn, offset);
>>>
>>
> 
> 


-- 
Claudio Fontana
Server Virtualization Architect
Huawei Technologies Duesseldorf GmbH
Riesstraße 25 - 80992 München

office: +49 89 158834 4135
mobile: +49 15253060158

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 25/26] tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 25/26] tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst Richard Henderson
@ 2014-04-11 12:35   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:35 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> The assembler seems to prefer them, perhaps we should too.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index a538a87..58597e7 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -645,11 +645,6 @@ static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>  static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>                           TCGReg rd, TCGReg rn, intptr_t offset)
>  {
> -    if (offset >= -256 && offset < 256) {
> -        tcg_out_ldst_9(s, size, type, rd, rn, offset);
> -        return;
> -    }
> -
>      /* If the offset is naturally aligned and in range, then we can
>         use the scaled uimm12 encoding */
>      if (offset >= 0 && !(offset & ((1 << size) - 1))) {
> @@ -660,6 +655,11 @@ static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
>          }
>      }
>  
> +    if (offset >= -256 && offset < 256) {
> +        tcg_out_ldst_9(s, size, type, rd, rn, offset);
> +        return;
> +    }
> +
>      /* Worst-case scenario, move offset to temp register, use reg offset.  */
>      tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
>      tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 26/26] tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 26/26] tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr Richard Henderson
@ 2014-04-11 12:36   ` Claudio Fontana
  0 siblings, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

On 03.04.2014 21:56, Richard Henderson wrote:
> It's the more canonical interface.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 16 +++++++---------
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index 58597e7..ab4cd25 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -951,9 +951,7 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
>      }
>      tcg_out_insn_3503(s, insn, ext, rh, ah, bh);
>  
> -    if (rl != orig_rl) {
> -        tcg_out_movr(s, ext, orig_rl, rl);
> -    }
> +    tcg_out_mov(s, ext, orig_rl, rl);
>  }
>  
>  #ifdef CONFIG_SOFTMMU
> @@ -997,15 +995,15 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>  
>      reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
>  
> -    tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
> -    tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
> +    tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
> +    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
>      tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
>      tcg_out_adr(s, TCG_REG_X3, (intptr_t)lb->raddr);
>      tcg_out_call(s, (intptr_t)qemu_ld_helpers[opc & ~MO_SIGN]);
>      if (opc & MO_SIGN) {
>          tcg_out_sxt(s, TCG_TYPE_I64, size, lb->datalo_reg, TCG_REG_X0);
>      } else {
> -        tcg_out_movr(s, TCG_TYPE_I64, lb->datalo_reg, TCG_REG_X0);
> +        tcg_out_mov(s, size == MO_64, lb->datalo_reg, TCG_REG_X0);
>      }
>  
>      tcg_out_goto(s, (intptr_t)lb->raddr);
> @@ -1018,9 +1016,9 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>  
>      reloc_pc19(lb->label_ptr[0], (intptr_t)s->code_ptr);
>  
> -    tcg_out_movr(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
> -    tcg_out_movr(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
> -    tcg_out_movr(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
> +    tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_X0, TCG_AREG0);
> +    tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
> +    tcg_out_mov(s, size == MO_64, TCG_REG_X2, lb->datalo_reg);
>      tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
>      tcg_out_adr(s, TCG_REG_X4, (intptr_t)lb->raddr);
>      tcg_out_call(s, (intptr_t)qemu_st_helpers[opc]);
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507
  2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507 Richard Henderson
  2014-04-09 12:54   ` Claudio Fontana
@ 2014-04-11 12:36   ` Claudio Fontana
  1 sibling, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: claudio.fontana

Just to remember that there is an issue with this, as the previous reviewed-by tag by me might mislead:

Nacked-by: Claudio Fontana <claudio.fontana@huawei.com>

On 03.04.2014 21:56, Richard Henderson wrote:
> Cleaning up the implementation of REV and REV16 at the same time.
> 
> Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index caaf8a2..de7490d 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -327,6 +327,10 @@ typedef enum {
>      I3506_CSEL      = 0x1a800000,
>      I3506_CSINC     = 0x1a800400,
>  
> +    /* Data-processing (1 source) instructions.  */
> +    I3507_REV16     = 0x5ac00400,
> +    I3507_REV       = 0x5ac00800,
> +
>      /* Data-processing (2 source) instructions.  */
>      I3508_LSLV      = 0x1ac02000,
>      I3508_LSRV      = 0x1ac02400,
> @@ -545,6 +549,12 @@ static void tcg_out_insn_3506(TCGContext *s, AArch64Insn insn, TCGType ext,
>                | tcg_cond_to_aarch64[c] << 12);
>  }
>  
> +static void tcg_out_insn_3507(TCGContext *s, AArch64Insn insn, TCGType ext,
> +                              TCGReg rd, TCGReg rn)
> +{
> +    tcg_out32(s, insn | ext << 31 | rn << 5 | rd);
> +}
> +
>  static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>                                TCGReg rd, TCGReg rn, TCGReg rm, TCGReg ra)
>  {
> @@ -961,19 +971,15 @@ static void tcg_out_brcond(TCGContext *s, TCGMemOp ext, TCGCond c, TCGArg a,
>  }
>  
>  static inline void tcg_out_rev(TCGContext *s, TCGType ext,
> -                               TCGReg rd, TCGReg rm)
> +                               TCGReg rd, TCGReg rn)
>  {
> -    /* using REV 0x5ac00800 */
> -    unsigned int base = ext ? 0xdac00c00 : 0x5ac00800;
> -    tcg_out32(s, base | rm << 5 | rd);
> +    tcg_out_insn(s, 3507, REV, ext, rd, rn);
>  }
>  
>  static inline void tcg_out_rev16(TCGContext *s, TCGType ext,
> -                                 TCGReg rd, TCGReg rm)
> +                                 TCGReg rd, TCGReg rn)
>  {
> -    /* using REV16 0x5ac00400 */
> -    unsigned int base = ext ? 0xdac00400 : 0x5ac00400;
> -    tcg_out32(s, base | rm << 5 | rd);
> +    tcg_out_insn(s, 3507, REV16, ext, rd, rn);
>  }
>  
>  static inline void tcg_out_sxt(TCGContext *s, TCGType ext, TCGMemOp s_bits,
> 


-- 
Claudio Fontana
Server Virtualization Architect
Huawei Technologies Duesseldorf GmbH
Riesstraße 25 - 80992 München

office: +49 89 158834 4135
mobile: +49 15253060158

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313
  2014-04-07 18:34     ` [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313 Richard Henderson
  2014-04-08  9:00       ` Claudio Fontana
@ 2014-04-11 12:36       ` Claudio Fontana
  1 sibling, 0 replies; 52+ messages in thread
From: Claudio Fontana @ 2014-04-11 12:36 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 07.04.2014 20:34, Richard Henderson wrote:
> Merge TCGMemOp size, AArch64LdstType type and a few stray opcode bits
> into a single I3312_* argument, eliminating some magic numbers from
> helper functions.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 129 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 76 insertions(+), 53 deletions(-)
> ---
> 
> I'm not really sure how much clearer this is, especially since we do
> have to re-extract the size within tcg_out_ldst.  But it does at least
> eliminate some of the magic numbers within the helpers.
> 
> Thoughts?
> 
> 
> r~
> 
> 
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index ab4cd25..324a452 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -271,6 +271,28 @@ typedef enum {
>      I3207_BLR       = 0xd63f0000,
>      I3207_RET       = 0xd65f0000,
>  
> +    /* Load/store register.  Described here as 3.3.12, but the helper
> +       that emits them can transform to 3.3.10 or 3.3.13.  */
> +    I3312_STRB      = 0x38000000 | LDST_ST << 22 | MO_8 << 30,
> +    I3312_STRH      = 0x38000000 | LDST_ST << 22 | MO_16 << 30,
> +    I3312_STRW      = 0x38000000 | LDST_ST << 22 | MO_32 << 30,
> +    I3312_STRX      = 0x38000000 | LDST_ST << 22 | MO_64 << 30,
> +
> +    I3312_LDRB      = 0x38000000 | LDST_LD << 22 | MO_8 << 30,
> +    I3312_LDRH      = 0x38000000 | LDST_LD << 22 | MO_16 << 30,
> +    I3312_LDRW      = 0x38000000 | LDST_LD << 22 | MO_32 << 30,
> +    I3312_LDRX      = 0x38000000 | LDST_LD << 22 | MO_64 << 30,
> +
> +    I3312_LDRSBW    = 0x38000000 | LDST_LD_S_W << 22 | MO_8 << 30,
> +    I3312_LDRSHW    = 0x38000000 | LDST_LD_S_W << 22 | MO_16 << 30,
> +
> +    I3312_LDRSBX    = 0x38000000 | LDST_LD_S_X << 22 | MO_8 << 30,
> +    I3312_LDRSHX    = 0x38000000 | LDST_LD_S_X << 22 | MO_16 << 30,
> +    I3312_LDRSWX    = 0x38000000 | LDST_LD_S_X << 22 | MO_32 << 30,
> +
> +    I3312_TO_I3310  = 0x00206800,
> +    I3312_TO_I3313  = 0x01000000,
> +
>      /* Load/store register pair instructions.  */
>      I3314_LDP       = 0x28400000,
>      I3314_STP       = 0x28000000,
> @@ -482,21 +504,25 @@ static void tcg_out_insn_3509(TCGContext *s, AArch64Insn insn, TCGType ext,
>      tcg_out32(s, insn | ext << 31 | rm << 16 | ra << 10 | rn << 5 | rd);
>  }
>  
> +static void tcg_out_insn_3310(TCGContext *s, AArch64Insn insn,
> +                              TCGReg rd, TCGReg base, TCGReg regoff)
> +{
> +    /* Note the AArch64Insn constants above are for C3.3.12.  Adjust.  */
> +    tcg_out32(s, insn | I3312_TO_I3310 | regoff << 16 | base << 5 | rd);
> +}
> +
>  
> -static void tcg_out_ldst_9(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> -                           TCGReg rd, TCGReg rn, intptr_t offset)
> +static void tcg_out_insn_3312(TCGContext *s, AArch64Insn insn,
> +                              TCGReg rd, TCGReg rn, intptr_t offset)
>  {
> -    /* use LDUR with BASE register with 9bit signed unscaled offset */
> -    tcg_out32(s, 0x38000000 | size << 30 | type << 22
> -              | (offset & 0x1ff) << 12 | rn << 5 | rd);
> +    tcg_out32(s, insn | (offset & 0x1ff) << 12 | rn << 5 | rd);
>  }
>  
> -/* tcg_out_ldst_12 expects a scaled unsigned immediate offset */
> -static void tcg_out_ldst_12(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> -                            TCGReg rd, TCGReg rn, tcg_target_ulong scaled_uimm)
> +static void tcg_out_insn_3313(TCGContext *s, AArch64Insn insn,
> +                              TCGReg rd, TCGReg rn, uintptr_t scaled_uimm)
>  {
> -    tcg_out32(s, 0x39000000 | size << 30 | type << 22
> -              | scaled_uimm << 10 | rn << 5 | rd);
> +    /* Note the AArch64Insn constants above are for C3.3.12.  Adjust.  */
> +    tcg_out32(s, insn | I3312_TO_I3313 | scaled_uimm << 10 | rn << 5 | rd);
>  }
>  
>  /* Register to register move using ORR (shifted register with no shift). */
> @@ -634,35 +660,32 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>      }
>  }
>  
> -static void tcg_out_ldst_r(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> -                           TCGReg rd, TCGReg base, TCGReg regoff)
> -{
> -    tcg_out32(s, 0x38206800 | size << 30 | type << 22
> -              | regoff << 16 | base << 5 | rd);
> -}
> +/* Define something more legible for general use.  */
> +#define tcg_out_ldst_r  tcg_out_insn_3310
>  
> -/* solve the whole ldst problem */
> -static void tcg_out_ldst(TCGContext *s, TCGMemOp size, AArch64LdstType type,
> +static void tcg_out_ldst(TCGContext *s, AArch64Insn insn,
>                           TCGReg rd, TCGReg rn, intptr_t offset)
>  {
> +    TCGMemOp size = (uint32_t)insn >> 30;
> +
>      /* If the offset is naturally aligned and in range, then we can
>         use the scaled uimm12 encoding */
>      if (offset >= 0 && !(offset & ((1 << size) - 1))) {
> -        tcg_target_ulong scaled_uimm = offset >> size;
> +        uintptr_t scaled_uimm = offset >> size;
>          if (scaled_uimm <= 0xfff) {
> -            tcg_out_ldst_12(s, size, type, rd, rn, scaled_uimm);
> +            tcg_out_insn_3313(s, insn, rd, rn, scaled_uimm);
>              return;
>          }
>      }
>  
>      if (offset >= -256 && offset < 256) {
> -        tcg_out_ldst_9(s, size, type, rd, rn, offset);
> +        tcg_out_insn_3312(s, insn, rd, rn, offset);
>          return;
>      }
>  
>      /* Worst-case scenario, move offset to temp register, use reg offset.  */
>      tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP, offset);
> -    tcg_out_ldst_r(s, size, type, rd, rn, TCG_REG_TMP);
> +    tcg_out_ldst_r(s, insn, rd, rn, TCG_REG_TMP);
>  }
>  
>  static inline void tcg_out_mov(TCGContext *s,
> @@ -676,14 +699,14 @@ static inline void tcg_out_mov(TCGContext *s,
>  static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>                                TCGReg arg1, intptr_t arg2)
>  {
> -    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_LD,
> +    tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_LDRW : I3312_LDRX,
>                   arg, arg1, arg2);
>  }
>  
>  static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>                                TCGReg arg1, intptr_t arg2)
>  {
> -    tcg_out_ldst(s, type == TCG_TYPE_I64 ? MO_64 : MO_32, LDST_ST,
> +    tcg_out_ldst(s, type == TCG_TYPE_I32 ? I3312_STRW : I3312_STRX,
>                   arg, arg1, arg2);
>  }
>  
> @@ -1081,12 +1104,12 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
>  
>      /* Merge "low bits" from tlb offset, load the tlb comparator into X0.
>         X0 = load [X2 + (tlb_offset & 0x000fff)] */
> -    tcg_out_ldst(s, TARGET_LONG_BITS == 64 ? MO_64 : MO_32,
> -                 LDST_LD, TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
> +    tcg_out_ldst(s, TARGET_LONG_BITS == 32 ? I3312_LDRW : I3312_LDRX,
> +                 TCG_REG_X0, TCG_REG_X2, tlb_offset & 0xfff);
>  
>      /* Load the tlb addend. Do that early to avoid stalling.
>         X1 = load [X2 + (tlb_offset & 0xfff) + offsetof(addend)] */
> -    tcg_out_ldst(s, MO_64, LDST_LD, TCG_REG_X1, TCG_REG_X2,
> +    tcg_out_ldst(s, I3312_LDRX, TCG_REG_X1, TCG_REG_X2,
>                   (tlb_offset & 0xfff) + (offsetof(CPUTLBEntry, addend)) -
>                   (is_read ? offsetof(CPUTLBEntry, addr_read)
>                    : offsetof(CPUTLBEntry, addr_write)));
> @@ -1108,43 +1131,43 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp memop,
>  
>      switch (memop & MO_SSIZE) {
>      case MO_UB:
> -        tcg_out_ldst_r(s, MO_8, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRB, data_r, addr_r, off_r);
>          break;
>      case MO_SB:
> -        tcg_out_ldst_r(s, MO_8, LDST_LD_S_X, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRSBX, data_r, addr_r, off_r);
>          break;
>      case MO_UW:
> -        tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
>          }
>          break;
>      case MO_SW:
>          if (bswap) {
> -            tcg_out_ldst_r(s, MO_16, LDST_LD, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRH, data_r, addr_r, off_r);
>              tcg_out_rev16(s, TCG_TYPE_I32, data_r, data_r);
>              tcg_out_sxt(s, TCG_TYPE_I64, MO_16, data_r, data_r);
>          } else {
> -            tcg_out_ldst_r(s, MO_16, LDST_LD_S_X, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRSHX, data_r, addr_r, off_r);
>          }
>          break;
>      case MO_UL:
> -        tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
>          }
>          break;
>      case MO_SL:
>          if (bswap) {
> -            tcg_out_ldst_r(s, MO_32, LDST_LD, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRW, data_r, addr_r, off_r);
>              tcg_out_rev(s, TCG_TYPE_I32, data_r, data_r);
>              tcg_out_sxt(s, TCG_TYPE_I64, MO_32, data_r, data_r);
>          } else {
> -            tcg_out_ldst_r(s, MO_32, LDST_LD_S_X, data_r, addr_r, off_r);
> +            tcg_out_ldst_r(s, I3312_LDRSWX, data_r, addr_r, off_r);
>          }
>          break;
>      case MO_Q:
> -        tcg_out_ldst_r(s, MO_64, LDST_LD, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_LDRX, data_r, addr_r, off_r);
>          if (bswap) {
>              tcg_out_rev(s, TCG_TYPE_I64, data_r, data_r);
>          }
> @@ -1161,28 +1184,28 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp memop,
>  
>      switch (memop & MO_SIZE) {
>      case MO_8:
> -        tcg_out_ldst_r(s, MO_8, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRB, data_r, addr_r, off_r);
>          break;
>      case MO_16:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev16(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, MO_16, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRH, data_r, addr_r, off_r);
>          break;
>      case MO_32:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I32, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, MO_32, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRW, data_r, addr_r, off_r);
>          break;
>      case MO_64:
>          if (bswap && data_r != TCG_REG_XZR) {
>              tcg_out_rev(s, TCG_TYPE_I64, TCG_REG_TMP, data_r);
>              data_r = TCG_REG_TMP;
>          }
> -        tcg_out_ldst_r(s, MO_64, LDST_ST, data_r, addr_r, off_r);
> +        tcg_out_ldst_r(s, I3312_STRX, data_r, addr_r, off_r);
>          break;
>      default:
>          tcg_abort();
> @@ -1275,49 +1298,49 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  
>      case INDEX_op_ld8u_i32:
>      case INDEX_op_ld8u_i64:
> -        tcg_out_ldst(s, MO_8, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRB, a0, a1, a2);
>          break;
>      case INDEX_op_ld8s_i32:
> -        tcg_out_ldst(s, MO_8, LDST_LD_S_W, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSBW, a0, a1, a2);
>          break;
>      case INDEX_op_ld8s_i64:
> -        tcg_out_ldst(s, MO_8, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSBX, a0, a1, a2);
>          break;
>      case INDEX_op_ld16u_i32:
>      case INDEX_op_ld16u_i64:
> -        tcg_out_ldst(s, MO_16, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRH, a0, a1, a2);
>          break;
>      case INDEX_op_ld16s_i32:
> -        tcg_out_ldst(s, MO_16, LDST_LD_S_W, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSHW, a0, a1, a2);
>          break;
>      case INDEX_op_ld16s_i64:
> -        tcg_out_ldst(s, MO_16, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSHX, a0, a1, a2);
>          break;
>      case INDEX_op_ld_i32:
>      case INDEX_op_ld32u_i64:
> -        tcg_out_ldst(s, MO_32, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRW, a0, a1, a2);
>          break;
>      case INDEX_op_ld32s_i64:
> -        tcg_out_ldst(s, MO_32, LDST_LD_S_X, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRSWX, a0, a1, a2);
>          break;
>      case INDEX_op_ld_i64:
> -        tcg_out_ldst(s, MO_64, LDST_LD, a0, a1, a2);
> +        tcg_out_ldst(s, I3312_LDRX, a0, a1, a2);
>          break;
>  
>      case INDEX_op_st8_i32:
>      case INDEX_op_st8_i64:
> -        tcg_out_ldst(s, MO_8, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRB, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st16_i32:
>      case INDEX_op_st16_i64:
> -        tcg_out_ldst(s, MO_16, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRH, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st_i32:
>      case INDEX_op_st32_i64:
> -        tcg_out_ldst(s, MO_32, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRW, REG0(0), a1, a2);
>          break;
>      case INDEX_op_st_i64:
> -        tcg_out_ldst(s, MO_64, LDST_ST, REG0(0), a1, a2);
> +        tcg_out_ldst(s, I3312_STRX, REG0(0), a1, a2);
>          break;
>  
>      case INDEX_op_add_i32:
> 

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2014-04-11 12:36 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-03 19:56 [Qemu-devel] [PATCH v2 00/26] tcg-aarch64 improvements, part 3 Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 01/26] tcg-aarch64: Properly detect SIGSEGV writes Richard Henderson
2014-04-07  7:58   ` Claudio Fontana
2014-04-07 16:33     ` Richard Henderson
2014-04-07 16:39   ` Peter Maydell
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 02/26] tcg-aarch64: Use intptr_t apropriately Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 03/26] tcg-aarch64: Use TCGType and TCGMemOp constants Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 04/26] tcg-aarch64: Use MOVN in tcg_out_movi Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 05/26] tcg-aarch64: Use ORRI " Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 06/26] tcg-aarch64: Special case small constants " Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 07/26] tcg-aarch64: Use adrp " Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 08/26] tcg-aarch64: Use symbolic names for branches Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 09/26] tcg-aarch64: Create tcg_out_brcond Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 10/26] tcg-aarch64: Use CBZ and CBNZ Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 11/26] tcg-aarch64: Reuse LR in translated code Richard Henderson
2014-04-07  8:03   ` Claudio Fontana
2014-04-07  9:49     ` Peter Maydell
2014-04-07 11:11       ` Claudio Fontana
2014-04-07 11:28         ` Peter Maydell
2014-04-11 12:33   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 12/26] tcg-aarch64: Introduce tcg_out_insn_3314 Richard Henderson
2014-04-11 12:34   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 13/26] tcg-aarch64: Implement tcg_register_jit Richard Henderson
2014-04-11 12:34   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 14/26] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 15/26] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 16/26] tcg-aarch64: Use ADR to pass the return address to the ld/st helpers Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 17/26] tcg-aarch64: Use TCGMemOp in qemu_ld/st Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 18/26] tcg-aarch64: Pass qemu_ld/st arguments directly Richard Henderson
2014-04-11 12:34   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 19/26] tcg-aarch64: Implement TCG_TARGET_HAS_new_ldst Richard Henderson
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 20/26] tcg-aarch64: Support stores of zero Richard Henderson
2014-04-11 12:34   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 21/26] tcg-aarch64: Introduce tcg_out_insn_3507 Richard Henderson
2014-04-09 12:54   ` Claudio Fontana
2014-04-09 17:17     ` Richard Henderson
2014-04-11 12:36   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 22/26] tcg-aarch64: Merge aarch64_ldst_get_data/type into tcg_out_op Richard Henderson
2014-04-11 12:34   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 23/26] tcg-aarch64: Replace aarch64_ldst_op_data with TCGMemOp Richard Henderson
2014-04-11 12:35   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 24/26] tcg-aarch64: Replace aarch64_ldst_op_data with AArch64LdstType Richard Henderson
2014-04-07 11:45   ` Claudio Fontana
2014-04-07 14:31     ` Richard Henderson
2014-04-11 12:35       ` Claudio Fontana
2014-04-07 18:34     ` [Qemu-devel] [PATCH 27/26] tcg-aarch64: Introduce tcg_out_insn_3312, _3310, _3313 Richard Henderson
2014-04-08  9:00       ` Claudio Fontana
2014-04-11 12:36       ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 25/26] tcg-aarch64: Prefer unsigned offsets before signed offsets for ldst Richard Henderson
2014-04-11 12:35   ` Claudio Fontana
2014-04-03 19:56 ` [Qemu-devel] [PATCH v3 26/26] tcg-aarch64: Use tcg_out_mov in preference to tcg_out_movr Richard Henderson
2014-04-11 12:36   ` Claudio Fontana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.