All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations
@ 2017-02-02 14:34 Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 01/21] tcg: add support for 128bit vector type Kirill Batuzov
                   ` (22 more replies)
  0 siblings, 23 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

The goal of these patch series is to set up an infrastructure to emulate
guest vector operations using host vector operations. Preliminary
experiments show that simply translating loads and stores increases
performance of x264 video codec by 10%. The performance of a gcc vectorized
for loop increased 2x.

To be able to emulate guest vector operations using host vector operations,
several things need to be done.

1. Corresponding vector types should be added to TCG. These series add
TCG_v128 and TCG_v64. I've made TCG_v64 a different type than TCG_i64
because it usually needs to be allocated to different registers and
supports different operations.

2. Load/store operations for these new types need to be implemented.

3. For seamless transition from current model to a new one we need to
handle cases where memory occupied by global variable can be accessed via
pointer to the CPUArchState structure. A very simple conservative alias
analysis has been added to do it. This analysis tracks memory loads and
stores that overlap with fields of CPUArchState and provides this
information to the register allocator. The allocator then spills and
reloads affected globals when needed.

4. Allow overlapping globals. For scalar registers this is a rare case, and
overlapping registers can ba handled as a single one (ah, al, ax, eax,
rax). In ARM every Q-register consists of two D-register each consisting of
two S-registers. Handling 4 S-registers as one because they are parts of
the same Q-register is way too inefficient.

5. Add new memory addressing mode to MMU code for large accesses and create
needed helpers. Only 128-bit vectors have been handled for now.

6. Create TCG opcodes for vector operations. Only addition has beed handled
in these series. Each operation has a wrapper that checks if the backend
supports the corresponding operation or not. In one case the vector opcode
is generated, in the other the operation is emulated with scalar
operations. The emulation code is generated inline for performance reasons
(there is a huge performance difference between inline generation
and calling a helper). As a positive side effect this will eventually allow
 to merge similar emulation code for vector instructions from different
frontends to target-independent implementation.

7. Use new operations in the frontend (ARM was used in these series).

8. Support new operations in the backend (x86_64 was used in these series).

For experiments I have used ARM guest on x86_64 host. I wanted some pair of
different architectures with vector extensions both. ARM and x86_64 pair
fits well.

v1 -> v2:
 - represent v128 type with smaller types when it is not supported by the host
 - detect AVX support and use AVX instructions when available
 - tcg/README updated
 - generate two v64 adds instead of one v128 when applicable
 - rebased to newer master
 - overlap detection for temps added (it needs to be explicitly called from
   <arch>_translate_init)
 - the stack is used to temporary store 128 bit variables to memory
   (instead of the TCGContext field)

v2 -> v2.1
 - automatic build failure fixed

Outstanding issues:
 - qemu_ld_v128 and qemu_st_v128 do not generate fallback code if the host
   does not support 128 bit registers. The reason is that I do not know how to
   handle the host/guest different endianness (whether do we swap only bytes
   in elements or whole vectors?). Different targets seem to have different
   ideas on how this should be done.

Kirill Batuzov (20):
  tcg: add support for 128bit vector type
  tcg: add support for 64bit vector type
  tcg: support representing vector type with smaller vector or scalar
    types
  tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
  tcg: add simple alias analysis
  tcg: use results of alias analysis in liveness analysis
  tcg: allow globals to overlap
  tcg: add vector addition operations
  target/arm: support access to vector guest registers as globals
  target/arm: use vector opcode to handle vadd.<size> instruction
  tcg/i386: add support for vector opcodes
  tcg/i386: support 64-bit vector operations
  tcg/i386: support remaining vector addition operations
  tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
  tcg: introduce new TCGMemOp - MO_128
  tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
  softmmu: create helpers for vector loads
  tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
  target/arm: load two consecutive 64-bits vector regs as a 128-bit
    vector reg
  tcg/README: update README to include information about vector opcodes

Kirill Batuzov (21):
  tcg: add support for 128bit vector type
  tcg: add support for 64bit vector type
  tcg: support representing vector type with smaller vector or scalar
    types
  tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
  tcg: add simple alias analysis
  tcg: use results of alias analysis in liveness analysis
  tcg: allow globals to overlap
  tcg: add vector addition operations
  target/arm: support access to vector guest registers as globals
  target/arm: use vector opcode to handle vadd.<size> instruction
  tcg/i386: add support for vector opcodes
  tcg/i386: support 64-bit vector operations
  tcg/i386: support remaining vector addition operations
  tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
  target/aarch64: do not check for non-existent TCGMemOp
  tcg: introduce new TCGMemOp - MO_128
  tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
  softmmu: create helpers for vector loads
  tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
  target/arm: load two consecutive 64-bits vector regs as a 128-bit
    vector reg
  tcg/README: update README to include information about vector opcodes

 cputlb.c                     |   4 +
 softmmu_template_vector.h    | 266 +++++++++++++++++++++++++++++++
 target/arm/translate-a64.c   |   1 -
 target/arm/translate.c       |  76 ++++++++-
 tcg/README                   |  47 +++++-
 tcg/aarch64/tcg-target.inc.c |   4 +-
 tcg/arm/tcg-target.inc.c     |   4 +-
 tcg/i386/tcg-target.h        |  45 +++++-
 tcg/i386/tcg-target.inc.c    | 260 +++++++++++++++++++++++++++++--
 tcg/mips/tcg-target.inc.c    |   4 +-
 tcg/optimize.c               | 165 +++++++++++++++++++-
 tcg/ppc/tcg-target.inc.c     |   4 +-
 tcg/s390/tcg-target.inc.c    |   4 +-
 tcg/sparc/tcg-target.inc.c   |  12 +-
 tcg/tcg-op.c                 |  92 ++++++++++-
 tcg/tcg-op.h                 | 267 +++++++++++++++++++++++++++++++
 tcg/tcg-opc.h                |  34 ++++
 tcg/tcg.c                    | 363 +++++++++++++++++++++++++++++++++++++------
 tcg/tcg.h                    | 163 ++++++++++++++++++-
 19 files changed, 1722 insertions(+), 93 deletions(-)
 create mode 100644 softmmu_template_vector.h

-- 
2.1.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 01/21] tcg: add support for 128bit vector type
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 02/21] tcg: add support for 64bit " Kirill Batuzov
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Introduce TCG_TYPE_V128 and corresponding TCGv_v128 for TCG temps. Add helper
functions that work with temps of this new type.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/tcg-op.h | 24 ++++++++++++++++++++++++
 tcg/tcg.c    | 13 +++++++++++++
 tcg/tcg.h    | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index c68e300..5abf8b2 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -248,6 +248,23 @@ static inline void tcg_gen_op6ii_i64(TCGOpcode opc, TCGv_i64 a1, TCGv_i64 a2,
                 GET_TCGV_I64(a3), GET_TCGV_I64(a4), a5, a6);
 }
 
+static inline void tcg_gen_op1_v128(TCGOpcode opc, TCGv_v128 a1)
+{
+    tcg_gen_op1(&tcg_ctx, opc, GET_TCGV_V128(a1));
+}
+
+static inline void tcg_gen_op2_v128(TCGOpcode opc, TCGv_v128 a1,
+                                    TCGv_v128 a2)
+{
+    tcg_gen_op2(&tcg_ctx, opc, GET_TCGV_V128(a1), GET_TCGV_V128(a2));
+}
+
+static inline void tcg_gen_op3_v128(TCGOpcode opc, TCGv_v128 a1,
+                                    TCGv_v128 a2, TCGv_v128 a3)
+{
+    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_V128(a1), GET_TCGV_V128(a2),
+                GET_TCGV_V128(a3));
+}
 
 /* Generic ops.  */
 
@@ -454,6 +471,13 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 arg)
     }
 }
 
+/* Vector ops */
+
+static inline void tcg_gen_discard_v128(TCGv_v128 arg)
+{
+    tcg_gen_op1_v128(INDEX_op_discard, arg);
+}
+
 /* 64 bit ops */
 
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index cb898f1..2a5e83b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -641,6 +641,14 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local)
     return MAKE_TCGV_I64(idx);
 }
 
+TCGv_v128 tcg_temp_new_internal_v128(int temp_local)
+{
+    int idx;
+
+    idx = tcg_temp_new_internal(TCG_TYPE_V128, temp_local);
+    return MAKE_TCGV_V128(idx);
+}
+
 static void tcg_temp_free_internal(int idx)
 {
     TCGContext *s = &tcg_ctx;
@@ -673,6 +681,11 @@ void tcg_temp_free_i64(TCGv_i64 arg)
     tcg_temp_free_internal(GET_TCGV_I64(arg));
 }
 
+void tcg_temp_free_v128(TCGv_v128 arg)
+{
+    tcg_temp_free_internal(GET_TCGV_V128(arg));
+}
+
 TCGv_i32 tcg_const_i32(int32_t val)
 {
     TCGv_i32 t0;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 631c6f6..56484e7 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -246,6 +246,7 @@ typedef struct TCGPool {
 typedef enum TCGType {
     TCG_TYPE_I32,
     TCG_TYPE_I64,
+    TCG_TYPE_V128,
     TCG_TYPE_COUNT, /* number of different types */
 
     /* An alias for the size of the host register.  */
@@ -421,6 +422,7 @@ typedef tcg_target_ulong TCGArg;
 typedef struct TCGv_i32_d *TCGv_i32;
 typedef struct TCGv_i64_d *TCGv_i64;
 typedef struct TCGv_ptr_d *TCGv_ptr;
+typedef struct TCGv_v128_d *TCGv_v128;
 typedef TCGv_ptr TCGv_env;
 #if TARGET_LONG_BITS == 32
 #define TCGv TCGv_i32
@@ -445,6 +447,11 @@ static inline TCGv_ptr QEMU_ARTIFICIAL MAKE_TCGV_PTR(intptr_t i)
     return (TCGv_ptr)i;
 }
 
+static inline TCGv_v128 QEMU_ARTIFICIAL MAKE_TCGV_V128(intptr_t i)
+{
+    return (TCGv_v128)i;
+}
+
 static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_I32(TCGv_i32 t)
 {
     return (intptr_t)t;
@@ -460,6 +467,11 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
     return (intptr_t)t;
 }
 
+static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_V128(TCGv_v128 t)
+{
+    return (intptr_t)t;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 #define TCGV_LOW(t) MAKE_TCGV_I32(GET_TCGV_I64(t))
 #define TCGV_HIGH(t) MAKE_TCGV_I32(GET_TCGV_I64(t) + 1)
@@ -467,15 +479,18 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
 
 #define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b))
 #define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b))
+#define TCGV_EQUAL_V128(a, b) (GET_TCGV_V128(a) == GET_TCGV_V128(b))
 #define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b))
 
 /* Dummy definition to avoid compiler warnings.  */
 #define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1)
 #define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1)
+#define TCGV_UNUSED_V128(x) x = MAKE_TCGV_V128(-1)
 #define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1)
 
 #define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1)
 #define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1)
+#define TCGV_IS_UNUSED_V128(x) (GET_TCGV_V128(x) == -1)
 #define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1)
 
 /* call flags */
@@ -798,9 +813,11 @@ TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name);
 
 TCGv_i32 tcg_temp_new_internal_i32(int temp_local);
 TCGv_i64 tcg_temp_new_internal_i64(int temp_local);
+TCGv_v128 tcg_temp_new_internal_v128(int temp_local);
 
 void tcg_temp_free_i32(TCGv_i32 arg);
 void tcg_temp_free_i64(TCGv_i64 arg);
+void tcg_temp_free_v128(TCGv_v128 arg);
 
 static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
                                               const char *name)
@@ -836,6 +853,23 @@ static inline TCGv_i64 tcg_temp_local_new_i64(void)
     return tcg_temp_new_internal_i64(1);
 }
 
+static inline TCGv_v128 tcg_global_mem_new_v128(TCGv_ptr reg, intptr_t offset,
+                                                const char *name)
+{
+    int idx = tcg_global_mem_new_internal(TCG_TYPE_V128, reg, offset, name);
+    return MAKE_TCGV_V128(idx);
+}
+
+static inline TCGv_v128 tcg_temp_new_v128(void)
+{
+    return tcg_temp_new_internal_v128(0);
+}
+
+static inline TCGv_v128 tcg_temp_local_new_v128(void)
+{
+    return tcg_temp_new_internal_v128(1);
+}
+
 #if defined(CONFIG_DEBUG_TCG)
 /* If you call tcg_clear_temp_count() at the start of a section of
  * code which is not supposed to leak any TCG temporaries, then
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 02/21] tcg: add support for 64bit vector type
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 01/21] tcg: add support for 128bit vector type Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 03/21] tcg: support representing vector type with smaller vector or scalar types Kirill Batuzov
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Introduce TCG_TYPE_V64 and corresponding TCGv_v64 for TCG temps. Add helper
functions that work with temps of this new type.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/tcg-op.h | 23 +++++++++++++++++++++++
 tcg/tcg.c    | 13 +++++++++++++
 tcg/tcg.h    | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 70 insertions(+)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 5abf8b2..517745e 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -266,6 +266,24 @@ static inline void tcg_gen_op3_v128(TCGOpcode opc, TCGv_v128 a1,
                 GET_TCGV_V128(a3));
 }
 
+static inline void tcg_gen_op1_v64(TCGOpcode opc, TCGv_v64 a1)
+{
+    tcg_gen_op1(&tcg_ctx, opc, GET_TCGV_V64(a1));
+}
+
+static inline void tcg_gen_op2_v64(TCGOpcode opc, TCGv_v64 a1,
+                                    TCGv_v64 a2)
+{
+    tcg_gen_op2(&tcg_ctx, opc, GET_TCGV_V64(a1), GET_TCGV_V64(a2));
+}
+
+static inline void tcg_gen_op3_v64(TCGOpcode opc, TCGv_v64 a1,
+                                    TCGv_v64 a2, TCGv_v64 a3)
+{
+    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_V64(a1), GET_TCGV_V64(a2),
+                GET_TCGV_V64(a3));
+}
+
 /* Generic ops.  */
 
 static inline void gen_set_label(TCGLabel *l)
@@ -478,6 +496,11 @@ static inline void tcg_gen_discard_v128(TCGv_v128 arg)
     tcg_gen_op1_v128(INDEX_op_discard, arg);
 }
 
+static inline void tcg_gen_discard_v64(TCGv_v64 arg)
+{
+    tcg_gen_op1_v64(INDEX_op_discard, arg);
+}
+
 /* 64 bit ops */
 
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 2a5e83b..5e69103 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -641,6 +641,14 @@ TCGv_i64 tcg_temp_new_internal_i64(int temp_local)
     return MAKE_TCGV_I64(idx);
 }
 
+TCGv_v64 tcg_temp_new_internal_v64(int temp_local)
+{
+    int idx;
+
+    idx = tcg_temp_new_internal(TCG_TYPE_V64, temp_local);
+    return MAKE_TCGV_V64(idx);
+}
+
 TCGv_v128 tcg_temp_new_internal_v128(int temp_local)
 {
     int idx;
@@ -681,6 +689,11 @@ void tcg_temp_free_i64(TCGv_i64 arg)
     tcg_temp_free_internal(GET_TCGV_I64(arg));
 }
 
+void tcg_temp_free_v64(TCGv_v64 arg)
+{
+    tcg_temp_free_internal(GET_TCGV_V64(arg));
+}
+
 void tcg_temp_free_v128(TCGv_v128 arg)
 {
     tcg_temp_free_internal(GET_TCGV_V128(arg));
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 56484e7..fa455ae 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -246,6 +246,7 @@ typedef struct TCGPool {
 typedef enum TCGType {
     TCG_TYPE_I32,
     TCG_TYPE_I64,
+    TCG_TYPE_V64,
     TCG_TYPE_V128,
     TCG_TYPE_COUNT, /* number of different types */
 
@@ -422,6 +423,7 @@ typedef tcg_target_ulong TCGArg;
 typedef struct TCGv_i32_d *TCGv_i32;
 typedef struct TCGv_i64_d *TCGv_i64;
 typedef struct TCGv_ptr_d *TCGv_ptr;
+typedef struct TCGv_v64_d *TCGv_v64;
 typedef struct TCGv_v128_d *TCGv_v128;
 typedef TCGv_ptr TCGv_env;
 #if TARGET_LONG_BITS == 32
@@ -447,6 +449,11 @@ static inline TCGv_ptr QEMU_ARTIFICIAL MAKE_TCGV_PTR(intptr_t i)
     return (TCGv_ptr)i;
 }
 
+static inline TCGv_v64 QEMU_ARTIFICIAL MAKE_TCGV_V64(intptr_t i)
+{
+    return (TCGv_v64)i;
+}
+
 static inline TCGv_v128 QEMU_ARTIFICIAL MAKE_TCGV_V128(intptr_t i)
 {
     return (TCGv_v128)i;
@@ -467,6 +474,11 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
     return (intptr_t)t;
 }
 
+static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_V64(TCGv_v64 t)
+{
+    return (intptr_t)t;
+}
+
 static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_V128(TCGv_v128 t)
 {
     return (intptr_t)t;
@@ -479,17 +491,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_V128(TCGv_v128 t)
 
 #define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b))
 #define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b))
+#define TCGV_EQUAL_V64(a, b) (GET_TCGV_V64(a) == GET_TCGV_V64(b))
 #define TCGV_EQUAL_V128(a, b) (GET_TCGV_V128(a) == GET_TCGV_V128(b))
 #define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b))
 
 /* Dummy definition to avoid compiler warnings.  */
 #define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1)
 #define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1)
+#define TCGV_UNUSED_V64(x) x = MAKE_TCGV_V64(-1)
 #define TCGV_UNUSED_V128(x) x = MAKE_TCGV_V128(-1)
 #define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1)
 
 #define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1)
 #define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1)
+#define TCGV_IS_UNUSED_V64(x) (GET_TCGV_V64(x) == -1)
 #define TCGV_IS_UNUSED_V128(x) (GET_TCGV_V128(x) == -1)
 #define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1)
 
@@ -813,10 +828,12 @@ TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name);
 
 TCGv_i32 tcg_temp_new_internal_i32(int temp_local);
 TCGv_i64 tcg_temp_new_internal_i64(int temp_local);
+TCGv_v64 tcg_temp_new_internal_v64(int temp_local);
 TCGv_v128 tcg_temp_new_internal_v128(int temp_local);
 
 void tcg_temp_free_i32(TCGv_i32 arg);
 void tcg_temp_free_i64(TCGv_i64 arg);
+void tcg_temp_free_v64(TCGv_v64 arg);
 void tcg_temp_free_v128(TCGv_v128 arg);
 
 static inline TCGv_i32 tcg_global_mem_new_i32(TCGv_ptr reg, intptr_t offset,
@@ -853,6 +870,23 @@ static inline TCGv_i64 tcg_temp_local_new_i64(void)
     return tcg_temp_new_internal_i64(1);
 }
 
+static inline TCGv_v64 tcg_global_mem_new_v64(TCGv_ptr reg, intptr_t offset,
+                                              const char *name)
+{
+    int idx = tcg_global_mem_new_internal(TCG_TYPE_V64, reg, offset, name);
+    return MAKE_TCGV_V64(idx);
+}
+
+static inline TCGv_v64 tcg_temp_new_v64(void)
+{
+    return tcg_temp_new_internal_v64(0);
+}
+
+static inline TCGv_v64 tcg_temp_local_new_v64(void)
+{
+    return tcg_temp_new_internal_v64(1);
+}
+
 static inline TCGv_v128 tcg_global_mem_new_v128(TCGv_ptr reg, intptr_t offset,
                                                 const char *name)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 03/21] tcg: support representing vector type with smaller vector or scalar types
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 01/21] tcg: add support for 128bit vector type Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 02/21] tcg: add support for 64bit " Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 04/21] tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes Kirill Batuzov
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---

This is not as bad as I thought it would be.
Only two cases: type == base_type and type != base_type.

---
 tcg/tcg.c | 136 +++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 91 insertions(+), 45 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5e69103..18d97ec 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -523,12 +523,54 @@ TCGv_i64 tcg_global_reg_new_i64(TCGReg reg, const char *name)
     return MAKE_TCGV_I64(idx);
 }
 
-int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
+static TCGType tcg_choose_type(TCGType type)
+{
+    switch (type) {
+    case TCG_TYPE_I64:
+        if (TCG_TARGET_REG_BITS == 64) {
+            return TCG_TYPE_I64;
+        }
+        /* Fallthrough */
+    case TCG_TYPE_I32:
+        return TCG_TYPE_I32;
+    case TCG_TYPE_V128:
+#ifdef TCG_TARGET_HAS_REG128
+        return TCG_TYPE_V128;
+#endif
+        /* Fallthrough */
+    case TCG_TYPE_V64:
+#ifdef TCG_TARGET_HAS_REGV64
+        return TCG_TYPE_V64;
+#else
+        return tcg_choose_type(TCG_TYPE_I64);
+#endif
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static intptr_t tcg_type_size(TCGType type)
+{
+    switch (type) {
+    case TCG_TYPE_I32:
+        return 4;
+    case TCG_TYPE_I64:
+    case TCG_TYPE_V64:
+        return 8;
+    case TCG_TYPE_V128:
+        return 16;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+int tcg_global_mem_new_internal(TCGType base_type, TCGv_ptr base,
                                 intptr_t offset, const char *name)
 {
     TCGContext *s = &tcg_ctx;
     TCGTemp *base_ts = &s->temps[GET_TCGV_PTR(base)];
     TCGTemp *ts = tcg_global_alloc(s);
+    TCGType type = tcg_choose_type(base_type);
     int indirect_reg = 0, bigendian = 0;
 #ifdef HOST_WORDS_BIGENDIAN
     bigendian = 1;
@@ -543,47 +585,51 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
         indirect_reg = 1;
     }
 
-    if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
-        TCGTemp *ts2 = tcg_global_alloc(s);
-        char buf[64];
-
-        ts->base_type = TCG_TYPE_I64;
-        ts->type = TCG_TYPE_I32;
+    if (type == base_type) {
+        ts->base_type = type;
+        ts->type = type;
         ts->indirect_reg = indirect_reg;
         ts->mem_allocated = 1;
         ts->mem_base = base_ts;
-        ts->mem_offset = offset + bigendian * 4;
-        pstrcpy(buf, sizeof(buf), name);
-        pstrcat(buf, sizeof(buf), "_0");
-        ts->name = strdup(buf);
-
-        tcg_debug_assert(ts2 == ts + 1);
-        ts2->base_type = TCG_TYPE_I64;
-        ts2->type = TCG_TYPE_I32;
-        ts2->indirect_reg = indirect_reg;
-        ts2->mem_allocated = 1;
-        ts2->mem_base = base_ts;
-        ts2->mem_offset = offset + (1 - bigendian) * 4;
-        pstrcpy(buf, sizeof(buf), name);
-        pstrcat(buf, sizeof(buf), "_1");
-        ts2->name = strdup(buf);
+        ts->mem_offset = offset;
+        ts->name = name;
     } else {
-        ts->base_type = type;
+        int i, count = tcg_type_size(base_type) / tcg_type_size(type);
+        TCGTemp *ts2, *ts1 = ts;
+        int cur_offset =
+                bigendian ? tcg_type_size(base_type) - tcg_type_size(type) : 0;
+
+        ts->base_type = base_type;
         ts->type = type;
         ts->indirect_reg = indirect_reg;
         ts->mem_allocated = 1;
         ts->mem_base = base_ts;
-        ts->mem_offset = offset;
-        ts->name = name;
+        ts->mem_offset = offset + cur_offset;
+        ts->name = g_strdup_printf("%s_0", name);
+
+        for (i = 1; i < count; i++) {
+            ts2 = tcg_global_alloc(s);
+            tcg_debug_assert(ts2 == ts1 + 1);
+            cur_offset += (bigendian ? -1 : 1) * tcg_type_size(type);
+            ts2->base_type = base_type;
+            ts2->type = type;
+            ts2->indirect_reg = indirect_reg;
+            ts2->mem_allocated = 1;
+            ts2->mem_base = base_ts;
+            ts2->mem_offset = offset + cur_offset;
+            ts2->name = g_strdup_printf("%s_%d", name, i);
+            ts1 = ts2;
+        }
     }
     return temp_idx(s, ts);
 }
 
-static int tcg_temp_new_internal(TCGType type, int temp_local)
+static int tcg_temp_new_internal(TCGType base_type, int temp_local)
 {
     TCGContext *s = &tcg_ctx;
     TCGTemp *ts;
     int idx, k;
+    TCGType type = tcg_choose_type(base_type);
 
     k = type + (temp_local ? TCG_TYPE_COUNT : 0);
     idx = find_first_bit(s->free_temps[k].l, TCG_MAX_TEMPS);
@@ -593,28 +639,28 @@ static int tcg_temp_new_internal(TCGType type, int temp_local)
 
         ts = &s->temps[idx];
         ts->temp_allocated = 1;
-        tcg_debug_assert(ts->base_type == type);
+        tcg_debug_assert(ts->base_type == base_type);
         tcg_debug_assert(ts->temp_local == temp_local);
     } else {
         ts = tcg_temp_alloc(s);
-        if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
-            TCGTemp *ts2 = tcg_temp_alloc(s);
-
-            ts->base_type = type;
-            ts->type = TCG_TYPE_I32;
-            ts->temp_allocated = 1;
-            ts->temp_local = temp_local;
-
-            tcg_debug_assert(ts2 == ts + 1);
-            ts2->base_type = TCG_TYPE_I64;
-            ts2->type = TCG_TYPE_I32;
-            ts2->temp_allocated = 1;
-            ts2->temp_local = temp_local;
-        } else {
-            ts->base_type = type;
-            ts->type = type;
-            ts->temp_allocated = 1;
-            ts->temp_local = temp_local;
+        ts->base_type = base_type;
+        ts->type = type;
+        ts->temp_allocated = 1;
+        ts->temp_local = temp_local;
+
+        if (type != base_type) {
+            int i, count = tcg_type_size(base_type) / tcg_type_size(type);
+            TCGTemp *ts2, *ts1 = ts;
+
+            for (i = 1; i < count; i++) {
+                ts2 = tcg_temp_alloc(s);
+                tcg_debug_assert(ts2 == ts1 + 1);
+                ts2->base_type = base_type;
+                ts2->type = type;
+                ts2->temp_allocated = 1;
+                ts2->temp_local = temp_local;
+                ts1 = ts2;
+            }
         }
         idx = temp_idx(s, ts);
     }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 04/21] tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (2 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 03/21] tcg: support representing vector type with smaller vector or scalar types Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 05/21] tcg: add simple alias analysis Kirill Batuzov
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/tcg-op.h  | 38 ++++++++++++++++++++++++++++++++++++++
 tcg/tcg-opc.h | 18 ++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 517745e..250493b 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -501,6 +501,44 @@ static inline void tcg_gen_discard_v64(TCGv_v64 arg)
     tcg_gen_op1_v64(INDEX_op_discard, arg);
 }
 
+static inline void tcg_gen_ldst_op_v128(TCGOpcode opc, TCGv_v128 val,
+                                       TCGv_ptr base, TCGArg offset)
+{
+    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_V128(val), GET_TCGV_PTR(base),
+                offset);
+}
+
+static inline void tcg_gen_st_v128(TCGv_v128 arg1, TCGv_ptr arg2,
+                                   tcg_target_long offset)
+{
+    tcg_gen_ldst_op_v128(INDEX_op_st_v128, arg1, arg2, offset);
+}
+
+static inline void tcg_gen_ld_v128(TCGv_v128 ret, TCGv_ptr arg2,
+                                   tcg_target_long offset)
+{
+    tcg_gen_ldst_op_v128(INDEX_op_ld_v128, ret, arg2, offset);
+}
+
+static inline void tcg_gen_ldst_op_v64(TCGOpcode opc, TCGv_v64 val,
+                                       TCGv_ptr base, TCGArg offset)
+{
+    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_V64(val), GET_TCGV_PTR(base),
+                offset);
+}
+
+static inline void tcg_gen_st_v64(TCGv_v64 arg1, TCGv_ptr arg2,
+                                  tcg_target_long offset)
+{
+    tcg_gen_ldst_op_v64(INDEX_op_st_v64, arg1, arg2, offset);
+}
+
+static inline void tcg_gen_ld_v64(TCGv_v64 ret, TCGv_ptr arg2,
+                                  tcg_target_long offset)
+{
+    tcg_gen_ldst_op_v64(INDEX_op_ld_v64, ret, arg2, offset);
+}
+
 /* 64 bit ops */
 
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index f06f894..2365c97 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -42,6 +42,18 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 # define IMPL64  TCG_OPF_64BIT
 #endif
 
+#ifdef TCG_TARGET_HAS_REG128
+# define IMPL128 0
+#else
+# define IMPL128 TCG_OPF_NOT_PRESENT
+#endif
+
+#ifdef TCG_TARGET_HAS_REGV64
+# define IMPLV64 0
+#else
+# define IMPLV64 TCG_OPF_NOT_PRESENT
+#endif
+
 DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
@@ -188,6 +200,12 @@ DEF(mulsh_i64, 1, 2, 0, IMPL(TCG_TARGET_HAS_mulsh_i64))
 #define TLADDR_ARGS  (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS ? 1 : 2)
 #define DATA64_ARGS  (TCG_TARGET_REG_BITS == 64 ? 1 : 2)
 
+/* load/store */
+DEF(st_v128, 0, 2, 1, IMPL128)
+DEF(ld_v128, 1, 1, 1, IMPL128)
+DEF(st_v64, 0, 2, 1, IMPLV64)
+DEF(ld_v64, 1, 1, 1, IMPLV64)
+
 /* QEMU specific */
 DEF(insn_start, 0, 0, TLADDR_ARGS * TARGET_INSN_START_WORDS,
     TCG_OPF_NOT_PRESENT)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 05/21] tcg: add simple alias analysis
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (3 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 04/21] tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 06/21] tcg: use results of alias analysis in liveness analysis Kirill Batuzov
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Add a simple alias analysis to TCG which finds out memory loads and stores
that overlap with CPUState. This information can be used later in liveness
analysis to ensure correctness of register allocation. In particular, if load
or store overlaps with memory location of some global variable, this variable
should be spilled and reloaded at appropriate times.

Previously no such analysis was performed and for correctness reasons it was
required that no load/store operations overlap with memory locations of global
variables.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---

I believe checkpatch warning here to be false-positive.

---
 tcg/optimize.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.h      |  17 +++++++
 2 files changed, 163 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index adfc56c..2347ce3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -34,6 +34,7 @@
 
 struct tcg_temp_info {
     bool is_const;
+    bool is_base;
     uint16_t prev_copy;
     uint16_t next_copy;
     tcg_target_ulong val;
@@ -61,6 +62,7 @@ static void reset_temp(TCGArg temp)
     temps[temp].next_copy = temp;
     temps[temp].prev_copy = temp;
     temps[temp].is_const = false;
+    temps[temp].is_base = false;
     temps[temp].mask = -1;
 }
 
@@ -1429,3 +1431,147 @@ void tcg_optimize(TCGContext *s)
         }
     }
 }
+
+/* Simple alias analysis. It finds out which load/store operations overlap
+   with CPUArchState. The result is stored in TCGContext and can be used
+   during liveness analysis and register allocation. */
+void tcg_alias_analysis(TCGContext *s)
+{
+    int oi, oi_next;
+
+    reset_all_temps(s->nb_temps);
+    temps[GET_TCGV_PTR(s->tcg_env)].is_base = true;
+    temps[GET_TCGV_PTR(s->tcg_env)].val = 0;
+
+    for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
+        int nb_oargs, i;
+        int size;
+        TCGAliasType tp;
+
+        TCGOp * const op = &s->gen_op_buf[oi];
+        TCGArg * const args = &s->gen_opparam_buf[op->args];
+        TCGOpcode opc = op->opc;
+        const TCGOpDef *def = &tcg_op_defs[opc];
+
+        oi_next = op->next;
+
+        if (opc == INDEX_op_call) {
+            nb_oargs = op->callo;
+        } else {
+            nb_oargs = def->nb_oargs;
+        }
+
+        s->alias_info[oi] = (TCGAliasInfo){
+                TCG_NOT_ALIAS,
+                false,
+                0,
+                0
+            };
+
+        switch (opc) {
+        CASE_OP_32_64(movi):
+            temps[args[0]].is_const = 1;
+            temps[args[0]].val = args[1];
+            break;
+        CASE_OP_32_64(mov):
+            temps[args[0]].is_const = temps[args[1]].is_const;
+            temps[args[0]].is_base = temps[args[1]].is_base;
+            temps[args[0]].val = temps[args[1]].val;
+            break;
+        CASE_OP_32_64(add):
+        CASE_OP_32_64(sub):
+            if (temps[args[1]].is_base && temps[args[2]].is_const) {
+                temps[args[0]].is_base = true;
+                temps[args[0]].is_const = false;
+                temps[args[0]].val =
+                    do_constant_folding(opc, temps[args[1]].val,
+                                        temps[args[2]].val);
+            } else {
+                reset_temp(args[0]);
+            }
+        CASE_OP_32_64(ld8s):
+        CASE_OP_32_64(ld8u):
+            size = 1;
+            tp = TCG_ALIAS_READ;
+            goto do_ldst;
+        CASE_OP_32_64(ld16s):
+        CASE_OP_32_64(ld16u):
+            size = 2;
+            tp = TCG_ALIAS_READ;
+            goto do_ldst;
+        case INDEX_op_ld_i32:
+        case INDEX_op_ld32s_i64:
+        case INDEX_op_ld32u_i64:
+            size = 4;
+            tp = TCG_ALIAS_READ;
+            goto do_ldst;
+        case INDEX_op_ld_i64:
+            size = 8;
+            tp = TCG_ALIAS_READ;
+            goto do_ldst;
+        case INDEX_op_ld_v128:
+            size = 16;
+            tp = TCG_ALIAS_READ;
+            goto do_ldst;
+        CASE_OP_32_64(st8):
+            size = 1;
+            tp = TCG_ALIAS_WRITE;
+            goto do_ldst;
+        CASE_OP_32_64(st16):
+            size = 2;
+            tp = TCG_ALIAS_WRITE;
+            goto do_ldst;
+        case INDEX_op_st_i32:
+        case INDEX_op_st32_i64:
+            size = 4;
+            tp = TCG_ALIAS_WRITE;
+            goto do_ldst;
+        case INDEX_op_st_i64:
+            size = 8;
+            tp = TCG_ALIAS_WRITE;
+            goto do_ldst;
+        case INDEX_op_st_v128:
+            size = 16;
+            tp = TCG_ALIAS_WRITE;
+            goto do_ldst;
+        do_ldst:
+            if (temps[args[1]].is_base) {
+                TCGArg val;
+#if TCG_TARGET_REG_BITS == 32
+                val = do_constant_folding(INDEX_op_add_i32,
+                                          temps[args[1]].val,
+                                          args[2]);
+#else
+                val = do_constant_folding(INDEX_op_add_i64,
+                                          temps[args[1]].val,
+                                          args[2]);
+#endif
+                if ((tcg_target_long)val < sizeof(CPUArchState) &&
+                    (tcg_target_long)val + size > 0) {
+                    s->alias_info[oi].alias_type = tp;
+                    s->alias_info[oi].fixed_offset = true;
+                    s->alias_info[oi].offset = val;
+                    s->alias_info[oi].size = size;
+                } else {
+                    s->alias_info[oi].alias_type = TCG_NOT_ALIAS;
+                }
+            } else {
+                s->alias_info[oi].alias_type = tp;
+                s->alias_info[oi].fixed_offset = false;
+            }
+            goto do_reset_output;
+        default:
+            if (def->flags & TCG_OPF_BB_END) {
+                reset_all_temps(s->nb_temps);
+                temps[GET_TCGV_PTR(s->tcg_env)].is_base = true;
+                temps[GET_TCGV_PTR(s->tcg_env)].val = 0;
+            } else {
+        do_reset_output:
+                for (i = 0; i < nb_oargs; i++) {
+                    reset_temp(args[i]);
+                }
+            }
+            break;
+        }
+    }
+}
diff --git a/tcg/tcg.h b/tcg/tcg.h
index fa455ae..0e1fbe9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -678,6 +678,20 @@ QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
 /* Make sure that we don't overflow 64 bits without noticing.  */
 QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
 
+typedef enum TCGAliasType {
+    TCG_NOT_ALIAS = 0,
+    TCG_ALIAS_READ = 1,
+    TCG_ALIAS_WRITE = 2,
+    TCG_ALIAS_RW = TCG_ALIAS_READ | TCG_ALIAS_WRITE
+} TCGAliasType;
+
+typedef struct TCGAliasInfo {
+    TCGAliasType alias_type;
+    bool fixed_offset;
+    tcg_target_long offset;
+    tcg_target_long size;
+} TCGAliasInfo;
+
 struct TCGContext {
     uint8_t *pool_cur, *pool_end;
     TCGPool *pool_first, *pool_current, *pool_first_large;
@@ -762,6 +776,8 @@ struct TCGContext {
     TCGOp gen_op_buf[OPC_BUF_SIZE];
     TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];
 
+    TCGAliasInfo alias_info[OPC_BUF_SIZE];
+
     uint16_t gen_insn_end_off[TCG_MAX_INSNS];
     target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
 };
@@ -1009,6 +1025,7 @@ TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc, int narg);
 TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc, int narg);
 
 void tcg_optimize(TCGContext *s);
+void tcg_alias_analysis(TCGContext *s);
 
 /* only used for debugging purposes */
 void tcg_dump_ops(TCGContext *s);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 06/21] tcg: use results of alias analysis in liveness analysis
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (4 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 05/21] tcg: add simple alias analysis Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 07/21] tcg: allow globals to overlap Kirill Batuzov
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/tcg.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 18d97ec..27e5944 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -564,6 +564,11 @@ static intptr_t tcg_type_size(TCGType type)
     }
 }
 
+static intptr_t tcg_temp_size(const TCGTemp *tmp)
+{
+    return tcg_type_size(tmp->type);
+}
+
 int tcg_global_mem_new_internal(TCGType base_type, TCGv_ptr base,
                                 intptr_t offset, const char *name)
 {
@@ -1472,6 +1477,43 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
     }
 }
 
+/* Check if memory write completely overwrites temp's memory location.
+   If this is the case then the temp can be considered dead. */
+static int tcg_temp_overwrite(TCGContext *s, const TCGTemp *tmp,
+                               const TCGAliasInfo *ai)
+{
+    if (!(ai->alias_type & TCG_ALIAS_WRITE) || !ai->fixed_offset) {
+        return 0;
+    }
+    if (tmp->mem_base != &s->temps[GET_TCGV_PTR(s->tcg_env)]) {
+        return 0;
+    }
+    if (ai->offset > tmp->mem_offset
+        || ai->offset + ai->size < tmp->mem_offset + tcg_temp_size(tmp)) {
+            return 0;
+    }
+    return 1;
+}
+
+/* Check if memory read or write overlaps with temp's memory location.
+   If this is the case then the temp must be synced to memory. */
+static int tcg_temp_overlap(TCGContext *s, const TCGTemp *tmp,
+                            const TCGAliasInfo *ai)
+{
+    if (!ai->fixed_offset || tmp->fixed_reg) {
+        return 0;
+    }
+    if (tmp->mem_base != &s->temps[GET_TCGV_PTR(s->tcg_env)]) {
+        return 1;
+    }
+    if (ai->offset >= tmp->mem_offset + tcg_temp_size(tmp)
+        || ai->offset + ai->size <= tmp->mem_offset) {
+            return 0;
+    } else {
+        return 1;
+    }
+}
+
 /* Liveness analysis : update the opc_arg_life array to tell if a
    given input arguments is dead. Instructions updating dead
    temporaries are removed. */
@@ -1674,6 +1716,23 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                     temp_state[arg] = TS_DEAD;
                 }
 
+                /* record if the operation uses some globals' memory location */
+                if (s->alias_info[oi].alias_type != TCG_NOT_ALIAS) {
+                    for (i = 0; i < s->nb_globals; i++) {
+                        if (tcg_temp_overwrite(s, &s->temps[i],
+                                               &s->alias_info[oi])) {
+                            temp_state[i] = TS_DEAD;
+                        } else if (tcg_temp_overlap(s, &s->temps[i],
+                                                    &s->alias_info[oi])) {
+                            if (s->alias_info[oi].alias_type & TCG_ALIAS_READ) {
+                                temp_state[i] = TS_MEM | TS_DEAD;
+                            } else if (!(temp_state[i] & TS_DEAD)) {
+                                temp_state[i] |= TS_MEM;
+                            }
+                        }
+                    }
+                }
+
                 /* if end of basic block, update */
                 if (def->flags & TCG_OPF_BB_END) {
                     tcg_la_bb_end(s, temp_state);
@@ -2622,6 +2681,8 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
     s->la_time -= profile_getclock();
 #endif
 
+    tcg_alias_analysis(s);
+
     {
         uint8_t *temp_state = tcg_malloc(s->nb_temps + s->nb_indirects);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 07/21] tcg: allow globals to overlap
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (5 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 06/21] tcg: use results of alias analysis in liveness analysis Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 08/21] tcg: add vector addition operations Kirill Batuzov
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Sometimes the target architecture may allow some parts of a register to be
accessed as a different register. If both of these registers are
implemented as globals in QEMU, then their content will overlap and the
change to one global will also change the value of the other. To handle
such situation properly, some fixes are needed in the register allocator
and liveness analysis.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/optimize.c |  19 ++++++++-
 tcg/tcg.c      | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.h      |  20 +++++++++
 3 files changed, 166 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2347ce3..7a69ff0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -55,7 +55,7 @@ static inline bool temp_is_copy(TCGArg arg)
 }
 
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
-static void reset_temp(TCGArg temp)
+static void reset_this_temp(TCGArg temp)
 {
     temps[temps[temp].next_copy].prev_copy = temps[temp].prev_copy;
     temps[temps[temp].prev_copy].next_copy = temps[temp].next_copy;
@@ -66,6 +66,23 @@ static void reset_temp(TCGArg temp)
     temps[temp].mask = -1;
 }
 
+static void reset_temp(TCGArg temp)
+{
+    int i;
+    TCGTemp *ts = &tcg_ctx.temps[temp];
+    reset_this_temp(temp);
+    if (ts->sub_temps) {
+        for (i = 0; ts->sub_temps[i] != (TCGArg)-1; i++) {
+            reset_this_temp(ts->sub_temps[i]);
+        }
+    }
+    if (ts->overlap_temps) {
+        for (i = 0; ts->overlap_temps[i] != (TCGArg)-1; i++) {
+            reset_this_temp(ts->overlap_temps[i]);
+        }
+    }
+}
+
 /* Reset all temporaries, given that there are NB_TEMPS of them.  */
 static void reset_all_temps(int nb_temps)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 27e5944..a8df040 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -623,9 +623,13 @@ int tcg_global_mem_new_internal(TCGType base_type, TCGv_ptr base,
             ts2->mem_base = base_ts;
             ts2->mem_offset = offset + cur_offset;
             ts2->name = g_strdup_printf("%s_%d", name, i);
+            ts2->sub_temps = NULL;
+            ts2->overlap_temps = NULL;
             ts1 = ts2;
         }
     }
+    ts->sub_temps = NULL;
+    ts->overlap_temps = NULL;
     return temp_idx(s, ts);
 }
 
@@ -1514,6 +1518,35 @@ static int tcg_temp_overlap(TCGContext *s, const TCGTemp *tmp,
     }
 }
 
+static void tcg_temp_arr_apply(const TCGArg *arr, uint8_t *temp_state,
+                               uint8_t temp_val)
+{
+    TCGArg i;
+    if (!arr) {
+        return ;
+    }
+    for (i = 0; arr[i] != (TCGArg)-1; i++) {
+        temp_state[arr[i]] = temp_val;
+    }
+}
+
+static void tcg_sub_temps_dead(TCGContext *s, TCGArg tmp, uint8_t *temp_state)
+{
+    tcg_temp_arr_apply(s->temps[tmp].sub_temps, temp_state, TS_DEAD);
+}
+
+static void tcg_sub_temps_sync(TCGContext *s, TCGArg tmp, uint8_t *temp_state)
+{
+    tcg_temp_arr_apply(s->temps[tmp].sub_temps, temp_state, TS_MEM | TS_DEAD);
+}
+
+static void tcg_overlap_temps_sync(TCGContext *s, TCGArg tmp,
+                                   uint8_t *temp_state)
+{
+    tcg_temp_arr_apply(s->temps[tmp].overlap_temps, temp_state,
+                       TS_MEM | TS_DEAD);
+}
+
 /* Liveness analysis : update the opc_arg_life array to tell if a
    given input arguments is dead. Instructions updating dead
    temporaries are removed. */
@@ -1568,6 +1601,11 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                         if (temp_state[arg] & TS_MEM) {
                             arg_life |= SYNC_ARG << i;
                         }
+                        /* sub_temps are also dead */
+                        tcg_sub_temps_dead(&tcg_ctx, arg, temp_state);
+                        /* overlap_temps need to go to memory */
+                        tcg_overlap_temps_sync(&tcg_ctx, arg, temp_state);
+
                         temp_state[arg] = TS_DEAD;
                     }
 
@@ -1595,6 +1633,11 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                     for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
                         arg = args[i];
                         if (arg != TCG_CALL_DUMMY_ARG) {
+                            /* both sub_temps and overlap_temps need to go
+                               to memory */
+                            tcg_sub_temps_sync(&tcg_ctx, arg, temp_state);
+                            tcg_overlap_temps_sync(&tcg_ctx, arg, temp_state);
+
                             temp_state[arg] &= ~TS_DEAD;
                         }
                     }
@@ -1713,6 +1756,11 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                     if (temp_state[arg] & TS_MEM) {
                         arg_life |= SYNC_ARG << i;
                     }
+                    /* sub_temps are also dead */
+                    tcg_sub_temps_dead(&tcg_ctx, arg, temp_state);
+                    /* overlap_temps need to go to memory */
+                    tcg_overlap_temps_sync(&tcg_ctx, arg, temp_state);
+
                     temp_state[arg] = TS_DEAD;
                 }
 
@@ -1753,6 +1801,9 @@ static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
                 /* input arguments are live for preceding opcodes */
                 for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
                     temp_state[args[i]] &= ~TS_DEAD;
+                    /* both sub_temps and overlap_temps need to go to memory */
+                    tcg_sub_temps_sync(&tcg_ctx, args[i], temp_state);
+                    tcg_overlap_temps_sync(&tcg_ctx, args[i], temp_state);
                 }
             }
             break;
@@ -3139,3 +3190,80 @@ void tcg_register_jit(void *buf, size_t buf_size)
 {
 }
 #endif /* ELF_HOST_MACHINE */
+
+static int tcg_temp_is_sub_temp(const TCGTemp *t1, const TCGTemp *t2)
+{
+    if (t2->mem_offset < t1->mem_offset) {
+        return 0;
+    }
+    if (t2->mem_offset + tcg_temp_size(t2) >
+        t1->mem_offset + tcg_temp_size(t1)) {
+        return 0;
+    }
+    return 1;
+}
+
+static int tcg_temp_is_overlap_temp(const TCGTemp *t1, const TCGTemp *t2)
+{
+    if (t2->mem_offset >= t1->mem_offset + tcg_temp_size(t1)) {
+        return 0;
+    }
+    if (t2->mem_offset + tcg_temp_size(t2) <= t1->mem_offset) {
+        return 0;
+    }
+    return 1;
+}
+
+void tcg_detect_overlapping_temps(TCGContext *s)
+{
+    int i, j;
+    int overlap_count, subtemps_count;
+    TCGArg *sub_temps, *overlap_temps;
+    TCGTemp *ts;
+    for (i = 0; i < s->nb_globals; i++) {
+        ts = &s->temps[i];
+        if (ts->fixed_reg ||
+            ts->mem_base != &s->temps[GET_TCGV_PTR(s->tcg_env)]) {
+            continue;
+        }
+        overlap_count = 0;
+        subtemps_count = 0;
+        overlap_temps = NULL;
+        sub_temps = NULL;
+        for (j = 0; j < s->nb_globals; j++) {
+            if (i != j && !s->temps[j].fixed_reg &&
+                s->temps[j].mem_base == &s->temps[GET_TCGV_PTR(s->tcg_env)]) {
+                if (tcg_temp_is_sub_temp(ts, &s->temps[j])) {
+                    subtemps_count++;
+                } else if (tcg_temp_is_overlap_temp(ts, &s->temps[j])) {
+                    overlap_count++;
+                }
+            }
+        }
+        if (subtemps_count == 0 && overlap_count == 0) {
+            continue;
+        }
+        if (subtemps_count > 0) {
+            sub_temps = g_malloc0((subtemps_count + 1) * sizeof(TCGArg));
+            sub_temps[subtemps_count] = (TCGArg)-1;
+            tcg_temp_set_sub_temps(i, sub_temps);
+        }
+        if (overlap_count > 0) {
+            overlap_temps = g_malloc0((overlap_count + 1) * sizeof(TCGArg));
+            overlap_temps[overlap_count] = (TCGArg)-1;
+            tcg_temp_set_overlap_temps(i, overlap_temps);
+        }
+        overlap_count = 0;
+        subtemps_count = 0;
+        for (j = 0; j < s->nb_globals; j++) {
+            if (i != j && !s->temps[j].fixed_reg &&
+                s->temps[j].mem_base == &s->temps[GET_TCGV_PTR(s->tcg_env)]) {
+                if (tcg_temp_is_sub_temp(ts, &s->temps[j])) {
+                    sub_temps[subtemps_count++] = (TCGArg)j;
+                } else if (tcg_temp_is_overlap_temp(ts, &s->temps[j])) {
+                    overlap_temps[overlap_count++] = (TCGArg)j;
+                }
+            }
+        }
+    }
+}
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 0e1fbe9..01299cc 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -634,6 +634,14 @@ typedef struct TCGTemp {
     struct TCGTemp *mem_base;
     intptr_t mem_offset;
     const char *name;
+
+    /* -1 terminated array of temps that are parts of this temp.
+       All bits of them are part of this temp. */
+    const TCGArg *sub_temps;
+    /* -1 terminated array of temps that overlap with this temp.
+       Some bits of them are part of this temp, but some are not. sub_temps
+       are not included here. */
+    const TCGArg *overlap_temps;
 } TCGTemp;
 
 typedef struct TCGContext TCGContext;
@@ -837,6 +845,16 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb);
 
 void tcg_set_frame(TCGContext *s, TCGReg reg, intptr_t start, intptr_t size);
 
+static inline void tcg_temp_set_sub_temps(TCGArg temp, const TCGArg *arr)
+{
+    tcg_ctx.temps[temp].sub_temps = arr;
+}
+
+static inline void tcg_temp_set_overlap_temps(TCGArg temp, const TCGArg *arr)
+{
+    tcg_ctx.temps[temp].overlap_temps = arr;
+}
+
 int tcg_global_mem_new_internal(TCGType, TCGv_ptr, intptr_t, const char *);
 
 TCGv_i32 tcg_global_reg_new_i32(TCGReg reg, const char *name);
@@ -1382,4 +1400,6 @@ void helper_atomic_sto_be_mmu(CPUArchState *env, target_ulong addr, Int128 val,
 
 #endif /* CONFIG_ATOMIC128 */
 
+void tcg_detect_overlapping_temps(TCGContext *s);
+
 #endif /* TCG_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 08/21] tcg: add vector addition operations
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (6 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 07/21] tcg: allow globals to overlap Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 09/21] target/arm: support access to vector guest registers as globals Kirill Batuzov
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---

Support for representing a v128 addition as two v64 additions have been added.
As a result GEN_VECT_WRAPPER_HALVES macro was added. It is larger and more
complicated than original GEN_VECT_WRAPPER (which is still used for v64 additions
because they do not have half operations (v32 additions)).

GEN_VECT_WRAPPER_HALVES seems to grow fast (in size and complexity) for each
supported representation. Calling tcg_gen_add_<smaller_size> may not be desirable
because last resort fallback code is better be generated for the whole vector as
it will require less additional operations.

Some additional performance optimization can be done by creating hand written
tcg_gen_internal_<operation> for some cases (for example, add_i8x16). This function
will still operate on memory locations but will use 64 bit scalar additions with some
bit masking as Richard suggested in v1 discussion. This series is focused on
infrastructure (not on optimization of particular instructions), so I have not
included this optimization yet.

---
 tcg/tcg-op.c  |  64 ++++++++++++++++++++++
 tcg/tcg-op.h  | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg-opc.h |  12 +++++
 tcg/tcg.c     |  12 +++++
 tcg/tcg.h     |  43 +++++++++++++++
 5 files changed, 298 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 95a39b7..8a19eee 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -3038,3 +3038,67 @@ static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)
 GEN_ATOMIC_HELPER(xchg, mov2, 0)
 
 #undef GEN_ATOMIC_HELPER
+
+/* Find a memory location for 128-bit TCG variable. */
+void tcg_v128_to_ptr(TCGv_v128 tmp, TCGv_ptr base, int slot,
+                     TCGv_ptr *real_base, intptr_t *real_offset, int is_read)
+{
+    int idx = GET_TCGV_V128(tmp);
+    assert(idx >= 0 && idx < tcg_ctx.nb_temps);
+    if (idx < tcg_ctx.nb_globals) {
+        /* Globals use their locations within CPUArchState. */
+        int env = GET_TCGV_PTR(tcg_ctx.tcg_env);
+        TCGTemp *ts_env = &tcg_ctx.temps[env];
+        TCGTemp *ts_arg = &tcg_ctx.temps[idx];
+
+        /* Sanity checks: global's memory locations must be addressed
+           relative to ENV. */
+        assert(ts_env->val_type == TEMP_VAL_REG &&
+               ts_env == ts_arg->mem_base &&
+               ts_arg->mem_allocated);
+
+        *real_base = tcg_ctx.tcg_env;
+        *real_offset = ts_arg->mem_offset;
+    } else {
+        /* Temporaries use swap space in TCGContext. Since we already have
+           a 128-bit temporary we'll assume that the target supports 128-bit
+           loads and stores. */
+        *real_base = base;
+        *real_offset = slot * 16;
+        if (is_read) {
+            tcg_gen_st_v128(tmp, base, slot * 16);
+        }
+    }
+}
+
+/* Find a memory location for 64-bit vector TCG variable. */
+void tcg_v64_to_ptr(TCGv_v64 tmp, TCGv_ptr base, int slot,
+                    TCGv_ptr *real_base, intptr_t *real_offset, int is_read)
+{
+    int idx = GET_TCGV_V64(tmp);
+    assert(idx >= 0 && idx < tcg_ctx.nb_temps);
+    if (idx < tcg_ctx.nb_globals) {
+        /* Globals use their locations within CPUArchState. */
+        int env = GET_TCGV_PTR(tcg_ctx.tcg_env);
+        TCGTemp *ts_env = &tcg_ctx.temps[env];
+        TCGTemp *ts_arg = &tcg_ctx.temps[idx];
+
+        /* Sanity checks: global's memory locations must be addressed
+           relative to ENV. */
+        assert(ts_env->val_type == TEMP_VAL_REG &&
+               ts_env == ts_arg->mem_base &&
+               ts_arg->mem_allocated);
+
+        *real_base = tcg_ctx.tcg_env;
+        *real_offset = ts_arg->mem_offset;
+    } else {
+        /* Temporaries use swap space in TCGContext. Since we already have
+           a 128-bit temporary we'll assume that the target supports 128-bit
+           loads and stores. */
+        *real_base = base;
+        *real_offset = slot * 16;
+        if (is_read) {
+            tcg_gen_st_v64(tmp, base, slot * 16);
+        }
+    }
+}
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 250493b..3727be7 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1195,6 +1195,10 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
     tcg_gen_add_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), TCGV_PTR_TO_NAT(B))
 # define tcg_gen_addi_ptr(R, A, B) \
     tcg_gen_addi_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
+# define tcg_gen_mov_ptr(R, B) \
+    tcg_gen_mov_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(B))
+# define tcg_gen_movi_ptr(R, B) \
+    tcg_gen_movi_i32(TCGV_PTR_TO_NAT(R), (B))
 # define tcg_gen_ext_i32_ptr(R, A) \
     tcg_gen_mov_i32(TCGV_PTR_TO_NAT(R), (A))
 #else
@@ -1206,6 +1210,169 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, TCGMemOp);
     tcg_gen_add_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), TCGV_PTR_TO_NAT(B))
 # define tcg_gen_addi_ptr(R, A, B) \
     tcg_gen_addi_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
+# define tcg_gen_mov_ptr(R, B) \
+    tcg_gen_mov_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(B))
+# define tcg_gen_movi_ptr(R, B) \
+    tcg_gen_movi_i64(TCGV_PTR_TO_NAT(R), (B))
 # define tcg_gen_ext_i32_ptr(R, A) \
     tcg_gen_ext_i32_i64(TCGV_PTR_TO_NAT(R), (A))
 #endif /* UINTPTR_MAX == UINT32_MAX */
+
+/***************************************/
+/* 64-bit and 128-bit vector arithmetic.          */
+
+/* Find a memory location for 128-bit TCG variable. */
+void tcg_v128_to_ptr(TCGv_v128 tmp, TCGv_ptr base, int slot,
+                     TCGv_ptr *real_base, intptr_t *real_offset, int is_read);
+/* Find a memory location for 64-bit vector TCG variable. */
+void tcg_v64_to_ptr(TCGv_v64 tmp, TCGv_ptr base, int slot,
+                    TCGv_ptr *real_base, intptr_t *real_offset, int is_read);
+
+#define VTYPE(width) glue(TCG_TYPE_V, width)
+#define TEMP_TYPE(arg, temp_type) \
+            tcg_ctx.temps[glue(GET_TCGV_, temp_type)(arg)].type
+
+#define GEN_VECT_WRAPPER_HALVES(op, width, half_op, half_width, func)        \
+    static inline void glue(tcg_gen_, op)(glue(TCGv_v, width) res,           \
+                                            glue(TCGv_v, width) arg1,        \
+                                            glue(TCGv_v, width) arg2)        \
+    {                                                                        \
+        if (glue(TCG_TARGET_HAS_, op)) {                                     \
+            glue(tcg_gen_op3_v, width)(glue(INDEX_op_, op), res, arg1,       \
+                                       arg2);                                \
+        } else if (TEMP_TYPE(res, glue(V, width)) == VTYPE(half_width) &&    \
+                   glue(TCG_TARGET_HAS_, half_op)) {                         \
+            glue(TCGv_v, half_width) res_lo, res_hi, arg1_lo, arg1_hi,       \
+                                     arg2_lo, arg2_hi;                       \
+            res_lo = glue(tcg_temp_low_half_v, width)(res);                  \
+            res_hi = glue(tcg_temp_high_half_v, width)(res);                 \
+            arg1_lo = glue(tcg_temp_low_half_v, width)(arg1);                \
+            arg1_hi = glue(tcg_temp_high_half_v, width)(arg1);               \
+            arg2_lo = glue(tcg_temp_low_half_v, width)(arg2);                \
+            arg2_hi = glue(tcg_temp_high_half_v, width)(arg2);               \
+            glue(tcg_gen_op3_v, half_width)(glue(INDEX_op_, half_op),        \
+                                            res_lo, arg1_lo, arg2_lo);       \
+            glue(tcg_gen_op3_v, half_width)(glue(INDEX_op_, half_op),        \
+                                            res_hi, arg1_hi, arg2_hi);       \
+        } else {                                                             \
+            TCGv_ptr base =                                                  \
+                        MAKE_TCGV_PTR(tcg_ctx.frame_temp - tcg_ctx.temps);   \
+            TCGv_ptr t1 = tcg_temp_new_ptr();                                \
+            TCGv_ptr t2 = tcg_temp_new_ptr();                                \
+            TCGv_ptr t3 = tcg_temp_new_ptr();                                \
+            TCGv_ptr arg1p, arg2p, resp;                                     \
+            intptr_t arg1of, arg2of, resof;                                  \
+                                                                             \
+            glue(glue(tcg_v, width), _to_ptr)(arg1, base, 1,                 \
+                                            &arg1p, &arg1of, 1);             \
+            glue(glue(tcg_v, width), _to_ptr)(arg2, base, 2,                 \
+                                            &arg2p, &arg2of, 1);             \
+            glue(glue(tcg_v, width), _to_ptr)(res, base, 0, &resp, &resof,   \
+                                              0);                            \
+                                                                             \
+            tcg_gen_addi_ptr(t1, resp, resof);                               \
+            tcg_gen_addi_ptr(t2, arg1p, arg1of);                             \
+            tcg_gen_addi_ptr(t3, arg2p, arg2of);                             \
+            func(t1, t2, t3);                                                \
+                                                                             \
+            if ((intptr_t)res >= tcg_ctx.nb_globals) {                       \
+                glue(tcg_gen_ld_v, width)(res, base, 0);                     \
+            }                                                                \
+                                                                             \
+            tcg_temp_free_ptr(t1);                                           \
+            tcg_temp_free_ptr(t2);                                           \
+            tcg_temp_free_ptr(t3);                                           \
+        }                                                                    \
+    }
+
+#define GEN_VECT_WRAPPER(op, width, func)                                    \
+    static inline void glue(tcg_gen_, op)(glue(TCGv_v, width) res,           \
+                                            glue(TCGv_v, width) arg1,        \
+                                            glue(TCGv_v, width) arg2)        \
+    {                                                                        \
+        if (glue(TCG_TARGET_HAS_, op)) {                                     \
+            glue(tcg_gen_op3_v, width)(glue(INDEX_op_, op), res, arg1,       \
+                                       arg2);                                \
+        } else {                                                             \
+            TCGv_ptr base =                                                  \
+                        MAKE_TCGV_PTR(tcg_ctx.frame_temp - tcg_ctx.temps);   \
+            TCGv_ptr t1 = tcg_temp_new_ptr();                                \
+            TCGv_ptr t2 = tcg_temp_new_ptr();                                \
+            TCGv_ptr t3 = tcg_temp_new_ptr();                                \
+            TCGv_ptr arg1p, arg2p, resp;                                     \
+            intptr_t arg1of, arg2of, resof;                                  \
+                                                                             \
+            glue(glue(tcg_v, width), _to_ptr)(arg1, base, 1,                 \
+                                            &arg1p, &arg1of, 1);             \
+            glue(glue(tcg_v, width), _to_ptr)(arg2, base, 2,                 \
+                                            &arg2p, &arg2of, 1);             \
+            glue(glue(tcg_v, width), _to_ptr)(res, base, 0, &resp, &resof,   \
+                                              0);                            \
+                                                                             \
+            tcg_gen_addi_ptr(t1, resp, resof);                               \
+            tcg_gen_addi_ptr(t2, arg1p, arg1of);                             \
+            tcg_gen_addi_ptr(t3, arg2p, arg2of);                             \
+            func(t1, t2, t3);                                                \
+                                                                             \
+            if ((intptr_t)res >= tcg_ctx.nb_globals) {                       \
+                glue(tcg_gen_ld_v, width)(res, base, 0);                     \
+            }                                                                \
+                                                                             \
+            tcg_temp_free_ptr(t1);                                           \
+            tcg_temp_free_ptr(t2);                                           \
+            tcg_temp_free_ptr(t3);                                           \
+        }                                                                    \
+    }
+#define TCG_INTERNAL_OP(name, N, size, ld, st, op, type)                     \
+    static inline void glue(tcg_internal_, name)(TCGv_ptr resp,              \
+                                                 TCGv_ptr arg1p,             \
+                                                 TCGv_ptr arg2p)             \
+    {                                                                        \
+        int i;                                                               \
+        glue(TCGv_, type) tmp1, tmp2;                                        \
+                                                                             \
+        tmp1 = glue(tcg_temp_new_, type)();                                  \
+        tmp2 = glue(tcg_temp_new_, type)();                                  \
+                                                                             \
+        for (i = 0; i < N; i++) {                                            \
+            glue(tcg_gen_, ld)(tmp1, arg1p, i * size);                       \
+            glue(tcg_gen_, ld)(tmp2, arg2p, i * size);                       \
+            glue(tcg_gen_, op)(tmp1, tmp1, tmp2);                            \
+            glue(tcg_gen_, st)(tmp1, resp, i * size);                        \
+        }                                                                    \
+                                                                             \
+        glue(tcg_temp_free_, type)(tmp1);                                    \
+        glue(tcg_temp_free_, type)(tmp2);                                    \
+    }
+
+#define TCG_INTERNAL_OP_8(name, N, op) \
+    TCG_INTERNAL_OP(name, N, 1, ld8u_i32, st8_i32, op, i32)
+#define TCG_INTERNAL_OP_16(name, N, op) \
+    TCG_INTERNAL_OP(name, N, 2, ld16u_i32, st16_i32, op, i32)
+#define TCG_INTERNAL_OP_32(name, N, op) \
+    TCG_INTERNAL_OP(name, N, 4, ld_i32, st_i32, op, i32)
+#define TCG_INTERNAL_OP_64(name, N, op) \
+    TCG_INTERNAL_OP(name, N, 8, ld_i64, st_i64, op, i64)
+
+TCG_INTERNAL_OP_8(add_i8x16, 16, add_i32)
+TCG_INTERNAL_OP_16(add_i16x8, 8, add_i32)
+TCG_INTERNAL_OP_32(add_i32x4, 4, add_i32)
+TCG_INTERNAL_OP_64(add_i64x2, 2, add_i64)
+
+TCG_INTERNAL_OP_8(add_i8x8, 8, add_i32)
+TCG_INTERNAL_OP_16(add_i16x4, 4, add_i32)
+TCG_INTERNAL_OP_32(add_i32x2, 2, add_i32)
+TCG_INTERNAL_OP_64(add_i64x1, 1, add_i64)
+
+GEN_VECT_WRAPPER_HALVES(add_i8x16, 128, add_i8x8, 64, tcg_internal_add_i8x16)
+GEN_VECT_WRAPPER_HALVES(add_i16x8, 128, add_i16x4, 64, tcg_internal_add_i16x8)
+GEN_VECT_WRAPPER_HALVES(add_i32x4, 128, add_i32x2, 64, tcg_internal_add_i32x4)
+GEN_VECT_WRAPPER_HALVES(add_i64x2, 128, add_i64x1, 64, tcg_internal_add_i64x2)
+
+GEN_VECT_WRAPPER(add_i8x8, 64, tcg_internal_add_i8x8)
+GEN_VECT_WRAPPER(add_i16x4, 64, tcg_internal_add_i16x4)
+GEN_VECT_WRAPPER(add_i32x2, 64, tcg_internal_add_i32x2)
+GEN_VECT_WRAPPER(add_i64x1, 64, tcg_internal_add_i64x1)
+
+#undef VTYPE
+#undef BASE_TYPE
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 2365c97..4c8f195 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -206,6 +206,18 @@ DEF(ld_v128, 1, 1, 1, IMPL128)
 DEF(st_v64, 0, 2, 1, IMPLV64)
 DEF(ld_v64, 1, 1, 1, IMPLV64)
 
+/* 128-bit vector arith */
+DEF(add_i8x16, 1, 2, 0, IMPL128 | IMPL(TCG_TARGET_HAS_add_i8x16))
+DEF(add_i16x8, 1, 2, 0, IMPL128 | IMPL(TCG_TARGET_HAS_add_i16x8))
+DEF(add_i32x4, 1, 2, 0, IMPL128 | IMPL(TCG_TARGET_HAS_add_i32x4))
+DEF(add_i64x2, 1, 2, 0, IMPL128 | IMPL(TCG_TARGET_HAS_add_i64x2))
+
+/* 64-bit vector arith */
+DEF(add_i8x8, 1, 2, 0, IMPLV64 | IMPL(TCG_TARGET_HAS_add_i8x8))
+DEF(add_i16x4, 1, 2, 0, IMPLV64 | IMPL(TCG_TARGET_HAS_add_i16x4))
+DEF(add_i32x2, 1, 2, 0, IMPLV64 | IMPL(TCG_TARGET_HAS_add_i32x2))
+DEF(add_i64x1, 1, 2, 0, IMPLV64 | IMPL(TCG_TARGET_HAS_add_i64x1))
+
 /* QEMU specific */
 DEF(insn_start, 0, 0, TLADDR_ARGS * TARGET_INSN_START_WORDS,
     TCG_OPF_NOT_PRESENT)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a8df040..a23f739 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -712,6 +712,18 @@ TCGv_v128 tcg_temp_new_internal_v128(int temp_local)
     return MAKE_TCGV_V128(idx);
 }
 
+int tcg_temp_half_internal(int arg, TCGType type, int is_high)
+{
+    const TCGTemp *ts = &tcg_ctx.temps[arg];
+    tcg_debug_assert(ts->type != ts->base_type);
+    tcg_debug_assert(tcg_type_size(type) > tcg_type_size(ts->type));
+    tcg_debug_assert(tcg_type_size(type) <= tcg_type_size(ts->base_type));
+    if (is_high) {
+        arg += tcg_type_size(type) / tcg_type_size(ts->type) / 2;
+    }
+    return arg;
+}
+
 static void tcg_temp_free_internal(int idx)
 {
     TCGContext *s = &tcg_ctx;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 01299cc..fd43f15 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -156,6 +156,34 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_rem_i64          0
 #endif
 
+/* 64-bit vector */
+#ifndef TCG_TARGET_HAS_add_i8x8
+#define TCG_TARGET_HAS_add_i8x8         0
+#endif
+#ifndef TCG_TARGET_HAS_add_i16x4
+#define TCG_TARGET_HAS_add_i16x4        0
+#endif
+#ifndef TCG_TARGET_HAS_add_i32x2
+#define TCG_TARGET_HAS_add_i32x2        0
+#endif
+#ifndef TCG_TARGET_HAS_add_i64x1
+#define TCG_TARGET_HAS_add_i64x1        0
+#endif
+
+/* 128-bit vector */
+#ifndef TCG_TARGET_HAS_add_i8x16
+#define TCG_TARGET_HAS_add_i8x16        0
+#endif
+#ifndef TCG_TARGET_HAS_add_i16x8
+#define TCG_TARGET_HAS_add_i16x8        0
+#endif
+#ifndef TCG_TARGET_HAS_add_i32x4
+#define TCG_TARGET_HAS_add_i32x4        0
+#endif
+#ifndef TCG_TARGET_HAS_add_i64x2
+#define TCG_TARGET_HAS_add_i64x2        0
+#endif
+
 /* For 32-bit targets, some sort of unsigned widening multiply is required.  */
 #if TCG_TARGET_REG_BITS == 32 \
     && !(defined(TCG_TARGET_HAS_mulu2_i32) \
@@ -761,6 +789,7 @@ struct TCGContext {
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     void *code_gen_ptr;
+    uint8_t v128_swap[16 * 3];
 
     /* Threshold to flush the translated code buffer.  */
     void *code_gen_highwater;
@@ -938,6 +967,20 @@ static inline TCGv_v128 tcg_temp_local_new_v128(void)
     return tcg_temp_new_internal_v128(1);
 }
 
+int tcg_temp_half_internal(int arg, TCGType type, int is_high);
+
+static inline TCGv_v64 tcg_temp_low_half_v128(TCGv_v128 arg)
+{
+    int idx = tcg_temp_half_internal(GET_TCGV_V128(arg), TCG_TYPE_V128, 0);
+    return MAKE_TCGV_V64(idx);
+}
+
+static inline TCGv_v64 tcg_temp_high_half_v128(TCGv_v128 arg)
+{
+    int idx = tcg_temp_half_internal(GET_TCGV_V128(arg), TCG_TYPE_V128, 1);
+    return MAKE_TCGV_V64(idx);
+}
+
 #if defined(CONFIG_DEBUG_TCG)
 /* If you call tcg_clear_temp_count() at the start of a section of
  * code which is not supposed to leak any TCG temporaries, then
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 09/21] target/arm: support access to vector guest registers as globals
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (7 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 08/21] tcg: add vector addition operations Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction Kirill Batuzov
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

To support vector guest registers as globals we need to do two things:

1) create corresponding globals,
2) mark which globals can overlap,

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---

For vector registers I used the same coding style as was used for scalar
registers. Should I change braces placement for them all?

---
 target/arm/translate.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 493c627..d7578e2 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -65,6 +65,8 @@ static TCGv_i32 cpu_R[16];
 TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
 TCGv_i64 cpu_exclusive_addr;
 TCGv_i64 cpu_exclusive_val;
+static TCGv_v128 cpu_Q[16];
+static TCGv_v64 cpu_D[32];
 
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
@@ -72,10 +74,20 @@ static TCGv_i64 cpu_F0d, cpu_F1d;
 
 #include "exec/gen-icount.h"
 
-static const char *regnames[] =
+static const char *regnames_r[] =
     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
 
+static const char *regnames_q[] =
+    { "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7",
+      "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15" };
+
+static const char *regnames_d[] =
+    { "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",
+      "d8", "d9", "d10", "d11", "d12", "d13", "d14", "d15",
+      "d16", "d17", "d18", "d19", "d20", "d21", "d22", "d23",
+      "d24", "d25", "d26", "d27", "d28", "d29", "d30", "d31" };
+
 /* initialize TCG globals.  */
 void arm_translate_init(void)
 {
@@ -87,8 +99,22 @@ void arm_translate_init(void)
     for (i = 0; i < 16; i++) {
         cpu_R[i] = tcg_global_mem_new_i32(cpu_env,
                                           offsetof(CPUARMState, regs[i]),
-                                          regnames[i]);
+                                          regnames_r[i]);
+    }
+    for (i = 0; i < 16; i++) {
+        cpu_Q[i] = tcg_global_mem_new_v128(cpu_env,
+                                           offsetof(CPUARMState,
+                                                    vfp.regs[2 * i]),
+                                           regnames_q[i]);
     }
+    for (i = 0; i < 32; i++) {
+        cpu_D[i] = tcg_global_mem_new_v64(cpu_env,
+                                          offsetof(CPUARMState, vfp.regs[i]),
+                                          regnames_d[i]);
+    }
+
+    tcg_detect_overlapping_temps(&tcg_ctx);
+
     cpu_CF = tcg_global_mem_new_i32(cpu_env, offsetof(CPUARMState, CF), "CF");
     cpu_NF = tcg_global_mem_new_i32(cpu_env, offsetof(CPUARMState, NF), "NF");
     cpu_VF = tcg_global_mem_new_i32(cpu_env, offsetof(CPUARMState, VF), "VF");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (8 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 09/21] target/arm: support access to vector guest registers as globals Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-09 13:19   ` Philippe Mathieu-Daudé
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 11/21] tcg/i386: add support for vector opcodes Kirill Batuzov
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 target/arm/translate.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index d7578e2..90e14df 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5628,6 +5628,37 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         }
 
+        /* Use vector ops to handle what we can */
+        switch (op) {
+        case NEON_3R_VADD_VSUB:
+            if (!u) {
+                void (* const gen_add_v128[])(TCGv_v128, TCGv_v128,
+                                             TCGv_v128) = {
+                    tcg_gen_add_i8x16,
+                    tcg_gen_add_i16x8,
+                    tcg_gen_add_i32x4,
+                    tcg_gen_add_i64x2
+                };
+                void (* const gen_add_v64[])(TCGv_v64, TCGv_v64,
+                                             TCGv_v64) = {
+                    tcg_gen_add_i8x8,
+                    tcg_gen_add_i16x4,
+                    tcg_gen_add_i32x2,
+                    tcg_gen_add_i64x1
+                };
+                if (q) {
+                    gen_add_v128[size](cpu_Q[rd >> 1], cpu_Q[rn >> 1],
+                                       cpu_Q[rm >> 1]);
+                } else {
+                    gen_add_v64[size](cpu_D[rd], cpu_D[rn], cpu_D[rm]);
+                }
+                return 0;
+            }
+            break;
+        default:
+            break;
+        }
+
         for (pass = 0; pass < (q ? 4 : 2); pass++) {
 
         if (pairwise) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 11/21] tcg/i386: add support for vector opcodes
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (9 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 12/21] tcg/i386: support 64-bit vector operations Kirill Batuzov
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

To be able to generate vector operations in a TCG backend we need to do
several things.

1. We need to tell the register allocator about vector target's register.
   In case of x86 we'll use xmm0..xmm7. xmm7 is designated as a scratch
   register, others can be used by the register allocator.

2. We need a new constraint to indicate where to use vector registers. In
   this commit the 'V' constraint is introduced.

3. We need to be able to generate bare minimum: load, store and reg-to-reg
   move. MOVDQU is used for loads and stores. MOVDQA is used for reg-to-reg
   moves.

4. Finally we need to support any other opcodes we want. INDEX_op_add_i32x4
   is the only one for now. The PADDD instruction handles it perfectly.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/i386/tcg-target.h     |  34 +++++++++++++-
 tcg/i386/tcg-target.inc.c | 111 +++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 137 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 21d96ec..b0704e8 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -29,8 +29,16 @@
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 31
 
 #ifdef __x86_64__
-# define TCG_TARGET_REG_BITS  64
-# define TCG_TARGET_NB_REGS   16
+# if defined(TARGET_WORDS_BIGENDIAN) == defined(HOST_WORDS_BIGENDIAN)
+#  define TCG_TARGET_HAS_REG128 1
+# endif
+# ifdef TCG_TARGET_HAS_REG128
+#  define TCG_TARGET_REG_BITS  64
+#  define TCG_TARGET_NB_REGS   32
+# else
+#  define TCG_TARGET_REG_BITS  64
+#  define TCG_TARGET_NB_REGS   16
+# endif
 #else
 # define TCG_TARGET_REG_BITS  32
 # define TCG_TARGET_NB_REGS    8
@@ -56,6 +64,24 @@ typedef enum {
     TCG_REG_R13,
     TCG_REG_R14,
     TCG_REG_R15,
+
+    TCG_REG_XMM0,
+    TCG_REG_XMM1,
+    TCG_REG_XMM2,
+    TCG_REG_XMM3,
+    TCG_REG_XMM4,
+    TCG_REG_XMM5,
+    TCG_REG_XMM6,
+    TCG_REG_XMM7,
+    TCG_REG_XMM8,
+    TCG_REG_XMM9,
+    TCG_REG_XMM10,
+    TCG_REG_XMM11,
+    TCG_REG_XMM12,
+    TCG_REG_XMM13,
+    TCG_REG_XMM14,
+    TCG_REG_XMM15,
+
     TCG_REG_RAX = TCG_REG_EAX,
     TCG_REG_RCX = TCG_REG_ECX,
     TCG_REG_RDX = TCG_REG_EDX,
@@ -144,6 +170,10 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_mulsh_i64        0
 #endif
 
+#ifdef TCG_TARGET_HAS_REG128
+#define TCG_TARGET_HAS_add_i32x4        1
+#endif
+
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
      ((ofs) == 0 && (len) == 16))
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 5918008..3e718f3 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -32,6 +32,11 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #else
     "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
 #endif
+#ifdef TCG_TARGET_HAS_REG128
+    "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "%xmm6", "%xmm7",
+    "%xmm8", "%xmm9", "%xmm10", "%xmm11", "%xmm12", "%xmm13", "%xmm14",
+    "%xmm15",
+#endif
 };
 #endif
 
@@ -61,6 +66,24 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_EDX,
     TCG_REG_EAX,
 #endif
+#ifdef TCG_TARGET_HAS_REG128
+    TCG_REG_XMM0,
+    TCG_REG_XMM1,
+    TCG_REG_XMM2,
+    TCG_REG_XMM3,
+    TCG_REG_XMM4,
+    TCG_REG_XMM5,
+    TCG_REG_XMM6,
+/*  TCG_REG_XMM7, <- scratch register */
+    TCG_REG_XMM8,
+    TCG_REG_XMM9,
+    TCG_REG_XMM10,
+    TCG_REG_XMM11,
+    TCG_REG_XMM12,
+    TCG_REG_XMM13,
+    TCG_REG_XMM14,
+    TCG_REG_XMM15,
+#endif
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -247,6 +270,10 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'I':
         ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_I32);
         break;
+    case 'V':
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 0xff0000);
+        break;
 
     default:
         return NULL;
@@ -302,6 +329,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define P_SIMDF3        0x10000         /* 0xf3 opcode prefix */
 #define P_SIMDF2        0x20000         /* 0xf2 opcode prefix */
 
+#define P_SSE_660F      (P_DATA16 | P_EXT)
+#define P_SSE_F30F      (P_SIMDF3 | P_EXT)
+
 #define OPC_ARITH_EvIz	(0x81)
 #define OPC_ARITH_EvIb	(0x83)
 #define OPC_ARITH_GvEv	(0x03)		/* ... plus (ARITH_FOO << 3) */
@@ -357,6 +387,11 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_GRP3_Ev	(0xf7)
 #define OPC_GRP5	(0xff)
 
+#define OPC_MOVDQU_M2R  (0x6f | P_SSE_F30F)  /* store 128-bit value */
+#define OPC_MOVDQU_R2M  (0x7f | P_SSE_F30F)  /* load 128-bit value */
+#define OPC_MOVDQA_R2R  (0x6f | P_SSE_660F)  /* reg-to-reg 128-bit mov */
+#define OPC_PADDD       (0xfe | P_SSE_660F)
+
 /* Group 1 opcode extensions for 0x80-0x83.
    These are also used as modifiers for OPC_ARITH.  */
 #define ARITH_ADD 0
@@ -434,6 +469,9 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int rm, int x)
         tcg_debug_assert((opc & P_REXW) == 0);
         tcg_out8(s, 0x66);
     }
+    if (opc & P_SIMDF3) {
+        tcg_out8(s, 0xf3);
+    }
     if (opc & P_ADDR32) {
         tcg_out8(s, 0x67);
     }
@@ -650,9 +688,26 @@ static inline void tgen_arithr(TCGContext *s, int subop, int dest, int src)
 static inline void tcg_out_mov(TCGContext *s, TCGType type,
                                TCGReg ret, TCGReg arg)
 {
+    int opc;
     if (arg != ret) {
-        int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0);
-        tcg_out_modrm(s, opc, ret, arg);
+        switch (type) {
+        case TCG_TYPE_V128:
+            ret -= TCG_REG_XMM0;
+            arg -= TCG_REG_XMM0;
+            if (have_avx) {
+                tcg_out_vex_modrm(s, OPC_MOVDQA_R2R, ret, 15, arg);
+            } else {
+                tcg_out_modrm(s, OPC_MOVDQA_R2R, ret, arg);
+            }
+            break;
+        case TCG_TYPE_I32:
+        case TCG_TYPE_I64:
+            opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0);
+            tcg_out_modrm(s, opc, ret, arg);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 }
 
@@ -727,15 +782,39 @@ static inline void tcg_out_pop(TCGContext *s, int reg)
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
                               TCGReg arg1, intptr_t arg2)
 {
-    int opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0);
-    tcg_out_modrm_offset(s, opc, ret, arg1, arg2);
+    int opc;
+    switch (type) {
+    case TCG_TYPE_V128:
+        ret -= TCG_REG_XMM0;
+        tcg_out_modrm_offset(s, OPC_MOVDQU_M2R, ret, arg1, arg2);
+        break;
+    case TCG_TYPE_I32:
+    case TCG_TYPE_I64:
+        opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0);
+        tcg_out_modrm_offset(s, opc, ret, arg1, arg2);
+        break;
+    default:
+        g_assert_not_reached();
+    }
 }
 
 static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, intptr_t arg2)
 {
-    int opc = OPC_MOVL_EvGv + (type == TCG_TYPE_I64 ? P_REXW : 0);
-    tcg_out_modrm_offset(s, opc, arg, arg1, arg2);
+    int opc;
+    switch (type) {
+    case TCG_TYPE_V128:
+        arg -= TCG_REG_XMM0;
+        tcg_out_modrm_offset(s, OPC_MOVDQU_R2M, arg, arg1, arg2);
+        break;
+    case TCG_TYPE_I32:
+    case TCG_TYPE_I64:
+        opc = OPC_MOVL_EvGv + (type == TCG_TYPE_I64 ? P_REXW : 0);
+        tcg_out_modrm_offset(s, opc, arg, arg1, arg2);
+        break;
+    default:
+        g_assert_not_reached();
+    }
 }
 
 static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
@@ -1929,6 +2008,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ld_i32:
         tcg_out_ld(s, TCG_TYPE_I32, a0, a1, a2);
         break;
+    case INDEX_op_ld_v128:
+        tcg_out_ld(s, TCG_TYPE_V128, args[0], args[1], args[2]);
+        break;
 
     OP_32_64(st8):
         if (const_args[0]) {
@@ -1957,6 +2039,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_st(s, TCG_TYPE_I32, a0, a1, a2);
         }
         break;
+    case INDEX_op_st_v128:
+        tcg_out_st(s, TCG_TYPE_V128, args[0], args[1], args[2]);
+        break;
 
     OP_32_64(add):
         /* For 3-operand addition, use LEA.  */
@@ -2263,6 +2348,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_mb:
         tcg_out_mb(s, a0);
         break;
+
+    case INDEX_op_add_i32x4:
+        tcg_out_modrm(s, OPC_PADDD, args[0], args[2]);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2297,6 +2387,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "L", "L" } };
     static const TCGTargetOpDef L_L_L_L
         = { .args_ct_str = { "L", "L", "L", "L" } };
+    static const TCGTargetOpDef V_r = { .args_ct_str  = { "V", "r" } };
+    static const TCGTargetOpDef V_0_V = { .args_ct_str  = { "V", "0", "V" } };
 
     switch (op) {
     case INDEX_op_ld8u_i32:
@@ -2313,6 +2405,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_ld_i64:
         return &r_r;
 
+    case INDEX_op_ld_v128:
+    case INDEX_op_st_v128:
+        return &V_r;
+
     case INDEX_op_st8_i32:
     case INDEX_op_st8_i64:
         return &qi_r;
@@ -2495,6 +2591,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
             return &s2;
         }
 
+    case INDEX_op_add_i32x4:
+        return &V_0_V;
+
     default:
         break;
     }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 12/21] tcg/i386: support 64-bit vector operations
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (10 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 11/21] tcg/i386: add support for vector opcodes Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations Kirill Batuzov
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/i386/tcg-target.h     |  1 +
 tcg/i386/tcg-target.inc.c | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b0704e8..755ebaa 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -31,6 +31,7 @@
 #ifdef __x86_64__
 # if defined(TARGET_WORDS_BIGENDIAN) == defined(HOST_WORDS_BIGENDIAN)
 #  define TCG_TARGET_HAS_REG128 1
+#  define TCG_TARGET_HAS_REGV64 1
 # endif
 # ifdef TCG_TARGET_HAS_REG128
 #  define TCG_TARGET_REG_BITS  64
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3e718f3..208bb81 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -390,6 +390,9 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_MOVDQU_M2R  (0x6f | P_SSE_F30F)  /* store 128-bit value */
 #define OPC_MOVDQU_R2M  (0x7f | P_SSE_F30F)  /* load 128-bit value */
 #define OPC_MOVDQA_R2R  (0x6f | P_SSE_660F)  /* reg-to-reg 128-bit mov */
+#define OPC_MOVQ_M2R    (0x7e | P_SSE_F30F)
+#define OPC_MOVQ_R2M    (0xd6 | P_SSE_660F)
+#define OPC_MOVQ_R2R    (0x7e | P_SSE_F30F)
 #define OPC_PADDD       (0xfe | P_SSE_660F)
 
 /* Group 1 opcode extensions for 0x80-0x83.
@@ -700,6 +703,15 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
                 tcg_out_modrm(s, OPC_MOVDQA_R2R, ret, arg);
             }
             break;
+        case TCG_TYPE_V64:
+            ret -= TCG_REG_XMM0;
+            arg -= TCG_REG_XMM0;
+            if (have_avx) {
+                tcg_out_vex_modrm(s, OPC_MOVQ_R2R, ret, 15, arg);
+            } else {
+                tcg_out_modrm(s, OPC_MOVQ_R2R, ret, arg);
+            }
+            break;
         case TCG_TYPE_I32:
         case TCG_TYPE_I64:
             opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0);
@@ -788,6 +800,10 @@ static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
         ret -= TCG_REG_XMM0;
         tcg_out_modrm_offset(s, OPC_MOVDQU_M2R, ret, arg1, arg2);
         break;
+    case TCG_TYPE_V64:
+        ret -= TCG_REG_XMM0;
+        tcg_out_modrm_offset(s, OPC_MOVQ_M2R, ret, arg1, arg2);
+        break;
     case TCG_TYPE_I32:
     case TCG_TYPE_I64:
         opc = OPC_MOVL_GvEv + (type == TCG_TYPE_I64 ? P_REXW : 0);
@@ -807,6 +823,10 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
         arg -= TCG_REG_XMM0;
         tcg_out_modrm_offset(s, OPC_MOVDQU_R2M, arg, arg1, arg2);
         break;
+    case TCG_TYPE_V64:
+        arg -= TCG_REG_XMM0;
+        tcg_out_modrm_offset(s, OPC_MOVQ_R2M, arg, arg1, arg2);
+        break;
     case TCG_TYPE_I32:
     case TCG_TYPE_I64:
         opc = OPC_MOVL_EvGv + (type == TCG_TYPE_I64 ? P_REXW : 0);
@@ -2407,6 +2427,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_ld_v128:
     case INDEX_op_st_v128:
+    case INDEX_op_ld_v64:
+    case INDEX_op_st_v64:
         return &V_r;
 
     case INDEX_op_st8_i32:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (11 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 12/21] tcg/i386: support 64-bit vector operations Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
       [not found]   ` <2089cbe3-0e9b-fae2-0e35-224f2765dc28@amsat.org>
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 14/21] tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend Kirill Batuzov
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---

I believe checkpatch warning here to be false-positive.

---
 tcg/i386/tcg-target.h     | 10 +++++++++
 tcg/i386/tcg-target.inc.c | 54 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 755ebaa..bd6cfe1 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -172,7 +172,17 @@ extern bool have_popcnt;
 #endif
 
 #ifdef TCG_TARGET_HAS_REG128
+#define TCG_TARGET_HAS_add_i8x16        1
+#define TCG_TARGET_HAS_add_i16x8        1
 #define TCG_TARGET_HAS_add_i32x4        1
+#define TCG_TARGET_HAS_add_i64x2        1
+#endif
+
+#ifdef TCG_TARGET_HAS_REGV64
+#define TCG_TARGET_HAS_add_i8x8         1
+#define TCG_TARGET_HAS_add_i16x4        1
+#define TCG_TARGET_HAS_add_i32x2        1
+#define TCG_TARGET_HAS_add_i64x1        1
 #endif
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 208bb81..d8f0d81 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -168,6 +168,11 @@ static bool have_lzcnt;
 #else
 # define have_lzcnt 0
 #endif
+#if defined(CONFIG_CPUID_H) && defined(bit_AVX) && defined(bit_OSXSAVE)
+static bool have_avx;
+#else
+# define have_avx 0
+#endif
 
 static tcg_insn_unit *tb_ret_addr;
 
@@ -393,7 +398,10 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_MOVQ_M2R    (0x7e | P_SSE_F30F)
 #define OPC_MOVQ_R2M    (0xd6 | P_SSE_660F)
 #define OPC_MOVQ_R2R    (0x7e | P_SSE_F30F)
+#define OPC_PADDB       (0xfc | P_SSE_660F)
+#define OPC_PADDW       (0xfd | P_SSE_660F)
 #define OPC_PADDD       (0xfe | P_SSE_660F)
+#define OPC_PADDQ       (0xd4 | P_SSE_660F)
 
 /* Group 1 opcode extensions for 0x80-0x83.
    These are also used as modifiers for OPC_ARITH.  */
@@ -1963,6 +1971,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     TCGArg a0, a1, a2;
     int c, const_a2, vexop, rexw = 0;
 
+    static const int vect_binop[] = {
+        [INDEX_op_add_i8x16] = OPC_PADDB,
+        [INDEX_op_add_i16x8] = OPC_PADDW,
+        [INDEX_op_add_i32x4] = OPC_PADDD,
+        [INDEX_op_add_i64x2] = OPC_PADDQ,
+
+        [INDEX_op_add_i8x8]  = OPC_PADDB,
+        [INDEX_op_add_i16x4] = OPC_PADDW,
+        [INDEX_op_add_i32x2] = OPC_PADDD,
+        [INDEX_op_add_i64x1] = OPC_PADDQ,
+    };
+
+
 #if TCG_TARGET_REG_BITS == 64
 # define OP_32_64(x) \
         case glue(glue(INDEX_op_, x), _i64): \
@@ -1972,6 +1993,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 # define OP_32_64(x) \
         case glue(glue(INDEX_op_, x), _i32)
 #endif
+#define OP_V128_ALL(x) \
+        case glue(glue(INDEX_op_, x), _i8x16): \
+        case glue(glue(INDEX_op_, x), _i16x8): \
+        case glue(glue(INDEX_op_, x), _i32x4): \
+        case glue(glue(INDEX_op_, x), _i64x2)
+
+#define OP_V64_ALL(x) \
+        case glue(glue(INDEX_op_, x), _i8x8):  \
+        case glue(glue(INDEX_op_, x), _i16x4): \
+        case glue(glue(INDEX_op_, x), _i32x2): \
+        case glue(glue(INDEX_op_, x), _i64x1)
 
     /* Hoist the loads of the most common arguments.  */
     a0 = args[0];
@@ -2369,8 +2401,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_mb(s, a0);
         break;
 
-    case INDEX_op_add_i32x4:
-        tcg_out_modrm(s, OPC_PADDD, args[0], args[2]);
+    OP_V128_ALL(add):
+    OP_V64_ALL(add):
+        if (have_avx) {
+            tcg_out_vex_modrm(s, vect_binop[opc], args[0], args[1], args[2]);
+        } else {
+            tcg_out_modrm(s, vect_binop[opc], args[0], args[2]);
+        }
         break;
 
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
@@ -2383,6 +2420,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     }
 
 #undef OP_32_64
+#undef OP_V128_ALL
+#undef OP_V64_ALL
 }
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
@@ -2613,7 +2652,14 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
             return &s2;
         }
 
+    case INDEX_op_add_i8x16:
+    case INDEX_op_add_i16x8:
     case INDEX_op_add_i32x4:
+    case INDEX_op_add_i64x2:
+    case INDEX_op_add_i8x8:
+    case INDEX_op_add_i16x4:
+    case INDEX_op_add_i32x2:
+    case INDEX_op_add_i64x1:
         return &V_0_V;
 
     default:
@@ -2728,6 +2774,10 @@ static void tcg_target_init(TCGContext *s)
 #ifdef bit_POPCNT
         have_popcnt = (c & bit_POPCNT) != 0;
 #endif
+#if defined(bit_AVX) && defined(bit_OSXSAVE)
+        have_avx = (c & (bit_AVX | bit_OSXSAVE)) == (bit_AVX | bit_OSXSAVE);
+#endif
+
     }
 
     if (max >= 7) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 14/21] tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (12 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-05-05 13:59   ` Alex Bennée
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 15/21] target/aarch64: do not check for non-existent TCGMemOp Kirill Batuzov
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/aarch64/tcg-target.inc.c |  4 ++--
 tcg/arm/tcg-target.inc.c     |  4 ++--
 tcg/i386/tcg-target.inc.c    |  4 ++--
 tcg/mips/tcg-target.inc.c    |  4 ++--
 tcg/ppc/tcg-target.inc.c     |  4 ++--
 tcg/s390/tcg-target.inc.c    |  4 ++--
 tcg/sparc/tcg-target.inc.c   | 12 ++++++------
 tcg/tcg-op.c                 |  4 ++--
 tcg/tcg.h                    |  1 +
 9 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 6d227a5..2b0b548 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1032,7 +1032,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_LEUW] = helper_le_lduw_mmu,
     [MO_LEUL] = helper_le_ldul_mmu,
@@ -1046,7 +1046,7 @@ static void * const qemu_ld_helpers[16] = {
  *                                     uintxx_t val, TCGMemOpIdx oi,
  *                                     uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
+static void * const qemu_st_helpers[] = {
     [MO_UB]   = helper_ret_stb_mmu,
     [MO_LEUW] = helper_le_stw_mmu,
     [MO_LEUL] = helper_le_stl_mmu,
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index e75a6d4..f603f02 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1058,7 +1058,7 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_SB]   = helper_ret_ldsb_mmu,
 
@@ -1078,7 +1078,7 @@ static void * const qemu_ld_helpers[16] = {
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
+static void * const qemu_st_helpers[] = {
     [MO_UB]   = helper_ret_stb_mmu,
     [MO_LEUW] = helper_le_stw_mmu,
     [MO_LEUL] = helper_le_stl_mmu,
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index d8f0d81..263c15e 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1334,7 +1334,7 @@ static void tcg_out_nopn(TCGContext *s, int n)
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_LEUW] = helper_le_lduw_mmu,
     [MO_LEUL] = helper_le_ldul_mmu,
@@ -1347,7 +1347,7 @@ static void * const qemu_ld_helpers[16] = {
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
+static void * const qemu_st_helpers[] = {
     [MO_UB]   = helper_ret_stb_mmu,
     [MO_LEUW] = helper_le_stw_mmu,
     [MO_LEUL] = helper_le_stl_mmu,
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 01ac7b2..4f2d5d1 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -1108,7 +1108,7 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
 }
 
 #if defined(CONFIG_SOFTMMU)
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_SB]   = helper_ret_ldsb_mmu,
     [MO_LEUW] = helper_le_lduw_mmu,
@@ -1125,7 +1125,7 @@ static void * const qemu_ld_helpers[16] = {
 #endif
 };
 
-static void * const qemu_st_helpers[16] = {
+static void * const qemu_st_helpers[] = {
     [MO_UB]   = helper_ret_stb_mmu,
     [MO_LEUW] = helper_le_stw_mmu,
     [MO_LEUL] = helper_le_stl_mmu,
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 64f67d2..680050b 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -1419,7 +1419,7 @@ static const uint32_t qemu_exts_opc[4] = {
 /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
  *                                 int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_LEUW] = helper_le_lduw_mmu,
     [MO_LEUL] = helper_le_ldul_mmu,
@@ -1432,7 +1432,7 @@ static void * const qemu_ld_helpers[16] = {
 /* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
  *                                 uintxx_t val, int mmu_idx, uintptr_t ra)
  */
-static void * const qemu_st_helpers[16] = {
+static void * const qemu_st_helpers[] = {
     [MO_UB]   = helper_ret_stb_mmu,
     [MO_LEUW] = helper_le_stw_mmu,
     [MO_LEUL] = helper_le_stl_mmu,
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index a679280..ec3491a 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -309,7 +309,7 @@ static const uint8_t tcg_cond_to_ltr_cond[] = {
 };
 
 #ifdef CONFIG_SOFTMMU
-static void * const qemu_ld_helpers[16] = {
+static void * const qemu_ld_helpers[] = {
     [MO_UB]   = helper_ret_ldub_mmu,
     [MO_SB]   = helper_ret_ldsb_mmu,
     [MO_LEUW] = helper_le_lduw_mmu,
@@ -324,7 +324,7 @@ static void * const qemu_ld_helpers[16] = {
     [MO_BEQ]  = helper_be_ldq_mmu,
 };
 
-static void * const qemu_st_helpers[16] = {
+static void * const qemu_st_helpers[] = {
     [MO_UB]   = helper_ret_stb_mmu,
     [MO_LEUW] = helper_le_stw_mmu,
     [MO_LEUL] = helper_le_stl_mmu,
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index d1f4c0d..1b115d2 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -840,12 +840,12 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 }
 
 #ifdef CONFIG_SOFTMMU
-static tcg_insn_unit *qemu_ld_trampoline[16];
-static tcg_insn_unit *qemu_st_trampoline[16];
+static tcg_insn_unit *qemu_ld_trampoline[MO_ALL];
+static tcg_insn_unit *qemu_st_trampoline[MO_ALL];
 
 static void build_trampolines(TCGContext *s)
 {
-    static void * const qemu_ld_helpers[16] = {
+    static void * const qemu_ld_helpers[MO_ALL] = {
         [MO_UB]   = helper_ret_ldub_mmu,
         [MO_SB]   = helper_ret_ldsb_mmu,
         [MO_LEUW] = helper_le_lduw_mmu,
@@ -857,7 +857,7 @@ static void build_trampolines(TCGContext *s)
         [MO_BEUL] = helper_be_ldul_mmu,
         [MO_BEQ]  = helper_be_ldq_mmu,
     };
-    static void * const qemu_st_helpers[16] = {
+    static void * const qemu_st_helpers[MO_ALL] = {
         [MO_UB]   = helper_ret_stb_mmu,
         [MO_LEUW] = helper_le_stw_mmu,
         [MO_LEUL] = helper_le_stl_mmu,
@@ -870,7 +870,7 @@ static void build_trampolines(TCGContext *s)
     int i;
     TCGReg ra;
 
-    for (i = 0; i < 16; ++i) {
+    for (i = 0; i < MO_ALL; ++i) {
         if (qemu_ld_helpers[i] == NULL) {
             continue;
         }
@@ -898,7 +898,7 @@ static void build_trampolines(TCGContext *s)
         tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O7, ra);
     }
 
-    for (i = 0; i < 16; ++i) {
+    for (i = 0; i < MO_ALL; ++i) {
         if (qemu_st_helpers[i] == NULL) {
             continue;
         }
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 8a19eee..0dfe611 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2767,7 +2767,7 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64);
 # define WITH_ATOMIC64(X)
 #endif
 
-static void * const table_cmpxchg[16] = {
+static void * const table_cmpxchg[] = {
     [MO_8] = gen_helper_atomic_cmpxchgb,
     [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
     [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
@@ -2985,7 +2985,7 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
 }
 
 #define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
-static void * const table_##NAME[16] = {                                \
+static void * const table_##NAME[] = {                                  \
     [MO_8] = gen_helper_atomic_##NAME##b,                               \
     [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
     [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
diff --git a/tcg/tcg.h b/tcg/tcg.h
index fd43f15..5e0c6da 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -386,6 +386,7 @@ typedef enum TCGMemOp {
     MO_TEQ   = MO_TE | MO_Q,
 
     MO_SSIZE = MO_SIZE | MO_SIGN,
+    MO_ALL   = MO_SIZE | MO_SIGN | MO_BSWAP | MO_AMASK,
 } TCGMemOp;
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 15/21] target/aarch64: do not check for non-existent TCGMemOp
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (13 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 14/21] tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 16/21] tcg: introduce new TCGMemOp - MO_128 Kirill Batuzov
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

MO_64|MO_SIGN is not a valid TCGMemOp. This code compiles only because by
coincidence this value equals to MO_SSIGN mask defined in the same enum.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---

Bugfix which is only indirectly related to this series. Other changes of
the series exposed the problem.

---
 target/arm/translate-a64.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d0352e2..8a1f70e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -990,7 +990,6 @@ static void read_vec_element(DisasContext *s, TCGv_i64 tcg_dest, int srcidx,
         tcg_gen_ld32s_i64(tcg_dest, cpu_env, vect_off);
         break;
     case MO_64:
-    case MO_64|MO_SIGN:
         tcg_gen_ld_i64(tcg_dest, cpu_env, vect_off);
         break;
     default:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 16/21] tcg: introduce new TCGMemOp - MO_128
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (14 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 15/21] target/aarch64: do not check for non-existent TCGMemOp Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 17/21] tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes Kirill Batuzov
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/tcg.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5e0c6da..63a83f9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -306,11 +306,12 @@ typedef enum TCGMemOp {
     MO_16    = 1,
     MO_32    = 2,
     MO_64    = 3,
-    MO_SIZE  = 3,   /* Mask for the above.  */
+    MO_128   = 4,
+    MO_SIZE  = 7,   /* Mask for the above.  */
 
-    MO_SIGN  = 4,   /* Sign-extended, otherwise zero-extended.  */
+    MO_SIGN  = 8,   /* Sign-extended, otherwise zero-extended.  */
 
-    MO_BSWAP = 8,   /* Host reverse endian.  */
+    MO_BSWAP = 16,   /* Host reverse endian.  */
 #ifdef HOST_WORDS_BIGENDIAN
     MO_LE    = MO_BSWAP,
     MO_BE    = 0,
@@ -342,7 +343,7 @@ typedef enum TCGMemOp {
      * - an alignment to a specified size, which may be more or less than
      *   the access size (MO_ALIGN_x where 'x' is a size in bytes);
      */
-    MO_ASHIFT = 4,
+    MO_ASHIFT = 5,
     MO_AMASK = 7 << MO_ASHIFT,
 #ifdef ALIGNED_ONLY
     MO_ALIGN = 0,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 17/21] tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (15 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 16/21] tcg: introduce new TCGMemOp - MO_128 Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 18/21] softmmu: create helpers for vector loads Kirill Batuzov
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/i386/tcg-target.inc.c |  5 +++++
 tcg/tcg-op.c              | 24 ++++++++++++++++++++++++
 tcg/tcg-op.h              | 15 +++++++++++++++
 tcg/tcg-opc.h             |  4 ++++
 4 files changed, 48 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 263c15e..1e6edc0 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2448,6 +2448,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "L", "L", "L", "L" } };
     static const TCGTargetOpDef V_r = { .args_ct_str  = { "V", "r" } };
     static const TCGTargetOpDef V_0_V = { .args_ct_str  = { "V", "0", "V" } };
+    static const TCGTargetOpDef V_L = { .args_ct_str  = { "V", "L" } };
 
     switch (op) {
     case INDEX_op_ld8u_i32:
@@ -2662,6 +2663,10 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_add_i64x1:
         return &V_0_V;
 
+    case INDEX_op_qemu_ld_v128:
+    case INDEX_op_qemu_st_v128:
+        return &V_L;
+
     default:
         break;
     }
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 0dfe611..db74017 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -3102,3 +3102,27 @@ void tcg_v64_to_ptr(TCGv_v64 tmp, TCGv_ptr base, int slot,
         }
     }
 }
+
+void tcg_gen_qemu_ld_v128(TCGv_v128 val, TCGv addr, TCGArg idx,
+                          TCGMemOp memop)
+{
+#ifdef TCG_TARGET_HAS_REG128
+    tcg_debug_assert((memop & MO_BSWAP) == MO_TE);
+    TCGMemOpIdx oi = make_memop_idx(memop, idx);
+    tcg_gen_op3si_v128(INDEX_op_qemu_ld_v128, val, addr, oi);
+#else
+    g_assert_not_reached();
+#endif
+}
+
+void tcg_gen_qemu_st_v128(TCGv_v128 val, TCGv addr, TCGArg idx,
+                          TCGMemOp memop)
+{
+#ifdef TCG_TARGET_HAS_REG128
+    tcg_debug_assert((memop & MO_BSWAP) == MO_TE);
+    TCGMemOpIdx oi = make_memop_idx(memop, idx);
+    tcg_gen_op3si_v128(INDEX_op_qemu_st_v128, val, addr, oi);
+#else
+    g_assert_not_reached();
+#endif
+}
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 3727be7..dc1d032 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -266,6 +266,19 @@ static inline void tcg_gen_op3_v128(TCGOpcode opc, TCGv_v128 a1,
                 GET_TCGV_V128(a3));
 }
 
+static inline void tcg_gen_op3si_v128(TCGOpcode opc, TCGv_v128 a1,
+                                      TCGv a2, TCGArg a3)
+{
+#if TARGET_LONG_BITS == 64 && TCG_TARGET_REG_BITS == 32
+    tcg_gen_op4(&tcg_ctx, opc, GET_TCGV_V128(a1), GET_TCGV_I32(TCGV_LOW(a2)),
+                GET_TCGV_I32(TCGV_HIGH(a2)), a3);
+#elif TARGET_LONG_BITS == 32
+    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_V128(a1), GET_TCGV_I32(a2), a3);
+#else
+    tcg_gen_op3(&tcg_ctx, opc, GET_TCGV_V128(a1), GET_TCGV_I64(a2), a3);
+#endif
+}
+
 static inline void tcg_gen_op1_v64(TCGOpcode opc, TCGv_v64 a1)
 {
     tcg_gen_op1(&tcg_ctx, opc, GET_TCGV_V64(a1));
@@ -909,6 +922,8 @@ void tcg_gen_qemu_ld_i32(TCGv_i32, TCGv, TCGArg, TCGMemOp);
 void tcg_gen_qemu_st_i32(TCGv_i32, TCGv, TCGArg, TCGMemOp);
 void tcg_gen_qemu_ld_i64(TCGv_i64, TCGv, TCGArg, TCGMemOp);
 void tcg_gen_qemu_st_i64(TCGv_i64, TCGv, TCGArg, TCGMemOp);
+void tcg_gen_qemu_ld_v128(TCGv_v128, TCGv, TCGArg, TCGMemOp);
+void tcg_gen_qemu_st_v128(TCGv_v128, TCGv, TCGArg, TCGMemOp);
 
 static inline void tcg_gen_qemu_ld8u(TCGv ret, TCGv addr, int mem_index)
 {
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4c8f195..6c2e697 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -232,6 +232,10 @@ DEF(qemu_ld_i64, DATA64_ARGS, TLADDR_ARGS, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT)
 DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | TCG_OPF_64BIT)
+DEF(qemu_ld_v128, 1, 1, 1,
+    TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | IMPL128)
+DEF(qemu_st_v128, 0, 2, 1,
+    TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS | IMPL128)
 
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 18/21] softmmu: create helpers for vector loads
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (16 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 17/21] tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 19/21] tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops Kirill Batuzov
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 cputlb.c                  |   4 +
 softmmu_template_vector.h | 266 ++++++++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.h                 |   5 +
 3 files changed, 275 insertions(+)
 create mode 100644 softmmu_template_vector.h

diff --git a/cputlb.c b/cputlb.c
index 6c39927..41c9a01 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -660,6 +660,10 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
 #define DATA_SIZE 8
 #include "softmmu_template.h"
 
+#define SHIFT 4
+#include "softmmu_template_vector.h"
+#undef MMUSUFFIX
+
 /* First set of helpers allows passing in of OI and RETADDR.  This makes
    them callable from other helpers.  */
 
diff --git a/softmmu_template_vector.h b/softmmu_template_vector.h
new file mode 100644
index 0000000..b286d65
--- /dev/null
+++ b/softmmu_template_vector.h
@@ -0,0 +1,266 @@
+/*
+ *  Software MMU support
+ *
+ * Generate helpers used by TCG for qemu_ld/st vector ops and code
+ * load functions.
+ *
+ * Included from target op helpers and exec.c.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include "qemu/timer.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+
+#define DATA_SIZE (1 << SHIFT)
+
+#if DATA_SIZE == 16
+#define SUFFIX v128
+#else
+#error unsupported data size
+#endif
+
+
+#ifdef SOFTMMU_CODE_ACCESS
+#define READ_ACCESS_TYPE MMU_INST_FETCH
+#define ADDR_READ addr_code
+#else
+#define READ_ACCESS_TYPE MMU_DATA_LOAD
+#define ADDR_READ addr_read
+#endif
+
+#define helper_te_ld_name  glue(glue(helper_te_ld, SUFFIX), MMUSUFFIX)
+#define helper_te_st_name  glue(glue(helper_te_st, SUFFIX), MMUSUFFIX)
+
+#ifndef SOFTMMU_CODE_ACCESS
+static inline void glue(io_read, SUFFIX)(CPUArchState *env,
+                                         CPUIOTLBEntry *iotlbentry,
+                                         target_ulong addr,
+                                         uintptr_t retaddr,
+                                         uint8_t *res)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    hwaddr physaddr = iotlbentry->addr;
+    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    int i;
+
+    assert(0); /* Needs testing */
+
+    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
+    cpu->mem_io_pc = retaddr;
+    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+        cpu_io_recompile(cpu, retaddr);
+    }
+
+    cpu->mem_io_vaddr = addr;
+    for (i = 0; i < (1 << SHIFT); i += 8) {
+        memory_region_dispatch_read(mr, physaddr + i, (uint64_t *)(res + i),
+                                    8, iotlbentry->attrs);
+    }
+}
+#endif
+
+void helper_te_ld_name(CPUArchState *env, target_ulong addr,
+                       TCGMemOpIdx oi, uintptr_t retaddr, uint8_t *res)
+{
+    unsigned mmu_idx = get_mmuidx(oi);
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
+    uintptr_t haddr;
+    int i;
+
+    /* Adjust the given return address.  */
+    retaddr -= GETPC_ADJ;
+
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if ((addr & TARGET_PAGE_MASK)
+         != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
+        if ((addr & (DATA_SIZE - 1)) != 0
+            && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+            cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
+                                 mmu_idx, retaddr);
+        }
+        if (!VICTIM_TLB_HIT(ADDR_READ, addr)) {
+            tlb_fill(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
+                     mmu_idx, retaddr);
+        }
+        tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        CPUIOTLBEntry *iotlbentry;
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+        iotlbentry = &env->iotlb[mmu_idx][index];
+
+        /* ??? Note that the io helpers always read data in the target
+           byte ordering.  We should push the LE/BE request down into io.  */
+        glue(io_read, SUFFIX)(env, iotlbentry, addr, retaddr, res);
+        return ;
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                    >= TARGET_PAGE_SIZE)) {
+        target_ulong addr1, addr2;
+        uint8_t res1[DATA_SIZE * 2];
+        unsigned shift;
+    do_unaligned_access:
+        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+            cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
+                                 mmu_idx, retaddr);
+        }
+        addr1 = addr & ~(DATA_SIZE - 1);
+        addr2 = addr1 + DATA_SIZE;
+        /* Note the adjustment at the beginning of the function.
+           Undo that for the recursion.  */
+        helper_te_ld_name(env, addr1, oi, retaddr + GETPC_ADJ, res1);
+        helper_te_ld_name(env, addr2, oi, retaddr + GETPC_ADJ,
+                          res1 + DATA_SIZE);
+        shift = addr & (DATA_SIZE - 1);
+
+        for (i = 0; i < DATA_SIZE; i++) {
+            res[i] = res1[i + shift];
+        }
+        return;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+    if ((addr & (DATA_SIZE - 1)) != 0
+        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, READ_ACCESS_TYPE,
+                             mmu_idx, retaddr);
+    }
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+    for (i = 0; i < DATA_SIZE; i++) {
+        res[i] = ((uint8_t *)haddr)[i];
+    }
+}
+
+#ifndef SOFTMMU_CODE_ACCESS
+
+static inline void glue(io_write, SUFFIX)(CPUArchState *env,
+                                          CPUIOTLBEntry *iotlbentry,
+                                          uint8_t *val,
+                                          target_ulong addr,
+                                          uintptr_t retaddr)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    hwaddr physaddr = iotlbentry->addr;
+    MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    int i;
+
+    assert(0); /* Needs testing */
+
+    physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
+    if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
+        cpu_io_recompile(cpu, retaddr);
+    }
+
+    cpu->mem_io_vaddr = addr;
+    cpu->mem_io_pc = retaddr;
+    for (i = 0; i < (1 << SHIFT); i += 8) {
+        memory_region_dispatch_write(mr, physaddr + i, *(uint64_t *)(val + i),
+                                     8, iotlbentry->attrs);
+    }
+}
+
+void helper_te_st_name(CPUArchState *env, target_ulong addr, uint8_t *val,
+                       TCGMemOpIdx oi, uintptr_t retaddr)
+{
+    unsigned mmu_idx = get_mmuidx(oi);
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    uintptr_t haddr;
+    int i;
+
+    /* Adjust the given return address.  */
+    retaddr -= GETPC_ADJ;
+
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if ((addr & TARGET_PAGE_MASK)
+        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
+        if ((addr & (DATA_SIZE - 1)) != 0
+            && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                                 mmu_idx, retaddr);
+        }
+        if (!VICTIM_TLB_HIT(addr_write, addr)) {
+            tlb_fill(ENV_GET_CPU(env), addr, MMU_DATA_STORE, mmu_idx, retaddr);
+        }
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        CPUIOTLBEntry *iotlbentry;
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+        iotlbentry = &env->iotlb[mmu_idx][index];
+
+        /* ??? Note that the io helpers always read data in the target
+           byte ordering.  We should push the LE/BE request down into io.  */
+        glue(io_write, SUFFIX)(env, iotlbentry, val, addr, retaddr);
+        return;
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                     >= TARGET_PAGE_SIZE)) {
+        int i;
+    do_unaligned_access:
+        if ((get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+            cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                                 mmu_idx, retaddr);
+        }
+        /* XXX: not efficient, but simple */
+        /* Note: relies on the fact that tlb_fill() does not remove the
+         * previous page from the TLB cache.  */
+        for (i = DATA_SIZE - 1; i >= 0; i--) {
+            /* Note the adjustment at the beginning of the function.
+               Undo that for the recursion.  */
+            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val[i],
+                                            oi, retaddr + GETPC_ADJ);
+        }
+        return;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+    if ((addr & (DATA_SIZE - 1)) != 0
+        && (get_memop(oi) & MO_AMASK) == MO_ALIGN) {
+        cpu_unaligned_access(ENV_GET_CPU(env), addr, MMU_DATA_STORE,
+                             mmu_idx, retaddr);
+    }
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+    for (i = 0; i < DATA_SIZE; i++) {
+        ((uint8_t *)haddr)[i] = val[i];
+    }
+}
+
+#endif /* !defined(SOFTMMU_CODE_ACCESS) */
+
+#undef READ_ACCESS_TYPE
+#undef SHIFT
+#undef SUFFIX
+#undef DATA_SIZE
+#undef ADDR_READ
+#undef helper_te_ld_name
+#undef helper_te_st_name
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 63a83f9..8dee5c2 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1330,6 +1330,11 @@ uint32_t helper_be_ldl_cmmu(CPUArchState *env, target_ulong addr,
 uint64_t helper_be_ldq_cmmu(CPUArchState *env, target_ulong addr,
                             TCGMemOpIdx oi, uintptr_t retaddr);
 
+void helper_te_ldv128_mmu(CPUArchState *env, target_ulong addr,
+                          TCGMemOpIdx oi, uintptr_t retaddr, uint8_t *res);
+void helper_te_stv128_mmu(CPUArchState *env, target_ulong addr, uint8_t *val,
+                          TCGMemOpIdx oi, uintptr_t retaddr);
+
 /* Temporary aliases until backends are converted.  */
 #ifdef TARGET_WORDS_BIGENDIAN
 # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 19/21] tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (17 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 18/21] softmmu: create helpers for vector loads Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 20/21] target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg Kirill Batuzov
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/i386/tcg-target.inc.c | 68 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 1e6edc0..4647e97 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1342,6 +1342,7 @@ static void * const qemu_ld_helpers[] = {
     [MO_BEUW] = helper_be_lduw_mmu,
     [MO_BEUL] = helper_be_ldul_mmu,
     [MO_BEQ]  = helper_be_ldq_mmu,
+    [MO_128]  = helper_te_ldv128_mmu,
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
@@ -1355,6 +1356,7 @@ static void * const qemu_st_helpers[] = {
     [MO_BEUW] = helper_be_stw_mmu,
     [MO_BEUL] = helper_be_stl_mmu,
     [MO_BEQ]  = helper_be_stq_mmu,
+    [MO_128]  = helper_te_stv128_mmu,
 };
 
 /* Perform the TLB load and compare.
@@ -1521,12 +1523,30 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
         ofs += 4;
 
         tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
+
+        if ((opc & MO_SSIZE) == MO_128) {
+            ofs += 4;
+            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_EAX, TCG_REG_ESP);
+            tcg_out_addi(s, TCG_REG_EAX, TCG_STATIC_CALL_ARGS_SIZE - 16);
+            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_EAX, TCG_REG_ESP, ofs);
+        }
     } else {
         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
         /* The second argument is already loaded with addrlo.  */
         tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], oi);
         tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
                      (uintptr_t)l->raddr);
+        if ((opc & MO_SSIZE) == MO_128) {
+            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_EAX, TCG_REG_ESP);
+            tcg_out_addi(s, TCG_REG_EAX, TCG_STATIC_CALL_ARGS_SIZE - 16);
+            if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
+                tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[4],
+                            TCG_REG_EAX);
+            } else {
+                tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_EAX,
+                            TCG_REG_ESP, TCG_TARGET_CALL_STACK_OFFSET);
+            }
+        }
     }
 
     tcg_out_call(s, qemu_ld_helpers[opc & (MO_BSWAP | MO_SIZE)]);
@@ -1562,6 +1582,11 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
             tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
         }
         break;
+    case MO_128:
+        tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_EAX, TCG_REG_ESP);
+        tcg_out_addi(s, TCG_REG_EAX, TCG_STATIC_CALL_ARGS_SIZE - 16);
+        tcg_out_ld(s, TCG_TYPE_V128, l->datalo_reg, TCG_REG_EAX, 0);
+        break;
     default:
         tcg_abort();
     }
@@ -1601,12 +1626,20 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
             ofs += 4;
         }
 
-        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (s_bits == MO_64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
+        if (s_bits == MO_128) {
+            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_EAX, TCG_REG_ESP);
+            tcg_out_addi(s, TCG_REG_EAX, TCG_STATIC_CALL_ARGS_SIZE - 16);
+            tcg_out_st(s, TCG_TYPE_V128, l->datalo_reg, TCG_REG_EAX, 0);
+            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_EAX, TCG_REG_ESP, ofs);
             ofs += 4;
+        } else {
+            tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+
+            if (s_bits == MO_64) {
+                tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
+                ofs += 4;
+            }
         }
 
         tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
@@ -1618,8 +1651,16 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     } else {
         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
         /* The second argument is already loaded with addrlo.  */
-        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-                    tcg_target_call_iarg_regs[2], l->datalo_reg);
+        if (s_bits == MO_128) {
+            tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP);
+            tcg_out_addi(s, TCG_REG_RAX, TCG_STATIC_CALL_ARGS_SIZE - 16);
+            tcg_out_st(s, TCG_TYPE_V128, l->datalo_reg, TCG_REG_RAX, 0);
+            tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[2],
+                        TCG_REG_RAX);
+        } else {
+            tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+                        tcg_target_call_iarg_regs[2], l->datalo_reg);
+        }
         tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], oi);
 
         if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
@@ -1751,6 +1792,10 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
             }
         }
         break;
+    case MO_128:
+        tcg_out_modrm_sib_offset(s, OPC_MOVDQU_M2R + seg, datalo,
+                                 base, index, 0, ofs);
+        break;
     default:
         tcg_abort();
     }
@@ -1894,6 +1939,9 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
             tcg_out_modrm_offset(s, movop + seg, datahi, base, ofs+4);
         }
         break;
+    case MO_128:
+        tcg_out_modrm_offset(s, OPC_MOVDQU_R2M + seg, datalo, base, ofs);
+        break;
     default:
         tcg_abort();
     }
@@ -2264,12 +2312,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_qemu_ld_i64:
         tcg_out_qemu_ld(s, args, 1);
         break;
+    case INDEX_op_qemu_ld_v128:
+        tcg_out_qemu_ld(s, args, 0);
+        break;
     case INDEX_op_qemu_st_i32:
         tcg_out_qemu_st(s, args, 0);
         break;
     case INDEX_op_qemu_st_i64:
         tcg_out_qemu_st(s, args, 1);
         break;
+    case INDEX_op_qemu_st_v128:
+        tcg_out_qemu_st(s, args, 0);
+        break;
 
     OP_32_64(mulu2):
         tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_MUL, args[3]);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 20/21] target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (18 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 19/21] tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 21/21] tcg/README: update README to include information about vector opcodes Kirill Batuzov
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

ARM instruction set does not have loads to 128-bit vector register (q-regs).
Instead it can read several consecutive 64-bit vector register (d-regs)
which is used by GCC to load 128-bit registers from memory.

For vector operations to work we need to detect such loads and transform them
into 128-bit loads to 128-bit temporaries.

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 target/arm/translate.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 90e14df..5bd0b1c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4710,6 +4710,21 @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
                 tcg_gen_addi_i32(addr, addr, 1 << size);
             }
             if (size == 3) {
+#ifdef TCG_TARGET_HAS_REG128
+                if (rd % 2 == 0 && nregs == 2) {
+                    TCGv aa32addr = gen_aa32_addr(s, addr, MO_TE | MO_128);
+                    /* 128-bit load */
+                    if (load) {
+                        tcg_gen_qemu_ld_v128(cpu_Q[rd / 2], aa32addr,
+                                             get_mem_index(s), MO_TE | MO_128);
+                    } else {
+                        tcg_gen_qemu_st_v128(cpu_Q[rd / 2], aa32addr,
+                                             get_mem_index(s), MO_TE | MO_128);
+                    }
+                    tcg_temp_free(aa32addr);
+                    break;
+                }
+#endif
                 tmp64 = tcg_temp_new_i64();
                 if (load) {
                     gen_aa32_ld64(s, tmp64, addr, get_mem_index(s));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v2.1 21/21] tcg/README: update README to include information about vector opcodes
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (19 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 20/21] target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg Kirill Batuzov
@ 2017-02-02 14:34 ` Kirill Batuzov
  2017-02-02 15:25 ` [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations no-reply
  2017-02-21 12:19 ` Kirill Batuzov
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-02 14:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée,
	Kirill Batuzov

Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
---
 tcg/README | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/tcg/README b/tcg/README
index a9858c2..209dbc4 100644
--- a/tcg/README
+++ b/tcg/README
@@ -53,9 +53,18 @@ an "undefined result".
 
 TCG instructions operate on variables which are temporaries, local
 temporaries or globals. TCG instructions and variables are strongly
-typed. Two types are supported: 32 bit integers and 64 bit
-integers. Pointers are defined as an alias to 32 bit or 64 bit
-integers depending on the TCG target word size.
+typed. Several types are supported:
+
+* 32 bit integers,
+
+* 64 bit integers,
+
+* 64 bit vectors,
+
+* 128 bit vectors.
+
+Pointers are defined as an alias to 32 bit or 64 bit integers
+depending on the TCG target word size.
 
 Each instruction has a fixed number of output variable operands, input
 variable operands and always constant operands.
@@ -208,6 +217,22 @@ t0=t1%t2 (signed). Undefined behavior if division by zero or overflow.
 
 t0=t1%t2 (unsigned). Undefined behavior if division by zero.
 
+* add_i8x16 t0, t1, t2
+add_i16x8 t0, t1, t2
+add_i32x4 t0, t1, t2
+add_i64x2 t0, t1, t2
+
+t0=t1+t2 where t0, t1 and t2 are 128 bit vectors of 8, 16, 32 or 64 bit
+integers.
+
+* add_i8x8 t0, t1, t2
+add_i16x4 t0, t1, t2
+add_i32x2 t0, t1, t2
+add_i64x1 t0, t1, t2
+
+t0=t1+t2 where t0, t1 and t2 are 64 bit vectors of 8, 16, 32 or 64 bit
+integers.
+
 ********* Logical
 
 * and_i32/i64 t0, t1, t2
@@ -477,8 +502,8 @@ current TB was linked to this TB. Otherwise execute the next
 instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
 at most once with each slot index per TB.
 
-* qemu_ld_i32/i64 t0, t1, flags, memidx
-* qemu_st_i32/i64 t0, t1, flags, memidx
+* qemu_ld_i32/i64/v128 t0, t1, flags, memidx
+* qemu_st_i32/i64/v128 t0, t1, flags, memidx
 
 Load data at the guest address t1 into t0, or store data in t0 at guest
 address t1.  The _i32/_i64 size applies to the size of the input/output
@@ -488,6 +513,9 @@ and the width of the memory operation is controlled by flags.
 Both t0 and t1 may be split into little-endian ordered pairs of registers
 if dealing with 64-bit quantities on a 32-bit host.
 
+The _v128 size can only be used to read exactly 128 bit. Host and target
+are required to be of the same endianness for it to work.
+
 The memidx selects the qemu tlb index to use (e.g. user or kernel access).
 The flags are the TCGMemOp bits, selecting the sign, width, and endianness
 of the memory access.
@@ -538,6 +566,15 @@ Floating point operations are not supported in this version. A
 previous incarnation of the code generator had full support of them,
 but it is better to concentrate on integer operations first.
 
+To support vector operations, the backend must define:
+- TCG_TARGET_HAS_REGV64 for the 64 bit vector type and/or
+- TCG_TARGET_HAS_REG128 for the 128 bit vector type.
+For supported types, load and store operations must be supported. An
+arbitrary set of other vector operations may be supported. Vector operations
+that were not explicitly declared as supported (by defining
+TCG_TARGET_HAS_<operation> to 1) will never appear in the intermediate
+representation. In this case, the emulation code will be emitted instead.
+
 4.2) Constraints
 
 GCC like constraints are used to define the constraints of every
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (20 preceding siblings ...)
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 21/21] tcg/README: update README to include information about vector opcodes Kirill Batuzov
@ 2017-02-02 15:25 ` no-reply
  2017-02-21 12:19 ` Kirill Batuzov
  22 siblings, 0 replies; 28+ messages in thread
From: no-reply @ 2017-02-02 15:25 UTC (permalink / raw)
  To: batuzovk
  Cc: famz, qemu-devel, peter.maydell, crosthwaite.peter, pbonzini,
	alex.bennee, rth

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations
Message-id: 1486046099-17726-1-git-send-email-batuzovk@ispras.ru

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/1486046099-17726-1-git-send-email-batuzovk@ispras.ru -> patchew/1486046099-17726-1-git-send-email-batuzovk@ispras.ru
 * [new tag]         patchew/1486046738-26059-1-git-send-email-abologna@redhat.com -> patchew/1486046738-26059-1-git-send-email-abologna@redhat.com
Switched to a new branch 'test'
64bbc76 tcg/README: update README to include information about vector opcodes
06bc776 target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg
164b1f6 tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
c227f30 softmmu: create helpers for vector loads
084c6df tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
a9ef8cf tcg: introduce new TCGMemOp - MO_128
723589b target/aarch64: do not check for non-existent TCGMemOp
1b57606 tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
78bb60f tcg/i386: support remaining vector addition operations
a789efe tcg/i386: support 64-bit vector operations
7c67ff1 tcg/i386: add support for vector opcodes
183aaf5 target/arm: use vector opcode to handle vadd.<size> instruction
565699d target/arm: support access to vector guest registers as globals
777b055 tcg: add vector addition operations
2d56597 tcg: allow globals to overlap
188d844 tcg: use results of alias analysis in liveness analysis
8a0b599 tcg: add simple alias analysis
c8e50bc tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
6211ed3 tcg: support representing vector type with smaller vector or scalar types
98f37fb tcg: add support for 64bit vector type
8928fcf tcg: add support for 128bit vector type

=== OUTPUT BEGIN ===
Checking PATCH 1/21: tcg: add support for 128bit vector type...
Checking PATCH 2/21: tcg: add support for 64bit vector type...
Checking PATCH 3/21: tcg: support representing vector type with smaller vector or scalar types...
Checking PATCH 4/21: tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes...
Checking PATCH 5/21: tcg: add simple alias analysis...
ERROR: spaces required around that ':' (ctx:VxE)
#81: FILE: tcg/optimize.c:1472:
+        CASE_OP_32_64(movi):
                            ^

ERROR: spaces required around that ':' (ctx:VxE)
#85: FILE: tcg/optimize.c:1476:
+        CASE_OP_32_64(mov):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#90: FILE: tcg/optimize.c:1481:
+        CASE_OP_32_64(add):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#91: FILE: tcg/optimize.c:1482:
+        CASE_OP_32_64(sub):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#101: FILE: tcg/optimize.c:1492:
+        CASE_OP_32_64(ld8s):
                            ^

ERROR: spaces required around that ':' (ctx:VxE)
#102: FILE: tcg/optimize.c:1493:
+        CASE_OP_32_64(ld8u):
                            ^

ERROR: spaces required around that ':' (ctx:VxE)
#106: FILE: tcg/optimize.c:1497:
+        CASE_OP_32_64(ld16s):
                             ^

ERROR: spaces required around that ':' (ctx:VxE)
#107: FILE: tcg/optimize.c:1498:
+        CASE_OP_32_64(ld16u):
                             ^

ERROR: spaces required around that ':' (ctx:VxE)
#125: FILE: tcg/optimize.c:1516:
+        CASE_OP_32_64(st8):
                           ^

ERROR: spaces required around that ':' (ctx:VxE)
#129: FILE: tcg/optimize.c:1520:
+        CASE_OP_32_64(st16):
                            ^

total: 10 errors, 0 warnings, 196 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 6/21: tcg: use results of alias analysis in liveness analysis...
Checking PATCH 7/21: tcg: allow globals to overlap...
Checking PATCH 8/21: tcg: add vector addition operations...
Checking PATCH 9/21: target/arm: support access to vector guest registers as globals...
ERROR: that open brace { should be on the previous line
#38: FILE: target/arm/translate.c:82:
+static const char *regnames_q[] =
+    { "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7",

ERROR: that open brace { should be on the previous line
#42: FILE: target/arm/translate.c:86:
+static const char *regnames_d[] =
+    { "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7",

total: 2 errors, 0 warnings, 52 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 10/21: target/arm: use vector opcode to handle vadd.<size> instruction...
Checking PATCH 11/21: tcg/i386: add support for vector opcodes...
Checking PATCH 12/21: tcg/i386: support 64-bit vector operations...
Checking PATCH 13/21: tcg/i386: support remaining vector addition operations...
ERROR: spaces required around that ':' (ctx:VxE)
#102: FILE: tcg/i386/tcg-target.inc.c:2404:
+    OP_V128_ALL(add):
                     ^

ERROR: spaces required around that ':' (ctx:VxE)
#103: FILE: tcg/i386/tcg-target.inc.c:2405:
+    OP_V64_ALL(add):
                    ^

total: 2 errors, 0 warnings, 121 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 14/21: tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend...
Checking PATCH 15/21: target/aarch64: do not check for non-existent TCGMemOp...
Checking PATCH 16/21: tcg: introduce new TCGMemOp - MO_128...
Checking PATCH 17/21: tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes...
Checking PATCH 18/21: softmmu: create helpers for vector loads...
Checking PATCH 19/21: tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops...
Checking PATCH 20/21: target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg...
Checking PATCH 21/21: tcg/README: update README to include information about vector opcodes...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction Kirill Batuzov
@ 2017-02-09 13:19   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 28+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-02-09 13:19 UTC (permalink / raw)
  To: Kirill Batuzov, qemu-devel
  Cc: Peter Maydell, Peter Crosthwaite, Paolo Bonzini,
	Alex Bennée, Richard Henderson



On 02/02/2017 11:34 AM, Kirill Batuzov wrote:
> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
> ---
>  target/arm/translate.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index d7578e2..90e14df 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -5628,6 +5628,37 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
>              return 1;
>          }
>
> +        /* Use vector ops to handle what we can */
> +        switch (op) {
> +        case NEON_3R_VADD_VSUB:
> +            if (!u) {
> +                void (* const gen_add_v128[])(TCGv_v128, TCGv_v128,
> +                                             TCGv_v128) = {
> +                    tcg_gen_add_i8x16,
> +                    tcg_gen_add_i16x8,
> +                    tcg_gen_add_i32x4,
> +                    tcg_gen_add_i64x2
> +                };

I'd rather prefer to have gen_add_v128 'static const'.

> +                void (* const gen_add_v64[])(TCGv_v64, TCGv_v64,
> +                                             TCGv_v64) = {
> +                    tcg_gen_add_i8x8,
> +                    tcg_gen_add_i16x4,
> +                    tcg_gen_add_i32x2,
> +                    tcg_gen_add_i64x1
> +                };

same with gen_add_v64.

> +                if (q) {
> +                    gen_add_v128[size](cpu_Q[rd >> 1], cpu_Q[rn >> 1],
> +                                       cpu_Q[rm >> 1]);
> +                } else {
> +                    gen_add_v64[size](cpu_D[rd], cpu_D[rn], cpu_D[rm]);
> +                }
> +                return 0;
> +            }
> +            break;
> +        default:
> +            break;
> +        }
> +
>          for (pass = 0; pass < (q ? 4 : 2); pass++) {
>
>          if (pairwise) {
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations
  2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
                   ` (21 preceding siblings ...)
  2017-02-02 15:25 ` [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations no-reply
@ 2017-02-21 12:19 ` Kirill Batuzov
  22 siblings, 0 replies; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-21 12:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski, Alex Bennée

On Thu, 2 Feb 2017, Kirill Batuzov wrote:

> The goal of these patch series is to set up an infrastructure to emulate
> guest vector operations using host vector operations. Preliminary
> experiments show that simply translating loads and stores increases
> performance of x264 video codec by 10%. The performance of a gcc vectorized
> for loop increased 2x.
> 
> To be able to emulate guest vector operations using host vector operations,
> several things need to be done.
> 
> 1. Corresponding vector types should be added to TCG. These series add
> TCG_v128 and TCG_v64. I've made TCG_v64 a different type than TCG_i64
> because it usually needs to be allocated to different registers and
> supports different operations.
> 
> 2. Load/store operations for these new types need to be implemented.
> 
> 3. For seamless transition from current model to a new one we need to
> handle cases where memory occupied by global variable can be accessed via
> pointer to the CPUArchState structure. A very simple conservative alias
> analysis has been added to do it. This analysis tracks memory loads and
> stores that overlap with fields of CPUArchState and provides this
> information to the register allocator. The allocator then spills and
> reloads affected globals when needed.
> 
> 4. Allow overlapping globals. For scalar registers this is a rare case, and
> overlapping registers can ba handled as a single one (ah, al, ax, eax,
> rax). In ARM every Q-register consists of two D-register each consisting of
> two S-registers. Handling 4 S-registers as one because they are parts of
> the same Q-register is way too inefficient.
> 
> 5. Add new memory addressing mode to MMU code for large accesses and create
> needed helpers. Only 128-bit vectors have been handled for now.
> 
> 6. Create TCG opcodes for vector operations. Only addition has beed handled
> in these series. Each operation has a wrapper that checks if the backend
> supports the corresponding operation or not. In one case the vector opcode
> is generated, in the other the operation is emulated with scalar
> operations. The emulation code is generated inline for performance reasons
> (there is a huge performance difference between inline generation
> and calling a helper). As a positive side effect this will eventually allow
>  to merge similar emulation code for vector instructions from different
> frontends to target-independent implementation.
> 
> 7. Use new operations in the frontend (ARM was used in these series).
> 
> 8. Support new operations in the backend (x86_64 was used in these series).
> 
> For experiments I have used ARM guest on x86_64 host. I wanted some pair of
> different architectures with vector extensions both. ARM and x86_64 pair
> fits well.
> 
> v1 -> v2:
>  - represent v128 type with smaller types when it is not supported by the host
>  - detect AVX support and use AVX instructions when available
>  - tcg/README updated
>  - generate two v64 adds instead of one v128 when applicable
>  - rebased to newer master
>  - overlap detection for temps added (it needs to be explicitly called from
>    <arch>_translate_init)
>  - the stack is used to temporary store 128 bit variables to memory
>    (instead of the TCGContext field)
> 
> v2 -> v2.1
>  - automatic build failure fixed
> 
> Outstanding issues:
>  - qemu_ld_v128 and qemu_st_v128 do not generate fallback code if the host
>    does not support 128 bit registers. The reason is that I do not know how to
>    handle the host/guest different endianness (whether do we swap only bytes
>    in elements or whole vectors?). Different targets seem to have different
>    ideas on how this should be done.
>

Ping?

-- 
Kirill

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations
       [not found]     ` <32a902a1-e8c7-c2f7-ac66-148e02ee0b2d@amsat.org>
@ 2017-02-21 13:29       ` Kirill Batuzov
  2017-02-21 16:21         ` Alex Bennée
  0 siblings, 1 reply; 28+ messages in thread
From: Kirill Batuzov @ 2017-02-21 13:29 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, Peter Maydell, Peter Crosthwaite, Paolo Bonzini,
	Alex Bennée, Richard Henderson

On Tue, 21 Feb 2017, Philippe Mathieu-Daudé wrote:

> Hi Kirill,
> 
> could you check my previous comment?
>

Hi Philippe,

thank you for your comments. I've seen them and I'll apply changes you
suggested in the next version of the series. I was just hoping to get
a bit more feedback before I proceed to v3.

-- 
Kirill

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations
  2017-02-21 13:29       ` Kirill Batuzov
@ 2017-02-21 16:21         ` Alex Bennée
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2017-02-21 16:21 UTC (permalink / raw)
  To: Kirill Batuzov
  Cc: Philippe Mathieu-Daudé,
	qemu-devel, Peter Maydell, Peter Crosthwaite, Paolo Bonzini,
	Richard Henderson


Kirill Batuzov <batuzovk@ispras.ru> writes:

> On Tue, 21 Feb 2017, Philippe Mathieu-Daudé wrote:
>
>> Hi Kirill,
>>
>> could you check my previous comment?
>>
>
> Hi Philippe,
>
> thank you for your comments. I've seen them and I'll apply changes you
> suggested in the next version of the series. I was just hoping to get
> a bit more feedback before I proceed to v3.

It is on my list to look at - however I'm in a bit of a crunch getting
the MTTCG stuff prepared before code freeze as well as preparing for a
company conference. Once that's out of the way I'll have a bit more
review time!

--
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v2.1 14/21] tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend
  2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 14/21] tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend Kirill Batuzov
@ 2017-05-05 13:59   ` Alex Bennée
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Bennée @ 2017-05-05 13:59 UTC (permalink / raw)
  To: Kirill Batuzov
  Cc: qemu-devel, Richard Henderson, Paolo Bonzini, Peter Crosthwaite,
	Peter Maydell, Andrzej Zaborowski


Kirill Batuzov <batuzovk@ispras.ru> writes:

> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
> ---
>  tcg/aarch64/tcg-target.inc.c |  4 ++--
>  tcg/arm/tcg-target.inc.c     |  4 ++--
>  tcg/i386/tcg-target.inc.c    |  4 ++--
>  tcg/mips/tcg-target.inc.c    |  4 ++--
>  tcg/ppc/tcg-target.inc.c     |  4 ++--
>  tcg/s390/tcg-target.inc.c    |  4 ++--
>  tcg/sparc/tcg-target.inc.c   | 12 ++++++------
>  tcg/tcg-op.c                 |  4 ++--
>  tcg/tcg.h                    |  1 +
>  9 files changed, 21 insertions(+), 20 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 6d227a5..2b0b548 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -1032,7 +1032,7 @@ static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     TCGMemOpIdx oi, uintptr_t ra)
>   */
> -static void * const qemu_ld_helpers[16] = {
> +static void * const qemu_ld_helpers[] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_LEUW] = helper_le_lduw_mmu,
>      [MO_LEUL] = helper_le_ldul_mmu,
> @@ -1046,7 +1046,7 @@ static void * const qemu_ld_helpers[16] = {
>   *                                     uintxx_t val, TCGMemOpIdx oi,
>   *                                     uintptr_t ra)
>   */
> -static void * const qemu_st_helpers[16] = {
> +static void * const qemu_st_helpers[] = {
>      [MO_UB]   = helper_ret_stb_mmu,
>      [MO_LEUW] = helper_le_stw_mmu,
>      [MO_LEUL] = helper_le_stl_mmu,
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index e75a6d4..f603f02 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -1058,7 +1058,7 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     int mmu_idx, uintptr_t ra)
>   */
> -static void * const qemu_ld_helpers[16] = {
> +static void * const qemu_ld_helpers[] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_SB]   = helper_ret_ldsb_mmu,
>
> @@ -1078,7 +1078,7 @@ static void * const qemu_ld_helpers[16] = {
>  /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
>   *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
>   */
> -static void * const qemu_st_helpers[16] = {
> +static void * const qemu_st_helpers[] = {
>      [MO_UB]   = helper_ret_stb_mmu,
>      [MO_LEUW] = helper_le_stw_mmu,
>      [MO_LEUL] = helper_le_stl_mmu,
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index d8f0d81..263c15e 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -1334,7 +1334,7 @@ static void tcg_out_nopn(TCGContext *s, int n)
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     int mmu_idx, uintptr_t ra)
>   */
> -static void * const qemu_ld_helpers[16] = {
> +static void * const qemu_ld_helpers[] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_LEUW] = helper_le_lduw_mmu,
>      [MO_LEUL] = helper_le_ldul_mmu,
> @@ -1347,7 +1347,7 @@ static void * const qemu_ld_helpers[16] = {
>  /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
>   *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
>   */
> -static void * const qemu_st_helpers[16] = {
> +static void * const qemu_st_helpers[] = {
>      [MO_UB]   = helper_ret_stb_mmu,
>      [MO_LEUW] = helper_le_stw_mmu,
>      [MO_LEUL] = helper_le_stl_mmu,
> diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
> index 01ac7b2..4f2d5d1 100644
> --- a/tcg/mips/tcg-target.inc.c
> +++ b/tcg/mips/tcg-target.inc.c
> @@ -1108,7 +1108,7 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *arg)
>  }
>
>  #if defined(CONFIG_SOFTMMU)
> -static void * const qemu_ld_helpers[16] = {
> +static void * const qemu_ld_helpers[] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_SB]   = helper_ret_ldsb_mmu,
>      [MO_LEUW] = helper_le_lduw_mmu,
> @@ -1125,7 +1125,7 @@ static void * const qemu_ld_helpers[16] = {
>  #endif
>  };
>
> -static void * const qemu_st_helpers[16] = {
> +static void * const qemu_st_helpers[] = {
>      [MO_UB]   = helper_ret_stb_mmu,
>      [MO_LEUW] = helper_le_stw_mmu,
>      [MO_LEUL] = helper_le_stl_mmu,
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 64f67d2..680050b 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -1419,7 +1419,7 @@ static const uint32_t qemu_exts_opc[4] = {
>  /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
>   *                                 int mmu_idx, uintptr_t ra)
>   */
> -static void * const qemu_ld_helpers[16] = {
> +static void * const qemu_ld_helpers[] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_LEUW] = helper_le_lduw_mmu,
>      [MO_LEUL] = helper_le_ldul_mmu,
> @@ -1432,7 +1432,7 @@ static void * const qemu_ld_helpers[16] = {
>  /* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
>   *                                 uintxx_t val, int mmu_idx, uintptr_t ra)
>   */
> -static void * const qemu_st_helpers[16] = {
> +static void * const qemu_st_helpers[] = {
>      [MO_UB]   = helper_ret_stb_mmu,
>      [MO_LEUW] = helper_le_stw_mmu,
>      [MO_LEUL] = helper_le_stl_mmu,
> diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
> index a679280..ec3491a 100644
> --- a/tcg/s390/tcg-target.inc.c
> +++ b/tcg/s390/tcg-target.inc.c
> @@ -309,7 +309,7 @@ static const uint8_t tcg_cond_to_ltr_cond[] = {
>  };
>
>  #ifdef CONFIG_SOFTMMU
> -static void * const qemu_ld_helpers[16] = {
> +static void * const qemu_ld_helpers[] = {
>      [MO_UB]   = helper_ret_ldub_mmu,
>      [MO_SB]   = helper_ret_ldsb_mmu,
>      [MO_LEUW] = helper_le_lduw_mmu,
> @@ -324,7 +324,7 @@ static void * const qemu_ld_helpers[16] = {
>      [MO_BEQ]  = helper_be_ldq_mmu,
>  };
>
> -static void * const qemu_st_helpers[16] = {
> +static void * const qemu_st_helpers[] = {
>      [MO_UB]   = helper_ret_stb_mmu,
>      [MO_LEUW] = helper_le_stw_mmu,
>      [MO_LEUL] = helper_le_stl_mmu,
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index d1f4c0d..1b115d2 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -840,12 +840,12 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
>  }
>
>  #ifdef CONFIG_SOFTMMU
> -static tcg_insn_unit *qemu_ld_trampoline[16];
> -static tcg_insn_unit *qemu_st_trampoline[16];
> +static tcg_insn_unit *qemu_ld_trampoline[MO_ALL];
> +static tcg_insn_unit *qemu_st_trampoline[MO_ALL];

Minor merge conflict here since
709a340d679d95a0c6cbb9b5f654498f04345b50.

>
>  static void build_trampolines(TCGContext *s)
>  {
> -    static void * const qemu_ld_helpers[16] = {
> +    static void * const qemu_ld_helpers[MO_ALL] = {

Why bother bounding the array here when for the other MO_ indexed arrays
in other backends you changed to []?

>          [MO_UB]   = helper_ret_ldub_mmu,
>          [MO_SB]   = helper_ret_ldsb_mmu,
>          [MO_LEUW] = helper_le_lduw_mmu,
> @@ -857,7 +857,7 @@ static void build_trampolines(TCGContext *s)
>          [MO_BEUL] = helper_be_ldul_mmu,
>          [MO_BEQ]  = helper_be_ldq_mmu,
>      };
> -    static void * const qemu_st_helpers[16] = {
> +    static void * const qemu_st_helpers[MO_ALL] = {
>          [MO_UB]   = helper_ret_stb_mmu,
>          [MO_LEUW] = helper_le_stw_mmu,
>          [MO_LEUL] = helper_le_stl_mmu,
> @@ -870,7 +870,7 @@ static void build_trampolines(TCGContext *s)
>      int i;
>      TCGReg ra;
>
> -    for (i = 0; i < 16; ++i) {
> +    for (i = 0; i < MO_ALL; ++i) {

You could just use ARRAY_SIZE(qemu_ld_helpers) here instead.

>          if (qemu_ld_helpers[i] == NULL) {
>              continue;
>          }
> @@ -898,7 +898,7 @@ static void build_trampolines(TCGContext *s)
>          tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_O7, ra);
>      }
>
> -    for (i = 0; i < 16; ++i) {
> +    for (i = 0; i < MO_ALL; ++i) {

And ARRAY_SIZE here again.

>          if (qemu_st_helpers[i] == NULL) {
>              continue;
>          }
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 8a19eee..0dfe611 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -2767,7 +2767,7 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv, TCGv_i64);
>  # define WITH_ATOMIC64(X)
>  #endif
>
> -static void * const table_cmpxchg[16] = {
> +static void * const table_cmpxchg[] = {
>      [MO_8] = gen_helper_atomic_cmpxchgb,
>      [MO_16 | MO_LE] = gen_helper_atomic_cmpxchgw_le,
>      [MO_16 | MO_BE] = gen_helper_atomic_cmpxchgw_be,
> @@ -2985,7 +2985,7 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGv addr, TCGv_i64 val,
>  }
>
>  #define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
> -static void * const table_##NAME[16] = {                                \
> +static void * const table_##NAME[] = {                                  \
>      [MO_8] = gen_helper_atomic_##NAME##b,                               \
>      [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
>      [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index fd43f15..5e0c6da 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -386,6 +386,7 @@ typedef enum TCGMemOp {
>      MO_TEQ   = MO_TE | MO_Q,
>
>      MO_SSIZE = MO_SIZE | MO_SIGN,
> +    MO_ALL   = MO_SIZE | MO_SIGN | MO_BSWAP | MO_AMASK,
>  } TCGMemOp;
>
>  /**


--
Alex Bennée

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-05-05 13:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-02 14:34 [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 01/21] tcg: add support for 128bit vector type Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 02/21] tcg: add support for 64bit " Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 03/21] tcg: support representing vector type with smaller vector or scalar types Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 04/21] tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 05/21] tcg: add simple alias analysis Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 06/21] tcg: use results of alias analysis in liveness analysis Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 07/21] tcg: allow globals to overlap Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 08/21] tcg: add vector addition operations Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 09/21] target/arm: support access to vector guest registers as globals Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 10/21] target/arm: use vector opcode to handle vadd.<size> instruction Kirill Batuzov
2017-02-09 13:19   ` Philippe Mathieu-Daudé
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 11/21] tcg/i386: add support for vector opcodes Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 12/21] tcg/i386: support 64-bit vector operations Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 13/21] tcg/i386: support remaining vector addition operations Kirill Batuzov
     [not found]   ` <2089cbe3-0e9b-fae2-0e35-224f2765dc28@amsat.org>
     [not found]     ` <32a902a1-e8c7-c2f7-ac66-148e02ee0b2d@amsat.org>
2017-02-21 13:29       ` Kirill Batuzov
2017-02-21 16:21         ` Alex Bennée
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 14/21] tcg: do not rely on exact values of MO_BSWAP or MO_SIGN in backend Kirill Batuzov
2017-05-05 13:59   ` Alex Bennée
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 15/21] target/aarch64: do not check for non-existent TCGMemOp Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 16/21] tcg: introduce new TCGMemOp - MO_128 Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 17/21] tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 18/21] softmmu: create helpers for vector loads Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 19/21] tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 20/21] target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg Kirill Batuzov
2017-02-02 14:34 ` [Qemu-devel] [PATCH v2.1 21/21] tcg/README: update README to include information about vector opcodes Kirill Batuzov
2017-02-02 15:25 ` [Qemu-devel] [PATCH v2.1 00/20] Emulate guest vector operations with host vector operations no-reply
2017-02-21 12:19 ` Kirill Batuzov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.