All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL 00/16] tcg queued patches
@ 2019-05-22 22:28 Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 01/16] tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts Richard Henderson
                   ` (18 more replies)
  0 siblings, 19 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The following changes since commit a4f667b6714916683408b983cfe0a615a725775f:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging (2019-05-21 16:30:13 +0100)

are available in the Git repository at:

  https://github.com/rth7680/qemu.git tags/pull-tcg-20190522

for you to fetch changes up to 11e2bfef799024be4a08fcf6797fe0b22fb16b58:

  tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store (2019-05-22 15:09:43 -0400)

----------------------------------------------------------------
Misc gvec improvements

----------------------------------------------------------------
Richard Henderson (16):
      tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
      tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
      tcg: Add support for vector bitwise select
      tcg: Add support for vector compare select
      tcg: Introduce do_op3_nofail for vector expansion
      tcg: Expand vector minmax using cmp+cmpsel
      tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative
      tcg/i386: Support vector comparison select value
      tcg/i386: Remove expansion for missing minmax
      tcg/i386: Use umin/umax in expanding unsigned compare
      tcg/aarch64: Support vector bitwise select value
      tcg/aarch64: Split up is_fimm
      tcg/aarch64: Use MVNI in tcg_out_dupi_vec
      tcg/aarch64: Build vector immediates with two insns
      tcg/aarch64: Allow immediates for vector ORR and BIC
      tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store

 accel/tcg/tcg-runtime.h      |   2 +
 tcg/aarch64/tcg-target.h     |   2 +
 tcg/i386/tcg-target.h        |   2 +
 tcg/tcg-op-gvec.h            |   7 +
 tcg/tcg-op.h                 |   5 +
 tcg/tcg-opc.h                |   5 +-
 tcg/tcg.h                    |   2 +
 accel/tcg/tcg-runtime-gvec.c |  14 ++
 tcg/aarch64/tcg-target.inc.c | 371 ++++++++++++++++++++++++++++++++-----------
 tcg/i386/tcg-target.inc.c    | 169 +++++++++++++-------
 tcg/tcg-op-gvec.c            |  71 ++++++---
 tcg/tcg-op-vec.c             | 142 ++++++++++++++---
 tcg/tcg.c                    |   5 +
 tcg/README                   |  11 ++
 14 files changed, 620 insertions(+), 188 deletions(-)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 01/16] tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 02/16] tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem Richard Henderson
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register.  This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.

Also fix the 32-bit avx2, which should have used VPBROADCASTW.

Fixes: 1e262b49b533
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index aafd01cb49..b3601446cd 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -358,6 +358,7 @@ static inline int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
 #define OPC_MOVD_VyEy   (0x6e | P_EXT | P_DATA16)
 #define OPC_MOVD_EyVy   (0x7e | P_EXT | P_DATA16)
+#define OPC_MOVDDUP     (0x12 | P_EXT | P_SIMDF2)
 #define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16)
 #define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16)
 #define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3)
@@ -921,7 +922,7 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
     } else {
         switch (vece) {
         case MO_64:
-            tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSD, r, 0, base, offset);
+            tcg_out_vex_modrm_offset(s, OPC_MOVDDUP, r, 0, base, offset);
             break;
         case MO_32:
             tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSS, r, 0, base, offset);
@@ -963,12 +964,12 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
         } else if (have_avx2) {
             tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret);
         } else {
-            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD, ret);
+            tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
         }
         new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
     } else {
         if (have_avx2) {
-            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD + vex_l, ret);
+            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
         } else {
             tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
         }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 02/16] tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 01/16] tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 03/16] tcg: Add support for vector bitwise select Richard Henderson
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The paths through tcg_gen_dup_mem_vec and through MO_128 were
missing the check_size_align.  The path through MO_128 was also
missing the expand_clr.  This last was not visible because the
only user is ARM SVE, which would set oprsz == maxsz, and not
require the clear.

Fix by adding the check_size_align and using do_dup directly
instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.c | 48 ++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 338ddd9d9e..bbf70e3cd9 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1446,36 +1446,35 @@ void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t oprsz,
 void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
                           uint32_t oprsz, uint32_t maxsz)
 {
+    check_size_align(oprsz, maxsz, dofs);
     if (vece <= MO_64) {
-        TCGType type = choose_vector_type(0, vece, oprsz, 0);
+        TCGType type = choose_vector_type(NULL, vece, oprsz, 0);
         if (type != 0) {
             TCGv_vec t_vec = tcg_temp_new_vec(type);
             tcg_gen_dup_mem_vec(vece, t_vec, cpu_env, aofs);
             do_dup_store(type, dofs, oprsz, maxsz, t_vec);
             tcg_temp_free_vec(t_vec);
-            return;
+        } else if (vece <= MO_32) {
+            TCGv_i32 in = tcg_temp_new_i32();
+            switch (vece) {
+            case MO_8:
+                tcg_gen_ld8u_i32(in, cpu_env, aofs);
+                break;
+            case MO_16:
+                tcg_gen_ld16u_i32(in, cpu_env, aofs);
+                break;
+            default:
+                tcg_gen_ld_i32(in, cpu_env, aofs);
+                break;
+            }
+            do_dup(vece, dofs, oprsz, maxsz, in, NULL, 0);
+            tcg_temp_free_i32(in);
+        } else {
+            TCGv_i64 in = tcg_temp_new_i64();
+            tcg_gen_ld_i64(in, cpu_env, aofs);
+            do_dup(vece, dofs, oprsz, maxsz, NULL, in, 0);
+            tcg_temp_free_i64(in);
         }
-    }
-    if (vece <= MO_32) {
-        TCGv_i32 in = tcg_temp_new_i32();
-        switch (vece) {
-        case MO_8:
-            tcg_gen_ld8u_i32(in, cpu_env, aofs);
-            break;
-        case MO_16:
-            tcg_gen_ld16u_i32(in, cpu_env, aofs);
-            break;
-        case MO_32:
-            tcg_gen_ld_i32(in, cpu_env, aofs);
-            break;
-        }
-        tcg_gen_gvec_dup_i32(vece, dofs, oprsz, maxsz, in);
-        tcg_temp_free_i32(in);
-    } else if (vece == MO_64) {
-        TCGv_i64 in = tcg_temp_new_i64();
-        tcg_gen_ld_i64(in, cpu_env, aofs);
-        tcg_gen_gvec_dup_i64(MO_64, dofs, oprsz, maxsz, in);
-        tcg_temp_free_i64(in);
     } else {
         /* 128-bit duplicate.  */
         /* ??? Dup to 256-bit vector.  */
@@ -1504,6 +1503,9 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
             tcg_temp_free_i64(in0);
             tcg_temp_free_i64(in1);
         }
+        if (oprsz < maxsz) {
+            expand_clr(dofs + oprsz, maxsz - oprsz);
+        }
     }
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 03/16] tcg: Add support for vector bitwise select
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 01/16] tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 02/16] tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 04/16] tcg: Add support for vector compare select Richard Henderson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units.  Include gvec expanders.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h      |  2 ++
 tcg/aarch64/tcg-target.h     |  1 +
 tcg/i386/tcg-target.h        |  1 +
 tcg/tcg-op-gvec.h            |  7 +++++++
 tcg/tcg-op.h                 |  3 +++
 tcg/tcg-opc.h                |  2 ++
 tcg/tcg.h                    |  1 +
 accel/tcg/tcg-runtime-gvec.c | 14 ++++++++++++++
 tcg/tcg-op-gvec.c            | 23 +++++++++++++++++++++++
 tcg/tcg-op-vec.c             | 26 ++++++++++++++++++++++++++
 tcg/tcg.c                    |  2 ++
 tcg/README                   |  4 ++++
 12 files changed, 86 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 6d73dc2d65..4fa61b49b4 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -303,3 +303,5 @@ DEF_HELPER_FLAGS_4(gvec_leu8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_bitsel, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e43554c3c7..52ee66424f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,6 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
+#define TCG_TARGET_HAS_bitsel_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 66f16fbe3c..08a0386433 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -190,6 +190,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
+#define TCG_TARGET_HAS_bitsel_vec       0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 52a398c190..2a9e0c7c0a 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -342,6 +342,13 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
                       uint32_t aofs, uint32_t bofs,
                       uint32_t oprsz, uint32_t maxsz);
 
+/*
+ * Perform vector bit select: d = (b & a) | (c & ~a).
+ */
+void tcg_gen_gvec_bitsel(unsigned vece, uint32_t dofs, uint32_t aofs,
+                         uint32_t bofs, uint32_t cofs,
+                         uint32_t oprsz, uint32_t maxsz);
+
 /*
  * 64-bit vector operations.  Use these when the register has been allocated
  * with tcg_global_mem_new_i64, and so we cannot also address it via pointer.
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 660fe205d0..268860ed2f 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1000,6 +1000,9 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
                      TCGv_vec a, TCGv_vec b);
 
+void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
+                        TCGv_vec b, TCGv_vec c);
+
 void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a2dd116eb..c05b71427c 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -256,6 +256,8 @@ DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
 
+DEF(bitsel_vec, 1, 3, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_bitsel_vec))
+
 DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
 #if TCG_TARGET_MAYBE_vec
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 0e01a70d66..72f9f6c70b 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -187,6 +187,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_mul_vec          0
 #define TCG_TARGET_HAS_sat_vec          0
 #define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       0
 #else
 #define TCG_TARGET_MAYBE_vec            1
 #endif
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 0f09e0ef38..3b6052fe97 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1444,3 +1444,17 @@ void HELPER(gvec_umax64)(void *d, void *a, void *b, uint32_t desc)
     }
     clear_high(d, oprsz, desc);
 }
+
+void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
+{
+    intptr_t oprsz = simd_oprsz(desc);
+    intptr_t i;
+
+    for (i = 0; i < oprsz; i += sizeof(vec64)) {
+        vec64 aa = *(vec64 *)(a + i);
+        vec64 bb = *(vec64 *)(b + i);
+        vec64 cc = *(vec64 *)(c + i);
+        *(vec64 *)(d + i) = (bb & aa) | (cc & ~aa);
+    }
+    clear_high(d, oprsz, desc);
+}
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index bbf70e3cd9..f18464cf07 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3195,3 +3195,26 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
         expand_clr(dofs + oprsz, maxsz - oprsz);
     }
 }
+
+static void tcg_gen_bitsel_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c)
+{
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_and_i64(t, b, a);
+    tcg_gen_andc_i64(d, c, a);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+void tcg_gen_gvec_bitsel(unsigned vece, uint32_t dofs, uint32_t aofs,
+                         uint32_t bofs, uint32_t cofs,
+                         uint32_t oprsz, uint32_t maxsz)
+{
+    static const GVecGen4 g = {
+        .fni8 = tcg_gen_bitsel_i64,
+        .fniv = tcg_gen_bitsel_vec,
+        .fno = gen_helper_gvec_bitsel,
+    };
+
+    tcg_gen_gvec_4(dofs, aofs, bofs, cofs, oprsz, maxsz, &g);
+}
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 543508d545..99cbf29e0b 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -88,6 +88,7 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
         case INDEX_op_dup2_vec:
         case INDEX_op_ld_vec:
         case INDEX_op_st_vec:
+        case INDEX_op_bitsel_vec:
             /* These opcodes are mandatory and should not be listed.  */
             g_assert_not_reached();
         default:
@@ -691,3 +692,28 @@ void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
     do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec);
 }
+
+void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
+                        TCGv_vec b, TCGv_vec c)
+{
+    TCGTemp *rt = tcgv_vec_temp(r);
+    TCGTemp *at = tcgv_vec_temp(a);
+    TCGTemp *bt = tcgv_vec_temp(b);
+    TCGTemp *ct = tcgv_vec_temp(c);
+    TCGType type = rt->base_type;
+
+    tcg_debug_assert(at->base_type >= type);
+    tcg_debug_assert(bt->base_type >= type);
+    tcg_debug_assert(ct->base_type >= type);
+
+    if (TCG_TARGET_HAS_bitsel_vec) {
+        vec_gen_4(INDEX_op_bitsel_vec, type, MO_8,
+                  temp_arg(rt), temp_arg(at), temp_arg(bt), temp_arg(ct));
+    } else {
+        TCGv_vec t = tcg_temp_new_vec(type);
+        tcg_gen_and_vec(MO_8, t, a, b);
+        tcg_gen_andc_vec(MO_8, r, c, a);
+        tcg_gen_or_vec(MO_8, r, r, t);
+        tcg_temp_free_vec(t);
+    }
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 24083b8c00..5d947dbcb0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1646,6 +1646,8 @@ bool tcg_op_supported(TCGOpcode op)
     case INDEX_op_smax_vec:
     case INDEX_op_umax_vec:
         return have_vec && TCG_TARGET_HAS_minmax_vec;
+    case INDEX_op_bitsel_vec:
+        return have_vec && TCG_TARGET_HAS_bitsel_vec;
 
     default:
         tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS);
diff --git a/tcg/README b/tcg/README
index cbdfd3b6bc..76057ab59f 100644
--- a/tcg/README
+++ b/tcg/README
@@ -627,6 +627,10 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
 
   Compare vectors by element, storing -1 for true and 0 for false.
 
+* bitsel_vec v0, v1, v2, v3
+
+  Bitwise select, v0 = (v2 & v1) | (v3 & ~v1), across the entire vector.
+
 *********
 
 Note 1: Some shortcuts are defined when the last operand is known to be
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 04/16] tcg: Add support for vector compare select
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (2 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 03/16] tcg: Add support for vector bitwise select Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 05/16] tcg: Introduce do_op3_nofail for vector expansion Richard Henderson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Perform a per-element conditional move.  This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h    |  1 +
 tcg/tcg-op.h             |  2 ++
 tcg/tcg-opc.h            |  1 +
 tcg/tcg.h                |  1 +
 tcg/tcg-op-vec.c         | 59 ++++++++++++++++++++++++++++++++++++++++
 tcg/tcg.c                |  3 ++
 tcg/README               |  7 +++++
 8 files changed, 75 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 52ee66424f..b4a9d36bbc 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -141,6 +141,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP     1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 08a0386433..16a83a7f7b 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -191,6 +191,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 268860ed2f..2d4dd5cd7d 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1002,6 +1002,8 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
 
 void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
                         TCGv_vec b, TCGv_vec c);
+void tcg_gen_cmpsel_vec(TCGCond cond, unsigned vece, TCGv_vec r,
+                        TCGv_vec a, TCGv_vec b, TCGv_vec c, TCGv_vec d);
 
 void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index c05b71427c..c7d971fa3d 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -257,6 +257,7 @@ DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
 
 DEF(bitsel_vec, 1, 3, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_bitsel_vec))
+DEF(cmpsel_vec, 1, 4, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_cmpsel_vec))
 
 DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 72f9f6c70b..21cd6f1249 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -188,6 +188,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_sat_vec          0
 #define TCG_TARGET_HAS_minmax_vec       0
 #define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
 #else
 #define TCG_TARGET_MAYBE_vec            1
 #endif
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 99cbf29e0b..a888c02df8 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -119,6 +119,11 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
                 continue;
             }
             break;
+        case INDEX_op_cmpsel_vec:
+            if (tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece)) {
+                continue;
+            }
+            break;
         default:
             break;
         }
@@ -159,6 +164,20 @@ void vec_gen_4(TCGOpcode opc, TCGType type, unsigned vece,
     op->args[3] = c;
 }
 
+static void vec_gen_6(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r,
+                      TCGArg a, TCGArg b, TCGArg c, TCGArg d, TCGArg e)
+{
+    TCGOp *op = tcg_emit_op(opc);
+    TCGOP_VECL(op) = type - TCG_TYPE_V64;
+    TCGOP_VECE(op) = vece;
+    op->args[0] = r;
+    op->args[1] = a;
+    op->args[2] = b;
+    op->args[3] = c;
+    op->args[4] = d;
+    op->args[5] = e;
+}
+
 static void vec_gen_op2(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a)
 {
     TCGTemp *rt = tcgv_vec_temp(r);
@@ -717,3 +736,43 @@ void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
         tcg_temp_free_vec(t);
     }
 }
+
+void tcg_gen_cmpsel_vec(TCGCond cond, unsigned vece, TCGv_vec r,
+                        TCGv_vec a, TCGv_vec b, TCGv_vec c, TCGv_vec d)
+{
+    TCGTemp *rt = tcgv_vec_temp(r);
+    TCGTemp *at = tcgv_vec_temp(a);
+    TCGTemp *bt = tcgv_vec_temp(b);
+    TCGTemp *ct = tcgv_vec_temp(c);
+    TCGTemp *dt = tcgv_vec_temp(d);
+    TCGArg ri = temp_arg(rt);
+    TCGArg ai = temp_arg(at);
+    TCGArg bi = temp_arg(bt);
+    TCGArg ci = temp_arg(ct);
+    TCGArg di = temp_arg(dt);
+    TCGType type = rt->base_type;
+    const TCGOpcode *hold_list;
+    int can;
+
+    tcg_debug_assert(at->base_type >= type);
+    tcg_debug_assert(bt->base_type >= type);
+    tcg_debug_assert(ct->base_type >= type);
+    tcg_debug_assert(dt->base_type >= type);
+
+    tcg_assert_listed_vecop(INDEX_op_cmpsel_vec);
+    hold_list = tcg_swap_vecop_list(NULL);
+    can = tcg_can_emit_vec_op(INDEX_op_cmpsel_vec, type, vece);
+
+    if (can > 0) {
+        vec_gen_6(INDEX_op_cmpsel_vec, type, vece, ri, ai, bi, ci, di, cond);
+    } else if (can < 0) {
+        tcg_expand_vec_op(INDEX_op_cmpsel_vec, type, vece,
+                          ri, ai, bi, ci, di, cond);
+    } else {
+        TCGv_vec t = tcg_temp_new_vec(type);
+        tcg_gen_cmp_vec(cond, vece, t, a, b);
+        tcg_gen_bitsel_vec(vece, r, t, c, d);
+        tcg_temp_free_vec(t);
+    }
+    tcg_swap_vecop_list(hold_list);
+}
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5d947dbcb0..02a2680169 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1648,6 +1648,8 @@ bool tcg_op_supported(TCGOpcode op)
         return have_vec && TCG_TARGET_HAS_minmax_vec;
     case INDEX_op_bitsel_vec:
         return have_vec && TCG_TARGET_HAS_bitsel_vec;
+    case INDEX_op_cmpsel_vec:
+        return have_vec && TCG_TARGET_HAS_cmpsel_vec;
 
     default:
         tcg_debug_assert(op > INDEX_op_last_generic && op < NB_OPS);
@@ -2028,6 +2030,7 @@ static void tcg_dump_ops(TCGContext *s, bool have_prefs)
             case INDEX_op_setcond_i64:
             case INDEX_op_movcond_i64:
             case INDEX_op_cmp_vec:
+            case INDEX_op_cmpsel_vec:
                 if (op->args[k] < ARRAY_SIZE(cond_name)
                     && cond_name[op->args[k]]) {
                     col += qemu_log(",%s", cond_name[op->args[k++]]);
diff --git a/tcg/README b/tcg/README
index 76057ab59f..21fcdf737f 100644
--- a/tcg/README
+++ b/tcg/README
@@ -631,6 +631,13 @@ E.g. VECL=1 -> 64 << 1 -> v128, and VECE=2 -> 1 << 2 -> i32.
 
   Bitwise select, v0 = (v2 & v1) | (v3 & ~v1), across the entire vector.
 
+* cmpsel_vec v0, c1, c2, v3, v4, cond
+
+  Select elements based on comparison results:
+  for (i = 0; i < n; ++i) {
+    v0[i] = (c1[i] cond c2[i]) ? v3[i] : v4[i].
+  }
+
 *********
 
 Note 1: Some shortcuts are defined when the last operand is known to be
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 05/16] tcg: Introduce do_op3_nofail for vector expansion
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (3 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 04/16] tcg: Add support for vector compare select Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 06/16] tcg: Expand vector minmax using cmp+cmpsel Richard Henderson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This makes do_op3 match do_op2 in allowing for failure,
and thus fall back expansions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-vec.c | 45 +++++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index a888c02df8..004a34935b 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -562,7 +562,7 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
     }
 }
 
-static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
+static bool do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
                    TCGv_vec b, TCGOpcode opc)
 {
     TCGTemp *rt = tcgv_vec_temp(r);
@@ -580,82 +580,91 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
     can = tcg_can_emit_vec_op(opc, type, vece);
     if (can > 0) {
         vec_gen_3(opc, type, vece, ri, ai, bi);
-    } else {
+    } else if (can < 0) {
         const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
-        tcg_debug_assert(can < 0);
         tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
         tcg_swap_vecop_list(hold_list);
+    } else {
+        return false;
     }
+    return true;
+}
+
+static void do_op3_nofail(unsigned vece, TCGv_vec r, TCGv_vec a,
+                          TCGv_vec b, TCGOpcode opc)
+{
+    bool ok = do_op3(vece, r, a, b, opc);
+    tcg_debug_assert(ok);
 }
 
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_add_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_add_vec);
 }
 
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_sub_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_sub_vec);
 }
 
 void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_mul_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_mul_vec);
 }
 
 void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_ssadd_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_ssadd_vec);
 }
 
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_usadd_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_usadd_vec);
 }
 
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_sssub_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_sssub_vec);
 }
 
 void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_ussub_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_smin_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_umin_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_smax_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_umax_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_shlv_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_shlv_vec);
 }
 
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_shrv_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_shrv_vec);
 }
 
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3(vece, r, a, b, INDEX_op_sarv_vec);
+    do_op3_nofail(vece, r, a, b, INDEX_op_sarv_vec);
 }
 
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
@@ -691,7 +700,7 @@ static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
         } else {
             tcg_gen_dup_i32_vec(vece, vec_s, s);
         }
-        do_op3(vece, r, a, vec_s, opc_v);
+        do_op3_nofail(vece, r, a, vec_s, opc_v);
         tcg_temp_free_vec(vec_s);
     }
     tcg_swap_vecop_list(hold_list);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 06/16] tcg: Expand vector minmax using cmp+cmpsel
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (4 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 05/16] tcg: Introduce do_op3_nofail for vector expansion Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 07/16] tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative Richard Henderson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Provide a generic fallback for the min/max operations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-vec.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 004a34935b..501d9630a2 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -120,6 +120,10 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
             }
             break;
         case INDEX_op_cmpsel_vec:
+        case INDEX_op_smin_vec:
+        case INDEX_op_smax_vec:
+        case INDEX_op_umin_vec:
+        case INDEX_op_umax_vec:
             if (tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece)) {
                 continue;
             }
@@ -632,24 +636,32 @@ void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
     do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
+static void do_minmax(unsigned vece, TCGv_vec r, TCGv_vec a,
+                      TCGv_vec b, TCGOpcode opc, TCGCond cond)
+{
+    if (!do_op3(vece, r, a, b, opc)) {
+        tcg_gen_cmpsel_vec(cond, vece, r, a, b, a, b);
+    }
+}
+
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
+    do_minmax(vece, r, a, b, INDEX_op_smin_vec, TCG_COND_LT);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
+    do_minmax(vece, r, a, b, INDEX_op_umin_vec, TCG_COND_LTU);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
+    do_minmax(vece, r, a, b, INDEX_op_smax_vec, TCG_COND_GT);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-    do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
+    do_minmax(vece, r, a, b, INDEX_op_umax_vec, TCG_COND_GTU);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 07/16] tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (5 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 06/16] tcg: Expand vector minmax using cmp+cmpsel Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value Richard Henderson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

If INDEX_op_foo is always expanded by tcg_expand_vec_op, then
there may be no reasonable set of constraints to return from
tcg_target_op_def for that opcode.

Let TCG_TARGET_HAS_foo be specified as -1 in that case.  Thus a
boolean test for TCG_TARGET_HAS_foo is true, but we will not
assert within process_op_defs when no constraints are specified.

Compare this with tcg_can_emit_vec_op, which already uses this
tri-state indication.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-opc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index c7d971fa3d..242d608e6d 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -35,7 +35,7 @@ DEF(call, 0, 0, 3, TCG_OPF_CALL_CLOBBER | TCG_OPF_NOT_PRESENT)
 
 DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 
-#define IMPL(X) (__builtin_constant_p(X) && !(X) ? TCG_OPF_NOT_PRESENT : 0)
+#define IMPL(X) (__builtin_constant_p(X) && (X) <= 0 ? TCG_OPF_NOT_PRESENT : 0)
 #if TCG_TARGET_REG_BITS == 32
 # define IMPL64  TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT
 #else
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (6 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 07/16] tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-30 11:26   ` Peter Maydell
  2019-05-22 22:28 ` [Qemu-devel] [PULL 09/16] tcg/i386: Remove expansion for missing minmax Richard Henderson
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

We already had backend support for this feature.  Expand the new
cmpsel opcode using vpblendb.  The combination allows us to avoid
an extra NOT for some comparison codes.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 39 +++++++++++++++++++++++++++++++++++----
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 16a83a7f7b..928e8b87bb 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -191,7 +191,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
-#define TCG_TARGET_HAS_cmpsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       -1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index b3601446cd..ffcafb1e14 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3246,6 +3246,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_andc_vec:
         return 1;
     case INDEX_op_cmp_vec:
+    case INDEX_op_cmpsel_vec:
         return -1;
 
     case INDEX_op_shli_vec:
@@ -3464,8 +3465,8 @@ static void expand_vec_mul(TCGType type, unsigned vece,
     }
 }
 
-static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
-                           TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
+                                 TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 {
     enum {
         NEED_SWAP = 1,
@@ -3522,11 +3523,34 @@ static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
             tcg_temp_free_vec(t2);
         }
     }
-    if (fixup & NEED_INV) {
+    return fixup & NEED_INV;
+}
+
+static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
+                           TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+{
+    if (expand_vec_cmp_noinv(type, vece, v0, v1, v2, cond)) {
         tcg_gen_not_vec(vece, v0, v0);
     }
 }
 
+static void expand_vec_cmpsel(TCGType type, unsigned vece, TCGv_vec v0,
+                              TCGv_vec c1, TCGv_vec c2,
+                              TCGv_vec v3, TCGv_vec v4, TCGCond cond)
+{
+    TCGv_vec t = tcg_temp_new_vec(type);
+
+    if (expand_vec_cmp_noinv(type, vece, t, c1, c2, cond)) {
+        /* Invert the sense of the compare by swapping arguments.  */
+        TCGv_vec x;
+        x = v3, v3 = v4, v4 = x;
+    }
+    vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece,
+              tcgv_vec_arg(v0), tcgv_vec_arg(v4),
+              tcgv_vec_arg(v3), tcgv_vec_arg(t));
+    tcg_temp_free_vec(t);
+}
+
 static void expand_vec_minmax(TCGType type, unsigned vece,
                               TCGCond cond, bool min,
                               TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
@@ -3551,7 +3575,7 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 {
     va_list va;
     TCGArg a2;
-    TCGv_vec v0, v1, v2;
+    TCGv_vec v0, v1, v2, v3, v4;
 
     va_start(va, a0);
     v0 = temp_tcgv_vec(arg_temp(a0));
@@ -3578,6 +3602,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
         expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
         break;
 
+    case INDEX_op_cmpsel_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        v3 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+        v4 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+        expand_vec_cmpsel(type, vece, v0, v1, v2, v3, v4, va_arg(va, TCGArg));
+        break;
+
     case INDEX_op_smin_vec:
         v2 = temp_tcgv_vec(arg_temp(a2));
         expand_vec_minmax(type, vece, TCG_COND_GT, true, v0, v1, v2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 09/16] tcg/i386: Remove expansion for missing minmax
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (7 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 10/16] tcg/i386: Use umin/umax in expanding unsigned compare Richard Henderson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This is now handled by code within tcg-op-vec.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 37 -------------------------------------
 1 file changed, 37 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index ffcafb1e14..569a2c2120 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3297,7 +3297,6 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_smax_vec:
     case INDEX_op_umin_vec:
     case INDEX_op_umax_vec:
-        return vece <= MO_32 ? 1 : -1;
     case INDEX_op_abs_vec:
         return vece <= MO_32;
 
@@ -3551,25 +3550,6 @@ static void expand_vec_cmpsel(TCGType type, unsigned vece, TCGv_vec v0,
     tcg_temp_free_vec(t);
 }
 
-static void expand_vec_minmax(TCGType type, unsigned vece,
-                              TCGCond cond, bool min,
-                              TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
-{
-    TCGv_vec t1 = tcg_temp_new_vec(type);
-
-    tcg_debug_assert(vece == MO_64);
-
-    tcg_gen_cmp_vec(cond, vece, t1, v1, v2);
-    if (min) {
-        TCGv_vec t2;
-        t2 = v1, v1 = v2, v2 = t2;
-    }
-    vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece,
-              tcgv_vec_arg(v0), tcgv_vec_arg(v1),
-              tcgv_vec_arg(v2), tcgv_vec_arg(t1));
-    tcg_temp_free_vec(t1);
-}
-
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
@@ -3609,23 +3589,6 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
         expand_vec_cmpsel(type, vece, v0, v1, v2, v3, v4, va_arg(va, TCGArg));
         break;
 
-    case INDEX_op_smin_vec:
-        v2 = temp_tcgv_vec(arg_temp(a2));
-        expand_vec_minmax(type, vece, TCG_COND_GT, true, v0, v1, v2);
-        break;
-    case INDEX_op_smax_vec:
-        v2 = temp_tcgv_vec(arg_temp(a2));
-        expand_vec_minmax(type, vece, TCG_COND_GT, false, v0, v1, v2);
-        break;
-    case INDEX_op_umin_vec:
-        v2 = temp_tcgv_vec(arg_temp(a2));
-        expand_vec_minmax(type, vece, TCG_COND_GTU, true, v0, v1, v2);
-        break;
-    case INDEX_op_umax_vec:
-        v2 = temp_tcgv_vec(arg_temp(a2));
-        expand_vec_minmax(type, vece, TCG_COND_GTU, false, v0, v1, v2);
-        break;
-
     default:
         break;
     }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 10/16] tcg/i386: Use umin/umax in expanding unsigned compare
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (8 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 09/16] tcg/i386: Remove expansion for missing minmax Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 11/16] tcg/aarch64: Support vector bitwise select value Richard Henderson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 80 +++++++++++++++++++++++++++++----------
 1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 569a2c2120..6ec5e60448 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3468,28 +3468,61 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
                                  TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 {
     enum {
-        NEED_SWAP = 1,
-        NEED_INV  = 2,
-        NEED_BIAS = 4
-    };
-    static const uint8_t fixups[16] = {
-        [0 ... 15] = -1,
-        [TCG_COND_EQ] = 0,
-        [TCG_COND_NE] = NEED_INV,
-        [TCG_COND_GT] = 0,
-        [TCG_COND_LT] = NEED_SWAP,
-        [TCG_COND_LE] = NEED_INV,
-        [TCG_COND_GE] = NEED_SWAP | NEED_INV,
-        [TCG_COND_GTU] = NEED_BIAS,
-        [TCG_COND_LTU] = NEED_BIAS | NEED_SWAP,
-        [TCG_COND_LEU] = NEED_BIAS | NEED_INV,
-        [TCG_COND_GEU] = NEED_BIAS | NEED_SWAP | NEED_INV,
+        NEED_INV  = 1,
+        NEED_SWAP = 2,
+        NEED_BIAS = 4,
+        NEED_UMIN = 8,
+        NEED_UMAX = 16,
     };
     TCGv_vec t1, t2;
     uint8_t fixup;
 
-    fixup = fixups[cond & 15];
-    tcg_debug_assert(fixup != 0xff);
+    switch (cond) {
+    case TCG_COND_EQ:
+    case TCG_COND_GT:
+        fixup = 0;
+        break;
+    case TCG_COND_NE:
+    case TCG_COND_LE:
+        fixup = NEED_INV;
+        break;
+    case TCG_COND_LT:
+        fixup = NEED_SWAP;
+        break;
+    case TCG_COND_GE:
+        fixup = NEED_SWAP | NEED_INV;
+        break;
+    case TCG_COND_LEU:
+        if (vece <= MO_32) {
+            fixup = NEED_UMIN;
+        } else {
+            fixup = NEED_BIAS | NEED_INV;
+        }
+        break;
+    case TCG_COND_GTU:
+        if (vece <= MO_32) {
+            fixup = NEED_UMIN | NEED_INV;
+        } else {
+            fixup = NEED_BIAS;
+        }
+        break;
+    case TCG_COND_GEU:
+        if (vece <= MO_32) {
+            fixup = NEED_UMAX;
+        } else {
+            fixup = NEED_BIAS | NEED_SWAP | NEED_INV;
+        }
+        break;
+    case TCG_COND_LTU:
+        if (vece <= MO_32) {
+            fixup = NEED_UMAX | NEED_INV;
+        } else {
+            fixup = NEED_BIAS | NEED_SWAP;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
 
     if (fixup & NEED_INV) {
         cond = tcg_invert_cond(cond);
@@ -3500,7 +3533,16 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
     }
 
     t1 = t2 = NULL;
-    if (fixup & NEED_BIAS) {
+    if (fixup & (NEED_UMIN | NEED_UMAX)) {
+        t1 = tcg_temp_new_vec(type);
+        if (fixup & NEED_UMIN) {
+            tcg_gen_umin_vec(vece, t1, v1, v2);
+        } else {
+            tcg_gen_umax_vec(vece, t1, v1, v2);
+        }
+        v2 = t1;
+        cond = TCG_COND_EQ;
+    } else if (fixup & NEED_BIAS) {
         t1 = tcg_temp_new_vec(type);
         t2 = tcg_temp_new_vec(type);
         tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1));
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 11/16] tcg/aarch64: Support vector bitwise select value
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (9 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 10/16] tcg/i386: Use umin/umax in expanding unsigned compare Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 12/16] tcg/aarch64: Split up is_fimm Richard Henderson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination.  We
can represent the operation without overlap and choose based on
the operands seen.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.h     |  2 +-
 tcg/aarch64/tcg-target.inc.c | 24 +++++++++++++++++++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index b4a9d36bbc..ca214f6909 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,7 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       1
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
 #define TCG_TARGET_DEFAULT_MO (0)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 40bf35079a..e99149cda7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -523,6 +523,9 @@ typedef enum {
     I3616_ADD       = 0x0e208400,
     I3616_AND       = 0x0e201c00,
     I3616_BIC       = 0x0e601c00,
+    I3616_BIF       = 0x2ee01c00,
+    I3616_BIT       = 0x2ea01c00,
+    I3616_BSL       = 0x2e601c00,
     I3616_EOR       = 0x2e201c00,
     I3616_MUL       = 0x0e209c00,
     I3616_ORR       = 0x0ea01c00,
@@ -2181,7 +2184,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 
     TCGType type = vecl + TCG_TYPE_V64;
     unsigned is_q = vecl;
-    TCGArg a0, a1, a2;
+    TCGArg a0, a1, a2, a3;
 
     a0 = args[0];
     a1 = args[1];
@@ -2304,6 +2307,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_bitsel_vec:
+        a3 = args[3];
+        if (a0 == a3) {
+            tcg_out_insn(s, 3616, BIT, is_q, 0, a0, a2, a1);
+        } else if (a0 == a2) {
+            tcg_out_insn(s, 3616, BIF, is_q, 0, a0, a3, a1);
+        } else {
+            if (a0 != a1) {
+                tcg_out_mov(s, type, a0, a1);
+            }
+            tcg_out_insn(s, 3616, BSL, is_q, 0, a0, a2, a3);
+        }
+        break;
+
     case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -2334,6 +2351,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
     case INDEX_op_shlv_vec:
+    case INDEX_op_bitsel_vec:
         return 1;
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
@@ -2408,6 +2426,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "r", "r", "rA", "rZ", "rZ" } };
     static const TCGTargetOpDef add2
         = { .args_ct_str = { "r", "r", "rZ", "rZ", "rA", "rMZ" } };
+    static const TCGTargetOpDef w_w_w_w
+        = { .args_ct_str = { "w", "w", "w", "w" } };
 
     switch (op) {
     case INDEX_op_goto_ptr:
@@ -2580,6 +2600,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_wr;
     case INDEX_op_cmp_vec:
         return &w_w_wZ;
+    case INDEX_op_bitsel_vec:
+        return &w_w_w_w;
 
     default:
         return NULL;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 12/16] tcg/aarch64: Split up is_fimm
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (10 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 11/16] tcg/aarch64: Support vector bitwise select value Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 13/16] tcg/aarch64: Use MVNI in tcg_out_dupi_vec Richard Henderson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

There are several sub-classes of vector immediate, and only MOVI
can use them all.  This will enable usage of MVNI and ORRI, which
use progressively fewer sub-classes.

This patch adds no new functionality, merely splits the function
and moves part of the logic into tcg_out_dupi_vec.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 205 ++++++++++++++++++++---------------
 1 file changed, 120 insertions(+), 85 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index e99149cda7..1422dfebe2 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -190,103 +190,86 @@ static inline bool is_limm(uint64_t val)
     return (val & (val - 1)) == 0;
 }
 
-/* Match a constant that is valid for vectors.  */
-static bool is_fimm(uint64_t v64, int *op, int *cmode, int *imm8)
+/* Return true if v16 is a valid 16-bit shifted immediate.  */
+static bool is_shimm16(uint16_t v16, int *cmode, int *imm8)
 {
-    int i;
-
-    *op = 0;
-    /* Match replication across 8 bits.  */
-    if (v64 == dup_const(MO_8, v64)) {
-        *cmode = 0xe;
-        *imm8 = v64 & 0xff;
+    if (v16 == (v16 & 0xff)) {
+        *cmode = 0x8;
+        *imm8 = v16 & 0xff;
+        return true;
+    } else if (v16 == (v16 & 0xff00)) {
+        *cmode = 0xa;
+        *imm8 = v16 >> 8;
         return true;
     }
-    /* Match replication across 16 bits.  */
-    if (v64 == dup_const(MO_16, v64)) {
-        uint16_t v16 = v64;
+    return false;
+}
 
-        if (v16 == (v16 & 0xff)) {
-            *cmode = 0x8;
-            *imm8 = v16 & 0xff;
-            return true;
-        } else if (v16 == (v16 & 0xff00)) {
-            *cmode = 0xa;
-            *imm8 = v16 >> 8;
-            return true;
-        }
+/* Return true if v32 is a valid 32-bit shifted immediate.  */
+static bool is_shimm32(uint32_t v32, int *cmode, int *imm8)
+{
+    if (v32 == (v32 & 0xff)) {
+        *cmode = 0x0;
+        *imm8 = v32 & 0xff;
+        return true;
+    } else if (v32 == (v32 & 0xff00)) {
+        *cmode = 0x2;
+        *imm8 = (v32 >> 8) & 0xff;
+        return true;
+    } else if (v32 == (v32 & 0xff0000)) {
+        *cmode = 0x4;
+        *imm8 = (v32 >> 16) & 0xff;
+        return true;
+    } else if (v32 == (v32 & 0xff000000)) {
+        *cmode = 0x6;
+        *imm8 = v32 >> 24;
+        return true;
     }
-    /* Match replication across 32 bits.  */
-    if (v64 == dup_const(MO_32, v64)) {
-        uint32_t v32 = v64;
+    return false;
+}
 
-        if (v32 == (v32 & 0xff)) {
-            *cmode = 0x0;
-            *imm8 = v32 & 0xff;
-            return true;
-        } else if (v32 == (v32 & 0xff00)) {
-            *cmode = 0x2;
-            *imm8 = (v32 >> 8) & 0xff;
-            return true;
-        } else if (v32 == (v32 & 0xff0000)) {
-            *cmode = 0x4;
-            *imm8 = (v32 >> 16) & 0xff;
-            return true;
-        } else if (v32 == (v32 & 0xff000000)) {
-            *cmode = 0x6;
-            *imm8 = v32 >> 24;
-            return true;
-        } else if ((v32 & 0xffff00ff) == 0xff) {
-            *cmode = 0xc;
-            *imm8 = (v32 >> 8) & 0xff;
-            return true;
-        } else if ((v32 & 0xff00ffff) == 0xffff) {
-            *cmode = 0xd;
-            *imm8 = (v32 >> 16) & 0xff;
-            return true;
-        }
-        /* Match forms of a float32.  */
-        if (extract32(v32, 0, 19) == 0
-            && (extract32(v32, 25, 6) == 0x20
-                || extract32(v32, 25, 6) == 0x1f)) {
-            *cmode = 0xf;
-            *imm8 = (extract32(v32, 31, 1) << 7)
-                  | (extract32(v32, 25, 1) << 6)
-                  | extract32(v32, 19, 6);
-            return true;
-        }
+/* Return true if v32 is a valid 32-bit shifting ones immediate.  */
+static bool is_soimm32(uint32_t v32, int *cmode, int *imm8)
+{
+    if ((v32 & 0xffff00ff) == 0xff) {
+        *cmode = 0xc;
+        *imm8 = (v32 >> 8) & 0xff;
+        return true;
+    } else if ((v32 & 0xff00ffff) == 0xffff) {
+        *cmode = 0xd;
+        *imm8 = (v32 >> 16) & 0xff;
+        return true;
     }
-    /* Match forms of a float64.  */
+    return false;
+}
+
+/* Return true if v32 is a valid float32 immediate.  */
+static bool is_fimm32(uint32_t v32, int *cmode, int *imm8)
+{
+    if (extract32(v32, 0, 19) == 0
+        && (extract32(v32, 25, 6) == 0x20
+            || extract32(v32, 25, 6) == 0x1f)) {
+        *cmode = 0xf;
+        *imm8 = (extract32(v32, 31, 1) << 7)
+              | (extract32(v32, 25, 1) << 6)
+              | extract32(v32, 19, 6);
+        return true;
+    }
+    return false;
+}
+
+/* Return true if v64 is a valid float64 immediate.  */
+static bool is_fimm64(uint64_t v64, int *cmode, int *imm8)
+{
     if (extract64(v64, 0, 48) == 0
         && (extract64(v64, 54, 9) == 0x100
             || extract64(v64, 54, 9) == 0x0ff)) {
         *cmode = 0xf;
-        *op = 1;
         *imm8 = (extract64(v64, 63, 1) << 7)
               | (extract64(v64, 54, 1) << 6)
               | extract64(v64, 48, 6);
         return true;
     }
-    /* Match bytes of 0x00 and 0xff.  */
-    for (i = 0; i < 64; i += 8) {
-        uint64_t byte = extract64(v64, i, 8);
-        if (byte != 0 && byte != 0xff) {
-            break;
-        }
-    }
-    if (i == 64) {
-        *cmode = 0xe;
-        *op = 1;
-        *imm8 = (extract64(v64, 0, 1) << 0)
-              | (extract64(v64, 8, 1) << 1)
-              | (extract64(v64, 16, 1) << 2)
-              | (extract64(v64, 24, 1) << 3)
-              | (extract64(v64, 32, 1) << 4)
-              | (extract64(v64, 40, 1) << 5)
-              | (extract64(v64, 48, 1) << 6)
-              | (extract64(v64, 56, 1) << 7);
-        return true;
-    }
     return false;
 }
 
@@ -817,11 +800,63 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
                              TCGReg rd, tcg_target_long v64)
 {
-    int op, cmode, imm8;
+    bool q = type == TCG_TYPE_V128;
+    int cmode, imm8, i;
 
-    if (is_fimm(v64, &op, &cmode, &imm8)) {
-        tcg_out_insn(s, 3606, MOVI, type == TCG_TYPE_V128, rd, op, cmode, imm8);
-    } else if (type == TCG_TYPE_V128) {
+    /* Test all bytes equal first.  */
+    if (v64 == dup_const(MO_8, v64)) {
+        imm8 = (uint8_t)v64;
+        tcg_out_insn(s, 3606, MOVI, q, rd, 0, 0xe, imm8);
+        return;
+    }
+
+    /*
+     * Test all bytes 0x00 or 0xff second.  This can match cases that
+     * might otherwise take 2 or 3 insns for MO_16 or MO_32 below.
+     */
+    for (i = imm8 = 0; i < 8; i++) {
+        uint8_t byte = v64 >> (i * 8);
+        if (byte == 0xff) {
+            imm8 |= 1 << i;
+        } else if (byte != 0) {
+            goto fail_bytes;
+        }
+    }
+    tcg_out_insn(s, 3606, MOVI, q, rd, 1, 0xe, imm8);
+    return;
+ fail_bytes:
+
+    /*
+     * Tests for various replications.  For each element width, if we
+     * cannot find an expansion there's no point checking a larger
+     * width because we already know by replication it cannot match.
+     */
+    if (v64 == dup_const(MO_16, v64)) {
+        uint16_t v16 = v64;
+
+        if (is_shimm16(v16, &cmode, &imm8)) {
+            tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
+            return;
+        }
+    } else if (v64 == dup_const(MO_32, v64)) {
+        uint32_t v32 = v64;
+
+        if (is_shimm32(v32, &cmode, &imm8) ||
+            is_soimm32(v32, &cmode, &imm8) ||
+            is_fimm32(v32, &cmode, &imm8)) {
+            tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
+            return;
+        }
+    } else if (is_fimm64(v64, &cmode, &imm8)) {
+        tcg_out_insn(s, 3606, MOVI, q, rd, 1, cmode, imm8);
+        return;
+    }
+
+    /*
+     * As a last resort, load from the constant pool.  Sadly there
+     * is no LD1R (literal), so store the full 16-byte vector.
+     */
+    if (type == TCG_TYPE_V128) {
         new_pool_l2(s, R_AARCH64_CONDBR19, s->code_ptr, 0, v64, v64);
         tcg_out_insn(s, 3305, LDR_v128, 0, rd);
     } else {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 13/16] tcg/aarch64: Use MVNI in tcg_out_dupi_vec
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (11 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 12/16] tcg/aarch64: Split up is_fimm Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 14/16] tcg/aarch64: Build vector immediates with two insns Richard Henderson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The compliment of a subset of immediates can be computed
with a single instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1422dfebe2..0b8b733805 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -494,6 +494,7 @@ typedef enum {
 
     /* AdvSIMD modified immediate */
     I3606_MOVI      = 0x0f000400,
+    I3606_MVNI      = 0x2f000400,
 
     /* AdvSIMD shift by immediate */
     I3614_SSHR      = 0x0f000400,
@@ -838,8 +839,13 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
             tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
             return;
         }
+        if (is_shimm16(~v16, &cmode, &imm8)) {
+            tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+            return;
+        }
     } else if (v64 == dup_const(MO_32, v64)) {
         uint32_t v32 = v64;
+        uint32_t n32 = ~v32;
 
         if (is_shimm32(v32, &cmode, &imm8) ||
             is_soimm32(v32, &cmode, &imm8) ||
@@ -847,6 +853,11 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
             tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
             return;
         }
+        if (is_shimm32(n32, &cmode, &imm8) ||
+            is_soimm32(n32, &cmode, &imm8)) {
+            tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+            return;
+        }
     } else if (is_fimm64(v64, &cmode, &imm8)) {
         tcg_out_insn(s, 3606, MOVI, q, rd, 1, cmode, imm8);
         return;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 14/16] tcg/aarch64: Build vector immediates with two insns
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (12 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 13/16] tcg/aarch64: Use MVNI in tcg_out_dupi_vec Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 15/16] tcg/aarch64: Allow immediates for vector ORR and BIC Richard Henderson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Use MOVI+ORR or MVNI+BIC in order to build some vector constants,
as opposed to dropping them to the constant pool.  This includes
all 16-bit constants and a similar set of 32-bit constants.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 47 ++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 0b8b733805..52c18074ae 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -273,6 +273,26 @@ static bool is_fimm64(uint64_t v64, int *cmode, int *imm8)
     return false;
 }
 
+/*
+ * Return non-zero if v32 can be formed by MOVI+ORR.
+ * Place the parameters for MOVI in (cmode, imm8).
+ * Return the cmode for ORR; the imm8 can be had via extraction from v32.
+ */
+static int is_shimm32_pair(uint32_t v32, int *cmode, int *imm8)
+{
+    int i;
+
+    for (i = 6; i > 0; i -= 2) {
+        /* Mask out one byte we can add with ORR.  */
+        uint32_t tmp = v32 & ~(0xffu << (i * 4));
+        if (is_shimm32(tmp, cmode, imm8) ||
+            is_soimm32(tmp, cmode, imm8)) {
+            break;
+        }
+    }
+    return i;
+}
+
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
                                   const TCGArgConstraint *arg_ct)
 {
@@ -495,6 +515,8 @@ typedef enum {
     /* AdvSIMD modified immediate */
     I3606_MOVI      = 0x0f000400,
     I3606_MVNI      = 0x2f000400,
+    I3606_BIC       = 0x2f001400,
+    I3606_ORR       = 0x0f001400,
 
     /* AdvSIMD shift by immediate */
     I3614_SSHR      = 0x0f000400,
@@ -843,6 +865,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
             tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
             return;
         }
+
+        /*
+         * Otherwise, all remaining constants can be loaded in two insns:
+         * rd = v16 & 0xff, rd |= v16 & 0xff00.
+         */
+        tcg_out_insn(s, 3606, MOVI, q, rd, 0, 0x8, v16 & 0xff);
+        tcg_out_insn(s, 3606, ORR, q, rd, 0, 0xa, v16 >> 8);
+        return;
     } else if (v64 == dup_const(MO_32, v64)) {
         uint32_t v32 = v64;
         uint32_t n32 = ~v32;
@@ -858,6 +888,23 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
             tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
             return;
         }
+
+        /*
+         * Restrict the set of constants to those we can load with
+         * two instructions.  Others we load from the pool.
+         */
+        i = is_shimm32_pair(v32, &cmode, &imm8);
+        if (i) {
+            tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
+            tcg_out_insn(s, 3606, ORR, q, rd, 0, i, extract32(v32, i * 4, 8));
+            return;
+        }
+        i = is_shimm32_pair(n32, &cmode, &imm8);
+        if (i) {
+            tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+            tcg_out_insn(s, 3606, BIC, q, rd, 0, i, extract32(n32, i * 4, 8));
+            return;
+        }
     } else if (is_fimm64(v64, &cmode, &imm8)) {
         tcg_out_insn(s, 3606, MOVI, q, rd, 1, cmode, imm8);
         return;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 15/16] tcg/aarch64: Allow immediates for vector ORR and BIC
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (13 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 14/16] tcg/aarch64: Build vector immediates with two insns Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-22 22:28 ` [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store Richard Henderson
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The allows immediates to be used for ORR and BIC,
as well as the trivial inversions, ORC and AND.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/aarch64/tcg-target.inc.c | 90 +++++++++++++++++++++++++++++++++---
 1 file changed, 83 insertions(+), 7 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 52c18074ae..9e1dad9696 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -119,6 +119,8 @@ static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_LIMM 0x200
 #define TCG_CT_CONST_ZERO 0x400
 #define TCG_CT_CONST_MONE 0x800
+#define TCG_CT_CONST_ORRI 0x1000
+#define TCG_CT_CONST_ANDI 0x2000
 
 /* parse target specific constraints */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
@@ -154,6 +156,12 @@ static const char *target_parse_constraint(TCGArgConstraint *ct,
     case 'M': /* minus one */
         ct->ct |= TCG_CT_CONST_MONE;
         break;
+    case 'O': /* vector orr/bic immediate */
+        ct->ct |= TCG_CT_CONST_ORRI;
+        break;
+    case 'N': /* vector orr/bic immediate, inverted */
+        ct->ct |= TCG_CT_CONST_ANDI;
+        break;
     case 'Z': /* zero */
         ct->ct |= TCG_CT_CONST_ZERO;
         break;
@@ -293,6 +301,16 @@ static int is_shimm32_pair(uint32_t v32, int *cmode, int *imm8)
     return i;
 }
 
+/* Return true if V is a valid 16-bit or 32-bit shifted immediate.  */
+static bool is_shimm1632(uint32_t v32, int *cmode, int *imm8)
+{
+    if (v32 == deposit32(v32, 16, 16, v32)) {
+        return is_shimm16(v32, cmode, imm8);
+    } else {
+        return is_shimm32(v32, cmode, imm8);
+    }
+}
+
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
                                   const TCGArgConstraint *arg_ct)
 {
@@ -317,6 +335,23 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
         return 1;
     }
 
+    switch (ct & (TCG_CT_CONST_ORRI | TCG_CT_CONST_ANDI)) {
+    case 0:
+        break;
+    case TCG_CT_CONST_ANDI:
+        val = ~val;
+        /* fallthru */
+    case TCG_CT_CONST_ORRI:
+        if (val == deposit64(val, 32, 32, val)) {
+            int cmode, imm8;
+            return is_shimm1632(val, &cmode, &imm8);
+        }
+        break;
+    default:
+        /* Both bits should not be set for the same insn.  */
+        g_assert_not_reached();
+    }
+
     return 0;
 }
 
@@ -2278,6 +2313,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     TCGType type = vecl + TCG_TYPE_V64;
     unsigned is_q = vecl;
     TCGArg a0, a1, a2, a3;
+    int cmode, imm8;
 
     a0 = args[0];
     a1 = args[1];
@@ -2309,20 +2345,56 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3617, ABS, is_q, vece, a0, a1);
         break;
     case INDEX_op_and_vec:
+        if (const_args[2]) {
+            is_shimm1632(~a2, &cmode, &imm8);
+            if (a0 == a1) {
+                tcg_out_insn(s, 3606, BIC, is_q, a0, 0, cmode, imm8);
+                return;
+            }
+            tcg_out_insn(s, 3606, MVNI, is_q, a0, 0, cmode, imm8);
+            a2 = a0;
+        }
         tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
         break;
     case INDEX_op_or_vec:
+        if (const_args[2]) {
+            is_shimm1632(a2, &cmode, &imm8);
+            if (a0 == a1) {
+                tcg_out_insn(s, 3606, ORR, is_q, a0, 0, cmode, imm8);
+                return;
+            }
+            tcg_out_insn(s, 3606, MOVI, is_q, a0, 0, cmode, imm8);
+            a2 = a0;
+        }
         tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2);
         break;
-    case INDEX_op_xor_vec:
-        tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2);
-        break;
     case INDEX_op_andc_vec:
+        if (const_args[2]) {
+            is_shimm1632(a2, &cmode, &imm8);
+            if (a0 == a1) {
+                tcg_out_insn(s, 3606, BIC, is_q, a0, 0, cmode, imm8);
+                return;
+            }
+            tcg_out_insn(s, 3606, MOVI, is_q, a0, 0, cmode, imm8);
+            a2 = a0;
+        }
         tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2);
         break;
     case INDEX_op_orc_vec:
+        if (const_args[2]) {
+            is_shimm1632(~a2, &cmode, &imm8);
+            if (a0 == a1) {
+                tcg_out_insn(s, 3606, ORR, is_q, a0, 0, cmode, imm8);
+                return;
+            }
+            tcg_out_insn(s, 3606, MVNI, is_q, a0, 0, cmode, imm8);
+            a2 = a0;
+        }
         tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2);
         break;
+    case INDEX_op_xor_vec:
+        tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2);
+        break;
     case INDEX_op_ssadd_vec:
         tcg_out_insn(s, 3616, SQADD, is_q, vece, a0, a1, a2);
         break;
@@ -2505,6 +2577,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
     static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
     static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
+    static const TCGTargetOpDef w_w_wO = { .args_ct_str = { "w", "w", "wO" } };
+    static const TCGTargetOpDef w_w_wN = { .args_ct_str = { "w", "w", "wN" } };
     static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
     static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
     static const TCGTargetOpDef r_r_rA = { .args_ct_str = { "r", "r", "rA" } };
@@ -2660,11 +2734,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
     case INDEX_op_mul_vec:
-    case INDEX_op_and_vec:
-    case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
-    case INDEX_op_andc_vec:
-    case INDEX_op_orc_vec:
     case INDEX_op_ssadd_vec:
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
@@ -2691,6 +2761,12 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         return &w_r;
     case INDEX_op_dup_vec:
         return &w_wr;
+    case INDEX_op_or_vec:
+    case INDEX_op_andc_vec:
+        return &w_w_wO;
+    case INDEX_op_and_vec:
+    case INDEX_op_orc_vec:
+        return &w_w_wN;
     case INDEX_op_cmp_vec:
         return &w_w_wZ;
     case INDEX_op_bitsel_vec:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (14 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 15/16] tcg/aarch64: Allow immediates for vector ORR and BIC Richard Henderson
@ 2019-05-22 22:28 ` Richard Henderson
  2019-05-28 17:28   ` David Hildenbrand
  2019-05-23  8:17 ` [Qemu-devel] [PULL 00/16] tcg queued patches Aleksandar Markovic
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2019-05-22 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This instruction raises #GP, aka SIGSEGV, if the effective address
is not aligned to 16-bytes.

We have assertions in tcg-op-gvec.c that the offset from ENV is
aligned, for vector types <= V128.  But the offset itself does not
validate that the final pointer is aligned -- one must also remember
to use the QEMU_ALIGNED() attribute on the vector member within ENV.

PowerPC Altivec has vector load/store instructions that silently
discard the low 4 bits of the address, making alignment mistakes
difficult to discover.  Aid that by making the most popular host
visibly signal the error.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target.inc.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 6ec5e60448..c0443da4af 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1082,14 +1082,24 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
         }
         /* FALLTHRU */
     case TCG_TYPE_V64:
+        /* There is no instruction that can validate 8-byte alignment.  */
         tcg_debug_assert(ret >= 16);
         tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2);
         break;
     case TCG_TYPE_V128:
+        /*
+         * The gvec infrastructure is asserts that v128 vector loads
+         * and stores use a 16-byte aligned offset.  Validate that the
+         * final pointer is aligned by using an insn that will SIGSEGV.
+         */
         tcg_debug_assert(ret >= 16);
-        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2);
+        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_VxWx, ret, 0, arg1, arg2);
         break;
     case TCG_TYPE_V256:
+        /*
+         * The gvec infrastructure only requires 16-byte alignment,
+         * so here we must use an unaligned load.
+         */
         tcg_debug_assert(ret >= 16);
         tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL,
                                  ret, 0, arg1, arg2);
@@ -1117,14 +1127,24 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
         }
         /* FALLTHRU */
     case TCG_TYPE_V64:
+        /* There is no instruction that can validate 8-byte alignment.  */
         tcg_debug_assert(arg >= 16);
         tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2);
         break;
     case TCG_TYPE_V128:
+        /*
+         * The gvec infrastructure is asserts that v128 vector loads
+         * and stores use a 16-byte aligned offset.  Validate that the
+         * final pointer is aligned by using an insn that will SIGSEGV.
+         */
         tcg_debug_assert(arg >= 16);
-        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2);
+        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
         break;
     case TCG_TYPE_V256:
+        /*
+         * The gvec infrastructure only requires 16-byte alignment,
+         * so here we must use an unaligned store.
+         */
         tcg_debug_assert(arg >= 16);
         tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL,
                                  arg, 0, arg1, arg2);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/16] tcg queued patches
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (15 preceding siblings ...)
  2019-05-22 22:28 ` [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store Richard Henderson
@ 2019-05-23  8:17 ` Aleksandar Markovic
  2019-05-23 12:42   ` Richard Henderson
  2019-05-24 10:43 ` Peter Maydell
  2019-05-28 16:58 ` David Hildenbrand
  18 siblings, 1 reply; 30+ messages in thread
From: Aleksandar Markovic @ 2019-05-23  8:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: peter.maydell, qemu-devel

On May 23, 2019 12:32 AM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> The following changes since commit
a4f667b6714916683408b983cfe0a615a725775f:
>
>   Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3'
into staging (2019-05-21 16:30:13 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/rth7680/qemu.git tags/pull-tcg-20190522
>
> for you to fetch changes up to 11e2bfef799024be4a08fcf6797fe0b22fb16b58:
>
>   tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store (2019-05-22 15:09:43
-0400)
>
> ----------------------------------------------------------------
> Misc gvec improvements
>
> ----------------------------------------------------------------

Why are “Reviewed-by:” lines missing from all patches of this pull request?

Regerds,
Aleksandar

> Richard Henderson (16):
>       tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
>       tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
>       tcg: Add support for vector bitwise select
>       tcg: Add support for vector compare select
>       tcg: Introduce do_op3_nofail for vector expansion
>       tcg: Expand vector minmax using cmp+cmpsel
>       tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative
>       tcg/i386: Support vector comparison select value
>       tcg/i386: Remove expansion for missing minmax
>       tcg/i386: Use umin/umax in expanding unsigned compare
>       tcg/aarch64: Support vector bitwise select value
>       tcg/aarch64: Split up is_fimm
>       tcg/aarch64: Use MVNI in tcg_out_dupi_vec
>       tcg/aarch64: Build vector immediates with two insns
>       tcg/aarch64: Allow immediates for vector ORR and BIC
>       tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
>
>  accel/tcg/tcg-runtime.h      |   2 +
>  tcg/aarch64/tcg-target.h     |   2 +
>  tcg/i386/tcg-target.h        |   2 +
>  tcg/tcg-op-gvec.h            |   7 +
>  tcg/tcg-op.h                 |   5 +
>  tcg/tcg-opc.h                |   5 +-
>  tcg/tcg.h                    |   2 +
>  accel/tcg/tcg-runtime-gvec.c |  14 ++
>  tcg/aarch64/tcg-target.inc.c | 371
++++++++++++++++++++++++++++++++-----------
>  tcg/i386/tcg-target.inc.c    | 169 +++++++++++++-------
>  tcg/tcg-op-gvec.c            |  71 ++++++---
>  tcg/tcg-op-vec.c             | 142 ++++++++++++++---
>  tcg/tcg.c                    |   5 +
>  tcg/README                   |  11 ++
>  14 files changed, 620 insertions(+), 188 deletions(-)
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/16] tcg queued patches
  2019-05-23  8:17 ` [Qemu-devel] [PULL 00/16] tcg queued patches Aleksandar Markovic
@ 2019-05-23 12:42   ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-23 12:42 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: peter.maydell, qemu-devel

On 5/23/19 4:17 AM, Aleksandar Markovic wrote:
> Why are “Reviewed-by:” lines missing from all patches of this pull request?
> 

Because it's hard to get people to review code under tcg/.
I post patches and wait a few days to a week and then give a pull.

This has been true since forever, when I was the only one giving
reviews to Aurelien, and thus wound up with this job when he left.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/16] tcg queued patches
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (16 preceding siblings ...)
  2019-05-23  8:17 ` [Qemu-devel] [PULL 00/16] tcg queued patches Aleksandar Markovic
@ 2019-05-24 10:43 ` Peter Maydell
  2019-05-28 16:58 ` David Hildenbrand
  18 siblings, 0 replies; 30+ messages in thread
From: Peter Maydell @ 2019-05-24 10:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Wed, 22 May 2019 at 23:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The following changes since commit a4f667b6714916683408b983cfe0a615a725775f:
>
>   Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging (2019-05-21 16:30:13 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/rth7680/qemu.git tags/pull-tcg-20190522
>
> for you to fetch changes up to 11e2bfef799024be4a08fcf6797fe0b22fb16b58:
>
>   tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store (2019-05-22 15:09:43 -0400)
>
> ----------------------------------------------------------------
> Misc gvec improvements
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.1
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 00/16] tcg queued patches
  2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
                   ` (17 preceding siblings ...)
  2019-05-24 10:43 ` Peter Maydell
@ 2019-05-28 16:58 ` David Hildenbrand
  18 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2019-05-28 16:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: peter.maydell

On 23.05.19 00:28, Richard Henderson wrote:
> The following changes since commit a4f667b6714916683408b983cfe0a615a725775f:
> 
>   Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190521-3' into staging (2019-05-21 16:30:13 +0100)
> 
> are available in the Git repository at:
> 
>   https://github.com/rth7680/qemu.git tags/pull-tcg-20190522
> 
> for you to fetch changes up to 11e2bfef799024be4a08fcf6797fe0b22fb16b58:
> 
>   tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store (2019-05-22 15:09:43 -0400)
> 
> ----------------------------------------------------------------
> Misc gvec improvements
> 
> ----------------------------------------------------------------
> Richard Henderson (16):
>       tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
>       tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
>       tcg: Add support for vector bitwise select
>       tcg: Add support for vector compare select
>       tcg: Introduce do_op3_nofail for vector expansion
>       tcg: Expand vector minmax using cmp+cmpsel
>       tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative
>       tcg/i386: Support vector comparison select value
>       tcg/i386: Remove expansion for missing minmax
>       tcg/i386: Use umin/umax in expanding unsigned compare
>       tcg/aarch64: Support vector bitwise select value
>       tcg/aarch64: Split up is_fimm
>       tcg/aarch64: Use MVNI in tcg_out_dupi_vec
>       tcg/aarch64: Build vector immediates with two insns
>       tcg/aarch64: Allow immediates for vector ORR and BIC
>       tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
> 
>  accel/tcg/tcg-runtime.h      |   2 +
>  tcg/aarch64/tcg-target.h     |   2 +
>  tcg/i386/tcg-target.h        |   2 +
>  tcg/tcg-op-gvec.h            |   7 +
>  tcg/tcg-op.h                 |   5 +
>  tcg/tcg-opc.h                |   5 +-
>  tcg/tcg.h                    |   2 +
>  accel/tcg/tcg-runtime-gvec.c |  14 ++
>  tcg/aarch64/tcg-target.inc.c | 371 ++++++++++++++++++++++++++++++++-----------
>  tcg/i386/tcg-target.inc.c    | 169 +++++++++++++-------
>  tcg/tcg-op-gvec.c            |  71 ++++++---
>  tcg/tcg-op-vec.c             | 142 ++++++++++++++---
>  tcg/tcg.c                    |   5 +
>  tcg/README                   |  11 ++
>  14 files changed, 620 insertions(+), 188 deletions(-)
> 

Rebasing my vx branch to latest qemu/master, I get segfaults when trying
to boot a Linux kernel:

[    2.652368] Unpacking initramfs...
Segmentation fault (Speicherabzug geschrieben)


"Auto-loading safe path" section in the GDB manual.  E.g., run from the
shell:
        info "(gdb)Auto-loading safe path"
(gdb) bt
#0  0x00007feb460409d0 in code_gen_buffer ()
#1  0x000055679d5322d3 in cpu_tb_exec (itb=<optimized out>,
cpu=0x7feb46040600 <code_gen_buffer+100926931>)
    at /home/dhildenb/git/qemu/accel/tcg/cpu-exec.c:171
#2  cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic
pointer>, tb=<optimized out>,
    cpu=0x7feb46040600 <code_gen_buffer+100926931>) at
/home/dhildenb/git/qemu/accel/tcg/cpu-exec.c:618
#3  cpu_exec (cpu=cpu@entry=0x55679fb37330) at
/home/dhildenb/git/qemu/accel/tcg/cpu-exec.c:729
#4  0x000055679d4f0ecf in tcg_cpu_exec (cpu=0x55679fb37330) at
/home/dhildenb/git/qemu/cpus.c:1434
#5  0x000055679d4f302b in qemu_tcg_cpu_thread_fn
(arg=arg@entry=0x55679fb37330)
    at /home/dhildenb/git/qemu/cpus.c:1743
#6  0x000055679d79a26a in qemu_thread_start (args=<optimized out>) at
util/qemu-thread-posix.c:502
#7  0x00007febd07a458e in ?? ()
#8  0x0000000000000000 in ?? ()

Any idea what this could be? (this series?)

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
  2019-05-22 22:28 ` [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store Richard Henderson
@ 2019-05-28 17:28   ` David Hildenbrand
  2019-05-28 18:33     ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2019-05-28 17:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: peter.maydell

On 23.05.19 00:28, Richard Henderson wrote:
> This instruction raises #GP, aka SIGSEGV, if the effective address
> is not aligned to 16-bytes.
> 
> We have assertions in tcg-op-gvec.c that the offset from ENV is
> aligned, for vector types <= V128.  But the offset itself does not
> validate that the final pointer is aligned -- one must also remember
> to use the QEMU_ALIGNED() attribute on the vector member within ENV.
> 
> PowerPC Altivec has vector load/store instructions that silently
> discard the low 4 bits of the address, making alignment mistakes
> difficult to discover.  Aid that by making the most popular host
> visibly signal the error.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.inc.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 6ec5e60448..c0443da4af 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -1082,14 +1082,24 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
>          }
>          /* FALLTHRU */
>      case TCG_TYPE_V64:
> +        /* There is no instruction that can validate 8-byte alignment.  */
>          tcg_debug_assert(ret >= 16);
>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2);
>          break;
>      case TCG_TYPE_V128:
> +        /*
> +         * The gvec infrastructure is asserts that v128 vector loads
> +         * and stores use a 16-byte aligned offset.  Validate that the
> +         * final pointer is aligned by using an insn that will SIGSEGV.
> +         */
>          tcg_debug_assert(ret >= 16);
> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2);
> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_VxWx, ret, 0, arg1, arg2);
>          break;
>      case TCG_TYPE_V256:
> +        /*
> +         * The gvec infrastructure only requires 16-byte alignment,
> +         * so here we must use an unaligned load.
> +         */
>          tcg_debug_assert(ret >= 16);
>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL,
>                                   ret, 0, arg1, arg2);
> @@ -1117,14 +1127,24 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>          }
>          /* FALLTHRU */
>      case TCG_TYPE_V64:
> +        /* There is no instruction that can validate 8-byte alignment.  */
>          tcg_debug_assert(arg >= 16);
>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2);
>          break;
>      case TCG_TYPE_V128:
> +        /*
> +         * The gvec infrastructure is asserts that v128 vector loads
> +         * and stores use a 16-byte aligned offset.  Validate that the
> +         * final pointer is aligned by using an insn that will SIGSEGV.
> +         */
>          tcg_debug_assert(arg >= 16);
> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2);
> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
>          break;
>      case TCG_TYPE_V256:
> +        /*
> +         * The gvec infrastructure only requires 16-byte alignment,
> +         * so here we must use an unaligned store.
> +         */
>          tcg_debug_assert(arg >= 16);
>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL,
>                                   arg, 0, arg1, arg2);
> 

This is the problematic patch. Haven't looked into the details yet, so I
can't tell what's wrong. Maybe really an alignemnt issue?

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
  2019-05-28 17:28   ` David Hildenbrand
@ 2019-05-28 18:33     ` David Hildenbrand
  2019-05-28 18:46       ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2019-05-28 18:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: peter.maydell

On 28.05.19 19:28, David Hildenbrand wrote:
> On 23.05.19 00:28, Richard Henderson wrote:
>> This instruction raises #GP, aka SIGSEGV, if the effective address
>> is not aligned to 16-bytes.
>>
>> We have assertions in tcg-op-gvec.c that the offset from ENV is
>> aligned, for vector types <= V128.  But the offset itself does not
>> validate that the final pointer is aligned -- one must also remember
>> to use the QEMU_ALIGNED() attribute on the vector member within ENV.
>>
>> PowerPC Altivec has vector load/store instructions that silently
>> discard the low 4 bits of the address, making alignment mistakes
>> difficult to discover.  Aid that by making the most popular host
>> visibly signal the error.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  tcg/i386/tcg-target.inc.c | 24 ++++++++++++++++++++++--
>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
>> index 6ec5e60448..c0443da4af 100644
>> --- a/tcg/i386/tcg-target.inc.c
>> +++ b/tcg/i386/tcg-target.inc.c
>> @@ -1082,14 +1082,24 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
>>          }
>>          /* FALLTHRU */
>>      case TCG_TYPE_V64:
>> +        /* There is no instruction that can validate 8-byte alignment.  */
>>          tcg_debug_assert(ret >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V128:
>> +        /*
>> +         * The gvec infrastructure is asserts that v128 vector loads
>> +         * and stores use a 16-byte aligned offset.  Validate that the
>> +         * final pointer is aligned by using an insn that will SIGSEGV.
>> +         */
>>          tcg_debug_assert(ret >= 16);
>> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2);
>> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_VxWx, ret, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V256:
>> +        /*
>> +         * The gvec infrastructure only requires 16-byte alignment,
>> +         * so here we must use an unaligned load.
>> +         */
>>          tcg_debug_assert(ret >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL,
>>                                   ret, 0, arg1, arg2);
>> @@ -1117,14 +1127,24 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>>          }
>>          /* FALLTHRU */
>>      case TCG_TYPE_V64:
>> +        /* There is no instruction that can validate 8-byte alignment.  */
>>          tcg_debug_assert(arg >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V128:
>> +        /*
>> +         * The gvec infrastructure is asserts that v128 vector loads
>> +         * and stores use a 16-byte aligned offset.  Validate that the
>> +         * final pointer is aligned by using an insn that will SIGSEGV.
>> +         */
>>          tcg_debug_assert(arg >= 16);
>> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2);
>> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
>>          break;
>>      case TCG_TYPE_V256:
>> +        /*
>> +         * The gvec infrastructure only requires 16-byte alignment,
>> +         * so here we must use an unaligned store.
>> +         */
>>          tcg_debug_assert(arg >= 16);
>>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL,
>>                                   arg, 0, arg1, arg2);
>>
> 
> This is the problematic patch. Haven't looked into the details yet, so I
> can't tell what's wrong. Maybe really an alignemnt issue?
> 

Okay, looks like "vregs" in "struct CPUS390XState" is always aligned to
8, but not to 16 bytes.

And that in return is the case, because "CPUS390XState env" is not
aligned to 16 bytes in "struct S390CPU"


!!!!!!!! CPU: 0x55a5e3046ef0
!!!!!!!! ENV: 0x55a5e304f1a8
!!!!!!!! VREGS: 0x55a5e304f228
!!!!!!!! CPU: 0x55a5e3070bb0
!!!!!!!! ENV: 0x55a5e3078e68
!!!!!!!! VREGS: 0x55a5e3078ee8
!!!!!!!! CPU: 0x55a5e3098310
!!!!!!!! ENV: 0x55a5e30a05c8
!!!!!!!! VREGS: 0x55a5e30a0648
!!!!!!!! CPU: 0x55a5e30c0730
!!!!!!!! ENV: 0x55a5e30c89e8
!!!!!!!! VREGS: 0x55a5e30c8a68
!!!!!!!! CPU: 0x55a5e30e7c90
!!!!!!!! ENV: 0x55a5e30eff48
!!!!!!!! VREGS: 0x55a5e30effc8
!!!!!!!! CPU: 0x55a5e310eea0
!!!!!!!! ENV: 0x55a5e3117158
!!!!!!!! VREGS: 0x55a5e31171d8
!!!!!!!! CPU: 0x55a5e31361e0
!!!!!!!! ENV: 0x55a5e313e498
!!!!!!!! VREGS: 0x55a5e313e518
!!!!!!!! CPU: 0x55a5e315d520
!!!!!!!! ENV: 0x55a5e31657d8
!!!!!!!! VREGS: 0x55a5e3165858

vregs is defined as:

CPU_DoubleU vregs[32][2];

We either have to switch to a type that has a natural alignment of 16
bytes, or enforce alignment of "CPUS390XState env" to 16 bytes.

What do you suggest?

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
  2019-05-28 18:33     ` David Hildenbrand
@ 2019-05-28 18:46       ` David Hildenbrand
  2019-05-28 21:34         ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2019-05-28 18:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: peter.maydell

On 28.05.19 20:33, David Hildenbrand wrote:
> On 28.05.19 19:28, David Hildenbrand wrote:
>> On 23.05.19 00:28, Richard Henderson wrote:
>>> This instruction raises #GP, aka SIGSEGV, if the effective address
>>> is not aligned to 16-bytes.
>>>
>>> We have assertions in tcg-op-gvec.c that the offset from ENV is
>>> aligned, for vector types <= V128.  But the offset itself does not
>>> validate that the final pointer is aligned -- one must also remember
>>> to use the QEMU_ALIGNED() attribute on the vector member within ENV.
>>>
>>> PowerPC Altivec has vector load/store instructions that silently
>>> discard the low 4 bits of the address, making alignment mistakes
>>> difficult to discover.  Aid that by making the most popular host
>>> visibly signal the error.
>>>
>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>>> ---
>>>  tcg/i386/tcg-target.inc.c | 24 ++++++++++++++++++++++--
>>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
>>> index 6ec5e60448..c0443da4af 100644
>>> --- a/tcg/i386/tcg-target.inc.c
>>> +++ b/tcg/i386/tcg-target.inc.c
>>> @@ -1082,14 +1082,24 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
>>>          }
>>>          /* FALLTHRU */
>>>      case TCG_TYPE_V64:
>>> +        /* There is no instruction that can validate 8-byte alignment.  */
>>>          tcg_debug_assert(ret >= 16);
>>>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2);
>>>          break;
>>>      case TCG_TYPE_V128:
>>> +        /*
>>> +         * The gvec infrastructure is asserts that v128 vector loads
>>> +         * and stores use a 16-byte aligned offset.  Validate that the
>>> +         * final pointer is aligned by using an insn that will SIGSEGV.
>>> +         */
>>>          tcg_debug_assert(ret >= 16);
>>> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2);
>>> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_VxWx, ret, 0, arg1, arg2);
>>>          break;
>>>      case TCG_TYPE_V256:
>>> +        /*
>>> +         * The gvec infrastructure only requires 16-byte alignment,
>>> +         * so here we must use an unaligned load.
>>> +         */
>>>          tcg_debug_assert(ret >= 16);
>>>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL,
>>>                                   ret, 0, arg1, arg2);
>>> @@ -1117,14 +1127,24 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
>>>          }
>>>          /* FALLTHRU */
>>>      case TCG_TYPE_V64:
>>> +        /* There is no instruction that can validate 8-byte alignment.  */
>>>          tcg_debug_assert(arg >= 16);
>>>          tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2);
>>>          break;
>>>      case TCG_TYPE_V128:
>>> +        /*
>>> +         * The gvec infrastructure is asserts that v128 vector loads
>>> +         * and stores use a 16-byte aligned offset.  Validate that the
>>> +         * final pointer is aligned by using an insn that will SIGSEGV.
>>> +         */
>>>          tcg_debug_assert(arg >= 16);
>>> -        tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2);
>>> +        tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
>>>          break;
>>>      case TCG_TYPE_V256:
>>> +        /*
>>> +         * The gvec infrastructure only requires 16-byte alignment,
>>> +         * so here we must use an unaligned store.
>>> +         */
>>>          tcg_debug_assert(arg >= 16);
>>>          tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL,
>>>                                   arg, 0, arg1, arg2);
>>>
>>
>> This is the problematic patch. Haven't looked into the details yet, so I
>> can't tell what's wrong. Maybe really an alignemnt issue?
>>
> 
> Okay, looks like "vregs" in "struct CPUS390XState" is always aligned to
> 8, but not to 16 bytes.
> 
> And that in return is the case, because "CPUS390XState env" is not
> aligned to 16 bytes in "struct S390CPU"
> 
> 
> !!!!!!!! CPU: 0x55a5e3046ef0
> !!!!!!!! ENV: 0x55a5e304f1a8
> !!!!!!!! VREGS: 0x55a5e304f228
> !!!!!!!! CPU: 0x55a5e3070bb0
> !!!!!!!! ENV: 0x55a5e3078e68
> !!!!!!!! VREGS: 0x55a5e3078ee8
> !!!!!!!! CPU: 0x55a5e3098310
> !!!!!!!! ENV: 0x55a5e30a05c8
> !!!!!!!! VREGS: 0x55a5e30a0648
> !!!!!!!! CPU: 0x55a5e30c0730
> !!!!!!!! ENV: 0x55a5e30c89e8
> !!!!!!!! VREGS: 0x55a5e30c8a68
> !!!!!!!! CPU: 0x55a5e30e7c90
> !!!!!!!! ENV: 0x55a5e30eff48
> !!!!!!!! VREGS: 0x55a5e30effc8
> !!!!!!!! CPU: 0x55a5e310eea0
> !!!!!!!! ENV: 0x55a5e3117158
> !!!!!!!! VREGS: 0x55a5e31171d8
> !!!!!!!! CPU: 0x55a5e31361e0
> !!!!!!!! ENV: 0x55a5e313e498
> !!!!!!!! VREGS: 0x55a5e313e518
> !!!!!!!! CPU: 0x55a5e315d520
> !!!!!!!! ENV: 0x55a5e31657d8
> !!!!!!!! VREGS: 0x55a5e3165858
> 
> vregs is defined as:
> 
> CPU_DoubleU vregs[32][2];
> 
> We either have to switch to a type that has a natural alignment of 16
> bytes, or enforce alignment of "CPUS390XState env" to 16 bytes.
> 
> What do you suggest?

FWIW, this seems to be the easiest way:

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index f0d9a6a36d..d363ae0fb3 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -66,7 +66,7 @@ struct CPUS390XState {
      * The floating point registers are part of the vector registers.
      * vregs[0][0] -> vregs[15][0] are 16 floating point registers
      */
-    CPU_DoubleU vregs[32][2];  /* vector registers */
+    CPU_DoubleU vregs[32][2] QEMU_ALIGNED(16);  /* vector registers */
     uint32_t aregs[16];    /* access registers */
     uint8_t riccb[64];     /* runtime instrumentation control */
     uint64_t gscb[4];      /* guarded storage control */


Makes it work for me again.

-- 

Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store
  2019-05-28 18:46       ` David Hildenbrand
@ 2019-05-28 21:34         ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2019-05-28 21:34 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: peter.maydell

On 5/28/19 1:46 PM, David Hildenbrand wrote:
> FWIW, this seems to be the easiest way:
> 
> diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
> index f0d9a6a36d..d363ae0fb3 100644
> --- a/target/s390x/cpu.h
> +++ b/target/s390x/cpu.h
> @@ -66,7 +66,7 @@ struct CPUS390XState {
>       * The floating point registers are part of the vector registers.
>       * vregs[0][0] -> vregs[15][0] are 16 floating point registers
>       */
> -    CPU_DoubleU vregs[32][2];  /* vector registers */
> +    CPU_DoubleU vregs[32][2] QEMU_ALIGNED(16);  /* vector registers */
>      uint32_t aregs[16];    /* access registers */
>      uint8_t riccb[64];     /* runtime instrumentation control */
>      uint64_t gscb[4];      /* guarded storage control */
> 
> 
> Makes it work for me again.

That's the right fix, and exactly the bug that I was hoping to find with
11e2bfef7990 ("tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store").


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value
  2019-05-22 22:28 ` [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value Richard Henderson
@ 2019-05-30 11:26   ` Peter Maydell
  2019-05-30 12:50     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Maydell @ 2019-05-30 11:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Wed, 22 May 2019 at 23:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> We already had backend support for this feature.  Expand the new
> cmpsel opcode using vpblendb.  The combination allows us to avoid
> an extra NOT for some comparison codes.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/i386/tcg-target.h     |  2 +-
>  tcg/i386/tcg-target.inc.c | 39 +++++++++++++++++++++++++++++++++++----
>  2 files changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 16a83a7f7b..928e8b87bb 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -191,7 +191,7 @@ extern bool have_avx2;
>  #define TCG_TARGET_HAS_sat_vec          1
>  #define TCG_TARGET_HAS_minmax_vec       1
>  #define TCG_TARGET_HAS_bitsel_vec       0
> -#define TCG_TARGET_HAS_cmpsel_vec       0
> +#define TCG_TARGET_HAS_cmpsel_vec       -1

This is the only place where we define a TCG_TARGET_HAS_* macro
to something other than 0 or 1, which means that Coverity
complains (CID 1401702) when we use it in a logical boolean expression
  return have_vec && TCG_TARGET_HAS_cmpsel_vec;
later on.

Should it really be -1, or is this a typo for 1 ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value
  2019-05-30 11:26   ` Peter Maydell
@ 2019-05-30 12:50     ` Richard Henderson
  2019-05-30 14:54       ` Aleksandar Markovic
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2019-05-30 12:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 5/30/19 6:26 AM, Peter Maydell wrote:
>> -#define TCG_TARGET_HAS_cmpsel_vec       0
>> +#define TCG_TARGET_HAS_cmpsel_vec       -1
> 
> This is the only place where we define a TCG_TARGET_HAS_* macro
> to something other than 0 or 1, which means that Coverity
> complains (CID 1401702) when we use it in a logical boolean expression
>   return have_vec && TCG_TARGET_HAS_cmpsel_vec;
> later on.
> 
> Should it really be -1, or is this a typo for 1 ?

It really should be -1.
See commit 25c012b4009256505be3430480954a0233de343e,
which contains the rationale.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value
  2019-05-30 12:50     ` Richard Henderson
@ 2019-05-30 14:54       ` Aleksandar Markovic
  2019-05-30 17:45         ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Aleksandar Markovic @ 2019-05-30 14:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On May 30, 2019 2:50 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> On 5/30/19 6:26 AM, Peter Maydell wrote:
> >> -#define TCG_TARGET_HAS_cmpsel_vec       0
> >> +#define TCG_TARGET_HAS_cmpsel_vec       -1
> >
> > This is the only place where we define a TCG_TARGET_HAS_* macro
> > to something other than 0 or 1, which means that Coverity
> > complains (CID 1401702) when we use it in a logical boolean expression
> >   return have_vec && TCG_TARGET_HAS_cmpsel_vec;
> > later on.
> >
> > Should it really be -1, or is this a typo for 1 ?
>
> It really should be -1.
> See commit 25c012b4009256505be3430480954a0233de343e,
> which contains the rationale.
>

How about extending commit message so that it contains explanation for -1
introduced in this very patch allowing future developers not to need to
reverse engineer whole git history to (maybe) find the explanation?

Sincerely,
Aleksandar

>
> r~
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value
  2019-05-30 14:54       ` Aleksandar Markovic
@ 2019-05-30 17:45         ` Richard Henderson
  2019-05-30 23:18           ` Aleksandar Markovic
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2019-05-30 17:45 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: Peter Maydell, QEMU Developers

On 5/30/19 9:54 AM, Aleksandar Markovic wrote:
> 
> On May 30, 2019 2:50 PM, "Richard Henderson" <richard.henderson@linaro.org
> <mailto:richard.henderson@linaro.org>> wrote:
>>
>> On 5/30/19 6:26 AM, Peter Maydell wrote:
>> >> -#define TCG_TARGET_HAS_cmpsel_vec       0
>> >> +#define TCG_TARGET_HAS_cmpsel_vec       -1
>> >
>> > This is the only place where we define a TCG_TARGET_HAS_* macro
>> > to something other than 0 or 1, which means that Coverity
>> > complains (CID 1401702) when we use it in a logical boolean expression
>> >   return have_vec && TCG_TARGET_HAS_cmpsel_vec;
>> > later on.
>> >
>> > Should it really be -1, or is this a typo for 1 ?
>>
>> It really should be -1.
>> See commit 25c012b4009256505be3430480954a0233de343e,
>> which contains the rationale.
>>
> 
> How about extending commit message so that it contains explanation for -1
> introduced in this very patch allowing future developers not to need to reverse
> engineer whole git history to (maybe) find the explanation?

No.

There seems to be no point at which you would stop, and not include the entire
git history of the project into each and every commit message.

I will not be drawn into such a discussion further.


r~


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value
  2019-05-30 17:45         ` Richard Henderson
@ 2019-05-30 23:18           ` Aleksandar Markovic
  0 siblings, 0 replies; 30+ messages in thread
From: Aleksandar Markovic @ 2019-05-30 23:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On May 30, 2019 7:45 PM, "Richard Henderson" <richard.henderson@linaro.org>
wrote:
>
> On 5/30/19 9:54 AM, Aleksandar Markovic wrote:
> >
> > On May 30, 2019 2:50 PM, "Richard Henderson" <
richard.henderson@linaro.org
> > <mailto:richard.henderson@linaro.org>> wrote:
> >>
> >> On 5/30/19 6:26 AM, Peter Maydell wrote:
> >> >> -#define TCG_TARGET_HAS_cmpsel_vec       0
> >> >> +#define TCG_TARGET_HAS_cmpsel_vec       -1
> >> >
> >> > This is the only place where we define a TCG_TARGET_HAS_* macro
> >> > to something other than 0 or 1, which means that Coverity
> >> > complains (CID 1401702) when we use it in a logical boolean
expression
> >> >   return have_vec && TCG_TARGET_HAS_cmpsel_vec;
> >> > later on.
> >> >
> >> > Should it really be -1, or is this a typo for 1 ?
> >>
> >> It really should be -1.
> >> See commit 25c012b4009256505be3430480954a0233de343e,
> >> which contains the rationale.
> >>
> >
> > How about extending commit message so that it contains explanation for
-1
> > introduced in this very patch allowing future developers not to need to
reverse
> > engineer whole git history to (maybe) find the explanation?
>
> No.
>
> There seems to be no point at which you would stop, and not include the
entire
> git history of the project into each and every commit message.
>
> I will not be drawn into such a discussion further.

Your commit messages are often disconnected with the content of the code
change, sometimes even look like cryptic puzzles. You can do much better
job there, and not look for what is good and clear for you, but what is
good and clear for others.

Regards,
Aleksandar

>
> r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-05-30 23:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22 22:28 [Qemu-devel] [PULL 00/16] tcg queued patches Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 01/16] tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 02/16] tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 03/16] tcg: Add support for vector bitwise select Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 04/16] tcg: Add support for vector compare select Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 05/16] tcg: Introduce do_op3_nofail for vector expansion Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 06/16] tcg: Expand vector minmax using cmp+cmpsel Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 07/16] tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 08/16] tcg/i386: Support vector comparison select value Richard Henderson
2019-05-30 11:26   ` Peter Maydell
2019-05-30 12:50     ` Richard Henderson
2019-05-30 14:54       ` Aleksandar Markovic
2019-05-30 17:45         ` Richard Henderson
2019-05-30 23:18           ` Aleksandar Markovic
2019-05-22 22:28 ` [Qemu-devel] [PULL 09/16] tcg/i386: Remove expansion for missing minmax Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 10/16] tcg/i386: Use umin/umax in expanding unsigned compare Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 11/16] tcg/aarch64: Support vector bitwise select value Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 12/16] tcg/aarch64: Split up is_fimm Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 13/16] tcg/aarch64: Use MVNI in tcg_out_dupi_vec Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 14/16] tcg/aarch64: Build vector immediates with two insns Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 15/16] tcg/aarch64: Allow immediates for vector ORR and BIC Richard Henderson
2019-05-22 22:28 ` [Qemu-devel] [PULL 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store Richard Henderson
2019-05-28 17:28   ` David Hildenbrand
2019-05-28 18:33     ` David Hildenbrand
2019-05-28 18:46       ` David Hildenbrand
2019-05-28 21:34         ` Richard Henderson
2019-05-23  8:17 ` [Qemu-devel] [PULL 00/16] tcg queued patches Aleksandar Markovic
2019-05-23 12:42   ` Richard Henderson
2019-05-24 10:43 ` Peter Maydell
2019-05-28 16:58 ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.