qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation
@ 2019-08-21 17:28 Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 01/75] target/i386: Push rex_r into DisasContext Jan Bobek
                   ` (73 more replies)
  0 siblings, 74 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Here comes the next version of my patch series, this time v4; the
previous version can be found at [1].

This version can decode all vector instructions up to AVX2. However,
note that this does not mean all of these instructions are implemented
correctly: to achieve that, the old ad-hoc style helpers will need to
be rewritten into the style used by gvec helpers (initial effort
towards that goal can be seen in patches 65-75). There is also a
number of instructions which have not been previously implemented at
all, for which the helpers need to be written from scratch.

Cheers,
 -Jan

Changes from v3:
  - introduce disas_insn_prefix ([2])
  - rename ck_cpuid to check_cpuid ([3])
  - clarify this series deals with vector instructions only ([4])
  - add a list of instructions per each vector ISA extension ([5])
  - change return type of check_cpuid and insnop_init to bool ([6])

References:
  1. https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg02616.html
  2. https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg02686.html
  3. https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg04120.html
  4. https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg04119.html
  5. https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg02689.html
  6. https://lists.nongnu.org/archive/html/qemu-devel/2019-08/msg02701.html

Jan Bobek (72):
  target/i386: use dflag from DisasContext
  target/i386: use prefix from DisasContext
  target/i386: introduce disas_insn_prefix
  target/i386: use pc_start from DisasContext
  target/i386: make variable b1 const
  target/i386: make variable is_xmm const
  target/i386: add vector register file alignment constraints
  target/i386: introduce gen_sse_ng
  target/i386: introduce CASES_* macros in gen_sse_ng
  target/i386: decode the 0F38/0F3A prefix in gen_sse_ng
  target/i386: introduce aliases for some tcg_gvec operations
  target/i386: introduce function check_cpuid
  target/i386: disable AVX/AVX2 cpuid bitchecks
  target/i386: introduce instruction operand infrastructure
  target/i386: introduce generic operand alias
  target/i386: introduce generic either-or operand
  target/i386: introduce generic load-store operand
  target/i386: introduce tcg register operands
  target/i386: introduce modrm operand
  target/i386: introduce operands for decoding modrm fields
  target/i386: introduce operand for direct-only r/m field
  target/i386: introduce Ib (immediate) operand
  target/i386: introduce M* (memptr) operands
  target/i386: introduce G*, R*, E* (general register) operands
  target/i386: introduce P*, N*, Q* (MMX) operands
  target/i386: introduce H*, L*, V*, U*, W* (SSE/AVX) operands
  target/i386: alias H* operands with the V* operands
  target/i386: introduce code generators
  target/i386: introduce helper-based code generator macros
  target/i386: introduce gvec-based code generator macros
  target/i386: introduce sse-opcode.inc.h
  target/i386: introduce instruction translator macros
  target/i386: introduce MMX translators
  target/i386: introduce MMX code generators
  target/i386: introduce MMX vector instructions to sse-opcode.inc.h
  target/i386: introduce SSE translators
  target/i386: introduce SSE code generators
  target/i386: introduce SSE vector instructions to sse-opcode.inc.h
  target/i386: introduce SSE2 translators
  target/i386: introduce SSE2 code generators
  target/i386: introduce SSE2 vector instructions to sse-opcode.inc.h
  target/i386: introduce SSE3 translators
  target/i386: introduce SSE3 code generators
  target/i386: introduce SSE3 vector instructions to sse-opcode.inc.h
  target/i386: introduce SSSE3 translators
  target/i386: introduce SSSE3 code generators
  target/i386: introduce SSSE3 vector instructions to sse-opcode.inc.h
  target/i386: introduce SSE4.1 translators
  target/i386: introduce SSE4.1 code generators
  target/i386: introduce SSE4.1 vector instructions to sse-opcode.inc.h
  target/i386: introduce SSE4.2 code generators
  target/i386: introduce SSE4.2 vector instructions to sse-opcode.inc.h
  target/i386: introduce AES and PCLMULQDQ translators
  target/i386: introduce AES and PCLMULQDQ code generators
  target/i386: introduce AES and PCLMULQDQ vector instructions to
    sse-opcode.inc.h
  target/i386: introduce AVX translators
  target/i386: introduce AVX code generators
  target/i386: introduce AVX vector instructions to sse-opcode.inc.h
  target/i386: introduce AVX2 translators
  target/i386: introduce AVX2 code generators
  target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h
  target/i386: remove obsoleted helpers
  target/i386: cleanup leftovers in ops_sse_header.h
  target/i386: introduce aliases for helper-based tcg_gen_gvec_*
    functions
  target/i386: convert ps((l,r)l(w,d,q),ra(w,d)) to helpers to gvec
    style
  target/i386: convert pmullw/pmulhw/pmulhuw helpers to gvec style
  target/i386: convert pavgb/pavgw helpers to gvec style
  target/i386: convert pmuludq/pmaddwd helpers to gvec style
  target/i386: convert psadbw helper to gvec style
  target/i386: remove obsoleted helper_mov(l,q)_mm_T0
  target/i386: convert pshuf(w,lw,hw,d),shuf(pd,ps) helpers to gvec
    style
  target/i386: convert pmovmskb/movmskps/movmskpd helpers to gvec style

Richard Henderson (3):
  target/i386: Push rex_r into DisasContext
  target/i386: Push rex_w into DisasContext
  target/i386: Simplify gen_exception arguments

 target/i386/cpu.h            |    6 +-
 target/i386/ops_sse.h        |  812 +++---
 target/i386/ops_sse_header.h |  131 +-
 target/i386/sse-opcode.inc.h | 2069 +++++++++++++++
 target/i386/translate.c      | 4777 ++++++++++++++++++++++++++++++----
 5 files changed, 6842 insertions(+), 953 deletions(-)
 create mode 100644 target/i386/sse-opcode.inc.h

-- 
2.20.1



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 01/75] target/i386: Push rex_r into DisasContext
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 02/75] target/i386: Push rex_w " Jan Bobek
                   ` (72 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson, Richard Henderson

From: Richard Henderson <rth@twiddle.net>

Treat this value the same as we do for rex_b and rex_x.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 85 +++++++++++++++++++++--------------------
 1 file changed, 44 insertions(+), 41 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 5cd74ad639..3aac84e5b0 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -43,10 +43,12 @@
 #define CODE64(s) ((s)->code64)
 #define REX_X(s) ((s)->rex_x)
 #define REX_B(s) ((s)->rex_b)
+#define REX_R(s) ((s)->rex_r)
 #else
 #define CODE64(s) 0
 #define REX_X(s) 0
 #define REX_B(s) 0
+#define REX_R(s) 0
 #endif
 
 #ifdef TARGET_X86_64
@@ -98,7 +100,7 @@ typedef struct DisasContext {
 #ifdef TARGET_X86_64
     int lma;    /* long mode active */
     int code64; /* 64 bit code segment */
-    int rex_x, rex_b;
+    int rex_x, rex_b, rex_r;
 #endif
     int vex_l;  /* vex vector length */
     int vex_v;  /* vex vvvv register, without 1's complement.  */
@@ -3037,7 +3039,7 @@ static const struct SSEOpHelper_eppi sse_op_table7[256] = {
 };
 
 static void gen_sse(CPUX86State *env, DisasContext *s, int b,
-                    target_ulong pc_start, int rex_r)
+                    target_ulong pc_start)
 {
     int b1, op1_offset, op2_offset, is_xmm, val;
     int modrm, mod, rm, reg;
@@ -3107,8 +3109,9 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
 
     modrm = x86_ldub_code(env, s);
     reg = ((modrm >> 3) & 7);
-    if (is_xmm)
-        reg |= rex_r;
+    if (is_xmm) {
+        reg |= REX_R(s);
+    }
     mod = (modrm >> 6) & 3;
     if (sse_fn_epp == SSE_SPECIAL) {
         b |= (b1 << 8);
@@ -3642,7 +3645,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                 tcg_gen_ld16u_tl(s->T0, cpu_env,
                                 offsetof(CPUX86State,fpregs[rm].mmx.MMX_W(val)));
             }
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             gen_op_mov_reg_v(s, ot, reg, s->T0);
             break;
         case 0x1d6: /* movq ea, xmm */
@@ -3686,7 +3689,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                                  offsetof(CPUX86State, fpregs[rm].mmx));
                 gen_helper_pmovmskb_mmx(s->tmp2_i32, cpu_env, s->ptr0);
             }
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32);
             break;
 
@@ -3698,7 +3701,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
             }
             modrm = x86_ldub_code(env, s);
             rm = modrm & 7;
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             mod = (modrm >> 6) & 3;
             if (b1 >= 2) {
                 goto unknown_op;
@@ -3774,7 +3777,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
             /* Various integer extensions at 0f 38 f[0-f].  */
             b = modrm | (b1 << 8);
             modrm = x86_ldub_code(env, s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
 
             switch (b) {
             case 0x3f0: /* crc32 Gd,Eb */
@@ -4128,7 +4131,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
             b = modrm;
             modrm = x86_ldub_code(env, s);
             rm = modrm & 7;
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             mod = (modrm >> 6) & 3;
             if (b1 >= 2) {
                 goto unknown_op;
@@ -4148,7 +4151,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                 rm = (modrm & 7) | REX_B(s);
                 if (mod != 3)
                     gen_lea_modrm(env, s, modrm);
-                reg = ((modrm >> 3) & 7) | rex_r;
+                reg = ((modrm >> 3) & 7) | REX_R(s);
                 val = x86_ldub_code(env, s);
                 switch (b) {
                 case 0x14: /* pextrb */
@@ -4317,7 +4320,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
             /* Various integer extensions at 0f 3a f[0-f].  */
             b = modrm | (b1 << 8);
             modrm = x86_ldub_code(env, s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
 
             switch (b) {
             case 0x3f0: /* rorx Gy,Ey, Ib */
@@ -4491,14 +4494,15 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     TCGMemOp ot, aflag, dflag;
     int modrm, reg, rm, mod, op, opreg, val;
     target_ulong next_eip, tval;
-    int rex_w, rex_r;
     target_ulong pc_start = s->base.pc_next;
+    int rex_w;
 
     s->pc_start = s->pc = pc_start;
     s->override = -1;
 #ifdef TARGET_X86_64
     s->rex_x = 0;
     s->rex_b = 0;
+    s->rex_r = 0;
     s->x86_64_hregs = false;
 #endif
     s->rip_offset = 0; /* for relative ip address */
@@ -4511,7 +4515,6 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     prefixes = 0;
     rex_w = -1;
-    rex_r = 0;
 
  next_byte:
     b = x86_ldub_code(env, s);
@@ -4555,9 +4558,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (CODE64(s)) {
             /* REX prefix */
             rex_w = (b >> 3) & 1;
-            rex_r = (b & 0x4) << 1;
+            s->rex_r = (b & 0x4) << 1;
             s->rex_x = (b & 0x2) << 2;
-            REX_B(s) = (b & 0x1) << 3;
+            s->rex_b = (b & 0x1) << 3;
             /* select uniform byte register addressing */
             s->x86_64_hregs = true;
             goto next_byte;
@@ -4590,8 +4593,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (s->x86_64_hregs) {
                 goto illegal_op;
             }
+            s->rex_r = (~vex2 >> 4) & 8;
 #endif
-            rex_r = (~vex2 >> 4) & 8;
             if (b == 0xc5) {
                 /* 2-byte VEX prefix: RVVVVlpp, implied 0f leading opcode byte */
                 vex3 = vex2;
@@ -4681,7 +4684,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             switch(f) {
             case 0: /* OP Ev, Gv */
                 modrm = x86_ldub_code(env, s);
-                reg = ((modrm >> 3) & 7) | rex_r;
+                reg = ((modrm >> 3) & 7) | REX_R(s);
                 mod = (modrm >> 6) & 3;
                 rm = (modrm & 7) | REX_B(s);
                 if (mod != 3) {
@@ -4703,7 +4706,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             case 1: /* OP Gv, Ev */
                 modrm = x86_ldub_code(env, s);
                 mod = (modrm >> 6) & 3;
-                reg = ((modrm >> 3) & 7) | rex_r;
+                reg = ((modrm >> 3) & 7) | REX_R(s);
                 rm = (modrm & 7) | REX_B(s);
                 if (mod != 3) {
                     gen_lea_modrm(env, s, modrm);
@@ -5123,7 +5126,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d(b, dflag);
 
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
 
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_v_reg(s, ot, s->T1, reg);
@@ -5195,7 +5198,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x6b:
         ot = dflag;
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         if (b == 0x69)
             s->rip_offset = insn_const_size(ot);
         else if (b == 0x6b)
@@ -5247,7 +5250,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1c1: /* xadd Ev, Gv */
         ot = mo_b_d(b, dflag);
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
         gen_op_mov_v_reg(s, ot, s->T0, reg);
         if (mod == 3) {
@@ -5279,7 +5282,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
             ot = mo_b_d(b, dflag);
             modrm = x86_ldub_code(env, s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             mod = (modrm >> 6) & 3;
             oldv = tcg_temp_new();
             newv = tcg_temp_new();
@@ -5501,7 +5504,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x89: /* mov Gv, Ev */
         ot = mo_b_d(b, dflag);
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
 
         /* generate a generic store */
         gen_ldst_modrm(env, s, modrm, ot, reg, 1);
@@ -5527,7 +5530,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x8b: /* mov Ev, Gv */
         ot = mo_b_d(b, dflag);
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
 
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
         gen_op_mov_reg_v(s, ot, reg, s->T0);
@@ -5577,7 +5580,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             s_ot = b & 8 ? MO_SIGN | ot : ot;
 
             modrm = x86_ldub_code(env, s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
 
@@ -5616,7 +5619,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         {
             AddressParts a = gen_lea_modrm_0(env, s, modrm);
             TCGv ea = gen_lea_modrm_1(s, a);
@@ -5698,7 +5701,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x87: /* xchg Ev, Gv */
         ot = mo_b_d(b, dflag);
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
         if (mod == 3) {
             rm = (modrm & 7) | REX_B(s);
@@ -5735,7 +5738,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     do_lxx:
         ot = dflag != MO_16 ? MO_32 : MO_16;
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
@@ -5818,7 +5821,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         modrm = x86_ldub_code(env, s);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         if (mod != 3) {
             gen_lea_modrm(env, s, modrm);
             opreg = OR_TMP0;
@@ -6669,7 +6672,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         }
         ot = dflag;
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         gen_cmovcc1(env, s, ot, b, modrm, reg);
         break;
 
@@ -6819,7 +6822,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     do_btx:
         ot = dflag;
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
         gen_op_mov_v_reg(s, MO_32, s->T1, reg);
@@ -6924,7 +6927,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1bd: /* bsr / lzcnt */
         ot = dflag;
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
         gen_extu(ot, s->T0);
 
@@ -7686,7 +7689,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             d_ot = dflag;
 
             modrm = x86_ldub_code(env, s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             mod = (modrm >> 6) & 3;
             rm = (modrm & 7) | REX_B(s);
 
@@ -7760,7 +7763,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             ot = dflag != MO_16 ? MO_32 : MO_16;
             modrm = x86_ldub_code(env, s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
             t0 = tcg_temp_local_new();
             gen_update_cc_op(s);
@@ -7801,7 +7804,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         modrm = x86_ldub_code(env, s);
         if (s->flags & HF_MPX_EN_MASK) {
             mod = (modrm >> 6) & 3;
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             if (prefixes & PREFIX_REPZ) {
                 /* bndcl */
                 if (reg >= 4
@@ -7891,7 +7894,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         modrm = x86_ldub_code(env, s);
         if (s->flags & HF_MPX_EN_MASK) {
             mod = (modrm >> 6) & 3;
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             if (mod != 3 && (prefixes & PREFIX_REPZ)) {
                 /* bndmk */
                 if (reg >= 4
@@ -8005,7 +8008,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
              * are assumed to be 1's, regardless of actual values.
              */
             rm = (modrm & 7) | REX_B(s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             if (CODE64(s))
                 ot = MO_64;
             else
@@ -8059,7 +8062,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
              * are assumed to be 1's, regardless of actual values.
              */
             rm = (modrm & 7) | REX_B(s);
-            reg = ((modrm >> 3) & 7) | rex_r;
+            reg = ((modrm >> 3) & 7) | REX_R(s);
             if (CODE64(s))
                 ot = MO_64;
             else
@@ -8102,7 +8105,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         mod = (modrm >> 6) & 3;
         if (mod == 3)
             goto illegal_op;
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
         /* generate a generic store */
         gen_ldst_modrm(env, s, modrm, ot, reg, 1);
         break;
@@ -8328,7 +8331,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             goto illegal_op;
 
         modrm = x86_ldub_code(env, s);
-        reg = ((modrm >> 3) & 7) | rex_r;
+        reg = ((modrm >> 3) & 7) | REX_R(s);
 
         if (s->prefix & PREFIX_DATA) {
             ot = MO_16;
@@ -8356,7 +8359,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1c2:
     case 0x1c4 ... 0x1c6:
     case 0x1d0 ... 0x1fe:
-        gen_sse(env, s, b, pc_start, rex_r);
+        gen_sse(env, s, b, pc_start);
         break;
     default:
         goto unknown_op;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 02/75] target/i386: Push rex_w into DisasContext
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 01/75] target/i386: Push rex_r into DisasContext Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-22  4:07   ` Aleksandar Markovic
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 03/75] target/i386: use dflag from DisasContext Jan Bobek
                   ` (71 subsequent siblings)
  73 siblings, 1 reply; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson, Richard Henderson

From: Richard Henderson <rth@twiddle.net>

Treat this the same as we already do for other rex bits.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3aac84e5b0..9ec1c79371 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -44,11 +44,13 @@
 #define REX_X(s) ((s)->rex_x)
 #define REX_B(s) ((s)->rex_b)
 #define REX_R(s) ((s)->rex_r)
+#define REX_W(s) ((s)->rex_w)
 #else
 #define CODE64(s) 0
 #define REX_X(s) 0
 #define REX_B(s) 0
 #define REX_R(s) 0
+#define REX_W(s) -1
 #endif
 
 #ifdef TARGET_X86_64
@@ -100,7 +102,7 @@ typedef struct DisasContext {
 #ifdef TARGET_X86_64
     int lma;    /* long mode active */
     int code64; /* 64 bit code segment */
-    int rex_x, rex_b, rex_r;
+    int rex_x, rex_b, rex_r, rex_w;
 #endif
     int vex_l;  /* vex vector length */
     int vex_v;  /* vex vvvv register, without 1's complement.  */
@@ -4495,7 +4497,6 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     int modrm, reg, rm, mod, op, opreg, val;
     target_ulong next_eip, tval;
     target_ulong pc_start = s->base.pc_next;
-    int rex_w;
 
     s->pc_start = s->pc = pc_start;
     s->override = -1;
@@ -4503,6 +4504,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     s->rex_x = 0;
     s->rex_b = 0;
     s->rex_r = 0;
+    s->rex_w = -1;
     s->x86_64_hregs = false;
 #endif
     s->rip_offset = 0; /* for relative ip address */
@@ -4514,7 +4516,6 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     }
 
     prefixes = 0;
-    rex_w = -1;
 
  next_byte:
     b = x86_ldub_code(env, s);
@@ -4557,7 +4558,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x40 ... 0x4f:
         if (CODE64(s)) {
             /* REX prefix */
-            rex_w = (b >> 3) & 1;
+            s->rex_w = (b >> 3) & 1;
             s->rex_r = (b & 0x4) << 1;
             s->rex_x = (b & 0x2) << 2;
             s->rex_b = (b & 0x1) << 3;
@@ -4606,7 +4607,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 s->rex_b = (~vex2 >> 2) & 8;
 #endif
                 vex3 = x86_ldub_code(env, s);
-                rex_w = (vex3 >> 7) & 1;
+#ifdef TARGET_X86_64
+                s->rex_w = (vex3 >> 7) & 1;
+#endif
                 switch (vex2 & 0x1f) {
                 case 0x01: /* Implied 0f leading opcode bytes.  */
                     b = x86_ldub_code(env, s) | 0x100;
@@ -4631,9 +4634,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     /* Post-process prefixes.  */
     if (CODE64(s)) {
         /* In 64-bit mode, the default data size is 32-bit.  Select 64-bit
-           data with rex_w, and 16-bit data with 0x66; rex_w takes precedence
+           data with REX_W, and 16-bit data with 0x66; REX_W takes precedence
            over 0x66 if both are present.  */
-        dflag = (rex_w > 0 ? MO_64 : prefixes & PREFIX_DATA ? MO_16 : MO_32);
+        dflag = (REX_W(s) > 0 ? MO_64 : prefixes & PREFIX_DATA ? MO_16 : MO_32);
         /* In 64-bit mode, 0x67 selects 32-bit addressing.  */
         aflag = (prefixes & PREFIX_ADR ? MO_32 : MO_64);
     } else {
@@ -5029,7 +5032,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 /* operand size for jumps is 64 bit */
                 ot = MO_64;
             } else if (op == 3 || op == 5) {
-                ot = dflag != MO_16 ? MO_32 + (rex_w == 1) : MO_16;
+                ot = dflag != MO_16 ? MO_32 + (REX_W(s) == 1) : MO_16;
             } else if (op == 6) {
                 /* default push size is 64 bit */
                 ot = mo_pushpop(s, dflag);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 03/75] target/i386: use dflag from DisasContext
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 01/75] target/i386: Push rex_r into DisasContext Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 02/75] target/i386: Push rex_w " Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 04/75] target/i386: use prefix " Jan Bobek
                   ` (70 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jan Bobek, Alex Bennée, Richard Henderson, Richard Henderson

There already is a variable dflag in DisasContext, so use that one
rather than the identically-named local helper variable.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 180 ++++++++++++++++++++--------------------
 1 file changed, 90 insertions(+), 90 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 9ec1c79371..7226c67d2e 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4682,7 +4682,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             op = (b >> 3) & 7;
             f = (b >> 1) & 3;
 
-            ot = mo_b_d(b, dflag);
+            ot = mo_b_d(b, s->dflag);
 
             switch(f) {
             case 0: /* OP Ev, Gv */
@@ -4740,7 +4740,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         {
             int val;
 
-            ot = mo_b_d(b, dflag);
+            ot = mo_b_d(b, s->dflag);
 
             modrm = x86_ldub_code(env, s);
             mod = (modrm >> 6) & 3;
@@ -4777,16 +4777,16 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         /**************************/
         /* inc, dec, and other misc arith */
     case 0x40 ... 0x47: /* inc Gv */
-        ot = dflag;
+        ot = s->dflag;
         gen_inc(s, ot, OR_EAX + (b & 7), 1);
         break;
     case 0x48 ... 0x4f: /* dec Gv */
-        ot = dflag;
+        ot = s->dflag;
         gen_inc(s, ot, OR_EAX + (b & 7), -1);
         break;
     case 0xf6: /* GRP3 */
     case 0xf7:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
 
         modrm = x86_ldub_code(env, s);
         mod = (modrm >> 6) & 3;
@@ -5018,7 +5018,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0xfe: /* GRP4 */
     case 0xff: /* GRP5 */
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
 
         modrm = x86_ldub_code(env, s);
         mod = (modrm >> 6) & 3;
@@ -5032,10 +5032,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 /* operand size for jumps is 64 bit */
                 ot = MO_64;
             } else if (op == 3 || op == 5) {
-                ot = dflag != MO_16 ? MO_32 + (REX_W(s) == 1) : MO_16;
+                ot = s->dflag != MO_16 ? MO_32 + (REX_W(s) == 1) : MO_16;
             } else if (op == 6) {
                 /* default push size is 64 bit */
-                ot = mo_pushpop(s, dflag);
+                ot = mo_pushpop(s, s->dflag);
             }
         }
         if (mod != 3) {
@@ -5063,7 +5063,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             break;
         case 2: /* call Ev */
             /* XXX: optimize if memory (no 'and' is necessary) */
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tcg_gen_ext16u_tl(s->T0, s->T0);
             }
             next_eip = s->pc - s->cs_base;
@@ -5081,19 +5081,19 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (s->pe && !s->vm86) {
                 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
                 gen_helper_lcall_protected(cpu_env, s->tmp2_i32, s->T1,
-                                           tcg_const_i32(dflag - 1),
+                                           tcg_const_i32(s->dflag - 1),
                                            tcg_const_tl(s->pc - s->cs_base));
             } else {
                 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
                 gen_helper_lcall_real(cpu_env, s->tmp2_i32, s->T1,
-                                      tcg_const_i32(dflag - 1),
+                                      tcg_const_i32(s->dflag - 1),
                                       tcg_const_i32(s->pc - s->cs_base));
             }
             tcg_gen_ld_tl(s->tmp4, cpu_env, offsetof(CPUX86State, eip));
             gen_jr(s, s->tmp4);
             break;
         case 4: /* jmp Ev */
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tcg_gen_ext16u_tl(s->T0, s->T0);
             }
             gen_op_jmp_v(s->T0);
@@ -5126,7 +5126,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0x84: /* test Ev, Gv */
     case 0x85:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
 
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
@@ -5139,7 +5139,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0xa8: /* test eAX, Iv */
     case 0xa9:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         val = insn_get(env, s, ot);
 
         gen_op_mov_v_reg(s, ot, s->T0, OR_EAX);
@@ -5149,7 +5149,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
 
     case 0x98: /* CWDE/CBW */
-        switch (dflag) {
+        switch (s->dflag) {
 #ifdef TARGET_X86_64
         case MO_64:
             gen_op_mov_v_reg(s, MO_32, s->T0, R_EAX);
@@ -5172,7 +5172,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         }
         break;
     case 0x99: /* CDQ/CWD */
-        switch (dflag) {
+        switch (s->dflag) {
 #ifdef TARGET_X86_64
         case MO_64:
             gen_op_mov_v_reg(s, MO_64, s->T0, R_EAX);
@@ -5199,7 +5199,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1af: /* imul Gv, Ev */
     case 0x69: /* imul Gv, Ev, I */
     case 0x6b:
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         if (b == 0x69)
@@ -5251,7 +5251,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x1c0:
     case 0x1c1: /* xadd Ev, Gv */
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
@@ -5283,7 +5283,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         {
             TCGv oldv, newv, cmpv;
 
-            ot = mo_b_d(b, dflag);
+            ot = mo_b_d(b, s->dflag);
             modrm = x86_ldub_code(env, s);
             reg = ((modrm >> 3) & 7) | REX_R(s);
             mod = (modrm >> 6) & 3;
@@ -5344,7 +5344,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
 #ifdef TARGET_X86_64
-            if (dflag == MO_64) {
+            if (s->dflag == MO_64) {
                 if (!(s->cpuid_ext_features & CPUID_EXT_CX16)) {
                     goto illegal_op;
                 }
@@ -5384,7 +5384,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             }
             gen_helper_rdrand(s->T0, cpu_env);
             rm = (modrm & 7) | REX_B(s);
-            gen_op_mov_reg_v(s, dflag, rm, s->T0);
+            gen_op_mov_reg_v(s, s->dflag, rm, s->T0);
             set_cc_op(s, CC_OP_EFLAGS);
             if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
                 gen_jmp(s, s->pc - s->cs_base);
@@ -5420,7 +5420,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x68: /* push Iv */
     case 0x6a:
-        ot = mo_pushpop(s, dflag);
+        ot = mo_pushpop(s, s->dflag);
         if (b == 0x68)
             val = insn_get(env, s, ot);
         else
@@ -5505,7 +5505,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         /* mov */
     case 0x88:
     case 0x89: /* mov Gv, Ev */
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
 
@@ -5514,7 +5514,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xc6:
     case 0xc7: /* mov Ev, Iv */
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         modrm = x86_ldub_code(env, s);
         mod = (modrm >> 6) & 3;
         if (mod != 3) {
@@ -5531,7 +5531,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x8a:
     case 0x8b: /* mov Ev, Gv */
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
 
@@ -5563,7 +5563,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (reg >= 6)
             goto illegal_op;
         gen_op_movl_T0_seg(s, reg);
-        ot = mod == 3 ? dflag : MO_16;
+        ot = mod == 3 ? s->dflag : MO_16;
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
         break;
 
@@ -5576,7 +5576,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             TCGMemOp s_ot;
 
             /* d_ot is the size of destination */
-            d_ot = dflag;
+            d_ot = s->dflag;
             /* ot is the size of source */
             ot = (b & 1) + MO_8;
             /* s_ot is the sign+size of source */
@@ -5627,7 +5627,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             AddressParts a = gen_lea_modrm_0(env, s, modrm);
             TCGv ea = gen_lea_modrm_1(s, a);
             gen_lea_v_seg(s, s->aflag, ea, -1, -1);
-            gen_op_mov_reg_v(s, dflag, reg, s->A0);
+            gen_op_mov_reg_v(s, s->dflag, reg, s->A0);
         }
         break;
 
@@ -5638,7 +5638,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         {
             target_ulong offset_addr;
 
-            ot = mo_b_d(b, dflag);
+            ot = mo_b_d(b, s->dflag);
             switch (s->aflag) {
 #ifdef TARGET_X86_64
             case MO_64:
@@ -5676,7 +5676,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xb8 ... 0xbf: /* mov R, Iv */
 #ifdef TARGET_X86_64
-        if (dflag == MO_64) {
+        if (s->dflag == MO_64) {
             uint64_t tmp;
             /* 64 bit case */
             tmp = x86_ldq_code(env, s);
@@ -5686,7 +5686,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         } else
 #endif
         {
-            ot = dflag;
+            ot = s->dflag;
             val = insn_get(env, s, ot);
             reg = (b & 7) | REX_B(s);
             tcg_gen_movi_tl(s->T0, val);
@@ -5696,13 +5696,13 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0x91 ... 0x97: /* xchg R, EAX */
     do_xchg_reg_eax:
-        ot = dflag;
+        ot = s->dflag;
         reg = (b & 7) | REX_B(s);
         rm = R_EAX;
         goto do_xchg_reg;
     case 0x86:
     case 0x87: /* xchg Ev, Gv */
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
@@ -5739,7 +5739,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1b5: /* lgs Gv */
         op = R_GS;
     do_lxx:
-        ot = dflag != MO_16 ? MO_32 : MO_16;
+        ot = s->dflag != MO_16 ? MO_32 : MO_16;
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
@@ -5767,7 +5767,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         shift = 2;
     grp2:
         {
-            ot = mo_b_d(b, dflag);
+            ot = mo_b_d(b, s->dflag);
             modrm = x86_ldub_code(env, s);
             mod = (modrm >> 6) & 3;
             op = (modrm >> 3) & 7;
@@ -5820,7 +5820,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         op = 1;
         shift = 0;
     do_shiftd:
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         mod = (modrm >> 6) & 3;
         rm = (modrm & 7) | REX_B(s);
@@ -5982,7 +5982,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 }
                 break;
             case 0x0c: /* fldenv mem */
-                gen_helper_fldenv(cpu_env, s->A0, tcg_const_i32(dflag - 1));
+                gen_helper_fldenv(cpu_env, s->A0, tcg_const_i32(s->dflag - 1));
                 break;
             case 0x0d: /* fldcw mem */
                 tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
@@ -5990,7 +5990,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 gen_helper_fldcw(cpu_env, s->tmp2_i32);
                 break;
             case 0x0e: /* fnstenv mem */
-                gen_helper_fstenv(cpu_env, s->A0, tcg_const_i32(dflag - 1));
+                gen_helper_fstenv(cpu_env, s->A0, tcg_const_i32(s->dflag - 1));
                 break;
             case 0x0f: /* fnstcw mem */
                 gen_helper_fnstcw(s->tmp2_i32, cpu_env);
@@ -6005,10 +6005,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 gen_helper_fpop(cpu_env);
                 break;
             case 0x2c: /* frstor mem */
-                gen_helper_frstor(cpu_env, s->A0, tcg_const_i32(dflag - 1));
+                gen_helper_frstor(cpu_env, s->A0, tcg_const_i32(s->dflag - 1));
                 break;
             case 0x2e: /* fnsave mem */
-                gen_helper_fsave(cpu_env, s->A0, tcg_const_i32(dflag - 1));
+                gen_helper_fsave(cpu_env, s->A0, tcg_const_i32(s->dflag - 1));
                 break;
             case 0x2f: /* fnstsw mem */
                 gen_helper_fnstsw(s->tmp2_i32, cpu_env);
@@ -6350,7 +6350,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0xa4: /* movsS */
     case 0xa5:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_movs(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
@@ -6360,7 +6360,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0xaa: /* stosS */
     case 0xab:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_stos(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
@@ -6369,7 +6369,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xac: /* lodsS */
     case 0xad:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_lods(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
@@ -6378,7 +6378,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xae: /* scasS */
     case 0xaf:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         if (prefixes & PREFIX_REPNZ) {
             gen_repz_scas(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 1);
         } else if (prefixes & PREFIX_REPZ) {
@@ -6390,7 +6390,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0xa6: /* cmpsS */
     case 0xa7:
-        ot = mo_b_d(b, dflag);
+        ot = mo_b_d(b, s->dflag);
         if (prefixes & PREFIX_REPNZ) {
             gen_repz_cmps(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 1);
         } else if (prefixes & PREFIX_REPZ) {
@@ -6401,7 +6401,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x6c: /* insS */
     case 0x6d:
-        ot = mo_b_d32(b, dflag);
+        ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base, 
                      SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes) | 4);
@@ -6416,7 +6416,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x6e: /* outsS */
     case 0x6f:
-        ot = mo_b_d32(b, dflag);
+        ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
                      svm_is_rep(prefixes) | 4);
@@ -6435,7 +6435,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     case 0xe4:
     case 0xe5:
-        ot = mo_b_d32(b, dflag);
+        ot = mo_b_d32(b, s->dflag);
         val = x86_ldub_code(env, s);
         tcg_gen_movi_tl(s->T0, val);
         gen_check_io(s, ot, pc_start - s->cs_base,
@@ -6453,7 +6453,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xe6:
     case 0xe7:
-        ot = mo_b_d32(b, dflag);
+        ot = mo_b_d32(b, s->dflag);
         val = x86_ldub_code(env, s);
         tcg_gen_movi_tl(s->T0, val);
         gen_check_io(s, ot, pc_start - s->cs_base,
@@ -6473,7 +6473,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xec:
     case 0xed:
-        ot = mo_b_d32(b, dflag);
+        ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
                      SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes));
@@ -6490,7 +6490,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xee:
     case 0xef:
-        ot = mo_b_d32(b, dflag);
+        ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
                      svm_is_rep(prefixes));
@@ -6533,21 +6533,21 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->pe && !s->vm86) {
             gen_update_cc_op(s);
             gen_jmp_im(s, pc_start - s->cs_base);
-            gen_helper_lret_protected(cpu_env, tcg_const_i32(dflag - 1),
+            gen_helper_lret_protected(cpu_env, tcg_const_i32(s->dflag - 1),
                                       tcg_const_i32(val));
         } else {
             gen_stack_A0(s);
             /* pop offset */
-            gen_op_ld_v(s, dflag, s->T0, s->A0);
+            gen_op_ld_v(s, s->dflag, s->T0, s->A0);
             /* NOTE: keeping EIP updated is not a problem in case of
                exception */
             gen_op_jmp_v(s->T0);
             /* pop selector */
-            gen_add_A0_im(s, 1 << dflag);
-            gen_op_ld_v(s, dflag, s->T0, s->A0);
+            gen_add_A0_im(s, 1 << s->dflag);
+            gen_op_ld_v(s, s->dflag, s->T0, s->A0);
             gen_op_movl_seg_T0_vm(s, R_CS);
             /* add stack offset */
-            gen_stack_update(s, val + (2 << dflag));
+            gen_stack_update(s, val + (2 << s->dflag));
         }
         gen_eob(s);
         break;
@@ -6558,17 +6558,17 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         gen_svm_check_intercept(s, pc_start, SVM_EXIT_IRET);
         if (!s->pe) {
             /* real mode */
-            gen_helper_iret_real(cpu_env, tcg_const_i32(dflag - 1));
+            gen_helper_iret_real(cpu_env, tcg_const_i32(s->dflag - 1));
             set_cc_op(s, CC_OP_EFLAGS);
         } else if (s->vm86) {
             if (s->iopl != 3) {
                 gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
             } else {
-                gen_helper_iret_real(cpu_env, tcg_const_i32(dflag - 1));
+                gen_helper_iret_real(cpu_env, tcg_const_i32(s->dflag - 1));
                 set_cc_op(s, CC_OP_EFLAGS);
             }
         } else {
-            gen_helper_iret_protected(cpu_env, tcg_const_i32(dflag - 1),
+            gen_helper_iret_protected(cpu_env, tcg_const_i32(s->dflag - 1),
                                       tcg_const_i32(s->pc - s->cs_base));
             set_cc_op(s, CC_OP_EFLAGS);
         }
@@ -6576,14 +6576,14 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xe8: /* call im */
         {
-            if (dflag != MO_16) {
+            if (s->dflag != MO_16) {
                 tval = (int32_t)insn_get(env, s, MO_32);
             } else {
                 tval = (int16_t)insn_get(env, s, MO_16);
             }
             next_eip = s->pc - s->cs_base;
             tval += next_eip;
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tval &= 0xffff;
             } else if (!CODE64(s)) {
                 tval &= 0xffffffff;
@@ -6600,7 +6600,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
             if (CODE64(s))
                 goto illegal_op;
-            ot = dflag;
+            ot = s->dflag;
             offset = insn_get(env, s, ot);
             selector = insn_get(env, s, MO_16);
 
@@ -6609,13 +6609,13 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         }
         goto do_lcall;
     case 0xe9: /* jmp im */
-        if (dflag != MO_16) {
+        if (s->dflag != MO_16) {
             tval = (int32_t)insn_get(env, s, MO_32);
         } else {
             tval = (int16_t)insn_get(env, s, MO_16);
         }
         tval += s->pc - s->cs_base;
-        if (dflag == MO_16) {
+        if (s->dflag == MO_16) {
             tval &= 0xffff;
         } else if (!CODE64(s)) {
             tval &= 0xffffffff;
@@ -6629,7 +6629,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
             if (CODE64(s))
                 goto illegal_op;
-            ot = dflag;
+            ot = s->dflag;
             offset = insn_get(env, s, ot);
             selector = insn_get(env, s, MO_16);
 
@@ -6640,7 +6640,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xeb: /* jmp Jb */
         tval = (int8_t)insn_get(env, s, MO_8);
         tval += s->pc - s->cs_base;
-        if (dflag == MO_16) {
+        if (s->dflag == MO_16) {
             tval &= 0xffff;
         }
         gen_jmp(s, tval);
@@ -6649,7 +6649,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         tval = (int8_t)insn_get(env, s, MO_8);
         goto do_jcc;
     case 0x180 ... 0x18f: /* jcc Jv */
-        if (dflag != MO_16) {
+        if (s->dflag != MO_16) {
             tval = (int32_t)insn_get(env, s, MO_32);
         } else {
             tval = (int16_t)insn_get(env, s, MO_16);
@@ -6657,7 +6657,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     do_jcc:
         next_eip = s->pc - s->cs_base;
         tval += next_eip;
-        if (dflag == MO_16) {
+        if (s->dflag == MO_16) {
             tval &= 0xffff;
         }
         gen_bnd_jmp(s);
@@ -6673,7 +6673,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (!(s->cpuid_features & CPUID_CMOV)) {
             goto illegal_op;
         }
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         gen_cmovcc1(env, s, ot, b, modrm, reg);
@@ -6698,7 +6698,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         } else {
             ot = gen_pop_T0(s);
             if (s->cpl == 0) {
-                if (dflag != MO_16) {
+                if (s->dflag != MO_16) {
                     gen_helper_write_eflags(cpu_env, s->T0,
                                             tcg_const_i32((TF_MASK | AC_MASK |
                                                            ID_MASK | NT_MASK |
@@ -6713,7 +6713,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 }
             } else {
                 if (s->cpl <= s->iopl) {
-                    if (dflag != MO_16) {
+                    if (s->dflag != MO_16) {
                         gen_helper_write_eflags(cpu_env, s->T0,
                                                 tcg_const_i32((TF_MASK |
                                                                AC_MASK |
@@ -6730,7 +6730,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                                                               & 0xffff));
                     }
                 } else {
-                    if (dflag != MO_16) {
+                    if (s->dflag != MO_16) {
                         gen_helper_write_eflags(cpu_env, s->T0,
                                            tcg_const_i32((TF_MASK | AC_MASK |
                                                           ID_MASK | NT_MASK)));
@@ -6790,7 +6790,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         /************************/
         /* bit operations */
     case 0x1ba: /* bt/bts/btr/btc Gv, im */
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         op = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
@@ -6823,7 +6823,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1bb: /* btc */
         op = 3;
     do_btx:
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         mod = (modrm >> 6) & 3;
@@ -6928,7 +6928,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x1bc: /* bsf / tzcnt */
     case 0x1bd: /* bsr / lzcnt */
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         reg = ((modrm >> 3) & 7) | REX_R(s);
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
@@ -7102,7 +7102,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x62: /* bound */
         if (CODE64(s))
             goto illegal_op;
-        ot = dflag;
+        ot = s->dflag;
         modrm = x86_ldub_code(env, s);
         reg = (modrm >> 3) & 7;
         mod = (modrm >> 6) & 3;
@@ -7120,7 +7120,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1c8 ... 0x1cf: /* bswap reg */
         reg = (b & 7) | REX_B(s);
 #ifdef TARGET_X86_64
-        if (dflag == MO_64) {
+        if (s->dflag == MO_64) {
             gen_op_mov_v_reg(s, MO_64, s->T0, reg);
             tcg_gen_bswap64_i64(s->T0, s->T0);
             gen_op_mov_reg_v(s, MO_64, reg, s->T0);
@@ -7150,7 +7150,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             tval = (int8_t)insn_get(env, s, MO_8);
             next_eip = s->pc - s->cs_base;
             tval += next_eip;
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tval &= 0xffff;
             }
 
@@ -7233,7 +7233,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (!s->pe) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            gen_helper_sysexit(cpu_env, tcg_const_i32(dflag - 1));
+            gen_helper_sysexit(cpu_env, tcg_const_i32(s->dflag - 1));
             gen_eob(s);
         }
         break;
@@ -7252,7 +7252,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (!s->pe) {
             gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
         } else {
-            gen_helper_sysret(cpu_env, tcg_const_i32(dflag - 1));
+            gen_helper_sysret(cpu_env, tcg_const_i32(s->dflag - 1));
             /* condition codes are modified only in long mode */
             if (s->lma) {
                 set_cc_op(s, CC_OP_EFLAGS);
@@ -7291,7 +7291,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_svm_check_intercept(s, pc_start, SVM_EXIT_LDTR_READ);
             tcg_gen_ld32u_tl(s->T0, cpu_env,
                              offsetof(CPUX86State, ldt.selector));
-            ot = mod == 3 ? dflag : MO_16;
+            ot = mod == 3 ? s->dflag : MO_16;
             gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
             break;
         case 2: /* lldt */
@@ -7312,7 +7312,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_svm_check_intercept(s, pc_start, SVM_EXIT_TR_READ);
             tcg_gen_ld32u_tl(s->T0, cpu_env,
                              offsetof(CPUX86State, tr.selector));
-            ot = mod == 3 ? dflag : MO_16;
+            ot = mod == 3 ? s->dflag : MO_16;
             gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
             break;
         case 3: /* ltr */
@@ -7356,7 +7356,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_op_st_v(s, MO_16, s->T0, s->A0);
             gen_add_A0_im(s, 2);
             tcg_gen_ld_tl(s->T0, cpu_env, offsetof(CPUX86State, gdt.base));
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tcg_gen_andi_tl(s->T0, s->T0, 0xffffff);
             }
             gen_op_st_v(s, CODE64(s) + MO_32, s->T0, s->A0);
@@ -7411,7 +7411,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_op_st_v(s, MO_16, s->T0, s->A0);
             gen_add_A0_im(s, 2);
             tcg_gen_ld_tl(s->T0, cpu_env, offsetof(CPUX86State, idt.base));
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tcg_gen_andi_tl(s->T0, s->T0, 0xffffff);
             }
             gen_op_st_v(s, CODE64(s) + MO_32, s->T0, s->A0);
@@ -7561,7 +7561,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_op_ld_v(s, MO_16, s->T1, s->A0);
             gen_add_A0_im(s, 2);
             gen_op_ld_v(s, CODE64(s) + MO_32, s->T0, s->A0);
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tcg_gen_andi_tl(s->T0, s->T0, 0xffffff);
             }
             tcg_gen_st_tl(s->T0, cpu_env, offsetof(CPUX86State, gdt.base));
@@ -7578,7 +7578,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_op_ld_v(s, MO_16, s->T1, s->A0);
             gen_add_A0_im(s, 2);
             gen_op_ld_v(s, CODE64(s) + MO_32, s->T0, s->A0);
-            if (dflag == MO_16) {
+            if (s->dflag == MO_16) {
                 tcg_gen_andi_tl(s->T0, s->T0, 0xffffff);
             }
             tcg_gen_st_tl(s->T0, cpu_env, offsetof(CPUX86State, idt.base));
@@ -7689,7 +7689,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (CODE64(s)) {
             int d_ot;
             /* d_ot is the size of destination */
-            d_ot = dflag;
+            d_ot = s->dflag;
 
             modrm = x86_ldub_code(env, s);
             reg = ((modrm >> 3) & 7) | REX_R(s);
@@ -7764,7 +7764,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             TCGv t0;
             if (!s->pe || s->vm86)
                 goto illegal_op;
-            ot = dflag != MO_16 ? MO_32 : MO_16;
+            ot = s->dflag != MO_16 ? MO_32 : MO_16;
             modrm = x86_ldub_code(env, s);
             reg = ((modrm >> 3) & 7) | REX_R(s);
             gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
@@ -8103,7 +8103,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1c3: /* MOVNTI reg, mem */
         if (!(s->cpuid_features & CPUID_SSE2))
             goto illegal_op;
-        ot = mo_64_32(dflag);
+        ot = mo_64_32(s->dflag);
         modrm = x86_ldub_code(env, s);
         mod = (modrm >> 6) & 3;
         if (mod == 3)
@@ -8339,7 +8339,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->prefix & PREFIX_DATA) {
             ot = MO_16;
         } else {
-            ot = mo_64_32(dflag);
+            ot = mo_64_32(s->dflag);
         }
 
         gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 04/75] target/i386: use prefix from DisasContext
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (2 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 03/75] target/i386: use dflag from DisasContext Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 05/75] target/i386: introduce disas_insn_prefix Jan Bobek
                   ` (69 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jan Bobek, Alex Bennée, Richard Henderson, Richard Henderson

Use prefix from DisasContext instead of the local helper variable
prefixes.

Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 110 ++++++++++++++++++++--------------------
 1 file changed, 55 insertions(+), 55 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 7226c67d2e..99a9097c49 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6351,7 +6351,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xa4: /* movsS */
     case 0xa5:
         ot = mo_b_d(b, s->dflag);
-        if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
+        if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_movs(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_movs(s, ot);
@@ -6361,7 +6361,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xaa: /* stosS */
     case 0xab:
         ot = mo_b_d(b, s->dflag);
-        if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
+        if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_stos(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_stos(s, ot);
@@ -6370,7 +6370,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xac: /* lodsS */
     case 0xad:
         ot = mo_b_d(b, s->dflag);
-        if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
+        if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_lods(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_lods(s, ot);
@@ -6379,9 +6379,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xae: /* scasS */
     case 0xaf:
         ot = mo_b_d(b, s->dflag);
-        if (prefixes & PREFIX_REPNZ) {
+        if (s->prefix & PREFIX_REPNZ) {
             gen_repz_scas(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 1);
-        } else if (prefixes & PREFIX_REPZ) {
+        } else if (s->prefix & PREFIX_REPZ) {
             gen_repz_scas(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 0);
         } else {
             gen_scas(s, ot);
@@ -6391,9 +6391,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xa6: /* cmpsS */
     case 0xa7:
         ot = mo_b_d(b, s->dflag);
-        if (prefixes & PREFIX_REPNZ) {
+        if (s->prefix & PREFIX_REPNZ) {
             gen_repz_cmps(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 1);
-        } else if (prefixes & PREFIX_REPZ) {
+        } else if (s->prefix & PREFIX_REPZ) {
             gen_repz_cmps(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 0);
         } else {
             gen_cmps(s, ot);
@@ -6404,8 +6404,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base, 
-                     SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes) | 4);
-        if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
+                     SVM_IOIO_TYPE_MASK | svm_is_rep(s->prefix) | 4);
+        if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_ins(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_ins(s, ot);
@@ -6419,8 +6419,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
-                     svm_is_rep(prefixes) | 4);
-        if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
+                     svm_is_rep(s->prefix) | 4);
+        if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
             gen_repz_outs(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_outs(s, ot);
@@ -6439,7 +6439,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         val = x86_ldub_code(env, s);
         tcg_gen_movi_tl(s->T0, val);
         gen_check_io(s, ot, pc_start - s->cs_base,
-                     SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes));
+                     SVM_IOIO_TYPE_MASK | svm_is_rep(s->prefix));
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
             gen_io_start();
         }
@@ -6457,7 +6457,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         val = x86_ldub_code(env, s);
         tcg_gen_movi_tl(s->T0, val);
         gen_check_io(s, ot, pc_start - s->cs_base,
-                     svm_is_rep(prefixes));
+                     svm_is_rep(s->prefix));
         gen_op_mov_v_reg(s, ot, s->T1, R_EAX);
 
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
@@ -6476,7 +6476,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
-                     SVM_IOIO_TYPE_MASK | svm_is_rep(prefixes));
+                     SVM_IOIO_TYPE_MASK | svm_is_rep(s->prefix));
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
             gen_io_start();
         }
@@ -6493,7 +6493,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
         gen_check_io(s, ot, pc_start - s->cs_base,
-                     svm_is_rep(prefixes));
+                     svm_is_rep(s->prefix));
         gen_op_mov_v_reg(s, ot, s->T1, R_EAX);
 
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
@@ -6935,7 +6935,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         gen_extu(ot, s->T0);
 
         /* Note that lzcnt and tzcnt are in different extensions.  */
-        if ((prefixes & PREFIX_REPZ)
+        if ((s->prefix & PREFIX_REPZ)
             && (b & 1
                 ? s->cpuid_ext3_features & CPUID_EXT3_ABM
                 : s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1)) {
@@ -7028,14 +7028,14 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         /* misc */
     case 0x90: /* nop */
         /* XXX: correct lock test for all insn */
-        if (prefixes & PREFIX_LOCK) {
+        if (s->prefix & PREFIX_LOCK) {
             goto illegal_op;
         }
         /* If REX_B is set, then this is xchg eax, r8d, not a nop.  */
         if (REX_B(s)) {
             goto do_xchg_reg_eax;
         }
-        if (prefixes & PREFIX_REPZ) {
+        if (s->prefix & PREFIX_REPZ) {
             gen_update_cc_op(s);
             gen_jmp_im(s, pc_start - s->cs_base);
             gen_helper_pause(cpu_env, tcg_const_i32(s->pc - pc_start));
@@ -7597,7 +7597,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
             break;
         case 0xee: /* rdpkru */
-            if (prefixes & PREFIX_LOCK) {
+            if (s->prefix & PREFIX_LOCK) {
                 goto illegal_op;
             }
             tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -7605,7 +7605,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
             break;
         case 0xef: /* wrpkru */
-            if (prefixes & PREFIX_LOCK) {
+            if (s->prefix & PREFIX_LOCK) {
                 goto illegal_op;
             }
             tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
@@ -7808,18 +7808,18 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->flags & HF_MPX_EN_MASK) {
             mod = (modrm >> 6) & 3;
             reg = ((modrm >> 3) & 7) | REX_R(s);
-            if (prefixes & PREFIX_REPZ) {
+            if (s->prefix & PREFIX_REPZ) {
                 /* bndcl */
                 if (reg >= 4
-                    || (prefixes & PREFIX_LOCK)
+                    || (s->prefix & PREFIX_LOCK)
                     || s->aflag == MO_16) {
                     goto illegal_op;
                 }
                 gen_bndck(env, s, modrm, TCG_COND_LTU, cpu_bndl[reg]);
-            } else if (prefixes & PREFIX_REPNZ) {
+            } else if (s->prefix & PREFIX_REPNZ) {
                 /* bndcu */
                 if (reg >= 4
-                    || (prefixes & PREFIX_LOCK)
+                    || (s->prefix & PREFIX_LOCK)
                     || s->aflag == MO_16) {
                     goto illegal_op;
                 }
@@ -7827,14 +7827,14 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 tcg_gen_not_i64(notu, cpu_bndu[reg]);
                 gen_bndck(env, s, modrm, TCG_COND_GTU, notu);
                 tcg_temp_free_i64(notu);
-            } else if (prefixes & PREFIX_DATA) {
+            } else if (s->prefix & PREFIX_DATA) {
                 /* bndmov -- from reg/mem */
                 if (reg >= 4 || s->aflag == MO_16) {
                     goto illegal_op;
                 }
                 if (mod == 3) {
                     int reg2 = (modrm & 7) | REX_B(s);
-                    if (reg2 >= 4 || (prefixes & PREFIX_LOCK)) {
+                    if (reg2 >= 4 || (s->prefix & PREFIX_LOCK)) {
                         goto illegal_op;
                     }
                     if (s->flags & HF_MPX_IU_MASK) {
@@ -7863,7 +7863,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 /* bndldx */
                 AddressParts a = gen_lea_modrm_0(env, s, modrm);
                 if (reg >= 4
-                    || (prefixes & PREFIX_LOCK)
+                    || (s->prefix & PREFIX_LOCK)
                     || s->aflag == MO_16
                     || a.base < -1) {
                     goto illegal_op;
@@ -7898,10 +7898,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->flags & HF_MPX_EN_MASK) {
             mod = (modrm >> 6) & 3;
             reg = ((modrm >> 3) & 7) | REX_R(s);
-            if (mod != 3 && (prefixes & PREFIX_REPZ)) {
+            if (mod != 3 && (s->prefix & PREFIX_REPZ)) {
                 /* bndmk */
                 if (reg >= 4
-                    || (prefixes & PREFIX_LOCK)
+                    || (s->prefix & PREFIX_LOCK)
                     || s->aflag == MO_16) {
                     goto illegal_op;
                 }
@@ -7926,22 +7926,22 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 /* bnd registers are now in-use */
                 gen_set_hflag(s, HF_MPX_IU_MASK);
                 break;
-            } else if (prefixes & PREFIX_REPNZ) {
+            } else if (s->prefix & PREFIX_REPNZ) {
                 /* bndcn */
                 if (reg >= 4
-                    || (prefixes & PREFIX_LOCK)
+                    || (s->prefix & PREFIX_LOCK)
                     || s->aflag == MO_16) {
                     goto illegal_op;
                 }
                 gen_bndck(env, s, modrm, TCG_COND_GTU, cpu_bndu[reg]);
-            } else if (prefixes & PREFIX_DATA) {
+            } else if (s->prefix & PREFIX_DATA) {
                 /* bndmov -- to reg/mem */
                 if (reg >= 4 || s->aflag == MO_16) {
                     goto illegal_op;
                 }
                 if (mod == 3) {
                     int reg2 = (modrm & 7) | REX_B(s);
-                    if (reg2 >= 4 || (prefixes & PREFIX_LOCK)) {
+                    if (reg2 >= 4 || (s->prefix & PREFIX_LOCK)) {
                         goto illegal_op;
                     }
                     if (s->flags & HF_MPX_IU_MASK) {
@@ -7968,7 +7968,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 /* bndstx */
                 AddressParts a = gen_lea_modrm_0(env, s, modrm);
                 if (reg >= 4
-                    || (prefixes & PREFIX_LOCK)
+                    || (s->prefix & PREFIX_LOCK)
                     || s->aflag == MO_16
                     || a.base < -1) {
                     goto illegal_op;
@@ -8016,7 +8016,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 ot = MO_64;
             else
                 ot = MO_32;
-            if ((prefixes & PREFIX_LOCK) && (reg == 0) &&
+            if ((s->prefix & PREFIX_LOCK) && (reg == 0) &&
                 (s->cpuid_ext3_features & CPUID_EXT3_CR8LEG)) {
                 reg = 8;
             }
@@ -8117,7 +8117,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         switch (modrm) {
         CASE_MODRM_MEM_OP(0): /* fxsave */
             if (!(s->cpuid_features & CPUID_FXSR)
-                || (prefixes & PREFIX_LOCK)) {
+                || (s->prefix & PREFIX_LOCK)) {
                 goto illegal_op;
             }
             if ((s->flags & HF_EM_MASK) || (s->flags & HF_TS_MASK)) {
@@ -8130,7 +8130,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
         CASE_MODRM_MEM_OP(1): /* fxrstor */
             if (!(s->cpuid_features & CPUID_FXSR)
-                || (prefixes & PREFIX_LOCK)) {
+                || (s->prefix & PREFIX_LOCK)) {
                 goto illegal_op;
             }
             if ((s->flags & HF_EM_MASK) || (s->flags & HF_TS_MASK)) {
@@ -8169,8 +8169,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
         CASE_MODRM_MEM_OP(4): /* xsave */
             if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
-                || (prefixes & (PREFIX_LOCK | PREFIX_DATA
-                                | PREFIX_REPZ | PREFIX_REPNZ))) {
+                || (s->prefix & (PREFIX_LOCK | PREFIX_DATA
+                                 | PREFIX_REPZ | PREFIX_REPNZ))) {
                 goto illegal_op;
             }
             gen_lea_modrm(env, s, modrm);
@@ -8181,8 +8181,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
         CASE_MODRM_MEM_OP(5): /* xrstor */
             if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
-                || (prefixes & (PREFIX_LOCK | PREFIX_DATA
-                                | PREFIX_REPZ | PREFIX_REPNZ))) {
+                || (s->prefix & (PREFIX_LOCK | PREFIX_DATA
+                                 | PREFIX_REPZ | PREFIX_REPNZ))) {
                 goto illegal_op;
             }
             gen_lea_modrm(env, s, modrm);
@@ -8197,10 +8197,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             break;
 
         CASE_MODRM_MEM_OP(6): /* xsaveopt / clwb */
-            if (prefixes & PREFIX_LOCK) {
+            if (s->prefix & PREFIX_LOCK) {
                 goto illegal_op;
             }
-            if (prefixes & PREFIX_DATA) {
+            if (s->prefix & PREFIX_DATA) {
                 /* clwb */
                 if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_CLWB)) {
                     goto illegal_op;
@@ -8210,7 +8210,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 /* xsaveopt */
                 if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
                     || (s->cpuid_xsave_features & CPUID_XSAVE_XSAVEOPT) == 0
-                    || (prefixes & (PREFIX_REPZ | PREFIX_REPNZ))) {
+                    || (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ))) {
                     goto illegal_op;
                 }
                 gen_lea_modrm(env, s, modrm);
@@ -8221,10 +8221,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             break;
 
         CASE_MODRM_MEM_OP(7): /* clflush / clflushopt */
-            if (prefixes & PREFIX_LOCK) {
+            if (s->prefix & PREFIX_LOCK) {
                 goto illegal_op;
             }
-            if (prefixes & PREFIX_DATA) {
+            if (s->prefix & PREFIX_DATA) {
                 /* clflushopt */
                 if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_CLFLUSHOPT)) {
                     goto illegal_op;
@@ -8244,8 +8244,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         case 0xd0 ... 0xd7: /* wrfsbase (f3 0f ae /2) */
         case 0xd8 ... 0xdf: /* wrgsbase (f3 0f ae /3) */
             if (CODE64(s)
-                && (prefixes & PREFIX_REPZ)
-                && !(prefixes & PREFIX_LOCK)
+                && (s->prefix & PREFIX_REPZ)
+                && !(s->prefix & PREFIX_LOCK)
                 && (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_FSGSBASE)) {
                 TCGv base, treg, src, dst;
 
@@ -8274,10 +8274,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             goto unknown_op;
 
         case 0xf8: /* sfence / pcommit */
-            if (prefixes & PREFIX_DATA) {
+            if (s->prefix & PREFIX_DATA) {
                 /* pcommit */
                 if (!(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_PCOMMIT)
-                    || (prefixes & PREFIX_LOCK)) {
+                    || (s->prefix & PREFIX_LOCK)) {
                     goto illegal_op;
                 }
                 break;
@@ -8285,21 +8285,21 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             /* fallthru */
         case 0xf9 ... 0xff: /* sfence */
             if (!(s->cpuid_features & CPUID_SSE)
-                || (prefixes & PREFIX_LOCK)) {
+                || (s->prefix & PREFIX_LOCK)) {
                 goto illegal_op;
             }
             tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0xe8 ... 0xef: /* lfence */
             if (!(s->cpuid_features & CPUID_SSE)
-                || (prefixes & PREFIX_LOCK)) {
+                || (s->prefix & PREFIX_LOCK)) {
                 goto illegal_op;
             }
             tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
             break;
         case 0xf0 ... 0xf7: /* mfence */
             if (!(s->cpuid_features & CPUID_SSE2)
-                || (prefixes & PREFIX_LOCK)) {
+                || (s->prefix & PREFIX_LOCK)) {
                 goto illegal_op;
             }
             tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
@@ -8327,8 +8327,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         gen_eob(s);
         break;
     case 0x1b8: /* SSE4.2 popcnt */
-        if ((prefixes & (PREFIX_REPZ | PREFIX_LOCK | PREFIX_REPNZ)) !=
-             PREFIX_REPZ)
+        if ((s->prefix & (PREFIX_REPZ | PREFIX_LOCK | PREFIX_REPNZ)) !=
+            PREFIX_REPZ)
             goto illegal_op;
         if (!(s->cpuid_ext_features & CPUID_EXT_POPCNT))
             goto illegal_op;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 05/75] target/i386: introduce disas_insn_prefix
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (3 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 04/75] target/i386: use prefix " Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 06/75] target/i386: Simplify gen_exception arguments Jan Bobek
                   ` (68 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Move the code for decoding an instruction prefix into a separate
function.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 48 +++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 99a9097c49..410aa89c75 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4486,19 +4486,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
     }
 }
 
-/* convert one instruction. s->base.is_jmp is set if the translation must
-   be stopped. Return the next pc value */
-static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
+static int disas_insn_prefix(DisasContext *s, CPUX86State *env)
 {
-    CPUX86State *env = cpu->env_ptr;
     int b, prefixes;
-    int shift;
-    TCGMemOp ot, aflag, dflag;
-    int modrm, reg, rm, mod, op, opreg, val;
-    target_ulong next_eip, tval;
-    target_ulong pc_start = s->base.pc_next;
+    TCGMemOp aflag, dflag;
 
-    s->pc_start = s->pc = pc_start;
     s->override = -1;
 #ifdef TARGET_X86_64
     s->rex_x = 0;
@@ -4510,10 +4502,6 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     s->rip_offset = 0; /* for relative ip address */
     s->vex_l = 0;
     s->vex_v = 0;
-    if (sigsetjmp(s->jmpbuf, 0) != 0) {
-        gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
-        return s->pc;
-    }
 
     prefixes = 0;
 
@@ -4657,6 +4645,38 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     s->prefix = prefixes;
     s->aflag = aflag;
     s->dflag = dflag;
+    return b;
+illegal_op:
+    gen_illegal_opcode(s);
+    return -1;
+unknown_op:
+    gen_unknown_opcode(env, s);
+    return -1;
+}
+
+/*
+ * convert one instruction. s->base.is_jmp is set if the translation must
+ * be stopped. Return the next pc value.
+ */
+static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
+{
+    CPUX86State *env = cpu->env_ptr;
+    int b, shift;
+    TCGMemOp ot;
+    int modrm, reg, rm, mod, op, opreg, val;
+    target_ulong next_eip, tval;
+    target_ulong pc_start = s->base.pc_next;
+
+    s->pc_start = s->pc = pc_start;
+    if (sigsetjmp(s->jmpbuf, 0) != 0) {
+        gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+        return s->pc;
+    }
+
+    b = disas_insn_prefix(s, env);
+    if (b < 0) {
+        return s->pc;
+    }
 
     /* now check op code */
  reswitch:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 06/75] target/i386: Simplify gen_exception arguments
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (4 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 05/75] target/i386: introduce disas_insn_prefix Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 07/75] target/i386: use pc_start from DisasContext Jan Bobek
                   ` (67 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Richard Henderson, Richard Henderson

From: Richard Henderson <rth@twiddle.net>

We can compute cur_eip from values present within DisasContext.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target/i386/translate.c | 89 ++++++++++++++++++++---------------------
 1 file changed, 44 insertions(+), 45 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 410aa89c75..b067323962 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -1272,10 +1272,10 @@ static void gen_helper_fp_arith_STN_ST0(int op, int opreg)
     }
 }
 
-static void gen_exception(DisasContext *s, int trapno, target_ulong cur_eip)
+static void gen_exception(DisasContext *s, int trapno)
 {
     gen_update_cc_op(s);
-    gen_jmp_im(s, cur_eip);
+    gen_jmp_im(s, s->pc_start - s->cs_base);
     gen_helper_raise_exception(cpu_env, tcg_const_i32(trapno));
     s->base.is_jmp = DISAS_NORETURN;
 }
@@ -1284,7 +1284,7 @@ static void gen_exception(DisasContext *s, int trapno, target_ulong cur_eip)
    the instruction is known, but it isn't allowed in the current cpu mode.  */
 static void gen_illegal_opcode(DisasContext *s)
 {
-    gen_exception(s, EXCP06_ILLOP, s->pc_start - s->cs_base);
+    gen_exception(s, EXCP06_ILLOP);
 }
 
 /* if d == OR_TMP0, it means memory operand (address in A0) */
@@ -3040,8 +3040,7 @@ static const struct SSEOpHelper_eppi sse_op_table7[256] = {
     [0xdf] = AESNI_OP(aeskeygenassist),
 };
 
-static void gen_sse(CPUX86State *env, DisasContext *s, int b,
-                    target_ulong pc_start)
+static void gen_sse(CPUX86State *env, DisasContext *s, int b)
 {
     int b1, op1_offset, op2_offset, is_xmm, val;
     int modrm, mod, rm, reg;
@@ -3076,7 +3075,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
     }
     /* simple MMX/SSE operation */
     if (s->flags & HF_TS_MASK) {
-        gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+        gen_exception(s, EXCP07_PREX);
         return;
     }
     if (s->flags & HF_EM_MASK) {
@@ -4669,7 +4668,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
     s->pc_start = s->pc = pc_start;
     if (sigsetjmp(s->jmpbuf, 0) != 0) {
-        gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+        gen_exception(s, EXCP0D_GPF);
         return s->pc;
     }
 
@@ -5868,7 +5867,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->flags & (HF_EM_MASK | HF_TS_MASK)) {
             /* if CR0.EM or CR0.TS are set, generate an FPU exception */
             /* XXX: what to do if illegal op ? */
-            gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+            gen_exception(s, EXCP07_PREX);
             break;
         }
         modrm = x86_ldub_code(env, s);
@@ -6582,7 +6581,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             set_cc_op(s, CC_OP_EFLAGS);
         } else if (s->vm86) {
             if (s->iopl != 3) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
             } else {
                 gen_helper_iret_real(cpu_env, tcg_const_i32(s->dflag - 1));
                 set_cc_op(s, CC_OP_EFLAGS);
@@ -6704,7 +6703,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x9c: /* pushf */
         gen_svm_check_intercept(s, pc_start, SVM_EXIT_PUSHF);
         if (s->vm86 && s->iopl != 3) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_update_cc_op(s);
             gen_helper_read_eflags(s->T0, cpu_env);
@@ -6714,7 +6713,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x9d: /* popf */
         gen_svm_check_intercept(s, pc_start, SVM_EXIT_POPF);
         if (s->vm86 && s->iopl != 3) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             ot = gen_pop_T0(s);
             if (s->cpl == 0) {
@@ -7031,7 +7030,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             goto illegal_op;
         val = x86_ldub_code(env, s);
         if (val == 0) {
-            gen_exception(s, EXCP00_DIVZ, pc_start - s->cs_base);
+            gen_exception(s, EXCP00_DIVZ);
         } else {
             gen_helper_aam(cpu_env, tcg_const_i32(val));
             set_cc_op(s, CC_OP_LOGICB);
@@ -7065,7 +7064,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x9b: /* fwait */
         if ((s->flags & (HF_MP_MASK | HF_TS_MASK)) ==
             (HF_MP_MASK | HF_TS_MASK)) {
-            gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+            gen_exception(s, EXCP07_PREX);
         } else {
             gen_helper_fwait(cpu_env);
         }
@@ -7076,7 +7075,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xcd: /* int N */
         val = x86_ldub_code(env, s);
         if (s->vm86 && s->iopl != 3) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_interrupt(s, val, pc_start - s->cs_base, s->pc - s->cs_base);
         }
@@ -7099,13 +7098,13 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (s->cpl <= s->iopl) {
                 gen_helper_cli(cpu_env);
             } else {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
             }
         } else {
             if (s->iopl == 3) {
                 gen_helper_cli(cpu_env);
             } else {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
             }
         }
         break;
@@ -7116,7 +7115,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_jmp_im(s, s->pc - s->cs_base);
             gen_eob_inhibit_irq(s, true);
         } else {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         }
         break;
     case 0x62: /* bound */
@@ -7208,7 +7207,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x130: /* wrmsr */
     case 0x132: /* rdmsr */
         if (s->cpl != 0) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_update_cc_op(s);
             gen_jmp_im(s, pc_start - s->cs_base);
@@ -7240,7 +7239,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (CODE64(s) && env->cpuid_vendor1 != CPUID_VENDOR_INTEL_1)
             goto illegal_op;
         if (!s->pe) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_helper_sysenter(cpu_env);
             gen_eob(s);
@@ -7251,7 +7250,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (CODE64(s) && env->cpuid_vendor1 != CPUID_VENDOR_INTEL_1)
             goto illegal_op;
         if (!s->pe) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_helper_sysexit(cpu_env, tcg_const_i32(s->dflag - 1));
             gen_eob(s);
@@ -7270,7 +7269,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x107: /* sysret */
         if (!s->pe) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_helper_sysret(cpu_env, tcg_const_i32(s->dflag - 1));
             /* condition codes are modified only in long mode */
@@ -7292,7 +7291,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0xf4: /* hlt */
         if (s->cpl != 0) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_update_cc_op(s);
             gen_jmp_im(s, pc_start - s->cs_base);
@@ -7318,7 +7317,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (!s->pe || s->vm86)
                 goto illegal_op;
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
             } else {
                 gen_svm_check_intercept(s, pc_start, SVM_EXIT_LDTR_WRITE);
                 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
@@ -7339,7 +7338,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (!s->pe || s->vm86)
                 goto illegal_op;
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
             } else {
                 gen_svm_check_intercept(s, pc_start, SVM_EXIT_TR_WRITE);
                 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
@@ -7455,7 +7454,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
@@ -7472,7 +7471,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7497,7 +7496,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7510,7 +7509,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7525,7 +7524,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7539,7 +7538,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7563,7 +7562,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7573,7 +7572,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
         CASE_MODRM_MEM_OP(2): /* lgdt */
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_svm_check_intercept(s, pc_start, SVM_EXIT_GDTR_WRITE);
@@ -7590,7 +7589,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
         CASE_MODRM_MEM_OP(3): /* lidt */
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_svm_check_intercept(s, pc_start, SVM_EXIT_IDTR_WRITE);
@@ -7635,7 +7634,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             break;
         CASE_MODRM_OP(6): /* lmsw */
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_svm_check_intercept(s, pc_start, SVM_EXIT_WRITE_CR0);
@@ -7647,7 +7646,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 
         CASE_MODRM_MEM_OP(7): /* invlpg */
             if (s->cpl != 0) {
-                gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                gen_exception(s, EXCP0D_GPF);
                 break;
             }
             gen_update_cc_op(s);
@@ -7662,7 +7661,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 #ifdef TARGET_X86_64
             if (CODE64(s)) {
                 if (s->cpl != 0) {
-                    gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+                    gen_exception(s, EXCP0D_GPF);
                 } else {
                     tcg_gen_mov_tl(s->T0, cpu_seg_base[R_GS]);
                     tcg_gen_ld_tl(cpu_seg_base[R_GS], cpu_env,
@@ -7698,7 +7697,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x108: /* invd */
     case 0x109: /* wbinvd */
         if (s->cpl != 0) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_svm_check_intercept(s, pc_start, (b & 2) ? SVM_EXIT_INVD : SVM_EXIT_WBINVD);
             /* nothing to do */
@@ -8022,7 +8021,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x120: /* mov reg, crN */
     case 0x122: /* mov crN, reg */
         if (s->cpl != 0) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             modrm = x86_ldub_code(env, s);
             /* Ignore the mod bits (assume (modrm&0xc0)==0xc0).
@@ -8076,7 +8075,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x121: /* mov reg, drN */
     case 0x123: /* mov drN, reg */
         if (s->cpl != 0) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             modrm = x86_ldub_code(env, s);
             /* Ignore the mod bits (assume (modrm&0xc0)==0xc0).
@@ -8110,7 +8109,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x106: /* clts */
         if (s->cpl != 0) {
-            gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
+            gen_exception(s, EXCP0D_GPF);
         } else {
             gen_svm_check_intercept(s, pc_start, SVM_EXIT_WRITE_CR0);
             gen_helper_clts(cpu_env);
@@ -8141,7 +8140,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if ((s->flags & HF_EM_MASK) || (s->flags & HF_TS_MASK)) {
-                gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+                gen_exception(s, EXCP07_PREX);
                 break;
             }
             gen_lea_modrm(env, s, modrm);
@@ -8154,7 +8153,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if ((s->flags & HF_EM_MASK) || (s->flags & HF_TS_MASK)) {
-                gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+                gen_exception(s, EXCP07_PREX);
                 break;
             }
             gen_lea_modrm(env, s, modrm);
@@ -8166,7 +8165,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->flags & HF_TS_MASK) {
-                gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+                gen_exception(s, EXCP07_PREX);
                 break;
             }
             gen_lea_modrm(env, s, modrm);
@@ -8179,7 +8178,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (s->flags & HF_TS_MASK) {
-                gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+                gen_exception(s, EXCP07_PREX);
                 break;
             }
             gen_lea_modrm(env, s, modrm);
@@ -8382,7 +8381,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1c2:
     case 0x1c4 ... 0x1c6:
     case 0x1d0 ... 0x1fe:
-        gen_sse(env, s, b, pc_start);
+        gen_sse(env, s, b);
         break;
     default:
         goto unknown_op;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 07/75] target/i386: use pc_start from DisasContext
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (5 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 06/75] target/i386: Simplify gen_exception arguments Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 08/75] target/i386: make variable b1 const Jan Bobek
                   ` (66 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The variable pc_start is already a member of DisasContext. Remove the
superfluous local variable.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 131 ++++++++++++++++++++--------------------
 1 file changed, 65 insertions(+), 66 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index b067323962..62cce20735 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4664,9 +4664,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     TCGMemOp ot;
     int modrm, reg, rm, mod, op, opreg, val;
     target_ulong next_eip, tval;
-    target_ulong pc_start = s->base.pc_next;
 
-    s->pc_start = s->pc = pc_start;
+    s->pc_start = s->pc = s->base.pc_next;
     if (sigsetjmp(s->jmpbuf, 0) != 0) {
         gen_exception(s, EXCP0D_GPF);
         return s->pc;
@@ -6371,7 +6370,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xa5:
         ot = mo_b_d(b, s->dflag);
         if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-            gen_repz_movs(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
+            gen_repz_movs(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_movs(s, ot);
         }
@@ -6381,7 +6380,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xab:
         ot = mo_b_d(b, s->dflag);
         if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-            gen_repz_stos(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
+            gen_repz_stos(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_stos(s, ot);
         }
@@ -6390,7 +6389,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xad:
         ot = mo_b_d(b, s->dflag);
         if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-            gen_repz_lods(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
+            gen_repz_lods(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_lods(s, ot);
         }
@@ -6399,9 +6398,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xaf:
         ot = mo_b_d(b, s->dflag);
         if (s->prefix & PREFIX_REPNZ) {
-            gen_repz_scas(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 1);
+            gen_repz_scas(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base, 1);
         } else if (s->prefix & PREFIX_REPZ) {
-            gen_repz_scas(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 0);
+            gen_repz_scas(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base, 0);
         } else {
             gen_scas(s, ot);
         }
@@ -6411,9 +6410,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xa7:
         ot = mo_b_d(b, s->dflag);
         if (s->prefix & PREFIX_REPNZ) {
-            gen_repz_cmps(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 1);
+            gen_repz_cmps(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base, 1);
         } else if (s->prefix & PREFIX_REPZ) {
-            gen_repz_cmps(s, ot, pc_start - s->cs_base, s->pc - s->cs_base, 0);
+            gen_repz_cmps(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base, 0);
         } else {
             gen_cmps(s, ot);
         }
@@ -6422,10 +6421,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x6d:
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
-        gen_check_io(s, ot, pc_start - s->cs_base, 
+        gen_check_io(s, ot, s->pc_start - s->cs_base,
                      SVM_IOIO_TYPE_MASK | svm_is_rep(s->prefix) | 4);
         if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-            gen_repz_ins(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
+            gen_repz_ins(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_ins(s, ot);
             if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
@@ -6437,10 +6436,10 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x6f:
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
-        gen_check_io(s, ot, pc_start - s->cs_base,
+        gen_check_io(s, ot, s->pc_start - s->cs_base,
                      svm_is_rep(s->prefix) | 4);
         if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
-            gen_repz_outs(s, ot, pc_start - s->cs_base, s->pc - s->cs_base);
+            gen_repz_outs(s, ot, s->pc_start - s->cs_base, s->pc - s->cs_base);
         } else {
             gen_outs(s, ot);
             if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
@@ -6457,7 +6456,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d32(b, s->dflag);
         val = x86_ldub_code(env, s);
         tcg_gen_movi_tl(s->T0, val);
-        gen_check_io(s, ot, pc_start - s->cs_base,
+        gen_check_io(s, ot, s->pc_start - s->cs_base,
                      SVM_IOIO_TYPE_MASK | svm_is_rep(s->prefix));
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
             gen_io_start();
@@ -6475,7 +6474,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         ot = mo_b_d32(b, s->dflag);
         val = x86_ldub_code(env, s);
         tcg_gen_movi_tl(s->T0, val);
-        gen_check_io(s, ot, pc_start - s->cs_base,
+        gen_check_io(s, ot, s->pc_start - s->cs_base,
                      svm_is_rep(s->prefix));
         gen_op_mov_v_reg(s, ot, s->T1, R_EAX);
 
@@ -6494,7 +6493,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xed:
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
-        gen_check_io(s, ot, pc_start - s->cs_base,
+        gen_check_io(s, ot, s->pc_start - s->cs_base,
                      SVM_IOIO_TYPE_MASK | svm_is_rep(s->prefix));
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
             gen_io_start();
@@ -6511,7 +6510,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0xef:
         ot = mo_b_d32(b, s->dflag);
         tcg_gen_ext16u_tl(s->T0, cpu_regs[R_EDX]);
-        gen_check_io(s, ot, pc_start - s->cs_base,
+        gen_check_io(s, ot, s->pc_start - s->cs_base,
                      svm_is_rep(s->prefix));
         gen_op_mov_v_reg(s, ot, s->T1, R_EAX);
 
@@ -6551,7 +6550,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     do_lret:
         if (s->pe && !s->vm86) {
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_lret_protected(cpu_env, tcg_const_i32(s->dflag - 1),
                                       tcg_const_i32(val));
         } else {
@@ -6574,7 +6573,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         val = 0;
         goto do_lret;
     case 0xcf: /* iret */
-        gen_svm_check_intercept(s, pc_start, SVM_EXIT_IRET);
+        gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_IRET);
         if (!s->pe) {
             /* real mode */
             gen_helper_iret_real(cpu_env, tcg_const_i32(s->dflag - 1));
@@ -6701,7 +6700,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         /************************/
         /* flags */
     case 0x9c: /* pushf */
-        gen_svm_check_intercept(s, pc_start, SVM_EXIT_PUSHF);
+        gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_PUSHF);
         if (s->vm86 && s->iopl != 3) {
             gen_exception(s, EXCP0D_GPF);
         } else {
@@ -6711,7 +6710,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         }
         break;
     case 0x9d: /* popf */
-        gen_svm_check_intercept(s, pc_start, SVM_EXIT_POPF);
+        gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_POPF);
         if (s->vm86 && s->iopl != 3) {
             gen_exception(s, EXCP0D_GPF);
         } else {
@@ -7056,8 +7055,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         }
         if (s->prefix & PREFIX_REPZ) {
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
-            gen_helper_pause(cpu_env, tcg_const_i32(s->pc - pc_start));
+            gen_jmp_im(s, s->pc_start - s->cs_base);
+            gen_helper_pause(cpu_env, tcg_const_i32(s->pc - s->pc_start));
             s->base.is_jmp = DISAS_NORETURN;
         }
         break;
@@ -7070,27 +7069,27 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         }
         break;
     case 0xcc: /* int3 */
-        gen_interrupt(s, EXCP03_INT3, pc_start - s->cs_base, s->pc - s->cs_base);
+        gen_interrupt(s, EXCP03_INT3, s->pc_start - s->cs_base, s->pc - s->cs_base);
         break;
     case 0xcd: /* int N */
         val = x86_ldub_code(env, s);
         if (s->vm86 && s->iopl != 3) {
             gen_exception(s, EXCP0D_GPF);
         } else {
-            gen_interrupt(s, val, pc_start - s->cs_base, s->pc - s->cs_base);
+            gen_interrupt(s, val, s->pc_start - s->cs_base, s->pc - s->cs_base);
         }
         break;
     case 0xce: /* into */
         if (CODE64(s))
             goto illegal_op;
         gen_update_cc_op(s);
-        gen_jmp_im(s, pc_start - s->cs_base);
-        gen_helper_into(cpu_env, tcg_const_i32(s->pc - pc_start));
+        gen_jmp_im(s, s->pc_start - s->cs_base);
+        gen_helper_into(cpu_env, tcg_const_i32(s->pc - s->pc_start));
         break;
 #ifdef WANT_ICEBP
     case 0xf1: /* icebp (undocumented, exits to external debugger) */
-        gen_svm_check_intercept(s, pc_start, SVM_EXIT_ICEBP);
-        gen_debug(s, pc_start - s->cs_base);
+        gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_ICEBP);
+        gen_debug(s, s->pc_start - s->cs_base);
         break;
 #endif
     case 0xfa: /* cli */
@@ -7210,7 +7209,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_exception(s, EXCP0D_GPF);
         } else {
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             if (b & 2) {
                 gen_helper_rdmsr(cpu_env);
             } else {
@@ -7220,7 +7219,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x131: /* rdtsc */
         gen_update_cc_op(s);
-        gen_jmp_im(s, pc_start - s->cs_base);
+        gen_jmp_im(s, s->pc_start - s->cs_base);
         if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
             gen_io_start();
         }
@@ -7231,7 +7230,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         break;
     case 0x133: /* rdpmc */
         gen_update_cc_op(s);
-        gen_jmp_im(s, pc_start - s->cs_base);
+        gen_jmp_im(s, s->pc_start - s->cs_base);
         gen_helper_rdpmc(cpu_env);
         break;
     case 0x134: /* sysenter */
@@ -7260,8 +7259,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x105: /* syscall */
         /* XXX: is it usable in real mode ? */
         gen_update_cc_op(s);
-        gen_jmp_im(s, pc_start - s->cs_base);
-        gen_helper_syscall(cpu_env, tcg_const_i32(s->pc - pc_start));
+        gen_jmp_im(s, s->pc_start - s->cs_base);
+        gen_helper_syscall(cpu_env, tcg_const_i32(s->pc - s->pc_start));
         /* TF handling for the syscall insn is different. The TF bit is  checked
            after the syscall insn completes. This allows #DB to not be
            generated after one has entered CPL0 if TF is set in FMASK.  */
@@ -7286,7 +7285,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
 #endif
     case 0x1a2: /* cpuid */
         gen_update_cc_op(s);
-        gen_jmp_im(s, pc_start - s->cs_base);
+        gen_jmp_im(s, s->pc_start - s->cs_base);
         gen_helper_cpuid(cpu_env);
         break;
     case 0xf4: /* hlt */
@@ -7294,8 +7293,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             gen_exception(s, EXCP0D_GPF);
         } else {
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
-            gen_helper_hlt(cpu_env, tcg_const_i32(s->pc - pc_start));
+            gen_jmp_im(s, s->pc_start - s->cs_base);
+            gen_helper_hlt(cpu_env, tcg_const_i32(s->pc - s->pc_start));
             s->base.is_jmp = DISAS_NORETURN;
         }
         break;
@@ -7307,7 +7306,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         case 0: /* sldt */
             if (!s->pe || s->vm86)
                 goto illegal_op;
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_LDTR_READ);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_LDTR_READ);
             tcg_gen_ld32u_tl(s->T0, cpu_env,
                              offsetof(CPUX86State, ldt.selector));
             ot = mod == 3 ? s->dflag : MO_16;
@@ -7319,7 +7318,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (s->cpl != 0) {
                 gen_exception(s, EXCP0D_GPF);
             } else {
-                gen_svm_check_intercept(s, pc_start, SVM_EXIT_LDTR_WRITE);
+                gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_LDTR_WRITE);
                 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
                 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
                 gen_helper_lldt(cpu_env, s->tmp2_i32);
@@ -7328,7 +7327,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         case 1: /* str */
             if (!s->pe || s->vm86)
                 goto illegal_op;
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_TR_READ);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_TR_READ);
             tcg_gen_ld32u_tl(s->T0, cpu_env,
                              offsetof(CPUX86State, tr.selector));
             ot = mod == 3 ? s->dflag : MO_16;
@@ -7340,7 +7339,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             if (s->cpl != 0) {
                 gen_exception(s, EXCP0D_GPF);
             } else {
-                gen_svm_check_intercept(s, pc_start, SVM_EXIT_TR_WRITE);
+                gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_TR_WRITE);
                 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
                 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
                 gen_helper_ltr(cpu_env, s->tmp2_i32);
@@ -7368,7 +7367,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         modrm = x86_ldub_code(env, s);
         switch (modrm) {
         CASE_MODRM_MEM_OP(0): /* sgdt */
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_GDTR_READ);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_GDTR_READ);
             gen_lea_modrm(env, s, modrm);
             tcg_gen_ld32u_tl(s->T0,
                              cpu_env, offsetof(CPUX86State, gdt.limit));
@@ -7386,7 +7385,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             tcg_gen_mov_tl(s->A0, cpu_regs[R_EAX]);
             gen_extu(s->aflag, s->A0);
             gen_add_A0_ds_seg(s);
@@ -7398,8 +7397,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
-            gen_helper_mwait(cpu_env, tcg_const_i32(s->pc - pc_start));
+            gen_jmp_im(s, s->pc_start - s->cs_base);
+            gen_helper_mwait(cpu_env, tcg_const_i32(s->pc - s->pc_start));
             gen_eob(s);
             break;
 
@@ -7424,7 +7423,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             break;
 
         CASE_MODRM_MEM_OP(1): /* sidt */
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_IDTR_READ);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_IDTR_READ);
             gen_lea_modrm(env, s, modrm);
             tcg_gen_ld32u_tl(s->T0, cpu_env, offsetof(CPUX86State, idt.limit));
             gen_op_st_v(s, MO_16, s->T0, s->A0);
@@ -7475,9 +7474,9 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 break;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_vmrun(cpu_env, tcg_const_i32(s->aflag - 1),
-                             tcg_const_i32(s->pc - pc_start));
+                             tcg_const_i32(s->pc - s->pc_start));
             tcg_gen_exit_tb(NULL, 0);
             s->base.is_jmp = DISAS_NORETURN;
             break;
@@ -7487,7 +7486,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_vmmcall(cpu_env);
             break;
 
@@ -7500,7 +7499,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 break;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_vmload(cpu_env, tcg_const_i32(s->aflag - 1));
             break;
 
@@ -7513,7 +7512,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 break;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_vmsave(cpu_env, tcg_const_i32(s->aflag - 1));
             break;
 
@@ -7542,7 +7541,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 break;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_clgi(cpu_env);
             break;
 
@@ -7553,7 +7552,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_skinit(cpu_env);
             break;
 
@@ -7566,7 +7565,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 break;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_helper_invlpga(cpu_env, tcg_const_i32(s->aflag - 1));
             break;
 
@@ -7575,7 +7574,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 gen_exception(s, EXCP0D_GPF);
                 break;
             }
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_GDTR_WRITE);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_GDTR_WRITE);
             gen_lea_modrm(env, s, modrm);
             gen_op_ld_v(s, MO_16, s->T1, s->A0);
             gen_add_A0_im(s, 2);
@@ -7592,7 +7591,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 gen_exception(s, EXCP0D_GPF);
                 break;
             }
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_IDTR_WRITE);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_IDTR_WRITE);
             gen_lea_modrm(env, s, modrm);
             gen_op_ld_v(s, MO_16, s->T1, s->A0);
             gen_add_A0_im(s, 2);
@@ -7605,7 +7604,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             break;
 
         CASE_MODRM_OP(4): /* smsw */
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_READ_CR0);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_READ_CR0);
             tcg_gen_ld_tl(s->T0, cpu_env, offsetof(CPUX86State, cr[0]));
             if (CODE64(s)) {
                 mod = (modrm >> 6) & 3;
@@ -7637,7 +7636,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 gen_exception(s, EXCP0D_GPF);
                 break;
             }
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_WRITE_CR0);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_WRITE_CR0);
             gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
             gen_helper_lmsw(cpu_env, s->T0);
             gen_jmp_im(s, s->pc - s->cs_base);
@@ -7650,7 +7649,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 break;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             gen_lea_modrm(env, s, modrm);
             gen_helper_invlpg(cpu_env, s->A0);
             gen_jmp_im(s, s->pc - s->cs_base);
@@ -7679,7 +7678,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             gen_update_cc_op(s);
-            gen_jmp_im(s, pc_start - s->cs_base);
+            gen_jmp_im(s, s->pc_start - s->cs_base);
             if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
                 gen_io_start();
             }
@@ -7699,7 +7698,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF);
         } else {
-            gen_svm_check_intercept(s, pc_start, (b & 2) ? SVM_EXIT_INVD : SVM_EXIT_WBINVD);
+            gen_svm_check_intercept(s, s->pc_start, (b & 2) ? SVM_EXIT_INVD : SVM_EXIT_WBINVD);
             /* nothing to do */
         }
         break;
@@ -8046,7 +8045,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             case 4:
             case 8:
                 gen_update_cc_op(s);
-                gen_jmp_im(s, pc_start - s->cs_base);
+                gen_jmp_im(s, s->pc_start - s->cs_base);
                 if (b & 2) {
                     if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
                         gen_io_start();
@@ -8093,14 +8092,14 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 goto illegal_op;
             }
             if (b & 2) {
-                gen_svm_check_intercept(s, pc_start, SVM_EXIT_WRITE_DR0 + reg);
+                gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_WRITE_DR0 + reg);
                 gen_op_mov_v_reg(s, ot, s->T0, rm);
                 tcg_gen_movi_i32(s->tmp2_i32, reg);
                 gen_helper_set_dr(cpu_env, s->tmp2_i32, s->T0);
                 gen_jmp_im(s, s->pc - s->cs_base);
                 gen_eob(s);
             } else {
-                gen_svm_check_intercept(s, pc_start, SVM_EXIT_READ_DR0 + reg);
+                gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_READ_DR0 + reg);
                 tcg_gen_movi_i32(s->tmp2_i32, reg);
                 gen_helper_get_dr(s->T0, cpu_env, s->tmp2_i32);
                 gen_op_mov_reg_v(s, ot, rm, s->T0);
@@ -8111,7 +8110,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         if (s->cpl != 0) {
             gen_exception(s, EXCP0D_GPF);
         } else {
-            gen_svm_check_intercept(s, pc_start, SVM_EXIT_WRITE_CR0);
+            gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_WRITE_CR0);
             gen_helper_clts(cpu_env);
             /* abort block because static cpu state changed */
             gen_jmp_im(s, s->pc - s->cs_base);
@@ -8337,7 +8336,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
         gen_nop_modrm(env, s, modrm);
         break;
     case 0x1aa: /* rsm */
-        gen_svm_check_intercept(s, pc_start, SVM_EXIT_RSM);
+        gen_svm_check_intercept(s, s->pc_start, SVM_EXIT_RSM);
         if (!(s->flags & HF_SMM_MASK))
             goto illegal_op;
         gen_update_cc_op(s);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 08/75] target/i386: make variable b1 const
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (6 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 07/75] target/i386: use pc_start from DisasContext Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 09/75] target/i386: make variable is_xmm const Jan Bobek
                   ` (65 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The variable b1 does not change value once assigned. Make this fact
explicit by marking it const.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 62cce20735..3be869a8e3 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -3042,7 +3042,7 @@ static const struct SSEOpHelper_eppi sse_op_table7[256] = {
 
 static void gen_sse(CPUX86State *env, DisasContext *s, int b)
 {
-    int b1, op1_offset, op2_offset, is_xmm, val;
+    int op1_offset, op2_offset, is_xmm, val;
     int modrm, mod, rm, reg;
     SSEFunc_0_epp sse_fn_epp;
     SSEFunc_0_eppi sse_fn_eppi;
@@ -3051,14 +3051,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
     TCGMemOp ot;
 
     b &= 0xff;
-    if (s->prefix & PREFIX_DATA)
-        b1 = 1;
-    else if (s->prefix & PREFIX_REPZ)
-        b1 = 2;
-    else if (s->prefix & PREFIX_REPNZ)
-        b1 = 3;
-    else
-        b1 = 0;
+    const int b1 =
+        s->prefix & PREFIX_DATA ? 1
+        : s->prefix & PREFIX_REPZ ? 2
+        : s->prefix & PREFIX_REPNZ ? 3
+        : 0;
     sse_fn_epp = sse_op_table1[b][b1];
     if (!sse_fn_epp) {
         goto unknown_op;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 09/75] target/i386: make variable is_xmm const
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (7 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 08/75] target/i386: make variable b1 const Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 10/75] target/i386: add vector register file alignment constraints Jan Bobek
                   ` (64 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The variable is_xmm does not change value after assignment, so make
this fact explicit by marking it const.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3be869a8e3..44ad55f9ee 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -3042,7 +3042,7 @@ static const struct SSEOpHelper_eppi sse_op_table7[256] = {
 
 static void gen_sse(CPUX86State *env, DisasContext *s, int b)
 {
-    int op1_offset, op2_offset, is_xmm, val;
+    int op1_offset, op2_offset, val;
     int modrm, mod, rm, reg;
     SSEFunc_0_epp sse_fn_epp;
     SSEFunc_0_eppi sse_fn_eppi;
@@ -3056,20 +3056,15 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
         : s->prefix & PREFIX_REPZ ? 2
         : s->prefix & PREFIX_REPNZ ? 3
         : 0;
+    const int is_xmm =
+        (0x10 <= b && b <= 0x5f)
+        || b == 0xc6
+        || b == 0xc2
+        || !!b1;
     sse_fn_epp = sse_op_table1[b][b1];
     if (!sse_fn_epp) {
         goto unknown_op;
     }
-    if ((b <= 0x5f && b >= 0x10) || b == 0xc6 || b == 0xc2) {
-        is_xmm = 1;
-    } else {
-        if (b1 == 0) {
-            /* MMX case */
-            is_xmm = 0;
-        } else {
-            is_xmm = 1;
-        }
-    }
     /* simple MMX/SSE operation */
     if (s->flags & HF_TS_MASK) {
         gen_exception(s, EXCP07_PREX);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 10/75] target/i386: add vector register file alignment constraints
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (8 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 09/75] target/i386: make variable is_xmm const Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 11/75] target/i386: introduce gen_sse_ng Jan Bobek
                   ` (63 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

gvec operations require that all vectors be aligned on 16-byte
boundary; make sure the MM/XMM/YMM/ZMM register file is aligned as
neccessary.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/cpu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5f6e3a029a..f226e61724 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1202,9 +1202,9 @@ typedef struct CPUX86State {
     float_status mmx_status; /* for 3DNow! float ops */
     float_status sse_status;
     uint32_t mxcsr;
-    ZMMReg xmm_regs[CPU_NB_REGS == 8 ? 8 : 32];
-    ZMMReg xmm_t0;
-    MMXReg mmx_t0;
+    ZMMReg xmm_regs[CPU_NB_REGS == 8 ? 8 : 32] QEMU_ALIGNED(16);
+    ZMMReg xmm_t0 QEMU_ALIGNED(16);
+    MMXReg mmx_t0 QEMU_ALIGNED(8);
 
     XMMReg ymmh_regs[CPU_NB_REGS];
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 11/75] target/i386: introduce gen_sse_ng
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (9 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 10/75] target/i386: add vector register file alignment constraints Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 12/75] target/i386: introduce CASES_* macros in gen_sse_ng Jan Bobek
                   ` (62 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This function serves as the point-of-intercept for all newly
implemented instructions. If no new implementation exists, fall back
to gen_sse.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 46 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 44ad55f9ee..8045ce3ce0 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4477,6 +4477,50 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
     }
 }
 
+static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
+{
+    enum {
+        P_NP = 0,
+        P_66 = 1 << 8,
+        P_F3 = 1 << 9,
+        P_F2 = 1 << 10,
+
+        M_NA   = 0,
+        M_0F   = 1 << 11,
+        M_0F38 = 1 << 12,
+        M_0F3A = 1 << 13,
+
+        W_0 = 0,
+        W_1 = 1 << 14,
+
+        VEX_128 = 1 << 15,
+        VEX_256 = 1 << 16,
+    };
+
+    const int p =
+        (s->prefix & PREFIX_DATA ? P_66 : 0)
+        | (s->prefix & PREFIX_REPZ ? P_F3 : 0)
+        | (s->prefix & PREFIX_REPNZ ? P_F2 : 0)
+        | (!(s->prefix & PREFIX_VEX) ? 0 :
+           (s->vex_l ? VEX_256 : VEX_128));
+    int m =
+        M_0F;
+    const int w =
+        REX_W(s) > 0 ? W_1 : W_0;
+    int op =
+        b & 0xff;
+
+    while (1) {
+        switch (p | m | w | op) {
+
+        default: {
+            gen_sse(env, s, b);
+        } return;
+
+        }
+    }
+}
+
 static int disas_insn_prefix(DisasContext *s, CPUX86State *env)
 {
     int b, prefixes;
@@ -8372,7 +8416,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
     case 0x1c2:
     case 0x1c4 ... 0x1c6:
     case 0x1d0 ... 0x1fe:
-        gen_sse(env, s, b);
+        gen_sse_ng(env, s, b);
         break;
     default:
         goto unknown_op;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 12/75] target/i386: introduce CASES_* macros in gen_sse_ng
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (10 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 11/75] target/i386: introduce gen_sse_ng Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 13/75] target/i386: decode the 0F38/0F3A prefix " Jan Bobek
                   ` (61 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

In case one or more fields should be ignored during instruction
disambiguation, we need to generate multiple case labels. Introduce
CASES_* macros for this purpose.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 54 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 8045ce3ce0..661010973b 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4513,10 +4513,64 @@ static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
     while (1) {
         switch (p | m | w | op) {
 
+#define CASES_0(e)         case (e):
+#define CASES_1(e, A, ...) CASES_ ## A(e, 0, ## __VA_ARGS__)
+#define CASES_2(e, A, ...) CASES_ ## A(e, 1, ## __VA_ARGS__)
+#define CASES_3(e, A, ...) CASES_ ## A(e, 2, ## __VA_ARGS__)
+#define CASES_4(e, A, ...) CASES_ ## A(e, 3, ## __VA_ARGS__)
+#define CASES(e, N, ...)   CASES_ ## N(e, ## __VA_ARGS__)
+
+#define CASES_P(e, N, p, ...) CASES_P ## p(e, N, ## __VA_ARGS__)
+#define CASES_PNP(e, N, ...)  CASES_ ## N(P_NP | e, ## __VA_ARGS__)
+#define CASES_P66(e, N, ...)  CASES_ ## N(P_66 | e, ## __VA_ARGS__)
+#define CASES_PF3(e, N, ...)  CASES_ ## N(P_F3 | e, ## __VA_ARGS__)
+#define CASES_PF2(e, N, ...)  CASES_ ## N(P_F2 | e, ## __VA_ARGS__)
+#define CASES_PIG(e, N, ...)  CASES_PNP(e, N, ## __VA_ARGS__)   \
+                              CASES_P66(e, N, ## __VA_ARGS__)   \
+                              CASES_PF3(e, N, ## __VA_ARGS__)   \
+                              CASES_PF2(e, N, ## __VA_ARGS__)
+
+#define CASES_M(e, N, m, ...) CASES_ ## N(M_ ## m | e, ## __VA_ARGS__)
+
+#define CASES_W(e, N, w, ...) CASES_W ## w(e, N, ## __VA_ARGS__)
+#define CASES_W0(e, N, ...)   CASES_ ## N(W_0 | e, ## __VA_ARGS__)
+#define CASES_W1(e, N, ...)   CASES_ ## N(W_1 | e, ## __VA_ARGS__)
+#define CASES_WIG(e, N, ...)  CASES_W0(e, N, ## __VA_ARGS__)    \
+                              CASES_W1(e, N, ## __VA_ARGS__)
+
+#define CASES_VEX_L(e, N, l, ...) CASES_VEX_L ## l(e, N, ## __VA_ARGS__)
+#define CASES_VEX_L128(e, N, ...) CASES_ ## N(VEX_128 | e, ## __VA_ARGS__)
+#define CASES_VEX_L256(e, N, ...) CASES_ ## N(VEX_256 | e, ## __VA_ARGS__)
+#define CASES_VEX_LZ              CASES_VEX_L128
+#define CASES_VEX_LIG(e, N, ...)  CASES_VEX_L128(e, N, ## __VA_ARGS__)  \
+                                  CASES_VEX_L256(e, N, ## __VA_ARGS__)
+
         default: {
             gen_sse(env, s, b);
         } return;
 
+#undef CASES_0
+#undef CASES_1
+#undef CASES_2
+#undef CASES_3
+#undef CASES_4
+#undef CASES
+#undef CASES_P
+#undef CASES_PNP
+#undef CASES_P66
+#undef CASES_PF3
+#undef CASES_PF2
+#undef CASES_PIG
+#undef CASES_M
+#undef CASES_W
+#undef CASES_W0
+#undef CASES_W1
+#undef CASES_WIG
+#undef CASES_VEX_L
+#undef CASES_VEX_L128
+#undef CASES_VEX_L256
+#undef CASES_VEX_LZ
+#undef CASES_VEX_LIG
         }
     }
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 13/75] target/i386: decode the 0F38/0F3A prefix in gen_sse_ng
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (11 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 12/75] target/i386: introduce CASES_* macros in gen_sse_ng Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 14/75] target/i386: introduce aliases for some tcg_gvec operations Jan Bobek
                   ` (60 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

In order to decode 0F38/0F3A-prefixed instructions, we need to load an
additional byte. This poses a problem if the instruction is not
implemented yet; implement a rewind in this (default) case.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 661010973b..bd9c62dc54 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4545,7 +4545,23 @@ static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 #define CASES_VEX_LIG(e, N, ...)  CASES_VEX_L128(e, N, ## __VA_ARGS__)  \
                                   CASES_VEX_L256(e, N, ## __VA_ARGS__)
 
+        CASES(0x38, 3, W, IG, M, 0F, P, IG)
+        CASES(0x38, 4, W, IG, M, 0F, P, IG, VEX_L, IG) {
+            m = M_0F38;
+            op = x86_ldub_code(env, s);
+        } break;
+
+        CASES(0x3a, 3, W, IG, M, 0F, P, IG)
+        CASES(0x3a, 4, W, IG, M, 0F, P, IG, VEX_L, IG) {
+            m = M_0F3A;
+            op = x86_ldub_code(env, s);
+        } break;
+
         default: {
+            if (m == M_0F38 || m == M_0F3A) {
+                /* rewind the advance_pc() x86_ldub_code() did */
+                advance_pc(env, s, -1);
+            }
             gen_sse(env, s, b);
         } return;
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 14/75] target/i386: introduce aliases for some tcg_gvec operations
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (12 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 13/75] target/i386: decode the 0F38/0F3A prefix " Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 15/75] target/i386: introduce function check_cpuid Jan Bobek
                   ` (59 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The aliases create a thin layer above the raw tcg_gvec functions,
making room for us to permute the arguments, pass additional constant
values etc., which will prove highly useful later.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index bd9c62dc54..467ecf15ba 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -22,6 +22,7 @@
 #include "cpu.h"
 #include "disas/disas.h"
 #include "exec/exec-all.h"
+#include "tcg-gvec-desc.h"
 #include "tcg-op.h"
 #include "exec/cpu_ldst.h"
 #include "exec/translator.h"
@@ -4477,6 +4478,44 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
     }
 }
 
+#define gen_gvec_mov(dofs, aofs, oprsz, maxsz, vece)    \
+    tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz)
+
+#define gen_gvec_add(dofs, aofs, bofs, oprsz, maxsz, vece)      \
+    tcg_gen_gvec_add(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_sub(dofs, aofs, bofs, oprsz, maxsz, vece)      \
+    tcg_gen_gvec_sub(vece, dofs, aofs, bofs, oprsz, maxsz)
+
+#define gen_gvec_ssadd(dofs, aofs, bofs, oprsz, maxsz, vece)    \
+    tcg_gen_gvec_ssadd(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_sssub(dofs, aofs, bofs, oprsz, maxsz, vece)    \
+    tcg_gen_gvec_sssub(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_usadd(dofs, aofs, bofs, oprsz, maxsz, vece)    \
+    tcg_gen_gvec_usadd(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_ussub(dofs, aofs, bofs, oprsz, maxsz, vece)    \
+    tcg_gen_gvec_ussub(vece, dofs, aofs, bofs, oprsz, maxsz)
+
+#define gen_gvec_smin(dofs, aofs, bofs, oprsz, maxsz, vece)     \
+    tcg_gen_gvec_smin(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_umin(dofs, aofs, bofs, oprsz, maxsz, vece)     \
+    tcg_gen_gvec_umin(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_smax(dofs, aofs, bofs, oprsz, maxsz, vece)     \
+    tcg_gen_gvec_smax(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_umax(dofs, aofs, bofs, oprsz, maxsz, vece)     \
+    tcg_gen_gvec_umax(vece, dofs, aofs, bofs, oprsz, maxsz)
+
+#define gen_gvec_and(dofs, aofs, bofs, oprsz, maxsz, vece)      \
+    tcg_gen_gvec_and(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_andn(dofs, aofs, bofs, oprsz, maxsz, vece)     \
+    tcg_gen_gvec_andc(vece, dofs, bofs, aofs, oprsz, maxsz)
+#define gen_gvec_or(dofs, aofs, bofs, oprsz, maxsz, vece)       \
+    tcg_gen_gvec_or(vece, dofs, aofs, bofs, oprsz, maxsz)
+#define gen_gvec_xor(dofs, aofs, bofs, oprsz, maxsz, vece)      \
+    tcg_gen_gvec_xor(vece, dofs, aofs, bofs, oprsz, maxsz)
+
+#define gen_gvec_cmp(dofs, aofs, bofs, oprsz, maxsz, vece, cond)        \
+    tcg_gen_gvec_cmp(cond, vece, dofs, aofs, bofs, oprsz, maxsz)
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 15/75] target/i386: introduce function check_cpuid
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (13 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 14/75] target/i386: introduce aliases for some tcg_gvec operations Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 16/75] target/i386: disable AVX/AVX2 cpuid bitchecks Jan Bobek
                   ` (58 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce a helper function to take care of instruction CPUID checks.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 62 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 467ecf15ba..3e54443d99 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4516,6 +4516,68 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
 #define gen_gvec_cmp(dofs, aofs, bofs, oprsz, maxsz, vece, cond)        \
     tcg_gen_gvec_cmp(cond, vece, dofs, aofs, bofs, oprsz, maxsz)
 
+typedef enum {
+    CHECK_CPUID_MMX = 1,
+    CHECK_CPUID_3DNOW,
+    CHECK_CPUID_SSE,
+    CHECK_CPUID_SSE2,
+    CHECK_CPUID_CLFLUSH,
+    CHECK_CPUID_SSE3,
+    CHECK_CPUID_SSSE3,
+    CHECK_CPUID_SSE4_1,
+    CHECK_CPUID_SSE4_2,
+    CHECK_CPUID_SSE4A,
+    CHECK_CPUID_AES,
+    CHECK_CPUID_PCLMULQDQ,
+    CHECK_CPUID_AVX,
+    CHECK_CPUID_AES_AVX,
+    CHECK_CPUID_PCLMULQDQ_AVX,
+    CHECK_CPUID_AVX2,
+} CheckCpuidFeat;
+
+static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
+{
+    switch (feat) {
+    case CHECK_CPUID_MMX:
+        return (s->cpuid_features & CPUID_MMX)
+            && (s->cpuid_ext2_features & CPUID_EXT2_MMX);
+    case CHECK_CPUID_3DNOW:
+        return s->cpuid_ext2_features & CPUID_EXT2_3DNOW;
+    case CHECK_CPUID_SSE:
+        return s->cpuid_features & CPUID_SSE;
+    case CHECK_CPUID_SSE2:
+        return s->cpuid_features & CPUID_SSE2;
+    case CHECK_CPUID_CLFLUSH:
+        return s->cpuid_features & CPUID_CLFLUSH;
+    case CHECK_CPUID_SSE3:
+        return s->cpuid_ext_features & CPUID_EXT_SSE3;
+    case CHECK_CPUID_SSSE3:
+        return s->cpuid_ext_features & CPUID_EXT_SSSE3;
+    case CHECK_CPUID_SSE4_1:
+        return s->cpuid_ext_features & CPUID_EXT_SSE41;
+    case CHECK_CPUID_SSE4_2:
+        return s->cpuid_ext_features & CPUID_EXT_SSE42;
+    case CHECK_CPUID_SSE4A:
+        return s->cpuid_ext3_features & CPUID_EXT3_SSE4A;
+    case CHECK_CPUID_AES:
+        return s->cpuid_ext_features & CPUID_EXT_AES;
+    case CHECK_CPUID_PCLMULQDQ:
+        return s->cpuid_ext_features & CPUID_EXT_PCLMULQDQ;
+    case CHECK_CPUID_AVX:
+        return s->cpuid_ext_features & CPUID_EXT_AVX;
+    case CHECK_CPUID_AES_AVX:
+        return (s->cpuid_ext_features & CPUID_EXT_AES)
+            && (s->cpuid_ext_features & CPUID_EXT_AVX);
+    case CHECK_CPUID_PCLMULQDQ_AVX:
+        return (s->cpuid_ext_features & CPUID_EXT_PCLMULQDQ)
+            && (s->cpuid_ext_features & CPUID_EXT_AVX);
+    case CHECK_CPUID_AVX2:
+        return s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 16/75] target/i386: disable AVX/AVX2 cpuid bitchecks
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (14 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 15/75] target/i386: introduce function check_cpuid Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 17/75] target/i386: introduce instruction operand infrastructure Jan Bobek
                   ` (57 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Ignore the AVX/AVX2 cpuid bits when checking for availability of the
relevant instructions. This is clearly incorrect, but it preserves the
old behavior, which is useful during development.

Note: This changeset is intended for development only and shall not be
included in the final patch series.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3e54443d99..e7c2ad41bf 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4564,15 +4564,15 @@ static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
     case CHECK_CPUID_PCLMULQDQ:
         return s->cpuid_ext_features & CPUID_EXT_PCLMULQDQ;
     case CHECK_CPUID_AVX:
-        return s->cpuid_ext_features & CPUID_EXT_AVX;
+        return true /* s->cpuid_ext_features & CPUID_EXT_AVX */;
     case CHECK_CPUID_AES_AVX:
-        return (s->cpuid_ext_features & CPUID_EXT_AES)
-            && (s->cpuid_ext_features & CPUID_EXT_AVX);
+        return s->cpuid_ext_features & CPUID_EXT_AES
+            /* && (s->cpuid_ext_features & CPUID_EXT_AVX) */;
     case CHECK_CPUID_PCLMULQDQ_AVX:
-        return (s->cpuid_ext_features & CPUID_EXT_PCLMULQDQ)
-            && (s->cpuid_ext_features & CPUID_EXT_AVX);
+        return s->cpuid_ext_features & CPUID_EXT_PCLMULQDQ
+            /* && (s->cpuid_ext_features & CPUID_EXT_AVX) */;
     case CHECK_CPUID_AVX2:
-        return s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2;
+        return true /* s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2 */;
     default:
         g_assert_not_reached();
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 17/75] target/i386: introduce instruction operand infrastructure
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (15 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 16/75] target/i386: disable AVX/AVX2 cpuid bitchecks Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 18/75] target/i386: introduce generic operand alias Jan Bobek
                   ` (56 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

insnop_arg_t, insnop_ctxt_t and init, prepare and finalize functions
form the basis of instruction operand decoding. Introduce macros for
defining a generic instruction operand; use cases for operand decoding
will be introduced later.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index e7c2ad41bf..78ef0e6608 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4578,6 +4578,34 @@ static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
     }
 }
 
+/*
+ * Instruction operand
+ */
+#define insnop_arg_t(opT)    insnop_ ## opT ## _arg_t
+#define insnop_ctxt_t(opT)   insnop_ ## opT ## _ctxt_t
+#define insnop_init(opT)     insnop_ ## opT ## _init
+#define insnop_prepare(opT)  insnop_ ## opT ## _prepare
+#define insnop_finalize(opT) insnop_ ## opT ## _finalize
+
+#define INSNOP_INIT(opT)                                        \
+    static bool insnop_init(opT)(insnop_ctxt_t(opT) *ctxt,      \
+                                 CPUX86State *env,              \
+                                 DisasContext *s,               \
+                                 int modrm, bool is_write)
+
+#define INSNOP_PREPARE(opT)                                             \
+    static insnop_arg_t(opT) insnop_prepare(opT)(insnop_ctxt_t(opT) *ctxt, \
+                                                 CPUX86State *env,      \
+                                                 DisasContext *s,       \
+                                                 int modrm, bool is_write)
+
+#define INSNOP_FINALIZE(opT)                                    \
+    static void insnop_finalize(opT)(insnop_ctxt_t(opT) *ctxt,  \
+                                     CPUX86State *env,          \
+                                     DisasContext *s,           \
+                                     int modrm, bool is_write,  \
+                                     insnop_arg_t(opT) arg)
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 18/75] target/i386: introduce generic operand alias
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (16 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 17/75] target/i386: introduce instruction operand infrastructure Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 19/75] target/i386: introduce generic either-or operand Jan Bobek
                   ` (55 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

It turns out it is useful to be able to declare operand name
aliases. Introduce a macro to capture this functionality.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 78ef0e6608..a6f23bae4e 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4606,6 +4606,26 @@ static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
                                      int modrm, bool is_write,  \
                                      insnop_arg_t(opT) arg)
 
+/*
+ * Operand alias
+ */
+#define DEF_INSNOP_ALIAS(opT, opT2)                                     \
+    typedef insnop_arg_t(opT2) insnop_arg_t(opT);                       \
+    typedef insnop_ctxt_t(opT2) insnop_ctxt_t(opT);                     \
+                                                                        \
+    INSNOP_INIT(opT)                                                    \
+    {                                                                   \
+        return insnop_init(opT2)(ctxt, env, s, modrm, is_write);        \
+    }                                                                   \
+    INSNOP_PREPARE(opT)                                                 \
+    {                                                                   \
+        return insnop_prepare(opT2)(ctxt, env, s, modrm, is_write);     \
+    }                                                                   \
+    INSNOP_FINALIZE(opT)                                                \
+    {                                                                   \
+        insnop_finalize(opT2)(ctxt, env, s, modrm, is_write, arg);      \
+    }
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 19/75] target/i386: introduce generic either-or operand
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (17 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 18/75] target/i386: introduce generic operand alias Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 20/75] target/i386: introduce generic load-store operand Jan Bobek
                   ` (54 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The either-or operand attempts to decode one operand, and if it fails,
it falls back to a second operand. It is unifying, meaning that
insnop_arg_t of the second operand must be implicitly castable to
insnop_arg_t of the first operand.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 44 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index a6f23bae4e..68c6b7d16b 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4626,6 +4626,50 @@ static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
         insnop_finalize(opT2)(ctxt, env, s, modrm, is_write, arg);      \
     }
 
+/*
+ * Generic unifying either-or operand
+ */
+#define DEF_INSNOP_EITHER(opT, opT1, opT2)                              \
+    typedef insnop_arg_t(opT1) insnop_arg_t(opT);                       \
+    typedef struct {                                                    \
+        bool is_ ## opT1;                                               \
+        union {                                                         \
+            insnop_ctxt_t(opT1) ctxt_ ## opT1;                          \
+            insnop_ctxt_t(opT2) ctxt_ ## opT2;                          \
+        };                                                              \
+    } insnop_ctxt_t(opT);                                               \
+                                                                        \
+    INSNOP_INIT(opT)                                                    \
+    {                                                                   \
+        if (insnop_init(opT1)(&ctxt->ctxt_ ## opT1,                     \
+                              env, s, modrm, is_write)) {               \
+            ctxt->is_ ## opT1 = true;                                   \
+            return true;                                                \
+        }                                                               \
+        if (insnop_init(opT2)(&ctxt->ctxt_ ## opT2,                     \
+                              env, s, modrm, is_write)) {               \
+            ctxt->is_ ## opT1 = false;                                  \
+            return true;                                                \
+        }                                                               \
+        return false;                                                   \
+    }                                                                   \
+    INSNOP_PREPARE(opT)                                                 \
+    {                                                                   \
+        return (ctxt->is_ ## opT1                                       \
+                ? insnop_prepare(opT1)(&ctxt->ctxt_ ## opT1,            \
+                                       env, s, modrm, is_write)         \
+                : insnop_prepare(opT2)(&ctxt->ctxt_ ## opT2,            \
+                                       env, s, modrm, is_write));       \
+    }                                                                   \
+    INSNOP_FINALIZE(opT)                                                \
+    {                                                                   \
+        (ctxt->is_ ## opT1                                              \
+         ? insnop_finalize(opT1)(&ctxt->ctxt_ ## opT1,                  \
+                                 env, s, modrm, is_write, arg)          \
+         : insnop_finalize(opT2)(&ctxt->ctxt_ ## opT2,                  \
+                                 env, s, modrm, is_write, arg));        \
+    }
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 20/75] target/i386: introduce generic load-store operand
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (18 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 19/75] target/i386: introduce generic either-or operand Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 21/75] target/i386: introduce tcg register operands Jan Bobek
                   ` (53 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This operand attempts to capture the "indirect" or "memory" operand in
a generic way. It significatly reduces the amount code that needs to
be written in order to read operands from memory to temporary storage
and write them back.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 57 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 68c6b7d16b..1be6176934 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4670,6 +4670,63 @@ static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
                                  env, s, modrm, is_write, arg));        \
     }
 
+/*
+ * Generic load-store operand
+ */
+#define insnop_ldst(opTarg, opTptr)             \
+    insnop_ldst_ ## opTarg ## opTptr
+
+#define INSNOP_LDST(opTarg, opTptr)                                     \
+    static void insnop_ldst(opTarg, opTptr)(CPUX86State *env,           \
+                                            DisasContext *s,            \
+                                            bool is_write,              \
+                                            insnop_arg_t(opTarg) arg,   \
+                                            insnop_arg_t(opTptr) ptr)
+
+#define DEF_INSNOP_LDST(opT, opTarg, opTptr)                            \
+    typedef insnop_arg_t(opTarg) insnop_arg_t(opT);                     \
+    typedef struct {                                                    \
+        insnop_ctxt_t(opTarg) arg_ctxt;                                 \
+        insnop_ctxt_t(opTptr) ptr_ctxt;                                 \
+        insnop_arg_t(opTptr) ptr;                                       \
+    } insnop_ctxt_t(opT);                                               \
+                                                                        \
+    /* forward declaration */                                           \
+    INSNOP_LDST(opTarg, opTptr);                                        \
+                                                                        \
+    INSNOP_INIT(opT)                                                    \
+    {                                                                   \
+        return insnop_init(opTarg)(&ctxt->arg_ctxt,                     \
+                                   env, s, modrm, is_write)             \
+            && insnop_init(opTptr)(&ctxt->ptr_ctxt,                     \
+                                   env, s, modrm, is_write);            \
+    }                                                                   \
+    INSNOP_PREPARE(opT)                                                 \
+    {                                                                   \
+        const insnop_arg_t(opTarg) arg =                                \
+            insnop_prepare(opTarg)(&ctxt->arg_ctxt,                     \
+                                   env, s, modrm, is_write);            \
+        ctxt->ptr =                                                     \
+            insnop_prepare(opTptr)(&ctxt->ptr_ctxt,                     \
+                                   env, s, modrm, is_write);            \
+        if (!is_write) {                                                \
+            insnop_ldst(opTarg, opTptr)(env, s, is_write,               \
+                                        arg, ctxt->ptr);                \
+        }                                                               \
+        return arg;                                                     \
+    }                                                                   \
+    INSNOP_FINALIZE(opT)                                                \
+    {                                                                   \
+        if (is_write) {                                                 \
+            insnop_ldst(opTarg, opTptr)(env, s, is_write,               \
+                                        arg, ctxt->ptr);                \
+        }                                                               \
+        insnop_finalize(opTptr)(&ctxt->ptr_ctxt,                        \
+                                env, s, modrm, is_write, ctxt->ptr);    \
+        insnop_finalize(opTarg)(&ctxt->arg_ctxt,                        \
+                                env, s, modrm, is_write, arg);          \
+    }
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 21/75] target/i386: introduce tcg register operands
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (19 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 20/75] target/i386: introduce generic load-store operand Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 22/75] target/i386: introduce modrm operand Jan Bobek
                   ` (52 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

TCG operands allocate a 32-bit or 64-bit TCG temporary and later
automatically free it.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 44 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 1be6176934..80cfb59978 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4727,6 +4727,50 @@ static bool check_cpuid(CPUX86State *env, DisasContext *s, CheckCpuidFeat feat)
                                 env, s, modrm, is_write, arg);          \
     }
 
+/*
+ * tcg_i32
+ *
+ * Operand which allocates a 32-bit TCG temporary and frees it
+ * automatically after use.
+ */
+typedef TCGv_i32 insnop_arg_t(tcg_i32);
+typedef struct {} insnop_ctxt_t(tcg_i32);
+
+INSNOP_INIT(tcg_i32)
+{
+    return true;
+}
+INSNOP_PREPARE(tcg_i32)
+{
+    return tcg_temp_new_i32();
+}
+INSNOP_FINALIZE(tcg_i32)
+{
+    tcg_temp_free_i32(arg);
+}
+
+/*
+ * tcg_i64
+ *
+ * Operand which allocates a 64-bit TCG temporary and frees it
+ * automatically after use.
+ */
+typedef TCGv_i64 insnop_arg_t(tcg_i64);
+typedef struct {} insnop_ctxt_t(tcg_i64);
+
+INSNOP_INIT(tcg_i64)
+{
+    return true;
+}
+INSNOP_PREPARE(tcg_i64)
+{
+    return tcg_temp_new_i64();
+}
+INSNOP_FINALIZE(tcg_i64)
+{
+    tcg_temp_free_i64(arg);
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 22/75] target/i386: introduce modrm operand
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (20 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 21/75] target/i386: introduce tcg register operands Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 23/75] target/i386: introduce operands for decoding modrm fields Jan Bobek
                   ` (51 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This permits the ModR/M byte to be passed raw into the code generator,
effectively allowing to short-circuit the operand decoding mechanism
and do the decoding work manually in the code generator.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 80cfb59978..a0a9f64ff3 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4771,6 +4771,26 @@ INSNOP_FINALIZE(tcg_i64)
     tcg_temp_free_i64(arg);
 }
 
+/*
+ * modrm
+ *
+ * Operand whose value is the ModR/M byte.
+ */
+typedef int insnop_arg_t(modrm);
+typedef struct {} insnop_ctxt_t(modrm);
+
+INSNOP_INIT(modrm)
+{
+    return true;
+}
+INSNOP_PREPARE(modrm)
+{
+    return modrm;
+}
+INSNOP_FINALIZE(modrm)
+{
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 23/75] target/i386: introduce operands for decoding modrm fields
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (21 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 22/75] target/i386: introduce modrm operand Jan Bobek
@ 2019-08-21 17:28 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 24/75] target/i386: introduce operand for direct-only r/m field Jan Bobek
                   ` (50 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The old code uses bitshifts and bitwise-and all over the place for
decoding ModR/M fields. Avoid doing that by introducing proper
decoding operands.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 62 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index a0a9f64ff3..b3b316d389 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4791,6 +4791,68 @@ INSNOP_FINALIZE(modrm)
 {
 }
 
+/*
+ * modrm_mod
+ *
+ * Operand whose value is the MOD field of the ModR/M byte.
+ */
+typedef int insnop_arg_t(modrm_mod);
+typedef struct {} insnop_ctxt_t(modrm_mod);
+
+INSNOP_INIT(modrm_mod)
+{
+    return true;
+}
+INSNOP_PREPARE(modrm_mod)
+{
+    return (modrm >> 6) & 3;
+}
+INSNOP_FINALIZE(modrm_mod)
+{
+}
+
+/*
+ * modrm_reg
+ *
+ * Operand whose value is the REG field of the ModR/M byte, extended
+ * with the REX.R bit if REX prefix is present.
+ */
+typedef int insnop_arg_t(modrm_reg);
+typedef struct {} insnop_ctxt_t(modrm_reg);
+
+INSNOP_INIT(modrm_reg)
+{
+    return true;
+}
+INSNOP_PREPARE(modrm_reg)
+{
+    return ((modrm >> 3) & 7) | REX_R(s);
+}
+INSNOP_FINALIZE(modrm_reg)
+{
+}
+
+/*
+ * modrm_rm
+ *
+ * Operand whose value is the RM field of the ModR/M byte, extended
+ * with the REX.B bit if REX prefix is present.
+ */
+typedef int insnop_arg_t(modrm_rm);
+typedef struct {} insnop_ctxt_t(modrm_rm);
+
+INSNOP_INIT(modrm_rm)
+{
+    return true;
+}
+INSNOP_PREPARE(modrm_rm)
+{
+    return (modrm & 7) | REX_B(s);
+}
+INSNOP_FINALIZE(modrm_rm)
+{
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 24/75] target/i386: introduce operand for direct-only r/m field
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (22 preceding siblings ...)
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 23/75] target/i386: introduce operands for decoding modrm fields Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 25/75] target/i386: introduce Ib (immediate) operand Jan Bobek
                   ` (49 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Many operands can only decode successfully if the ModR/M byte has the
direct form (i.e. MOD=3). Capture this common aspect by introducing a
special direct-only r/m field operand.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index b3b316d389..886f64a58f 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4853,6 +4853,43 @@ INSNOP_FINALIZE(modrm_rm)
 {
 }
 
+/*
+ * modrm_rm_direct
+ *
+ * Equivalent of modrm_rm, but only decodes successfully if
+ * the ModR/M byte has the direct form (i.e. MOD=3).
+ */
+typedef insnop_arg_t(modrm_rm) insnop_arg_t(modrm_rm_direct);
+typedef struct {
+    insnop_ctxt_t(modrm_rm) rm;
+} insnop_ctxt_t(modrm_rm_direct);
+
+INSNOP_INIT(modrm_rm_direct)
+{
+    bool ret;
+    insnop_ctxt_t(modrm_mod) modctxt;
+
+    ret = insnop_init(modrm_mod)(&modctxt, env, s, modrm, 0);
+    if (ret) {
+        const int mod = insnop_prepare(modrm_mod)(&modctxt, env, s, modrm, 0);
+        if (mod == 3) {
+            ret = insnop_init(modrm_rm)(&ctxt->rm, env, s, modrm, is_write);
+        } else {
+            ret = false;
+        }
+        insnop_finalize(modrm_mod)(&modctxt, env, s, modrm, 0, mod);
+    }
+    return ret;
+}
+INSNOP_PREPARE(modrm_rm_direct)
+{
+    return insnop_prepare(modrm_rm)(&ctxt->rm, env, s, modrm, is_write);
+}
+INSNOP_FINALIZE(modrm_rm_direct)
+{
+    insnop_finalize(modrm_rm)(&ctxt->rm, env, s, modrm, is_write, arg);
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 25/75] target/i386: introduce Ib (immediate) operand
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (23 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 24/75] target/i386: introduce operand for direct-only r/m field Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 26/75] target/i386: introduce M* (memptr) operands Jan Bobek
                   ` (48 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce the immediate-byte operand, which loads a byte from the
instruction stream and passes its value as the operand.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 886f64a58f..9f02ea58d1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4890,6 +4890,24 @@ INSNOP_FINALIZE(modrm_rm_direct)
     insnop_finalize(modrm_rm)(&ctxt->rm, env, s, modrm, is_write, arg);
 }
 
+/*
+ * Immediate operand
+ */
+typedef uint8_t insnop_arg_t(Ib);
+typedef struct {} insnop_ctxt_t(Ib);
+
+INSNOP_INIT(Ib)
+{
+    return true;
+}
+INSNOP_PREPARE(Ib)
+{
+    return x86_ldub_code(env, s);
+}
+INSNOP_FINALIZE(Ib)
+{
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 26/75] target/i386: introduce M* (memptr) operands
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (24 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 25/75] target/i386: introduce Ib (immediate) operand Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 27/75] target/i386: introduce G*, R*, E* (general register) operands Jan Bobek
                   ` (47 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The memory-pointer operand decodes the indirect form of ModR/M byte,
loads the effective address into a register and passes that register
as the operand.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 9f02ea58d1..46c41cc3be 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4908,6 +4908,42 @@ INSNOP_FINALIZE(Ib)
 {
 }
 
+/*
+ * Memory-pointer operand
+ */
+typedef TCGv insnop_arg_t(M);
+typedef struct {} insnop_ctxt_t(M);
+
+INSNOP_INIT(M)
+{
+    bool ret;
+    insnop_ctxt_t(modrm_mod) modctxt;
+
+    ret = insnop_init(modrm_mod)(&modctxt, env, s, modrm, is_write);
+    if (ret) {
+        const int mod =
+            insnop_prepare(modrm_mod)(&modctxt, env, s, modrm, is_write);
+        ret = (mod != 3);
+        insnop_finalize(modrm_mod)(&modctxt, env, s, modrm, is_write, mod);
+    }
+    return ret;
+}
+INSNOP_PREPARE(M)
+{
+    gen_lea_modrm(env, s, modrm);
+    return s->A0;
+}
+INSNOP_FINALIZE(M)
+{
+}
+
+DEF_INSNOP_ALIAS(Mb, M)
+DEF_INSNOP_ALIAS(Mw, M)
+DEF_INSNOP_ALIAS(Md, M)
+DEF_INSNOP_ALIAS(Mq, M)
+DEF_INSNOP_ALIAS(Mdq, M)
+DEF_INSNOP_ALIAS(Mqq, M)
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 27/75] target/i386: introduce G*, R*, E* (general register) operands
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (25 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 26/75] target/i386: introduce M* (memptr) operands Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 28/75] target/i386: introduce P*, N*, Q* (MMX) operands Jan Bobek
                   ` (46 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

These address the general-purpose register file. The corresponding
32-bit or 64-bit register is passed as the operand value.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 88 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 46c41cc3be..d6d32c7f06 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4944,6 +4944,94 @@ DEF_INSNOP_ALIAS(Mq, M)
 DEF_INSNOP_ALIAS(Mdq, M)
 DEF_INSNOP_ALIAS(Mqq, M)
 
+/*
+ * 32-bit general register operands
+ */
+DEF_INSNOP_LDST(Gd, tcg_i32, modrm_reg)
+DEF_INSNOP_LDST(Rd, tcg_i32, modrm_rm_direct)
+
+INSNOP_LDST(tcg_i32, modrm_reg)
+{
+    assert(0 <= ptr && ptr < CPU_NB_REGS);
+    if (is_write) {
+        tcg_gen_extu_i32_tl(cpu_regs[ptr], arg);
+    } else {
+        tcg_gen_trunc_tl_i32(arg, cpu_regs[ptr]);
+    }
+}
+INSNOP_LDST(tcg_i32, modrm_rm_direct)
+{
+    insnop_ldst(tcg_i32, modrm_reg)(env, s, is_write, arg, ptr);
+}
+
+DEF_INSNOP_LDST(MEd, tcg_i32, Md)
+DEF_INSNOP_EITHER(Ed, Rd, MEd)
+DEF_INSNOP_LDST(MRdMw, tcg_i32, Mw)
+DEF_INSNOP_EITHER(RdMw, Rd, MRdMw)
+DEF_INSNOP_LDST(MRdMb, tcg_i32, Mb)
+DEF_INSNOP_EITHER(RdMb, Rd, MRdMb)
+
+INSNOP_LDST(tcg_i32, Md)
+{
+    if (is_write) {
+        tcg_gen_qemu_st_i32(arg, ptr, s->mem_index, MO_LEUL);
+    } else {
+        tcg_gen_qemu_ld_i32(arg, ptr, s->mem_index, MO_LEUL);
+    }
+}
+INSNOP_LDST(tcg_i32, Mw)
+{
+    if (is_write) {
+        tcg_gen_qemu_st_i32(arg, ptr, s->mem_index, MO_LEUW);
+    } else {
+        tcg_gen_qemu_ld_i32(arg, ptr, s->mem_index, MO_LEUW);
+    }
+}
+INSNOP_LDST(tcg_i32, Mb)
+{
+    if (is_write) {
+        tcg_gen_qemu_st_i32(arg, ptr, s->mem_index, MO_UB);
+    } else {
+        tcg_gen_qemu_ld_i32(arg, ptr, s->mem_index, MO_UB);
+    }
+}
+
+/*
+ * 64-bit general register operands
+ */
+DEF_INSNOP_LDST(Gq, tcg_i64, modrm_reg)
+DEF_INSNOP_LDST(Rq, tcg_i64, modrm_rm_direct)
+
+INSNOP_LDST(tcg_i64, modrm_reg)
+{
+#ifdef TARGET_X86_64
+    assert(0 <= ptr && ptr < CPU_NB_REGS);
+    if (is_write) {
+        tcg_gen_mov_i64(cpu_regs[ptr], arg);
+    } else {
+        tcg_gen_mov_i64(arg, cpu_regs[ptr]);
+    }
+#else /* !TARGET_X86_64 */
+    g_assert_not_reached();
+#endif /* !TARGET_X86_64 */
+}
+INSNOP_LDST(tcg_i64, modrm_rm_direct)
+{
+    insnop_ldst(tcg_i64, modrm_reg)(env, s, is_write, arg, ptr);
+}
+
+DEF_INSNOP_LDST(MEq, tcg_i64, Mq)
+DEF_INSNOP_EITHER(Eq, Rq, MEq)
+
+INSNOP_LDST(tcg_i64, Mq)
+{
+    if (is_write) {
+        tcg_gen_qemu_st_i64(arg, ptr, s->mem_index, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(arg, ptr, s->mem_index, MO_LEQ);
+    }
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 28/75] target/i386: introduce P*, N*, Q* (MMX) operands
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (26 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 27/75] target/i386: introduce G*, R*, E* (general register) operands Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 29/75] target/i386: introduce H*, L*, V*, U*, W* (SSE/AVX) operands Jan Bobek
                   ` (45 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

These address the MMX-technology register file; the corresponding
cpu_env offset is passed as the operand value. Notably, offset of the
entire register is pased at all times, regardless of the operand-size
suffix.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 80 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index d6d32c7f06..815354f12b 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5032,6 +5032,86 @@ INSNOP_LDST(tcg_i64, Mq)
     }
 }
 
+/*
+ * MMX-technology register operands
+ */
+typedef unsigned int insnop_arg_t(mm);
+typedef struct {} insnop_ctxt_t(mm);
+
+INSNOP_INIT(mm)
+{
+    return true;
+}
+INSNOP_PREPARE(mm)
+{
+    return offsetof(CPUX86State, mmx_t0);
+}
+INSNOP_FINALIZE(mm)
+{
+}
+
+#define DEF_INSNOP_MM(opT, opTmmid)                                     \
+    typedef insnop_arg_t(mm) insnop_arg_t(opT);                         \
+    typedef struct {                                                    \
+        insnop_ctxt_t(opTmmid) mmid;                                    \
+    } insnop_ctxt_t(opT);                                               \
+                                                                        \
+    INSNOP_INIT(opT)                                                    \
+    {                                                                   \
+        return insnop_init(opTmmid)(&ctxt->mmid, env, s, modrm, is_write); \
+    }                                                                   \
+    INSNOP_PREPARE(opT)                                                 \
+    {                                                                   \
+        const insnop_arg_t(opTmmid) mmid =                              \
+            insnop_prepare(opTmmid)(&ctxt->mmid, env, s, modrm, is_write); \
+        const insnop_arg_t(opT) arg =                                   \
+            offsetof(CPUX86State, fpregs[mmid & 7].mmx);                \
+        insnop_finalize(opTmmid)(&ctxt->mmid, env, s, modrm, is_write, mmid); \
+        return arg;                                                     \
+    }                                                                   \
+    INSNOP_FINALIZE(opT)                                                \
+    {                                                                   \
+    }
+
+DEF_INSNOP_MM(P, modrm_reg)
+DEF_INSNOP_ALIAS(Pq, P)
+
+DEF_INSNOP_MM(N, modrm_rm_direct)
+DEF_INSNOP_ALIAS(Nd, N)
+DEF_INSNOP_ALIAS(Nq, N)
+
+DEF_INSNOP_LDST(MQd, mm, Md)
+DEF_INSNOP_LDST(MQq, mm, Mq)
+DEF_INSNOP_EITHER(Qd, Nd, MQd)
+DEF_INSNOP_EITHER(Qq, Nq, MQq)
+
+INSNOP_LDST(mm, Md)
+{
+    const insnop_arg_t(mm) ofs = offsetof(MMXReg, MMX_L(0));
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+    if (is_write) {
+        tcg_gen_ld_i32(r32, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i32(r32, ptr, s->mem_index, MO_LEUL);
+    } else {
+        tcg_gen_qemu_ld_i32(r32, ptr, s->mem_index, MO_LEUL);
+        tcg_gen_st_i32(r32, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i32(r32);
+}
+INSNOP_LDST(mm, Mq)
+{
+    const insnop_arg_t(mm) ofs = offsetof(MMXReg, MMX_Q(0));
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    if (is_write) {
+        tcg_gen_ld_i64(r64, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i64(r64, ptr, s->mem_index, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(r64, ptr, s->mem_index, MO_LEQ);
+        tcg_gen_st_i64(r64, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i64(r64);
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 29/75] target/i386: introduce H*, L*, V*, U*, W* (SSE/AVX) operands
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (27 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 28/75] target/i386: introduce P*, N*, Q* (MMX) operands Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 30/75] target/i386: alias H* operands with the V* operands Jan Bobek
                   ` (44 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

These address the SSE/AVX-technology register file. Offset of the
entire corresponding register is passed as the operand value,
regardless of operand-size suffix.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 220 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 220 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 815354f12b..23ba1d5edd 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4941,6 +4941,7 @@ DEF_INSNOP_ALIAS(Mb, M)
 DEF_INSNOP_ALIAS(Mw, M)
 DEF_INSNOP_ALIAS(Md, M)
 DEF_INSNOP_ALIAS(Mq, M)
+DEF_INSNOP_ALIAS(Mhq, M)
 DEF_INSNOP_ALIAS(Mdq, M)
 DEF_INSNOP_ALIAS(Mqq, M)
 
@@ -5112,6 +5113,225 @@ INSNOP_LDST(mm, Mq)
     tcg_temp_free_i64(r64);
 }
 
+/*
+ * vex_v
+ *
+ * Operand whose value is the VVVV field of the VEX prefix.
+ */
+typedef int insnop_arg_t(vex_v);
+typedef struct {} insnop_ctxt_t(vex_v);
+
+INSNOP_INIT(vex_v)
+{
+    return s->prefix & PREFIX_VEX;
+}
+INSNOP_PREPARE(vex_v)
+{
+    return s->vex_v;
+}
+INSNOP_FINALIZE(vex_v)
+{
+}
+
+/*
+ * is4
+ *
+ * Upper 4 bits of the 8-bit immediate selector, used with the SSE/AVX
+ * register file in some instructions.
+ */
+typedef int insnop_arg_t(is4);
+typedef struct {
+    insnop_ctxt_t(Ib) ctxt_Ib;
+    insnop_arg_t(Ib) arg_Ib;
+} insnop_ctxt_t(is4);
+
+INSNOP_INIT(is4)
+{
+    return insnop_init(Ib)(&ctxt->ctxt_Ib, env, s, modrm, is_write);
+}
+INSNOP_PREPARE(is4)
+{
+    ctxt->arg_Ib = insnop_prepare(Ib)(&ctxt->ctxt_Ib, env, s, modrm, is_write);
+    return (ctxt->arg_Ib >> 4) & 0xf;
+}
+INSNOP_FINALIZE(is4)
+{
+    insnop_finalize(Ib)(&ctxt->ctxt_Ib, env, s, modrm, is_write, ctxt->arg_Ib);
+}
+
+/*
+ * SSE/AVX-technology registers
+ */
+typedef unsigned int insnop_arg_t(xmm);
+typedef struct {} insnop_ctxt_t(xmm);
+
+INSNOP_INIT(xmm)
+{
+    return true;
+}
+INSNOP_PREPARE(xmm)
+{
+    return offsetof(CPUX86State, xmm_t0);
+}
+INSNOP_FINALIZE(xmm)
+{
+}
+
+#define DEF_INSNOP_XMM(opT, opTxmmid)                                   \
+    typedef insnop_arg_t(xmm) insnop_arg_t(opT);                        \
+    typedef struct {                                                    \
+        insnop_ctxt_t(opTxmmid) xmmid;                                  \
+    } insnop_ctxt_t(opT);                                               \
+                                                                        \
+    INSNOP_INIT(opT)                                                    \
+    {                                                                   \
+        return insnop_init(opTxmmid)(&ctxt->xmmid, env, s, modrm, is_write); \
+    }                                                                   \
+    INSNOP_PREPARE(opT)                                                 \
+    {                                                                   \
+        const insnop_arg_t(opTxmmid) xmmid =                            \
+            insnop_prepare(opTxmmid)(&ctxt->xmmid, env, s, modrm, is_write); \
+        const insnop_arg_t(opT) arg =                                   \
+            offsetof(CPUX86State, xmm_regs[xmmid]);                     \
+        insnop_finalize(opTxmmid)(&ctxt->xmmid, env, s,                 \
+                                  modrm, is_write, xmmid);              \
+        return arg;                                                     \
+    }                                                                   \
+    INSNOP_FINALIZE(opT)                                                \
+    {                                                                   \
+    }
+
+DEF_INSNOP_XMM(V, modrm_reg)
+DEF_INSNOP_ALIAS(Vd, V)
+DEF_INSNOP_ALIAS(Vq, V)
+DEF_INSNOP_ALIAS(Vdq, V)
+DEF_INSNOP_ALIAS(Vqq, V)
+
+DEF_INSNOP_XMM(U, modrm_rm_direct)
+DEF_INSNOP_ALIAS(Ub, U)
+DEF_INSNOP_ALIAS(Uw, U)
+DEF_INSNOP_ALIAS(Ud, U)
+DEF_INSNOP_ALIAS(Uq, U)
+DEF_INSNOP_ALIAS(Udq, U)
+DEF_INSNOP_ALIAS(Uqq, U)
+
+DEF_INSNOP_XMM(H, vex_v)
+DEF_INSNOP_ALIAS(Hd, H)
+DEF_INSNOP_ALIAS(Hq, H)
+DEF_INSNOP_ALIAS(Hdq, H)
+DEF_INSNOP_ALIAS(Hqq, H)
+
+DEF_INSNOP_XMM(L, is4)
+DEF_INSNOP_ALIAS(Ldq, L)
+DEF_INSNOP_ALIAS(Lqq, L)
+
+DEF_INSNOP_LDST(MWb, xmm, Mb)
+DEF_INSNOP_LDST(MWw, xmm, Mw)
+DEF_INSNOP_LDST(MWd, xmm, Md)
+DEF_INSNOP_LDST(MWq, xmm, Mq)
+DEF_INSNOP_LDST(MWdq, xmm, Mdq)
+DEF_INSNOP_LDST(MWqq, xmm, Mqq)
+DEF_INSNOP_LDST(MUdqMhq, xmm, Mhq)
+DEF_INSNOP_EITHER(Wb, Ub, MWb)
+DEF_INSNOP_EITHER(Ww, Uw, MWw)
+DEF_INSNOP_EITHER(Wd, Ud, MWd)
+DEF_INSNOP_EITHER(Wq, Uq, MWq)
+DEF_INSNOP_EITHER(Wdq, Udq, MWdq)
+DEF_INSNOP_EITHER(Wqq, Uqq, MWqq)
+DEF_INSNOP_EITHER(UdqMq, Udq, MWq)
+DEF_INSNOP_EITHER(UdqMhq, Udq, MUdqMhq)
+
+INSNOP_LDST(xmm, Mb)
+{
+    const insnop_arg_t(xmm) ofs = offsetof(ZMMReg, ZMM_B(0));
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+    if (is_write) {
+        tcg_gen_ld_i32(r32, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i32(r32, ptr, s->mem_index, MO_UB);
+    } else {
+        tcg_gen_qemu_ld_i32(r32, ptr, s->mem_index, MO_UB);
+        tcg_gen_st_i32(r32, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i32(r32);
+}
+INSNOP_LDST(xmm, Mw)
+{
+    const insnop_arg_t(xmm) ofs = offsetof(ZMMReg, ZMM_W(0));
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+    if (is_write) {
+        tcg_gen_ld_i32(r32, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i32(r32, ptr, s->mem_index, MO_LEUW);
+    } else {
+        tcg_gen_qemu_ld_i32(r32, ptr, s->mem_index, MO_LEUW);
+        tcg_gen_st_i32(r32, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i32(r32);
+}
+INSNOP_LDST(xmm, Md)
+{
+    const insnop_arg_t(xmm) ofs = offsetof(ZMMReg, ZMM_L(0));
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+    if (is_write) {
+        tcg_gen_ld_i32(r32, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i32(r32, ptr, s->mem_index, MO_LEUL);
+    } else {
+        tcg_gen_qemu_ld_i32(r32, ptr, s->mem_index, MO_LEUL);
+        tcg_gen_st_i32(r32, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i32(r32);
+}
+INSNOP_LDST(xmm, Mq)
+{
+    const insnop_arg_t(xmm) ofs = offsetof(ZMMReg, ZMM_Q(0));
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    if (is_write) {
+        tcg_gen_ld_i64(r64, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i64(r64, ptr, s->mem_index, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(r64, ptr, s->mem_index, MO_LEQ);
+        tcg_gen_st_i64(r64, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i64(r64);
+}
+INSNOP_LDST(xmm, Mdq)
+{
+    const insnop_arg_t(xmm) ofs = offsetof(ZMMReg, ZMM_Q(0));
+    const insnop_arg_t(xmm) ofs1 = offsetof(ZMMReg, ZMM_Q(1));
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    const TCGv ptr1 = tcg_temp_new();
+    tcg_gen_addi_tl(ptr1, ptr, sizeof(uint64_t));
+    if (is_write) {
+        tcg_gen_ld_i64(r64, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i64(r64, ptr, s->mem_index, MO_LEQ);
+        tcg_gen_ld_i64(r64, cpu_env, arg + ofs1);
+        tcg_gen_qemu_st_i64(r64, ptr1, s->mem_index, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(r64, ptr, s->mem_index, MO_LEQ);
+        tcg_gen_st_i64(r64, cpu_env, arg + ofs);
+        tcg_gen_qemu_ld_i64(r64, ptr1, s->mem_index, MO_LEQ);
+        tcg_gen_st_i64(r64, cpu_env, arg + ofs1);
+    }
+    tcg_temp_free_i64(r64);
+    tcg_temp_free(ptr1);
+}
+INSNOP_LDST(xmm, Mqq)
+{
+    insnop_ldst(xmm, Mdq)(env, s, is_write, arg, ptr);
+}
+INSNOP_LDST(xmm, Mhq)
+{
+    const insnop_arg_t(xmm) ofs = offsetof(ZMMReg, ZMM_Q(1));
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    if (is_write) {
+        tcg_gen_ld_i64(r64, cpu_env, arg + ofs);
+        tcg_gen_qemu_st_i64(r64, ptr, s->mem_index, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(r64, ptr, s->mem_index, MO_LEQ);
+        tcg_gen_st_i64(r64, cpu_env, arg + ofs);
+    }
+    tcg_temp_free_i64(r64);
+}
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 30/75] target/i386: alias H* operands with the V* operands
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (28 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 29/75] target/i386: introduce H*, L*, V*, U*, W* (SSE/AVX) operands Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 31/75] target/i386: introduce code generators Jan Bobek
                   ` (43 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This effectively means that all thee-operand VEX-encoded instructions
using the VEX.V field will be decoded as their non-VEX-encoded
two-operand equivalents. While this is incorrect, it is the old
behavior, which is useful during development.

Note: This changeset is intended for development only and shall not be
included in the final patch series.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 23ba1d5edd..2e78bed78f 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5215,11 +5215,10 @@ DEF_INSNOP_ALIAS(Uq, U)
 DEF_INSNOP_ALIAS(Udq, U)
 DEF_INSNOP_ALIAS(Uqq, U)
 
-DEF_INSNOP_XMM(H, vex_v)
-DEF_INSNOP_ALIAS(Hd, H)
-DEF_INSNOP_ALIAS(Hq, H)
-DEF_INSNOP_ALIAS(Hdq, H)
-DEF_INSNOP_ALIAS(Hqq, H)
+DEF_INSNOP_ALIAS(Hd, Vd)
+DEF_INSNOP_ALIAS(Hq, Vq)
+DEF_INSNOP_ALIAS(Hdq, Vdq)
+DEF_INSNOP_ALIAS(Hqq, Vqq)
 
 DEF_INSNOP_XMM(L, is4)
 DEF_INSNOP_ALIAS(Ldq, L)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 31/75] target/i386: introduce code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (29 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 30/75] target/i386: alias H* operands with the V* operands Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-22  4:33   ` Aleksandar Markovic
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 32/75] target/i386: introduce helper-based code generator macros Jan Bobek
                   ` (42 subsequent siblings)
  73 siblings, 1 reply; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

In this context, "code generators" are functions that receive decoded
instruction operands and emit TCG ops implementing the correct
instruction functionality. Introduce the naming macros first, actual
generator macros will be added later.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 46 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 2e78bed78f..603a5b80a1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5331,6 +5331,52 @@ INSNOP_LDST(xmm, Mhq)
     tcg_temp_free_i64(r64);
 }
 
+/*
+ * Code generators
+ */
+#define gen_insn(mnem, argc, ...)               \
+    glue(gen_insn, argc)(mnem, ## __VA_ARGS__)
+#define gen_insn0(mnem)                         \
+    gen_ ## mnem ## _0
+#define gen_insn1(mnem, opT1)                   \
+    gen_ ## mnem ## _1 ## opT1
+#define gen_insn2(mnem, opT1, opT2)             \
+    gen_ ## mnem ## _2 ## opT1 ## opT2
+#define gen_insn3(mnem, opT1, opT2, opT3)       \
+    gen_ ## mnem ## _3 ## opT1 ## opT2 ## opT3
+#define gen_insn4(mnem, opT1, opT2, opT3, opT4)         \
+    gen_ ## mnem ## _4 ## opT1 ## opT2 ## opT3 ## opT4
+#define gen_insn5(mnem, opT1, opT2, opT3, opT4, opT5)           \
+    gen_ ## mnem ## _5 ## opT1 ## opT2 ## opT3 ## opT4 ## opT5
+
+#define GEN_INSN0(mnem)                         \
+    static void gen_insn0(mnem)(                \
+        CPUX86State *env, DisasContext *s)
+#define GEN_INSN1(mnem, opT1)                   \
+    static void gen_insn1(mnem, opT1)(          \
+        CPUX86State *env, DisasContext *s,      \
+        insnop_arg_t(opT1) arg1)
+#define GEN_INSN2(mnem, opT1, opT2)                             \
+    static void gen_insn2(mnem, opT1, opT2)(                    \
+        CPUX86State *env, DisasContext *s,                      \
+        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2)
+#define GEN_INSN3(mnem, opT1, opT2, opT3)                       \
+    static void gen_insn3(mnem, opT1, opT2, opT3)(              \
+        CPUX86State *env, DisasContext *s,                      \
+        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2,       \
+        insnop_arg_t(opT3) arg3)
+#define GEN_INSN4(mnem, opT1, opT2, opT3, opT4)                 \
+    static void gen_insn4(mnem, opT1, opT2, opT3, opT4)(        \
+        CPUX86State *env, DisasContext *s,                      \
+        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2,       \
+        insnop_arg_t(opT3) arg3, insnop_arg_t(opT4) arg4)
+#define GEN_INSN5(mnem, opT1, opT2, opT3, opT4, opT5)           \
+    static void gen_insn5(mnem, opT1, opT2, opT3, opT4, opT5)(  \
+        CPUX86State *env, DisasContext *s,                      \
+        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2,       \
+        insnop_arg_t(opT3) arg3, insnop_arg_t(opT4) arg4,       \
+        insnop_arg_t(opT5) arg5)
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 32/75] target/i386: introduce helper-based code generator macros
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (30 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 31/75] target/i386: introduce code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 33/75] target/i386: introduce gvec-based " Jan Bobek
                   ` (41 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Code generators defined using these macros rely on a helper function
(as emitted by gen_helper_*).

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 160 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 603a5b80a1..046914578b 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5377,6 +5377,166 @@ INSNOP_LDST(xmm, Mhq)
         insnop_arg_t(opT3) arg3, insnop_arg_t(opT4) arg4,       \
         insnop_arg_t(opT5) arg5)
 
+#define DEF_GEN_INSN0_HELPER(mnem, helper)      \
+    GEN_INSN0(mnem)                             \
+    {                                           \
+        gen_helper_ ## helper(cpu_env);         \
+    }
+
+#define DEF_GEN_INSN2_HELPER_EPD(mnem, helper, opT1, opT2)      \
+    GEN_INSN2(mnem, opT1, opT2)                                 \
+    {                                                           \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();           \
+                                                                \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);              \
+        gen_helper_ ## helper(cpu_env, arg1_ptr, arg2);         \
+                                                                \
+        tcg_temp_free_ptr(arg1_ptr);                            \
+    }
+#define DEF_GEN_INSN2_HELPER_DEP(mnem, helper, opT1, opT2)      \
+    GEN_INSN2(mnem, opT1, opT2)                                 \
+    {                                                           \
+        const TCGv_ptr arg2_ptr = tcg_temp_new_ptr();           \
+                                                                \
+        tcg_gen_addi_ptr(arg2_ptr, cpu_env, arg2);              \
+        gen_helper_ ## helper(arg1, cpu_env, arg2_ptr);         \
+                                                                \
+        tcg_temp_free_ptr(arg2_ptr);                            \
+    }
+#ifdef TARGET_X86_64
+#define DEF_GEN_INSN2_HELPER_EPQ(mnem, helper, opT1, opT2)      \
+    DEF_GEN_INSN2_HELPER_EPD(mnem, helper, opT1, opT2)
+#define DEF_GEN_INSN2_HELPER_QEP(mnem, helper, opT1, opT2)      \
+    DEF_GEN_INSN2_HELPER_DEP(mnem, helper, opT1, opT2)
+#else /* !TARGET_X86_64 */
+#define DEF_GEN_INSN2_HELPER_EPQ(mnem, helper, opT1, opT2)      \
+    GEN_INSN2(mnem, opT1, opT2)                                 \
+    {                                                           \
+        g_assert_not_reached();                                 \
+    }
+#define DEF_GEN_INSN2_HELPER_QEP(mnem, helper, opT1, opT2)      \
+    GEN_INSN2(mnem, opT1, opT2)                                 \
+    {                                                           \
+        g_assert_not_reached();                                 \
+    }
+#endif /* !TARGET_X86_64 */
+#define DEF_GEN_INSN2_HELPER_EPP(mnem, helper, opT1, opT2)      \
+    GEN_INSN2(mnem, opT1, opT2)                                 \
+    {                                                           \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();           \
+        const TCGv_ptr arg2_ptr = tcg_temp_new_ptr();           \
+                                                                \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);              \
+        tcg_gen_addi_ptr(arg2_ptr, cpu_env, arg2);              \
+        gen_helper_ ## helper(cpu_env, arg1_ptr, arg2_ptr);     \
+                                                                \
+        tcg_temp_free_ptr(arg1_ptr);                            \
+        tcg_temp_free_ptr(arg2_ptr);                            \
+    }
+
+#define DEF_GEN_INSN3_HELPER_EPD(mnem, helper, opT1, opT2, opT3)        \
+    GEN_INSN3(mnem, opT1, opT2, opT3)                                   \
+    {                                                                   \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();                   \
+                                                                        \
+        assert(arg1 == arg2);                                           \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);                      \
+        gen_helper_ ## helper(cpu_env, arg1_ptr, arg3);                 \
+                                                                        \
+        tcg_temp_free_ptr(arg1_ptr);                                    \
+    }
+#ifdef TARGET_X86_64
+#define DEF_GEN_INSN3_HELPER_EPQ(mnem, helper, opT1, opT2, opT3)        \
+    DEF_GEN_INSN3_HELPER_EPD(mnem, helper, opT1, opT2, opT3)
+#else /* !TARGET_X86_64 */
+#define DEF_GEN_INSN3_HELPER_EPQ(mnem, helper, opT1, opT2, opT3)        \
+    GEN_INSN3(mnem, opT1, opT2, opT3)                                   \
+    {                                                                   \
+        g_assert_not_reached();                                         \
+    }
+#endif /* !TARGET_X86_64 */
+#define DEF_GEN_INSN3_HELPER_EPP(mnem, helper, opT1, opT2, opT3)        \
+    GEN_INSN3(mnem, opT1, opT2, opT3)                                   \
+    {                                                                   \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_ptr arg3_ptr = tcg_temp_new_ptr();                   \
+                                                                        \
+        assert(arg1 == arg2);                                           \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);                      \
+        tcg_gen_addi_ptr(arg3_ptr, cpu_env, arg3);                      \
+        gen_helper_ ## helper(cpu_env, arg1_ptr, arg3_ptr);             \
+                                                                        \
+        tcg_temp_free_ptr(arg1_ptr);                                    \
+        tcg_temp_free_ptr(arg3_ptr);                                    \
+    }
+#define DEF_GEN_INSN3_HELPER_PPI(mnem, helper, opT1, opT2, opT3)        \
+    GEN_INSN3(mnem, opT1, opT2, opT3)                                   \
+    {                                                                   \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_ptr arg2_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_i32 arg3_r32 = tcg_temp_new_i32();                   \
+                                                                        \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);                      \
+        tcg_gen_addi_ptr(arg2_ptr, cpu_env, arg2);                      \
+        tcg_gen_movi_i32(arg3_r32, arg3);                               \
+        gen_helper_ ## helper(arg1_ptr, arg2_ptr, arg3_r32);            \
+                                                                        \
+        tcg_temp_free_ptr(arg1_ptr);                                    \
+        tcg_temp_free_ptr(arg2_ptr);                                    \
+        tcg_temp_free_i32(arg3_r32);                                    \
+    }
+#define DEF_GEN_INSN3_HELPER_EPPI(mnem, helper, opT1, opT2, opT3)       \
+    GEN_INSN3(mnem, opT1, opT2, opT3)                                   \
+    {                                                                   \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_ptr arg2_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_i32 arg3_r32 = tcg_temp_new_i32();                   \
+                                                                        \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);                      \
+        tcg_gen_addi_ptr(arg2_ptr, cpu_env, arg2);                      \
+        tcg_gen_movi_i32(arg3_r32, arg3);                               \
+        gen_helper_ ## helper(cpu_env, arg1_ptr, arg2_ptr, arg3_r32);   \
+                                                                        \
+        tcg_temp_free_ptr(arg1_ptr);                                    \
+        tcg_temp_free_ptr(arg2_ptr);                                    \
+        tcg_temp_free_i32(arg3_r32);                                    \
+    }
+
+#define DEF_GEN_INSN4_HELPER_PPI(mnem, helper, opT1, opT2, opT3, opT4)  \
+    GEN_INSN4(mnem, opT1, opT2, opT3, opT4)                             \
+    {                                                                   \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_ptr arg3_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_i32 arg4_r32 = tcg_temp_new_i32();                   \
+                                                                        \
+        assert(arg1 == arg2);                                           \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);                      \
+        tcg_gen_addi_ptr(arg3_ptr, cpu_env, arg3);                      \
+        tcg_gen_movi_i32(arg4_r32, arg4);                               \
+        gen_helper_ ## helper(arg1_ptr, arg3_ptr, arg4_r32);            \
+                                                                        \
+        tcg_temp_free_ptr(arg1_ptr);                                    \
+        tcg_temp_free_ptr(arg3_ptr);                                    \
+        tcg_temp_free_i32(arg4_r32);                                    \
+    }
+#define DEF_GEN_INSN4_HELPER_EPPI(mnem, helper, opT1, opT2, opT3, opT4) \
+    GEN_INSN4(mnem, opT1, opT2, opT3, opT4)                             \
+    {                                                                   \
+        const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_ptr arg3_ptr = tcg_temp_new_ptr();                   \
+        const TCGv_i32 arg4_r32 = tcg_temp_new_i32();                   \
+                                                                        \
+        assert(arg1 == arg2);                                           \
+        tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);                      \
+        tcg_gen_addi_ptr(arg3_ptr, cpu_env, arg3);                      \
+        tcg_gen_movi_i32(arg4_r32, arg4);                               \
+        gen_helper_ ## helper(cpu_env, arg1_ptr, arg3_ptr, arg4_r32);   \
+                                                                        \
+        tcg_temp_free_ptr(arg1_ptr);                                    \
+        tcg_temp_free_ptr(arg3_ptr);                                    \
+        tcg_temp_free_i32(arg4_r32);                                    \
+    }
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 33/75] target/i386: introduce gvec-based code generator macros
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (31 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 32/75] target/i386: introduce helper-based code generator macros Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 34/75] target/i386: introduce sse-opcode.inc.h Jan Bobek
                   ` (40 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Code generators defined using these macros rely on a gvec operation
(i.e. tcg_gen_gvec_*).

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 046914578b..eab36963c3 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -24,6 +24,7 @@
 #include "exec/exec-all.h"
 #include "tcg-gvec-desc.h"
 #include "tcg-op.h"
+#include "tcg-op-gvec.h"
 #include "exec/cpu_ldst.h"
 #include "exec/translator.h"
 
@@ -5537,6 +5538,30 @@ INSNOP_LDST(xmm, Mhq)
         tcg_temp_free_i32(arg4_r32);                                    \
     }
 
+#define MM_OPRSZ sizeof(MMXReg)
+#define MM_MAXSZ sizeof(MMXReg)
+#define XMM_OPRSZ (!(s->prefix & PREFIX_VEX) ? sizeof(XMMReg) : \
+                   s->vex_l ? sizeof(XMMReg) :                  \
+                   sizeof(XMMReg))
+#define XMM_MAXSZ (!(s->prefix & PREFIX_VEX) ? sizeof(XMMReg) : \
+                   sizeof(YMMReg))
+
+#define DEF_GEN_INSN2_GVEC(mnem, opT1, opT2, gvec, ...) \
+    GEN_INSN2(mnem, opT1, opT2)                         \
+    {                                                   \
+        gen_gvec_ ## gvec(arg1, arg2, ## __VA_ARGS__);  \
+    }
+#define DEF_GEN_INSN3_GVEC(mnem, opT1, opT2, opT3, gvec, ...)   \
+    GEN_INSN3(mnem, opT1, opT2, opT3)                           \
+    {                                                           \
+        gen_gvec_ ## gvec(arg1, arg2, arg3, ## __VA_ARGS__);    \
+    }
+#define DEF_GEN_INSN4_GVEC(mnem, opT1, opT2, opT3, opT4, gvec, ...)     \
+    GEN_INSN4(mnem, opT1, opT2, opT3, opT4)                             \
+    {                                                                   \
+        gen_gvec_ ## gvec(arg1, arg2, arg3, arg4, ## __VA_ARGS__);      \
+    }
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 34/75] target/i386: introduce sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (32 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 33/75] target/i386: introduce gvec-based " Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 35/75] target/i386: introduce instruction translator macros Jan Bobek
                   ` (39 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This header is intended to eventually list all supported instructions
along with some useful details (e.g. mnemonics, opcode, operands etc.)
It shall be used (along with some preprocessor magic) anytime we need
to automatically generate code for every instruction.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 75 ++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 target/i386/sse-opcode.inc.h

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
new file mode 100644
index 0000000000..04d0c49168
--- /dev/null
+++ b/target/i386/sse-opcode.inc.h
@@ -0,0 +1,75 @@
+#define FMTI____      (0, 0, 0, )
+#define FMTI__R__     (1, 1, 0, r)
+#define FMTI__RR__    (2, 2, 0, rr)
+#define FMTI__RRR__   (3, 3, 0, rrr)
+#define FMTI__W__     (1, 0, 1, w)
+#define FMTI__WR__    (2, 1, 1, wr)
+#define FMTI__WRR__   (3, 2, 1, wrr)
+#define FMTI__WRRR__  (4, 3, 1, wrrr)
+#define FMTI__WRRRR__ (5, 4, 1, wrrrr)
+#define FMTI__WWRRR__ (5, 3, 2, wwrrr)
+
+#define FMTI__(prop, fmti) FMTI_ ## prop ## __ fmti
+
+#define FMTI_ARGC__(argc, argc_rd, argc_wr, lower)    argc
+#define FMTI_ARGC_RD__(argc, argc_rd, argc_wr, lower) argc_rd
+#define FMTI_ARGC_WR__(argc, argc_rd, argc_wr, lower) argc_wr
+#define FMTI_LOWER__(argc, argc_rd, argc_wr, lower)   lower
+
+#define FMT_ARGC(fmt)    FMTI__(ARGC, FMTI__ ## fmt ## __)
+#define FMT_ARGC_RD(fmt) FMTI__(ARGC_RD, FMTI__ ## fmt ## __)
+#define FMT_ARGC_WR(fmt) FMTI__(ARGC_WR, FMTI__ ## fmt ## __)
+#define FMT_LOWER(fmt)   FMTI__(LOWER, FMTI__ ## fmt ## __)
+#define FMT_UPPER(fmt)   fmt
+
+#ifndef OPCODE
+#   define OPCODE(mnem, opcode, feat, fmt, ...)
+#endif /* OPCODE */
+
+#ifndef OPCODE_GRP
+#   define OPCODE_GRP(grpname, opcode)
+#endif /* OPCODE_GRP */
+
+#ifndef OPCODE_GRP_BEGIN
+#   define OPCODE_GRP_BEGIN(grpname)
+#endif /* OPCODE_GRP_BEGIN */
+
+#ifndef OPCODE_GRPMEMB
+#   define OPCODE_GRPMEMB(grpname, mnem, opcode, feat, fmt, ...)
+#endif /* OPCODE_GRPMEMB */
+
+#ifndef OPCODE_GRP_END
+#   define OPCODE_GRP_END(grpname)
+#endif /* OPCODE_GRP_END */
+
+#undef FMTI____
+#undef FMTI__R__
+#undef FMTI__RR__
+#undef FMTI__RRR__
+#undef FMTI__W__
+#undef FMTI__WR__
+#undef FMTI__WRR__
+#undef FMTI__WRRR__
+#undef FMTI__WRRRR__
+#undef FMTI__WWRRR__
+
+#undef FMTI__
+
+#undef FMTI_ARGC__
+#undef FMTI_ARGC_RD__
+#undef FMTI_ARGC_WR__
+#undef FMTI_LOWER__
+
+#undef FMT_ARGC
+#undef FMT_ARGC_RD
+#undef FMT_ARGC_WR
+#undef FMT_LOWER
+#undef FMT_UPPER
+
+#undef LEG
+#undef VEX
+#undef OPCODE
+#undef OPCODE_GRP
+#undef OPCODE_GRP_BEGIN
+#undef OPCODE_GRPMEMB
+#undef OPCODE_GRP_END
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 35/75] target/i386: introduce instruction translator macros
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (33 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 34/75] target/i386: introduce sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 36/75] target/i386: introduce MMX translators Jan Bobek
                   ` (38 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Instruction "translators" are responsible for decoding and loading
instruction operands, calling the passed-in code generator, and
storing the operands back (if applicable). Once a translator returns,
the instruction has been translated to TCG ops, hence the name.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 272 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 272 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index eab36963c3..1c2502ff50 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5562,6 +5562,262 @@ INSNOP_LDST(xmm, Mhq)
         gen_gvec_ ## gvec(arg1, arg2, arg3, arg4, ## __VA_ARGS__);      \
     }
 
+/*
+ * Instruction translators
+ */
+#define translate_insn(argc, ...)               \
+    glue(translate_insn, argc)(__VA_ARGS__)
+#define translate_insn0()                       \
+    translate_insn_0
+#define translate_insn1(opT1)                   \
+    translate_insn_1 ## opT1
+#define translate_insn2(opT1, opT2)             \
+    translate_insn_2 ## opT1 ## opT2
+#define translate_insn3(opT1, opT2, opT3)       \
+    translate_insn_3 ## opT1 ## opT2 ## opT3
+#define translate_insn4(opT1, opT2, opT3, opT4)         \
+    translate_insn_4 ## opT1 ## opT2 ## opT3 ## opT4
+#define translate_insn5(opT1, opT2, opT3, opT4, opT5)           \
+    translate_insn_5 ## opT1 ## opT2 ## opT3 ## opT4 ## opT5
+#define translate_group(grpname)                \
+    translate_group_ ## grpname
+
+static void translate_insn0()(
+    CPUX86State *env, DisasContext *s, int modrm,
+    CheckCpuidFeat feat, unsigned int argc_wr,
+    void (*gen_insn_fp)(CPUX86State *, DisasContext *))
+{
+    if (!check_cpuid(env, s, feat)) {
+        gen_illegal_opcode(s);
+        return;
+    }
+
+    (*gen_insn_fp)(env, s);
+}
+
+#define DEF_TRANSLATE_INSN1(opT1)                                       \
+    static void translate_insn1(opT1)(                                  \
+        CPUX86State *env, DisasContext *s, int modrm,                   \
+        CheckCpuidFeat feat, unsigned int argc_wr,                      \
+        void (*gen_insn1_fp)(CPUX86State *, DisasContext *,             \
+                             insnop_arg_t(opT1)))                       \
+    {                                                                   \
+        insnop_ctxt_t(opT1) ctxt1;                                      \
+                                                                        \
+        const bool is_write1 = (1 <= argc_wr);                          \
+                                                                        \
+        if (check_cpuid(env, s, feat)                                   \
+            && insnop_init(opT1)(&ctxt1, env, s, modrm, is_write1)) {   \
+                                                                        \
+            const insnop_arg_t(opT1) arg1 =                             \
+                insnop_prepare(opT1)(&ctxt1, env, s, modrm, is_write1); \
+                                                                        \
+            (*gen_insn1_fp)(env, s, arg1);                              \
+                                                                        \
+            insnop_finalize(opT1)(&ctxt1, env, s, modrm, is_write1, arg1); \
+        } else {                                                        \
+            gen_illegal_opcode(s);                                      \
+        }                                                               \
+    }
+
+#define DEF_TRANSLATE_INSN2(opT1, opT2)                                 \
+    static void translate_insn2(opT1, opT2)(                            \
+        CPUX86State *env, DisasContext *s, int modrm,                   \
+        CheckCpuidFeat feat, unsigned int argc_wr,                      \
+        void (*gen_insn2_fp)(CPUX86State *, DisasContext *,             \
+                             insnop_arg_t(opT1), insnop_arg_t(opT2)))   \
+    {                                                                   \
+        insnop_ctxt_t(opT1) ctxt1;                                      \
+        insnop_ctxt_t(opT2) ctxt2;                                      \
+                                                                        \
+        const bool is_write1 = (1 <= argc_wr);                          \
+        const bool is_write2 = (2 <= argc_wr);                          \
+                                                                        \
+        if (check_cpuid(env, s, feat)                                   \
+            && insnop_init(opT1)(&ctxt1, env, s, modrm, is_write1)      \
+            && insnop_init(opT2)(&ctxt2, env, s, modrm, is_write2)) {   \
+                                                                        \
+            const insnop_arg_t(opT1) arg1 =                             \
+                insnop_prepare(opT1)(&ctxt1, env, s, modrm, is_write1); \
+            const insnop_arg_t(opT2) arg2 =                             \
+                insnop_prepare(opT2)(&ctxt2, env, s, modrm, is_write2); \
+                                                                        \
+            (*gen_insn2_fp)(env, s, arg1, arg2);                        \
+                                                                        \
+            insnop_finalize(opT1)(&ctxt1, env, s, modrm, is_write1, arg1); \
+            insnop_finalize(opT2)(&ctxt2, env, s, modrm, is_write2, arg2); \
+        } else {                                                        \
+            gen_illegal_opcode(s);                                      \
+        }                                                               \
+    }
+
+#define DEF_TRANSLATE_INSN3(opT1, opT2, opT3)                           \
+    static void translate_insn3(opT1, opT2, opT3)(                      \
+        CPUX86State *env, DisasContext *s, int modrm,                   \
+        CheckCpuidFeat feat, unsigned int argc_wr,                      \
+        void (*gen_insn3_fp)(CPUX86State *, DisasContext *,             \
+                             insnop_arg_t(opT1), insnop_arg_t(opT2),    \
+                             insnop_arg_t(opT3)))                       \
+    {                                                                   \
+        insnop_ctxt_t(opT1) ctxt1;                                      \
+        insnop_ctxt_t(opT2) ctxt2;                                      \
+        insnop_ctxt_t(opT3) ctxt3;                                      \
+                                                                        \
+        const bool is_write1 = (1 <= argc_wr);                          \
+        const bool is_write2 = (2 <= argc_wr);                          \
+        const bool is_write3 = (3 <= argc_wr);                          \
+                                                                        \
+        if (check_cpuid(env, s, feat)                                   \
+            && insnop_init(opT1)(&ctxt1, env, s, modrm, is_write1)      \
+            && insnop_init(opT2)(&ctxt2, env, s, modrm, is_write2)      \
+            && insnop_init(opT3)(&ctxt3, env, s, modrm, is_write3)) {   \
+                                                                        \
+            const insnop_arg_t(opT1) arg1 =                             \
+                insnop_prepare(opT1)(&ctxt1, env, s, modrm, is_write1); \
+            const insnop_arg_t(opT2) arg2 =                             \
+                insnop_prepare(opT2)(&ctxt2, env, s, modrm, is_write2); \
+            const insnop_arg_t(opT3) arg3 =                             \
+                insnop_prepare(opT3)(&ctxt3, env, s, modrm, is_write3); \
+                                                                        \
+            (*gen_insn3_fp)(env, s, arg1, arg2, arg3);                  \
+                                                                        \
+            insnop_finalize(opT1)(&ctxt1, env, s, modrm, is_write1, arg1); \
+            insnop_finalize(opT2)(&ctxt2, env, s, modrm, is_write2, arg2); \
+            insnop_finalize(opT3)(&ctxt3, env, s, modrm, is_write3, arg3); \
+        } else {                                                        \
+            gen_illegal_opcode(s);                                      \
+        }                                                               \
+    }
+
+#define DEF_TRANSLATE_INSN4(opT1, opT2, opT3, opT4)                     \
+    static void translate_insn4(opT1, opT2, opT3, opT4)(                \
+        CPUX86State *env, DisasContext *s, int modrm,                   \
+        CheckCpuidFeat feat, unsigned int argc_wr,                      \
+        void (*gen_insn4_fp)(CPUX86State *, DisasContext *,             \
+                             insnop_arg_t(opT1), insnop_arg_t(opT2),    \
+                             insnop_arg_t(opT3), insnop_arg_t(opT4)))   \
+    {                                                                   \
+        insnop_ctxt_t(opT1) ctxt1;                                      \
+        insnop_ctxt_t(opT2) ctxt2;                                      \
+        insnop_ctxt_t(opT3) ctxt3;                                      \
+        insnop_ctxt_t(opT4) ctxt4;                                      \
+                                                                        \
+        const bool is_write1 = (1 <= argc_wr);                          \
+        const bool is_write2 = (2 <= argc_wr);                          \
+        const bool is_write3 = (3 <= argc_wr);                          \
+        const bool is_write4 = (4 <= argc_wr);                          \
+                                                                        \
+        if (check_cpuid(env, s, feat)                                   \
+            && insnop_init(opT1)(&ctxt1, env, s, modrm, is_write1)      \
+            && insnop_init(opT2)(&ctxt2, env, s, modrm, is_write2)      \
+            && insnop_init(opT3)(&ctxt3, env, s, modrm, is_write3)      \
+            && insnop_init(opT4)(&ctxt4, env, s, modrm, is_write4)) {   \
+                                                                        \
+            const insnop_arg_t(opT1) arg1 =                             \
+                insnop_prepare(opT1)(&ctxt1, env, s, modrm, is_write1); \
+            const insnop_arg_t(opT2) arg2 =                             \
+                insnop_prepare(opT2)(&ctxt2, env, s, modrm, is_write2); \
+            const insnop_arg_t(opT3) arg3 =                             \
+                insnop_prepare(opT3)(&ctxt3, env, s, modrm, is_write3); \
+            const insnop_arg_t(opT4) arg4 =                             \
+                insnop_prepare(opT4)(&ctxt4, env, s, modrm, is_write4); \
+                                                                        \
+            (*gen_insn4_fp)(env, s, arg1, arg2, arg3, arg4);            \
+                                                                        \
+            insnop_finalize(opT1)(&ctxt1, env, s, modrm, is_write1, arg1); \
+            insnop_finalize(opT2)(&ctxt2, env, s, modrm, is_write2, arg2); \
+            insnop_finalize(opT3)(&ctxt3, env, s, modrm, is_write3, arg3); \
+            insnop_finalize(opT4)(&ctxt4, env, s, modrm, is_write4, arg4); \
+        } else {                                                        \
+            gen_illegal_opcode(s);                                      \
+        }                                                               \
+    }
+
+#define DEF_TRANSLATE_INSN5(opT1, opT2, opT3, opT4, opT5)               \
+    static void translate_insn5(opT1, opT2, opT3, opT4, opT5)(          \
+        CPUX86State *env, DisasContext *s, int modrm,                   \
+        CheckCpuidFeat feat, unsigned int argc_wr,                      \
+        void (*gen_insn5_fp)(CPUX86State *, DisasContext *,             \
+                             insnop_arg_t(opT1), insnop_arg_t(opT2),    \
+                             insnop_arg_t(opT3), insnop_arg_t(opT4),    \
+                             insnop_arg_t(opT5)))                       \
+    {                                                                   \
+        insnop_ctxt_t(opT1) ctxt1;                                      \
+        insnop_ctxt_t(opT2) ctxt2;                                      \
+        insnop_ctxt_t(opT3) ctxt3;                                      \
+        insnop_ctxt_t(opT4) ctxt4;                                      \
+        insnop_ctxt_t(opT5) ctxt5;                                      \
+                                                                        \
+        const bool is_write1 = (1 <= argc_wr);                          \
+        const bool is_write2 = (2 <= argc_wr);                          \
+        const bool is_write3 = (3 <= argc_wr);                          \
+        const bool is_write4 = (4 <= argc_wr);                          \
+        const bool is_write5 = (5 <= argc_wr);                          \
+                                                                        \
+        if (check_cpuid(env, s, feat)                                   \
+            && insnop_init(opT1)(&ctxt1, env, s, modrm, is_write1)      \
+            && insnop_init(opT2)(&ctxt2, env, s, modrm, is_write2)      \
+            && insnop_init(opT3)(&ctxt3, env, s, modrm, is_write3)      \
+            && insnop_init(opT4)(&ctxt4, env, s, modrm, is_write4)      \
+            && insnop_init(opT5)(&ctxt5, env, s, modrm, is_write5)) {   \
+                                                                        \
+            const insnop_arg_t(opT1) arg1 =                             \
+                insnop_prepare(opT1)(&ctxt1, env, s, modrm, is_write1); \
+            const insnop_arg_t(opT2) arg2 =                             \
+                insnop_prepare(opT2)(&ctxt2, env, s, modrm, is_write2); \
+            const insnop_arg_t(opT3) arg3 =                             \
+                insnop_prepare(opT3)(&ctxt3, env, s, modrm, is_write3); \
+            const insnop_arg_t(opT4) arg4 =                             \
+                insnop_prepare(opT4)(&ctxt4, env, s, modrm, is_write4); \
+            const insnop_arg_t(opT5) arg5 =                             \
+                insnop_prepare(opT5)(&ctxt5, env, s, modrm, is_write5); \
+                                                                        \
+            (*gen_insn5_fp)(env, s, arg1, arg2, arg3, arg4, arg5);      \
+                                                                        \
+            insnop_finalize(opT1)(&ctxt1, env, s, modrm, is_write1, arg1); \
+            insnop_finalize(opT2)(&ctxt2, env, s, modrm, is_write2, arg2); \
+            insnop_finalize(opT3)(&ctxt3, env, s, modrm, is_write3, arg3); \
+            insnop_finalize(opT4)(&ctxt4, env, s, modrm, is_write4, arg4); \
+            insnop_finalize(opT5)(&ctxt5, env, s, modrm, is_write5, arg5); \
+        } else {                                                        \
+            gen_illegal_opcode(s);                                      \
+        }                                                               \
+    }
+
+#define OPCODE_GRP_BEGIN(grpname)                                       \
+    static void translate_group(grpname)(                               \
+        CPUX86State *env, DisasContext *s, int modrm)                   \
+    {                                                                   \
+        bool ret;                                                       \
+        insnop_ctxt_t(modrm_reg) regctxt;                               \
+                                                                        \
+        ret = insnop_init(modrm_reg)(&regctxt, env, s, modrm, 0);       \
+        if (ret) {                                                      \
+            const insnop_arg_t(modrm_reg) reg =                         \
+                insnop_prepare(modrm_reg)(&regctxt, env, s, modrm, 0);  \
+                                                                        \
+            switch (reg & 7) {
+#define OPCODE_GRPMEMB(grpname, mnem, opcode, feat, fmt, ...)           \
+            case opcode:                                                \
+                translate_insn(FMT_ARGC(fmt), ## __VA_ARGS__)(          \
+                    env, s, modrm, CHECK_CPUID_ ## feat, FMT_ARGC_WR(fmt), \
+                    gen_insn(mnem, FMT_ARGC(fmt), ## __VA_ARGS__));     \
+                break;
+#define OPCODE_GRP_END(grpname)                                         \
+            default:                                                    \
+                ret = false;                                            \
+                break;                                                  \
+            }                                                           \
+                                                                        \
+            insnop_finalize(modrm_reg)(&regctxt, env, s, modrm, 0, reg); \
+        }                                                               \
+                                                                        \
+        if (!ret) {                                                     \
+            gen_illegal_opcode(s);                                      \
+        }                                                               \
+    }
+#include "sse-opcode.inc.h"
+
 static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
 {
     enum {
@@ -5642,6 +5898,22 @@ static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
             op = x86_ldub_code(env, s);
         } break;
 
+#define LEG(p, m, w, op)    CASES(op, 3, W, w, M, m, P, p)
+#define VEX(l, p, m, w, op) CASES(op, 4, W, w, M, m, P, p, VEX_L, l)
+#define OPCODE(mnem, cases, feat, fmt, ...)                             \
+        cases {                                                         \
+            const int modrm = 0 < FMT_ARGC(fmt) ? x86_ldub_code(env, s) : -1; \
+            translate_insn(FMT_ARGC(fmt), ## __VA_ARGS__)(              \
+                env, s, modrm, CHECK_CPUID_ ## feat, FMT_ARGC_WR(fmt),  \
+                gen_insn(mnem, FMT_ARGC(fmt), ## __VA_ARGS__));         \
+        } return;
+#define OPCODE_GRP(grpname, cases)                      \
+        cases {                                         \
+            const int modrm = x86_ldub_code(env, s);    \
+            translate_group(grpname)(env, s, modrm);    \
+        } return;
+#include "sse-opcode.inc.h"
+
         default: {
             if (m == M_0F38 || m == M_0F3A) {
                 /* rewind the advance_pc() x86_ldub_code() did */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 36/75] target/i386: introduce MMX translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (34 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 35/75] target/i386: introduce instruction translator macros Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 37/75] target/i386: introduce MMX code generators Jan Bobek
                   ` (37 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define instruction translators required
by MMX instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 1c2502ff50..96ba0f5704 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5651,6 +5651,13 @@ static void translate_insn0()(
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN2(Ed, Pq)
+DEF_TRANSLATE_INSN2(Eq, Pq)
+DEF_TRANSLATE_INSN2(Pq, Ed)
+DEF_TRANSLATE_INSN2(Pq, Eq)
+DEF_TRANSLATE_INSN2(Pq, Qq)
+DEF_TRANSLATE_INSN2(Qq, Pq)
+
 #define DEF_TRANSLATE_INSN3(opT1, opT2, opT3)                           \
     static void translate_insn3(opT1, opT2, opT3)(                      \
         CPUX86State *env, DisasContext *s, int modrm,                   \
@@ -5689,6 +5696,10 @@ static void translate_insn0()(
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN3(Nq, Nq, Ib)
+DEF_TRANSLATE_INSN3(Pq, Pq, Qd)
+DEF_TRANSLATE_INSN3(Pq, Pq, Qq)
+
 #define DEF_TRANSLATE_INSN4(opT1, opT2, opT3, opT4)                     \
     static void translate_insn4(opT1, opT2, opT3, opT4)(                \
         CPUX86State *env, DisasContext *s, int modrm,                   \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 37/75] target/i386: introduce MMX code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (35 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 36/75] target/i386: introduce MMX translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 38/75] target/i386: introduce MMX vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (36 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Define code generators required for MMX instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 100 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 96ba0f5704..fdfca03071 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5562,6 +5562,106 @@ INSNOP_LDST(xmm, Mhq)
         gen_gvec_ ## gvec(arg1, arg2, arg3, arg4, ## __VA_ARGS__);      \
     }
 
+GEN_INSN2(movq, Pq, Eq);        /* forward declaration */
+GEN_INSN2(movd, Pq, Ed)
+{
+    const insnop_arg_t(Eq) arg2_r64 = tcg_temp_new_i64();
+    tcg_gen_extu_i32_i64(arg2_r64, arg2);
+    gen_insn2(movq, Pq, Eq)(env, s, arg1, arg2_r64);
+    tcg_temp_free_i64(arg2_r64);
+}
+GEN_INSN2(movd, Ed, Pq)
+{
+    tcg_gen_ld_i32(arg1, cpu_env, arg2 + offsetof(MMXReg, MMX_L(0)));
+}
+
+GEN_INSN2(movq, Pq, Eq)
+{
+    tcg_gen_st_i64(arg2, cpu_env, arg1 + offsetof(MMXReg, MMX_Q(0)));
+}
+GEN_INSN2(movq, Eq, Pq)
+{
+    tcg_gen_ld_i64(arg1, cpu_env, arg2 + offsetof(MMXReg, MMX_Q(0)));
+}
+
+DEF_GEN_INSN2_GVEC(movq, Pq, Qq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movq, Qq, Pq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
+
+DEF_GEN_INSN3_GVEC(paddb, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(paddw, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(paddd, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(paddsb, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(paddsw, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(paddusb, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(paddusw, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_16)
+
+DEF_GEN_INSN3_GVEC(psubb, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(psubw, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(psubd, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(psubsb, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(psubsw, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(psubusb, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(psubusw, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_16)
+
+DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
+
+DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpeqd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpgtb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
+
+DEF_GEN_INSN3_GVEC(pand, Pq, Pq, Qq, and, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(pandn, Pq, Pq, Qq, andn, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(por, Pq, Pq, Qq, or, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(pxor, Pq, Pq, Qq, xor, MM_OPRSZ, MM_MAXSZ, MO_64)
+
+DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_mmx, Pq, Pq, Qq)
+
+#define DEF_GEN_PSHIFT_IMM_MM(mnem, opT1, opT2)                         \
+    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
+    {                                                                   \
+        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
+        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
+        const insnop_arg_t(Qq) arg3_mm =                                \
+            offsetof(CPUX86State, mmx_t0.MMX_Q(0));                     \
+                                                                        \
+        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
+        gen_insn2(movq, Pq, Eq)(env, s, arg3_mm, arg3_r64);             \
+        gen_insn3(mnem, Pq, Pq, Qq)(env, s, arg1, arg2, arg3_mm);       \
+    }
+
+DEF_GEN_PSHIFT_IMM_MM(psllw, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(pslld, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(psllq, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(psrlw, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(psrld, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(psrlq, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(psraw, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_MM(psrad, Nq, Nq)
+
+DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_mmx, Pq, Pq, Qd)
+DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_mmx, Pq, Pq, Qd)
+DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_mmx, Pq, Pq, Qd)
+DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_mmx, Pq, Pq, Qq)
+
+DEF_GEN_INSN0_HELPER(emms, emms)
+
 /*
  * Instruction translators
  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 38/75] target/i386: introduce MMX vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (36 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 37/75] target/i386: introduce MMX code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 39/75] target/i386: introduce SSE translators Jan Bobek
                   ` (35 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all MMX vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 136 +++++++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index 04d0c49168..e570d449fc 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -42,6 +42,142 @@
 #   define OPCODE_GRP_END(grpname)
 #endif /* OPCODE_GRP_END */
 
+/*
+ * MMX Instructions
+ * -----------------
+ * NP 0F 6E /r          MOVD mm,r/m32
+ * NP 0F 7E /r          MOVD r/m32,mm
+ * NP REX.W + 0F 6E /r  MOVQ mm,r/m64
+ * NP REX.W + 0F 7E /r  MOVQ r/m64,mm
+ * NP 0F 6F /r          MOVQ mm, mm/m64
+ * NP 0F 7F /r          MOVQ mm/m64, mm
+ * NP 0F FC /r          PADDB mm, mm/m64
+ * NP 0F FD /r          PADDW mm, mm/m64
+ * NP 0F FE /r          PADDD mm, mm/m64
+ * NP 0F EC /r          PADDSB mm, mm/m64
+ * NP 0F ED /r          PADDSW mm, mm/m64
+ * NP 0F DC /r          PADDUSB mm,mm/m64
+ * NP 0F DD /r          PADDUSW mm,mm/m64
+ * NP 0F F8 /r          PSUBB mm, mm/m64
+ * NP 0F F9 /r          PSUBW mm, mm/m64
+ * NP 0F FA /r          PSUBD mm, mm/m64
+ * NP 0F E8 /r          PSUBSB mm, mm/m64
+ * NP 0F E9 /r          PSUBSW mm, mm/m64
+ * NP 0F D8 /r          PSUBUSB mm, mm/m64
+ * NP 0F D9 /r          PSUBUSW mm, mm/m64
+ * NP 0F D5 /r          PMULLW mm, mm/m64
+ * NP 0F E5 /r          PMULHW mm, mm/m64
+ * NP 0F F5 /r          PMADDWD mm, mm/m64
+ * NP 0F 74 /r          PCMPEQB mm,mm/m64
+ * NP 0F 75 /r          PCMPEQW mm,mm/m64
+ * NP 0F 76 /r          PCMPEQD mm,mm/m64
+ * NP 0F 64 /r          PCMPGTB mm,mm/m64
+ * NP 0F 65 /r          PCMPGTW mm,mm/m64
+ * NP 0F 66 /r          PCMPGTD mm,mm/m64
+ * NP 0F DB /r          PAND mm, mm/m64
+ * NP 0F DF /r          PANDN mm, mm/m64
+ * NP 0F EB /r          POR mm, mm/m64
+ * NP 0F EF /r          PXOR mm, mm/m64
+ * NP 0F F1 /r          PSLLW mm, mm/m64
+ * NP 0F F2 /r          PSLLD mm, mm/m64
+ * NP 0F F3 /r          PSLLQ mm, mm/m64
+ * NP 0F D1 /r          PSRLW mm, mm/m64
+ * NP 0F D2 /r          PSRLD mm, mm/m64
+ * NP 0F D3 /r          PSRLQ mm, mm/m64
+ * NP 0F E1 /r          PSRAW mm,mm/m64
+ * NP 0F E2 /r          PSRAD mm,mm/m64
+ * NP 0F 63 /r          PACKSSWB mm1, mm2/m64
+ * NP 0F 6B /r          PACKSSDW mm1, mm2/m64
+ * NP 0F 67 /r          PACKUSWB mm, mm/m64
+ * NP 0F 68 /r          PUNPCKHBW mm, mm/m64
+ * NP 0F 69 /r          PUNPCKHWD mm, mm/m64
+ * NP 0F 6A /r          PUNPCKHDQ mm, mm/m64
+ * NP 0F 60 /r          PUNPCKLBW mm, mm/m32
+ * NP 0F 61 /r          PUNPCKLWD mm, mm/m32
+ * NP 0F 62 /r          PUNPCKLDQ mm, mm/m32
+ * NP 0F 77             EMMS
+ * NP 0F 71 /6 ib       PSLLW mm1, imm8
+ * NP 0F 71 /2 ib       PSRLW mm, imm8
+ * NP 0F 71 /4 ib       PSRAW mm,imm8
+ * NP 0F 72 /6 ib       PSLLD mm, imm8
+ * NP 0F 72 /2 ib       PSRLD mm, imm8
+ * NP 0F 72 /4 ib       PSRAD mm,imm8
+ * NP 0F 73 /6 ib       PSLLQ mm, imm8
+ * NP 0F 73 /2 ib       PSRLQ mm, imm8
+ */
+
+OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
+OPCODE(movd, LEG(NP, 0F, 0, 0x7e), MMX, WR, Ed, Pq)
+OPCODE(movq, LEG(NP, 0F, 1, 0x6e), MMX, WR, Pq, Eq)
+OPCODE(movq, LEG(NP, 0F, 1, 0x7e), MMX, WR, Eq, Pq)
+OPCODE(movq, LEG(NP, 0F, 0, 0x6f), MMX, WR, Pq, Qq)
+OPCODE(movq, LEG(NP, 0F, 0, 0x7f), MMX, WR, Qq, Pq)
+OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddd, LEG(NP, 0F, 0, 0xfe), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddsb, LEG(NP, 0F, 0, 0xec), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddsw, LEG(NP, 0F, 0, 0xed), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddusb, LEG(NP, 0F, 0, 0xdc), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddusw, LEG(NP, 0F, 0, 0xdd), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubd, LEG(NP, 0F, 0, 0xfa), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubsb, LEG(NP, 0F, 0, 0xe8), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubsw, LEG(NP, 0F, 0, 0xe9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubusb, LEG(NP, 0F, 0, 0xd8), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubusw, LEG(NP, 0F, 0, 0xd9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pand, LEG(NP, 0F, 0, 0xdb), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pandn, LEG(NP, 0F, 0, 0xdf), MMX, WRR, Pq, Pq, Qq)
+OPCODE(por, LEG(NP, 0F, 0, 0xeb), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pxor, LEG(NP, 0F, 0, 0xef), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psllw, LEG(NP, 0F, 0, 0xf1), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pslld, LEG(NP, 0F, 0, 0xf2), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psllq, LEG(NP, 0F, 0, 0xf3), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrlw, LEG(NP, 0F, 0, 0xd1), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrld, LEG(NP, 0F, 0, 0xd2), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrlq, LEG(NP, 0F, 0, 0xd3), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psraw, LEG(NP, 0F, 0, 0xe1), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrad, LEG(NP, 0F, 0, 0xe2), MMX, WRR, Pq, Pq, Qq)
+OPCODE(packsswb, LEG(NP, 0F, 0, 0x63), MMX, WRR, Pq, Pq, Qq)
+OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
+OPCODE(packuswb, LEG(NP, 0F, 0, 0x67), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpckhbw, LEG(NP, 0F, 0, 0x68), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpckhwd, LEG(NP, 0F, 0, 0x69), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpckhdq, LEG(NP, 0F, 0, 0x6a), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpcklbw, LEG(NP, 0F, 0, 0x60), MMX, WRR, Pq, Pq, Qd)
+OPCODE(punpcklwd, LEG(NP, 0F, 0, 0x61), MMX, WRR, Pq, Pq, Qd)
+OPCODE(punpckldq, LEG(NP, 0F, 0, 0x62), MMX, WRR, Pq, Pq, Qd)
+OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
+
+OPCODE_GRP(grp12_LEG_NP, LEG(NP, 0F, 0, 0x71))
+OPCODE_GRP_BEGIN(grp12_LEG_NP)
+    OPCODE_GRPMEMB(grp12_LEG_NP, psllw, 6, MMX, WRR, Nq, Nq, Ib)
+    OPCODE_GRPMEMB(grp12_LEG_NP, psrlw, 2, MMX, WRR, Nq, Nq, Ib)
+    OPCODE_GRPMEMB(grp12_LEG_NP, psraw, 4, MMX, WRR, Nq, Nq, Ib)
+OPCODE_GRP_END(grp12_LEG_NP)
+
+OPCODE_GRP(grp13_LEG_NP, LEG(NP, 0F, 0, 0x72))
+OPCODE_GRP_BEGIN(grp13_LEG_NP)
+    OPCODE_GRPMEMB(grp13_LEG_NP, pslld, 6, MMX, WRR, Nq, Nq, Ib)
+    OPCODE_GRPMEMB(grp13_LEG_NP, psrld, 2, MMX, WRR, Nq, Nq, Ib)
+    OPCODE_GRPMEMB(grp13_LEG_NP, psrad, 4, MMX, WRR, Nq, Nq, Ib)
+OPCODE_GRP_END(grp13_LEG_NP)
+
+OPCODE_GRP(grp14_LEG_NP, LEG(NP, 0F, 0, 0x73))
+OPCODE_GRP_BEGIN(grp14_LEG_NP)
+    OPCODE_GRPMEMB(grp14_LEG_NP, psllq, 6, MMX, WRR, Nq, Nq, Ib)
+    OPCODE_GRPMEMB(grp14_LEG_NP, psrlq, 2, MMX, WRR, Nq, Nq, Ib)
+OPCODE_GRP_END(grp14_LEG_NP)
+
 #undef FMTI____
 #undef FMTI__R__
 #undef FMTI__RR__
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 39/75] target/i386: introduce SSE translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (37 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 38/75] target/i386: introduce MMX vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 40/75] target/i386: introduce SSE code generators Jan Bobek
                   ` (34 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by SSE
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index fdfca03071..d77c08b7db 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5720,6 +5720,9 @@ static void translate_insn0()(
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN1(Mb)
+DEF_TRANSLATE_INSN1(Md)
+
 #define DEF_TRANSLATE_INSN2(opT1, opT2)                                 \
     static void translate_insn2(opT1, opT2)(                            \
         CPUX86State *env, DisasContext *s, int modrm,                   \
@@ -5753,10 +5756,29 @@ static void translate_insn0()(
 
 DEF_TRANSLATE_INSN2(Ed, Pq)
 DEF_TRANSLATE_INSN2(Eq, Pq)
+DEF_TRANSLATE_INSN2(Gd, Nq)
+DEF_TRANSLATE_INSN2(Gd, Udq)
+DEF_TRANSLATE_INSN2(Gd, Wd)
+DEF_TRANSLATE_INSN2(Gq, Nq)
+DEF_TRANSLATE_INSN2(Gq, Udq)
+DEF_TRANSLATE_INSN2(Gq, Wd)
+DEF_TRANSLATE_INSN2(Mdq, Vdq)
+DEF_TRANSLATE_INSN2(Mq, Pq)
+DEF_TRANSLATE_INSN2(Mq, Vdq)
+DEF_TRANSLATE_INSN2(Mq, Vq)
 DEF_TRANSLATE_INSN2(Pq, Ed)
 DEF_TRANSLATE_INSN2(Pq, Eq)
+DEF_TRANSLATE_INSN2(Pq, Nq)
 DEF_TRANSLATE_INSN2(Pq, Qq)
+DEF_TRANSLATE_INSN2(Pq, Wq)
 DEF_TRANSLATE_INSN2(Qq, Pq)
+DEF_TRANSLATE_INSN2(Vd, Ed)
+DEF_TRANSLATE_INSN2(Vd, Eq)
+DEF_TRANSLATE_INSN2(Vd, Wd)
+DEF_TRANSLATE_INSN2(Vdq, Qq)
+DEF_TRANSLATE_INSN2(Vdq, Wdq)
+DEF_TRANSLATE_INSN2(Wd, Vd)
+DEF_TRANSLATE_INSN2(Wdq, Vdq)
 
 #define DEF_TRANSLATE_INSN3(opT1, opT2, opT3)                           \
     static void translate_insn3(opT1, opT2, opT3)(                      \
@@ -5796,9 +5818,16 @@ DEF_TRANSLATE_INSN2(Qq, Pq)
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN3(Gd, Nq, Ib)
+DEF_TRANSLATE_INSN3(Gq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Nq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qd)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qq)
+DEF_TRANSLATE_INSN3(Pq, Qq, Ib)
+DEF_TRANSLATE_INSN3(Vd, Vd, Wd)
+DEF_TRANSLATE_INSN3(Vdq, Vdq, UdqMhq)
+DEF_TRANSLATE_INSN3(Vdq, Vdq, Wdq)
+DEF_TRANSLATE_INSN3(Vdq, Vq, Wq)
 
 #define DEF_TRANSLATE_INSN4(opT1, opT2, opT3, opT4)                     \
     static void translate_insn4(opT1, opT2, opT3, opT4)(                \
@@ -5844,6 +5873,11 @@ DEF_TRANSLATE_INSN3(Pq, Pq, Qq)
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN4(Pq, Pq, RdMw, Ib)
+DEF_TRANSLATE_INSN4(Vd, Vd, Wd, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, Wd, modrm_mod)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, Wdq, Ib)
+
 #define DEF_TRANSLATE_INSN5(opT1, opT2, opT3, opT4, opT5)               \
     static void translate_insn5(opT1, opT2, opT3, opT4, opT5)(          \
         CPUX86State *env, DisasContext *s, int modrm,                   \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 40/75] target/i386: introduce SSE code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (38 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 39/75] target/i386: introduce SSE translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 41/75] target/i386: introduce SSE vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (33 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by SSE instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 309 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 309 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index d77c08b7db..d1537bc1b7 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5586,6 +5586,90 @@ GEN_INSN2(movq, Eq, Pq)
 
 DEF_GEN_INSN2_GVEC(movq, Pq, Qq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movq, Qq, Pq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movaps, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movaps, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movups, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movups, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+
+GEN_INSN4(movss, Vdq, Vdq, Wd, modrm_mod)
+{
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    if (arg4 == 3) {
+        /* merging movss */
+        tcg_gen_ld_i32(r32, cpu_env, arg3 + offsetof(ZMMReg, ZMM_L(0)));
+        tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(0)));
+        if (arg1 != arg2) {
+            tcg_gen_ld_i32(r32, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(1)));
+            tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(1)));
+            tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(1)));
+            tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+        }
+    } else {
+        /* zero-extending movss */
+        tcg_gen_ld_i32(r32, cpu_env, arg3 + offsetof(ZMMReg, ZMM_L(0)));
+        tcg_gen_extu_i32_i64(r64, r32);
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+        tcg_gen_movi_i64(r64, 0);
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+    }
+    tcg_temp_free_i32(r32);
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(movss, Wd, Vd)
+{
+    gen_insn4(movss, Vdq, Vdq, Wd, modrm_mod)(env, s, arg1, arg1, arg2, 3);
+}
+
+GEN_INSN3(movhlps, Vdq, Vdq, UdqMhq)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    tcg_gen_ld_i64(r64, cpu_env, arg3 + offsetof(ZMMReg, ZMM_Q(1)));
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    if (arg1 != arg2) {
+        tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(1)));
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+    }
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(movlps, Mq, Vq)
+{
+    insnop_ldst(xmm, Mq)(env, s, 1, arg2, arg1);
+}
+
+GEN_INSN3(movlhps, Vdq, Vq, Wq)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    tcg_gen_ld_i64(r64, cpu_env, arg3 + offsetof(ZMMReg, ZMM_Q(0)));
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+    if (arg1 != arg2) {
+        tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    }
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(movhps, Mq, Vdq)
+{
+    insnop_ldst(xmm, Mhq)(env, s, 1, arg2, arg1);
+}
+
+DEF_GEN_INSN2_HELPER_DEP(pmovmskb, pmovmskb_mmx, Gd, Nq)
+GEN_INSN2(pmovmskb, Gq, Nq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(pmovmskb, Gd, Nq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
+
+DEF_GEN_INSN2_HELPER_DEP(movmskps, movmskps, Gd, Udq)
+GEN_INSN2(movmskps, Gq, Udq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(movmskps, Gd, Udq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
 
 DEF_GEN_INSN3_GVEC(paddb, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddw, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_16)
@@ -5594,6 +5678,8 @@ DEF_GEN_INSN3_GVEC(paddsb, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddsw, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddusb, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddusw, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_HELPER_EPP(addps, addps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(addss, addss, Vd, Vd, Wd)
 
 DEF_GEN_INSN3_GVEC(psubb, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubw, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_16)
@@ -5602,11 +5688,38 @@ DEF_GEN_INSN3_GVEC(psubsb, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubsw, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubusb, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubusw, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_HELPER_EPP(subps, subps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(subss, subss, Vd, Vd, Wd)
 
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(mulps, mulps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(mulss, mulss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
 
+DEF_GEN_INSN3_HELPER_EPP(divps, divps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(divss, divss, Vd, Vd, Wd)
+
+DEF_GEN_INSN2_HELPER_EPP(rcpps, rcpps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(rcpss, rcpss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(sqrtps, sqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(sqrtss, sqrtss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(rsqrtps, rsqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(rsqrtss, rsqrtss, Vd, Wd)
+
+DEF_GEN_INSN3_GVEC(pminub, Pq, Pq, Qq, umin, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(pminsw, Pq, Pq, Qq, smin, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_HELPER_EPP(minps, minps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(minss, minss, Vd, Vd, Wd)
+DEF_GEN_INSN3_GVEC(pmaxub, Pq, Pq, Qq, umax, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(pmaxsw, Pq, Pq, Qq, smax, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_HELPER_EPP(maxps, maxps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(maxss, maxss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
+
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_EQ)
@@ -5614,10 +5727,98 @@ DEF_GEN_INSN3_GVEC(pcmpgtb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_
 DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
 
+DEF_GEN_INSN3_HELPER_EPP(cmpeqps, cmpeqps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpeqss, cmpeqss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpltps, cmpltps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpltss, cmpltss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpleps, cmpleps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpless, cmpless, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpunordps, cmpunordps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpunordss, cmpunordss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpneqps, cmpneqps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpneqss, cmpneqss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpnltps, cmpnltps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpnltss, cmpnltss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpnleps, cmpnleps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpnless, cmpnless, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpordps, cmpordps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpordss, cmpordss, Vd, Vd, Wd)
+
+GEN_INSN4(cmpps, Vdq, Vdq, Wdq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(cmpeqps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(cmpltps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(cmpleps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(cmpunordps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(cmpneqps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(cmpnltps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(cmpnleps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(cmpordps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
+GEN_INSN4(cmpss, Vd, Vd, Wd, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(cmpeqss, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(cmpltss, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(cmpless, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(cmpunordss, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(cmpneqss, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(cmpnltss, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(cmpnless, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(cmpordss, Vd, Vd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
+DEF_GEN_INSN2_HELPER_EPP(comiss, comiss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(ucomiss, ucomiss, Vd, Wd)
+
 DEF_GEN_INSN3_GVEC(pand, Pq, Pq, Qq, and, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(andps, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pandn, Pq, Pq, Qq, andn, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(andnps, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(por, Pq, Pq, Qq, or, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(orps, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pxor, Pq, Pq, Qq, xor, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(xorps, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 
 DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_mmx, Pq, Pq, Qq)
@@ -5660,8 +5861,116 @@ DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_mmx, Pq, Pq, Qq)
 
+DEF_GEN_INSN3_HELPER_EPP(unpcklps, punpckldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(unpckhps, punpckhdq_xmm, Vdq, Vdq, Wdq)
+
+DEF_GEN_INSN3_HELPER_PPI(pshufw, pshufw_mmx, Pq, Qq, Ib)
+DEF_GEN_INSN4_HELPER_PPI(shufps, shufps, Vdq, Vdq, Wdq, Ib)
+
+GEN_INSN4(pinsrw, Pq, Pq, RdMw, Ib)
+{
+    assert(arg1 == arg2);
+
+    const size_t ofs = offsetof(MMXReg, MMX_W(arg4 & 3));
+    tcg_gen_st16_i32(arg3, cpu_env, arg1 + ofs);
+}
+
+GEN_INSN3(pextrw, Gd, Nq, Ib)
+{
+    const size_t ofs = offsetof(MMXReg, MMX_W(arg3 & 3));
+    tcg_gen_ld16u_i32(arg1, cpu_env, arg2 + ofs);
+}
+GEN_INSN3(pextrw, Gq, Nq, Ib)
+{
+    const size_t ofs = offsetof(MMXReg, MMX_W(arg3 & 3));
+    tcg_gen_ld16u_i64(arg1, cpu_env, arg2 + ofs);
+}
+
+DEF_GEN_INSN2_HELPER_EPP(cvtpi2ps, cvtpi2ps, Vdq, Qq)
+DEF_GEN_INSN2_HELPER_EPD(cvtsi2ss, cvtsi2ss, Vd, Ed)
+DEF_GEN_INSN2_HELPER_EPQ(cvtsi2ss, cvtsq2ss, Vd, Eq)
+DEF_GEN_INSN2_HELPER_EPP(cvtps2pi, cvtps2pi, Pq, Wq)
+DEF_GEN_INSN2_HELPER_DEP(cvtss2si, cvtss2si, Gd, Wd)
+DEF_GEN_INSN2_HELPER_QEP(cvtss2si, cvtss2sq, Gq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(cvttps2pi, cvttps2pi, Pq, Wq)
+DEF_GEN_INSN2_HELPER_DEP(cvttss2si, cvttss2si, Gd, Wd)
+DEF_GEN_INSN2_HELPER_QEP(cvttss2si, cvttss2sq, Gq, Wd)
+
+GEN_INSN2(maskmovq, Pq, Nq)
+{
+    const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();
+    const TCGv_ptr arg2_ptr = tcg_temp_new_ptr();
+
+    tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]);
+    gen_extu(s->aflag, s->A0);
+    gen_add_A0_ds_seg(s);
+
+    tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);
+    tcg_gen_addi_ptr(arg2_ptr, cpu_env, arg2);
+    gen_helper_maskmov_mmx(cpu_env, arg1_ptr, arg2_ptr, s->A0);
+
+    tcg_temp_free_ptr(arg1_ptr);
+    tcg_temp_free_ptr(arg2_ptr);
+}
+
+GEN_INSN2(movntps, Mdq, Vdq)
+{
+    insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
+}
+
+GEN_INSN2(movntq, Mq, Pq)
+{
+    insnop_ldst(mm, Mq)(env, s, 1, arg2, arg1);
+}
+
 DEF_GEN_INSN0_HELPER(emms, emms)
 
+GEN_INSN0(sfence)
+{
+    if (s->prefix & PREFIX_LOCK) {
+        gen_illegal_opcode(s);
+    } else {
+        tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
+    }
+}
+
+GEN_INSN1(ldmxcsr, Md)
+{
+    if ((s->flags & HF_EM_MASK) || !(s->flags & HF_OSFXSR_MASK)) {
+        gen_illegal_opcode(s);
+    } else if (s->flags & HF_TS_MASK) {
+        gen_exception(s, EXCP07_PREX);
+    } else {
+        tcg_gen_qemu_ld_i32(s->tmp2_i32, arg1, s->mem_index, MO_LEUL);
+        gen_helper_ldmxcsr(cpu_env, s->tmp2_i32);
+    }
+}
+GEN_INSN1(stmxcsr, Md)
+{
+    if ((s->flags & HF_EM_MASK) || !(s->flags & HF_OSFXSR_MASK)) {
+        gen_illegal_opcode(s);
+    } else if (s->flags & HF_TS_MASK) {
+        gen_exception(s, EXCP07_PREX);
+    } else {
+        const size_t ofs = offsetof(CPUX86State, mxcsr);
+        tcg_gen_ld_i32(s->tmp2_i32, cpu_env, ofs);
+        tcg_gen_qemu_st_i32(s->tmp2_i32, arg1, s->mem_index, MO_LEUL);
+    }
+}
+
+GEN_INSN1(prefetcht0, Mb)
+{
+}
+GEN_INSN1(prefetcht1, Mb)
+{
+}
+GEN_INSN1(prefetcht2, Mb)
+{
+}
+GEN_INSN1(prefetchnta, Mb)
+{
+}
+
 /*
  * Instruction translators
  */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 41/75] target/i386: introduce SSE vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (39 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 40/75] target/i386: introduce SSE code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 42/75] target/i386: introduce SSE2 translators Jan Bobek
                   ` (32 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the SSE vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 161 +++++++++++++++++++++++++++++++++++
 1 file changed, 161 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index e570d449fc..92581703cd 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -104,6 +104,85 @@
  * NP 0F 72 /4 ib       PSRAD mm,imm8
  * NP 0F 73 /6 ib       PSLLQ mm, imm8
  * NP 0F 73 /2 ib       PSRLQ mm, imm8
+ *
+ * SSE Instructions
+ * -----------------
+ * NP 0F 28 /r          MOVAPS xmm1, xmm2/m128
+ * NP 0F 29 /r          MOVAPS xmm2/m128, xmm1
+ * NP 0F 10 /r          MOVUPS xmm1, xmm2/m128
+ * NP 0F 11 /r          MOVUPS xmm2/m128, xmm1
+ * F3 0F 10 /r          MOVSS xmm1, xmm2/m32
+ * F3 0F 11 /r          MOVSS xmm2/m32, xmm1
+ * NP 0F 12 /r          MOVHLPS xmm1, xmm2
+ * NP 0F 12 /r          MOVLPS xmm1, m64
+ * 0F 13 /r             MOVLPS m64, xmm1
+ * NP 0F 16 /r          MOVLHPS xmm1, xmm2
+ * NP 0F 16 /r          MOVHPS xmm1, m64
+ * NP 0F 17 /r          MOVHPS m64, xmm1
+ * NP 0F D7 /r          PMOVMSKB r32, mm
+ * NP REX.W 0F D7 /r    PMOVMSKB r64, mm
+ * NP 0F 50 /r          MOVMSKPS r32, xmm
+ * NP REX.W 0F 50 /r    MOVMSKPS r64, xmm
+ * NP 0F 58 /r          ADDPS xmm1, xmm2/m128
+ * F3 0F 58 /r          ADDSS xmm1, xmm2/m32
+ * NP 0F 5C /r          SUBPS xmm1, xmm2/m128
+ * F3 0F 5C /r          SUBSS xmm1, xmm2/m32
+ * NP 0F E4 /r          PMULHUW mm1, mm2/m64
+ * NP 0F 59 /r          MULPS xmm1, xmm2/m128
+ * F3 0F 59 /r          MULSS xmm1,xmm2/m32
+ * NP 0F 5E /r          DIVPS xmm1, xmm2/m128
+ * F3 0F 5E /r          DIVSS xmm1, xmm2/m32
+ * NP 0F 53 /r          RCPPS xmm1, xmm2/m128
+ * F3 0F 53 /r          RCPSS xmm1, xmm2/m32
+ * NP 0F 51 /r          SQRTPS xmm1, xmm2/m128
+ * F3 0F 51 /r          SQRTSS xmm1, xmm2/m32
+ * NP 0F 52 /r          RSQRTPS xmm1, xmm2/m128
+ * F3 0F 52 /r          RSQRTSS xmm1, xmm2/m32
+ * NP 0F DA /r          PMINUB mm1, mm2/m64
+ * NP 0F EA /r          PMINSW mm1, mm2/m64
+ * NP 0F 5D /r          MINPS xmm1, xmm2/m128
+ * F3 0F 5D /r          MINSS xmm1,xmm2/m32
+ * NP 0F DE /r          PMAXUB mm1, mm2/m64
+ * NP 0F EE /r          PMAXSW mm1, mm2/m64
+ * NP 0F 5F /r          MAXPS xmm1, xmm2/m128
+ * F3 0F 5F /r          MAXSS xmm1, xmm2/m32
+ * NP 0F E0 /r          PAVGB mm1, mm2/m64
+ * NP 0F E3 /r          PAVGW mm1, mm2/m64
+ * NP 0F F6 /r          PSADBW mm1, mm2/m64
+ * NP 0F C2 /r ib       CMPPS xmm1, xmm2/m128, imm8
+ * F3 0F C2 /r ib       CMPSS xmm1, xmm2/m32, imm8
+ * NP 0F 2E /r          UCOMISS xmm1, xmm2/m32
+ * NP 0F 2F /r          COMISS xmm1, xmm2/m32
+ * NP 0F 54 /r          ANDPS xmm1, xmm2/m128
+ * NP 0F 55 /r          ANDNPS xmm1, xmm2/m128
+ * NP 0F 56 /r          ORPS xmm1, xmm2/m128
+ * NP 0F 57 /r          XORPS xmm1, xmm2/m128
+ * NP 0F 14 /r          UNPCKLPS xmm1, xmm2/m128
+ * NP 0F 15 /r          UNPCKHPS xmm1, xmm2/m128
+ * NP 0F 70 /r ib       PSHUFW mm1, mm2/m64, imm8
+ * NP 0F C6 /r ib       SHUFPS xmm1, xmm3/m128, imm8
+ * NP 0F C4 /r ib       PINSRW mm, r32/m16, imm8
+ * NP 0F C5 /r ib       PEXTRW r32, mm, imm8
+ * NP REX.W 0F C5 /r ib PEXTRW r64, mm, imm8
+ * NP 0F 2A /r          CVTPI2PS xmm, mm/m64
+ * F3 0F 2A /r          CVTSI2SS xmm1,r/m32
+ * F3 REX.W 0F 2A /r    CVTSI2SS xmm1,r/m64
+ * NP 0F 2D /r          CVTPS2PI mm, xmm/m64
+ * F3 0F 2D /r          CVTSS2SI r32,xmm1/m32
+ * F3 REX.W 0F 2D /r    CVTSS2SI r64,xmm1/m32
+ * NP 0F 2C /r          CVTTPS2PI mm, xmm/m64
+ * F3 0F 2C /r          CVTTSS2SI r32,xmm1/m32
+ * F3 REX.W 0F 2C /r    CVTTSS2SI r64,xmm1/m32
+ * NP 0F F7 /r          MASKMOVQ mm1, mm2
+ * NP 0F 2B /r          MOVNTPS m128, xmm1
+ * NP 0F E7 /r          MOVNTQ m64, mm
+ * NP 0F AE /7          SFENCE
+ * NP 0F AE /2          LDMXCSR m32
+ * NP 0F AE /3          STMXCSR m32
+ * 0F 18 /1             PREFETCHT0 m8
+ * 0F 18 /2             PREFETCHT1 m8
+ * 0F 18 /3             PREFETCHT2 m8
+ * 0F 18 /0             PREFETCHNTA m8
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -112,6 +191,20 @@ OPCODE(movq, LEG(NP, 0F, 1, 0x6e), MMX, WR, Pq, Eq)
 OPCODE(movq, LEG(NP, 0F, 1, 0x7e), MMX, WR, Eq, Pq)
 OPCODE(movq, LEG(NP, 0F, 0, 0x6f), MMX, WR, Pq, Qq)
 OPCODE(movq, LEG(NP, 0F, 0, 0x7f), MMX, WR, Qq, Pq)
+OPCODE(movaps, LEG(NP, 0F, 0, 0x28), SSE, WR, Vdq, Wdq)
+OPCODE(movaps, LEG(NP, 0F, 0, 0x29), SSE, WR, Wdq, Vdq)
+OPCODE(movups, LEG(NP, 0F, 0, 0x10), SSE, WR, Vdq, Wdq)
+OPCODE(movups, LEG(NP, 0F, 0, 0x11), SSE, WR, Wdq, Vdq)
+OPCODE(movss, LEG(F3, 0F, 0, 0x10), SSE, WRRR, Vdq, Vdq, Wd, modrm_mod)
+OPCODE(movss, LEG(F3, 0F, 0, 0x11), SSE, WR, Wd, Vd)
+OPCODE(movhlps, LEG(NP, 0F, 0, 0x12), SSE, WRR, Vdq, Vdq, UdqMhq)
+OPCODE(movlps, LEG(NP, 0F, 0, 0x13), SSE, WR, Mq, Vq)
+OPCODE(movlhps, LEG(NP, 0F, 0, 0x16), SSE, WRR, Vdq, Vq, Wq)
+OPCODE(movhps, LEG(NP, 0F, 0, 0x17), SSE, WR, Mq, Vdq)
+OPCODE(pmovmskb, LEG(NP, 0F, 0, 0xd7), SSE, WR, Gd, Nq)
+OPCODE(pmovmskb, LEG(NP, 0F, 1, 0xd7), SSE, WR, Gq, Nq)
+OPCODE(movmskps, LEG(NP, 0F, 0, 0x50), SSE, WR, Gd, Udq)
+OPCODE(movmskps, LEG(NP, 0F, 1, 0x50), SSE, WR, Gq, Udq)
 OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddd, LEG(NP, 0F, 0, 0xfe), MMX, WRR, Pq, Pq, Qq)
@@ -119,6 +212,8 @@ OPCODE(paddsb, LEG(NP, 0F, 0, 0xec), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddsw, LEG(NP, 0F, 0, 0xed), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddusb, LEG(NP, 0F, 0, 0xdc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddusw, LEG(NP, 0F, 0, 0xdd), MMX, WRR, Pq, Pq, Qq)
+OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(addss, LEG(F3, 0F, 0, 0x58), SSE, WRR, Vd, Vd, Wd)
 OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubd, LEG(NP, 0F, 0, 0xfa), MMX, WRR, Pq, Pq, Qq)
@@ -126,19 +221,51 @@ OPCODE(psubsb, LEG(NP, 0F, 0, 0xe8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubsw, LEG(NP, 0F, 0, 0xe9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubusb, LEG(NP, 0F, 0, 0xd8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubusw, LEG(NP, 0F, 0, 0xd9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(subss, LEG(F3, 0F, 0, 0x5c), SSE, WRR, Vd, Vd, Wd)
 OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
+OPCODE(mulps, LEG(NP, 0F, 0, 0x59), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(mulss, LEG(F3, 0F, 0, 0x59), SSE, WRR, Vd, Vd, Wd)
 OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(divps, LEG(NP, 0F, 0, 0x5e), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(divss, LEG(F3, 0F, 0, 0x5e), SSE, WRR, Vd, Vd, Wd)
+OPCODE(rcpps, LEG(NP, 0F, 0, 0x53), SSE, WR, Vdq, Wdq)
+OPCODE(rcpss, LEG(F3, 0F, 0, 0x53), SSE, WR, Vd, Wd)
+OPCODE(sqrtps, LEG(NP, 0F, 0, 0x51), SSE, WR, Vdq, Wdq)
+OPCODE(sqrtss, LEG(F3, 0F, 0, 0x51), SSE, WR, Vd, Wd)
+OPCODE(rsqrtps, LEG(NP, 0F, 0, 0x52), SSE, WR, Vdq, Wdq)
+OPCODE(rsqrtss, LEG(F3, 0F, 0, 0x52), SSE, WR, Vd, Wd)
+OPCODE(pminub, LEG(NP, 0F, 0, 0xda), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pminsw, LEG(NP, 0F, 0, 0xea), SSE, WRR, Pq, Pq, Qq)
+OPCODE(minps, LEG(NP, 0F, 0, 0x5d), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(minss, LEG(F3, 0F, 0, 0x5d), SSE, WRR, Vd, Vd, Wd)
+OPCODE(pmaxub, LEG(NP, 0F, 0, 0xde), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pmaxsw, LEG(NP, 0F, 0, 0xee), SSE, WRR, Pq, Pq, Qq)
+OPCODE(maxps, LEG(NP, 0F, 0, 0x5f), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(maxss, LEG(F3, 0F, 0, 0x5f), SSE, WRR, Vd, Vd, Wd)
+OPCODE(pavgb, LEG(NP, 0F, 0, 0xe0), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
+OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
+OPCODE(cmpps, LEG(NP, 0F, 0, 0xc2), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(cmpss, LEG(F3, 0F, 0, 0xc2), SSE, WRRR, Vd, Vd, Wd, Ib)
+OPCODE(ucomiss, LEG(NP, 0F, 0, 0x2e), SSE, RR, Vd, Wd)
+OPCODE(comiss, LEG(NP, 0F, 0, 0x2f), SSE, RR, Vd, Wd)
 OPCODE(pand, LEG(NP, 0F, 0, 0xdb), MMX, WRR, Pq, Pq, Qq)
+OPCODE(andps, LEG(NP, 0F, 0, 0x54), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(pandn, LEG(NP, 0F, 0, 0xdf), MMX, WRR, Pq, Pq, Qq)
+OPCODE(andnps, LEG(NP, 0F, 0, 0x55), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(por, LEG(NP, 0F, 0, 0xeb), MMX, WRR, Pq, Pq, Qq)
+OPCODE(orps, LEG(NP, 0F, 0, 0x56), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(pxor, LEG(NP, 0F, 0, 0xef), MMX, WRR, Pq, Pq, Qq)
+OPCODE(xorps, LEG(NP, 0F, 0, 0x57), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(psllw, LEG(NP, 0F, 0, 0xf1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pslld, LEG(NP, 0F, 0, 0xf2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psllq, LEG(NP, 0F, 0, 0xf3), MMX, WRR, Pq, Pq, Qq)
@@ -156,6 +283,25 @@ OPCODE(punpckhdq, LEG(NP, 0F, 0, 0x6a), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpcklbw, LEG(NP, 0F, 0, 0x60), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpcklwd, LEG(NP, 0F, 0, 0x61), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpckldq, LEG(NP, 0F, 0, 0x62), MMX, WRR, Pq, Pq, Qd)
+OPCODE(unpcklps, LEG(NP, 0F, 0, 0x14), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(unpckhps, LEG(NP, 0F, 0, 0x15), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(pshufw, LEG(NP, 0F, 0, 0x70), SSE, WRR, Pq, Qq, Ib)
+OPCODE(shufps, LEG(NP, 0F, 0, 0xc6), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(pinsrw, LEG(NP, 0F, 0, 0xc4), SSE, WRRR, Pq, Pq, RdMw, Ib)
+OPCODE(pextrw, LEG(NP, 0F, 0, 0xc5), SSE, WRR, Gd, Nq, Ib)
+OPCODE(pextrw, LEG(NP, 0F, 1, 0xc5), SSE, WRR, Gq, Nq, Ib)
+OPCODE(cvtpi2ps, LEG(NP, 0F, 0, 0x2a), SSE, WR, Vdq, Qq)
+OPCODE(cvtsi2ss, LEG(F3, 0F, 0, 0x2a), SSE, WR, Vd, Ed)
+OPCODE(cvtsi2ss, LEG(F3, 0F, 1, 0x2a), SSE, WR, Vd, Eq)
+OPCODE(cvtps2pi, LEG(NP, 0F, 0, 0x2d), SSE, WR, Pq, Wq)
+OPCODE(cvtss2si, LEG(F3, 0F, 0, 0x2d), SSE, WR, Gd, Wd)
+OPCODE(cvtss2si, LEG(F3, 0F, 1, 0x2d), SSE, WR, Gq, Wd)
+OPCODE(cvttps2pi, LEG(NP, 0F, 0, 0x2c), SSE, WR, Pq, Wq)
+OPCODE(cvttss2si, LEG(F3, 0F, 0, 0x2c), SSE, WR, Gd, Wd)
+OPCODE(cvttss2si, LEG(F3, 0F, 1, 0x2c), SSE, WR, Gq, Wd)
+OPCODE(maskmovq, LEG(NP, 0F, 0, 0xf7), SSE, RR, Pq, Nq)
+OPCODE(movntps, LEG(NP, 0F, 0, 0x2b), SSE, WR, Mdq, Vdq)
+OPCODE(movntq, LEG(NP, 0F, 0, 0xe7), SSE, WR, Mq, Pq)
 OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
 
 OPCODE_GRP(grp12_LEG_NP, LEG(NP, 0F, 0, 0x71))
@@ -178,6 +324,21 @@ OPCODE_GRP_BEGIN(grp14_LEG_NP)
     OPCODE_GRPMEMB(grp14_LEG_NP, psrlq, 2, MMX, WRR, Nq, Nq, Ib)
 OPCODE_GRP_END(grp14_LEG_NP)
 
+OPCODE_GRP(grp15_LEG_NP, LEG(NP, 0F, 0, 0xae))
+OPCODE_GRP_BEGIN(grp15_LEG_NP)
+    OPCODE_GRPMEMB(grp15_LEG_NP, sfence, 7, SSE, )
+    OPCODE_GRPMEMB(grp15_LEG_NP, ldmxcsr, 2, SSE, R, Md)
+    OPCODE_GRPMEMB(grp15_LEG_NP, stmxcsr, 3, SSE, W, Md)
+OPCODE_GRP_END(grp15_LEG_NP)
+
+OPCODE_GRP(grp16_LEG_NP, LEG(NP, 0F, 0, 0x18))
+OPCODE_GRP_BEGIN(grp16_LEG_NP)
+    OPCODE_GRPMEMB(grp16_LEG_NP, prefetcht0, 1, SSE, R, Mb)
+    OPCODE_GRPMEMB(grp16_LEG_NP, prefetcht1, 2, SSE, R, Mb)
+    OPCODE_GRPMEMB(grp16_LEG_NP, prefetcht2, 3, SSE, R, Mb)
+    OPCODE_GRPMEMB(grp16_LEG_NP, prefetchnta, 0, SSE, R, Mb)
+OPCODE_GRP_END(grp16_LEG_NP)
+
 #undef FMTI____
 #undef FMTI__R__
 #undef FMTI__RR__
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 42/75] target/i386: introduce SSE2 translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (40 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 41/75] target/i386: introduce SSE vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 43/75] target/i386: introduce SSE2 code generators Jan Bobek
                   ` (31 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by SSE2
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index d1537bc1b7..43917edc76 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6064,14 +6064,20 @@ DEF_TRANSLATE_INSN1(Md)
     }
 
 DEF_TRANSLATE_INSN2(Ed, Pq)
+DEF_TRANSLATE_INSN2(Ed, Vdq)
 DEF_TRANSLATE_INSN2(Eq, Pq)
+DEF_TRANSLATE_INSN2(Eq, Vdq)
 DEF_TRANSLATE_INSN2(Gd, Nq)
 DEF_TRANSLATE_INSN2(Gd, Udq)
 DEF_TRANSLATE_INSN2(Gd, Wd)
+DEF_TRANSLATE_INSN2(Gd, Wq)
 DEF_TRANSLATE_INSN2(Gq, Nq)
 DEF_TRANSLATE_INSN2(Gq, Udq)
 DEF_TRANSLATE_INSN2(Gq, Wd)
+DEF_TRANSLATE_INSN2(Gq, Wq)
+DEF_TRANSLATE_INSN2(Md, Gd)
 DEF_TRANSLATE_INSN2(Mdq, Vdq)
+DEF_TRANSLATE_INSN2(Mq, Gq)
 DEF_TRANSLATE_INSN2(Mq, Pq)
 DEF_TRANSLATE_INSN2(Mq, Vdq)
 DEF_TRANSLATE_INSN2(Mq, Vq)
@@ -6079,15 +6085,30 @@ DEF_TRANSLATE_INSN2(Pq, Ed)
 DEF_TRANSLATE_INSN2(Pq, Eq)
 DEF_TRANSLATE_INSN2(Pq, Nq)
 DEF_TRANSLATE_INSN2(Pq, Qq)
+DEF_TRANSLATE_INSN2(Pq, Uq)
+DEF_TRANSLATE_INSN2(Pq, Wdq)
 DEF_TRANSLATE_INSN2(Pq, Wq)
 DEF_TRANSLATE_INSN2(Qq, Pq)
+DEF_TRANSLATE_INSN2(UdqMq, Vq)
 DEF_TRANSLATE_INSN2(Vd, Ed)
 DEF_TRANSLATE_INSN2(Vd, Eq)
 DEF_TRANSLATE_INSN2(Vd, Wd)
+DEF_TRANSLATE_INSN2(Vd, Wq)
+DEF_TRANSLATE_INSN2(Vdq, Ed)
+DEF_TRANSLATE_INSN2(Vdq, Eq)
+DEF_TRANSLATE_INSN2(Vdq, Nq)
 DEF_TRANSLATE_INSN2(Vdq, Qq)
+DEF_TRANSLATE_INSN2(Vdq, Udq)
 DEF_TRANSLATE_INSN2(Vdq, Wdq)
+DEF_TRANSLATE_INSN2(Vdq, Wq)
+DEF_TRANSLATE_INSN2(Vq, Ed)
+DEF_TRANSLATE_INSN2(Vq, Eq)
+DEF_TRANSLATE_INSN2(Vq, Wd)
+DEF_TRANSLATE_INSN2(Vq, Wq)
 DEF_TRANSLATE_INSN2(Wd, Vd)
 DEF_TRANSLATE_INSN2(Wdq, Vdq)
+DEF_TRANSLATE_INSN2(Wq, Vq)
+DEF_TRANSLATE_INSN2(modrm_mod, modrm)
 
 #define DEF_TRANSLATE_INSN3(opT1, opT2, opT3)                           \
     static void translate_insn3(opT1, opT2, opT3)(                      \
@@ -6128,15 +6149,22 @@ DEF_TRANSLATE_INSN2(Wdq, Vdq)
     }
 
 DEF_TRANSLATE_INSN3(Gd, Nq, Ib)
+DEF_TRANSLATE_INSN3(Gd, Udq, Ib)
 DEF_TRANSLATE_INSN3(Gq, Nq, Ib)
+DEF_TRANSLATE_INSN3(Gq, Udq, Ib)
 DEF_TRANSLATE_INSN3(Nq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qd)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qq)
 DEF_TRANSLATE_INSN3(Pq, Qq, Ib)
+DEF_TRANSLATE_INSN3(Udq, Udq, Ib)
 DEF_TRANSLATE_INSN3(Vd, Vd, Wd)
+DEF_TRANSLATE_INSN3(Vdq, Vdq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, UdqMhq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, Wdq)
+DEF_TRANSLATE_INSN3(Vdq, Vq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vq, Wq)
+DEF_TRANSLATE_INSN3(Vdq, Wdq, Ib)
+DEF_TRANSLATE_INSN3(Vq, Vq, Wq)
 
 #define DEF_TRANSLATE_INSN4(opT1, opT2, opT3, opT4)                     \
     static void translate_insn4(opT1, opT2, opT3, opT4)(                \
@@ -6184,8 +6212,11 @@ DEF_TRANSLATE_INSN3(Vdq, Vq, Wq)
 
 DEF_TRANSLATE_INSN4(Pq, Pq, RdMw, Ib)
 DEF_TRANSLATE_INSN4(Vd, Vd, Wd, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, RdMw, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wd, modrm_mod)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wdq, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, Wq, modrm_mod)
+DEF_TRANSLATE_INSN4(Vq, Vq, Wq, Ib)
 
 #define DEF_TRANSLATE_INSN5(opT1, opT2, opT3, opT4, opT5)               \
     static void translate_insn5(opT1, opT2, opT3, opT4, opT5)(          \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 43/75] target/i386: introduce SSE2 code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (41 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 42/75] target/i386: introduce SSE2 translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 44/75] target/i386: introduce SSE2 vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (30 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by SSE2 instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 427 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 425 insertions(+), 2 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 43917edc76..3445b4cff1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5563,6 +5563,7 @@ INSNOP_LDST(xmm, Mhq)
     }
 
 GEN_INSN2(movq, Pq, Eq);        /* forward declaration */
+GEN_INSN2(movq, Vdq, Eq);       /* forward declaration */
 GEN_INSN2(movd, Pq, Ed)
 {
     const insnop_arg_t(Eq) arg2_r64 = tcg_temp_new_i64();
@@ -5574,6 +5575,17 @@ GEN_INSN2(movd, Ed, Pq)
 {
     tcg_gen_ld_i32(arg1, cpu_env, arg2 + offsetof(MMXReg, MMX_L(0)));
 }
+GEN_INSN2(movd, Vdq, Ed)
+{
+    const insnop_arg_t(Eq) arg2_r64 = tcg_temp_new_i64();
+    tcg_gen_extu_i32_i64(arg2_r64, arg2);
+    gen_insn2(movq, Vdq, Eq)(env, s, arg1, arg2_r64);
+    tcg_temp_free_i64(arg2_r64);
+}
+GEN_INSN2(movd, Ed, Vdq)
+{
+    tcg_gen_ld_i32(arg1, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(0)));
+}
 
 GEN_INSN2(movq, Pq, Eq)
 {
@@ -5583,13 +5595,45 @@ GEN_INSN2(movq, Eq, Pq)
 {
     tcg_gen_ld_i64(arg1, cpu_env, arg2 + offsetof(MMXReg, MMX_Q(0)));
 }
+GEN_INSN2(movq, Vdq, Eq)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    tcg_gen_st_i64(arg2, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    tcg_gen_movi_i64(r64, 0);
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(movq, Eq, Vdq)
+{
+    tcg_gen_ld_i64(arg1, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
+}
 
 DEF_GEN_INSN2_GVEC(movq, Pq, Qq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movq, Qq, Pq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
+GEN_INSN2(movq, Vdq, Wq)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
+    gen_insn2(movq, Vdq, Eq)(env, s, arg1, r64);
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(movq, UdqMq, Vq)
+{
+    gen_insn2(movq, Vdq, Wq)(env, s, arg1, arg2);
+}
+
 DEF_GEN_INSN2_GVEC(movaps, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movaps, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movapd, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movapd, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movdqa, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movdqa, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movups, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movups, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movupd, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movupd, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movdqu, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(movdqu, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 
 GEN_INSN4(movss, Vdq, Vdq, Wd, modrm_mod)
 {
@@ -5621,6 +5665,46 @@ GEN_INSN2(movss, Wd, Vd)
     gen_insn4(movss, Vdq, Vdq, Wd, modrm_mod)(env, s, arg1, arg1, arg2, 3);
 }
 
+GEN_INSN4(movsd, Vdq, Vdq, Wq, modrm_mod)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    tcg_gen_ld_i64(r64, cpu_env, arg3 + offsetof(ZMMReg, ZMM_Q(0)));
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    if (arg4 == 3) {
+        /* merging movsd */
+        if (arg1 != arg2) {
+            tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(1)));
+            tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+        }
+    } else {
+        /* zero-extending movsd */
+        tcg_gen_movi_i64(r64, 0);
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+    }
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(movsd, Wq, Vq)
+{
+    gen_insn4(movsd, Vdq, Vdq, Wq, modrm_mod)(env, s, arg1, arg1, arg2, 3);
+}
+
+GEN_INSN2(movq2dq, Vdq, Nq)
+{
+    const insnop_arg_t(Vdq) dofs = offsetof(ZMMReg, ZMM_Q(0));
+    const insnop_arg_t(Nq) aofs = offsetof(MMXReg, MMX_Q(0));
+    gen_op_movq(s, arg1 + dofs, arg2 + aofs);
+
+    const TCGv_i64 r64z = tcg_const_i64(0);
+    tcg_gen_st_i64(r64z, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+    tcg_temp_free_i64(r64z);
+}
+GEN_INSN2(movdq2q, Pq, Uq)
+{
+    const insnop_arg_t(Pq) dofs = offsetof(MMXReg, MMX_Q(0));
+    const insnop_arg_t(Uq) aofs = offsetof(ZMMReg, ZMM_Q(0));
+    gen_op_movq(s, arg1 + dofs, arg2 + aofs);
+}
+
 GEN_INSN3(movhlps, Vdq, Vdq, UdqMhq)
 {
     const TCGv_i64 r64 = tcg_temp_new_i64();
@@ -5637,6 +5721,21 @@ GEN_INSN2(movlps, Mq, Vq)
     insnop_ldst(xmm, Mq)(env, s, 1, arg2, arg1);
 }
 
+GEN_INSN3(movlpd, Vdq, Vdq, Mq)
+{
+    insnop_ldst(xmm, Mq)(env, s, 0, arg1, arg3);
+    if (arg1 != arg2) {
+        const TCGv_i64 r64 = tcg_temp_new_i64();
+        tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(1)));
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+        tcg_temp_free_i64(r64);
+    }
+}
+GEN_INSN2(movlpd, Mq, Vq)
+{
+    insnop_ldst(xmm, Mq)(env, s, 1, arg2, arg1);
+}
+
 GEN_INSN3(movlhps, Vdq, Vq, Wq)
 {
     const TCGv_i64 r64 = tcg_temp_new_i64();
@@ -5653,6 +5752,21 @@ GEN_INSN2(movhps, Mq, Vdq)
     insnop_ldst(xmm, Mhq)(env, s, 1, arg2, arg1);
 }
 
+GEN_INSN3(movhpd, Vdq, Vq, Mq)
+{
+    if (arg1 != arg2) {
+        const TCGv_i64 r64 = tcg_temp_new_i64();
+        tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+        tcg_temp_free_i64(r64);
+    }
+    insnop_ldst(xmm, Mhq)(env, s, 0, arg1, arg3);
+}
+GEN_INSN2(movhpd, Mq, Vdq)
+{
+    insnop_ldst(xmm, Mhq)(env, s, 1, arg2, arg1);
+}
+
 DEF_GEN_INSN2_HELPER_DEP(pmovmskb, pmovmskb_mmx, Gd, Nq)
 GEN_INSN2(pmovmskb, Gq, Nq)
 {
@@ -5661,6 +5775,14 @@ GEN_INSN2(pmovmskb, Gq, Nq)
     tcg_gen_extu_i32_i64(arg1, arg1_r32);
     tcg_temp_free_i32(arg1_r32);
 }
+DEF_GEN_INSN2_HELPER_DEP(pmovmskb, pmovmskb_xmm, Gd, Udq)
+GEN_INSN2(pmovmskb, Gq, Udq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(pmovmskb, Gd, Udq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
 
 DEF_GEN_INSN2_HELPER_DEP(movmskps, movmskps, Gd, Udq)
 GEN_INSN2(movmskps, Gq, Udq)
@@ -5671,78 +5793,154 @@ GEN_INSN2(movmskps, Gq, Udq)
     tcg_temp_free_i32(arg1_r32);
 }
 
+DEF_GEN_INSN2_HELPER_DEP(movmskpd, movmskpd, Gd, Udq)
+GEN_INSN2(movmskpd, Gq, Udq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(movmskpd, Gd, Udq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
+
 DEF_GEN_INSN3_GVEC(paddb, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(paddb, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddw, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(paddw, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddd, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(paddd, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(paddq, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(paddq, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(paddsb, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(paddsb, Vdq, Vdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddsw, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(paddsw, Vdq, Vdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddusb, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(paddusb, Vdq, Vdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddusw, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(paddusw, Vdq, Vdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(addps, addps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(addpd, addpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(addss, addss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(addsd, addsd, Vq, Vq, Wq)
 
 DEF_GEN_INSN3_GVEC(psubb, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(psubb, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubw, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(psubw, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubd, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(psubd, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(psubq, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(psubq, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(psubsb, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(psubsb, Vdq, Vdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubsw, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(psubsw, Vdq, Vdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubusb, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(psubusb, Vdq, Vdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubusw, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(psubusw, Vdq, Vdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(subps, subps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(subpd, subpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(subss, subss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(subsd, subsd, Vq, Vq, Wq)
 
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(mulps, mulps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(mulpd, mulpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(mulss, mulss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(mulsd, mulsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_xmm, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(divps, divps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(divpd, divpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(divss, divss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(divsd, divsd, Vq, Vq, Wq)
 
 DEF_GEN_INSN2_HELPER_EPP(rcpps, rcpps, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(rcpss, rcpss, Vd, Wd)
 DEF_GEN_INSN2_HELPER_EPP(sqrtps, sqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(sqrtpd, sqrtpd, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(sqrtss, sqrtss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(sqrtsd, sqrtsd, Vq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(rsqrtps, rsqrtps, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(rsqrtss, rsqrtss, Vd, Wd)
 
 DEF_GEN_INSN3_GVEC(pminub, Pq, Pq, Qq, umin, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(pminub, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminsw, Pq, Pq, Qq, smin, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(pminsw, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(minps, minps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(minpd, minpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(minss, minss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(minsd, minsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_GVEC(pmaxub, Pq, Pq, Qq, umax, MM_OPRSZ, MM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(pmaxub, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxsw, Pq, Pq, Qq, smax, MM_OPRSZ, MM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(pmaxsw, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(maxps, maxps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(maxpd, maxpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(maxss, maxss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(maxsd, maxsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpeqb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpeqw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpeqd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(pcmpgtb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(pcmpgtw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(pcmpgtd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
 
 DEF_GEN_INSN3_HELPER_EPP(cmpeqps, cmpeqps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpeqpd, cmpeqpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpeqss, cmpeqss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpeqsd, cmpeqsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpltps, cmpltps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpltpd, cmpltpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpltss, cmpltss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpltsd, cmpltsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpleps, cmpleps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmplepd, cmplepd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpless, cmpless, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmplesd, cmplesd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpunordps, cmpunordps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpunordpd, cmpunordpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpunordss, cmpunordss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpunordsd, cmpunordsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpneqps, cmpneqps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpneqpd, cmpneqpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpneqss, cmpneqss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpneqsd, cmpneqsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnltps, cmpnltps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpnltpd, cmpnltpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnltss, cmpnltss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpnltsd, cmpnltsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnleps, cmpnleps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpnlepd, cmpnlepd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnless, cmpnless, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpnlesd, cmpnlesd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpordps, cmpordps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(cmpordpd, cmpordpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpordss, cmpordss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(cmpordsd, cmpordsd, Vq, Vq, Wq)
 
 GEN_INSN4(cmpps, Vdq, Vdq, Wdq, Ib)
 {
@@ -5776,6 +5974,38 @@ GEN_INSN4(cmpps, Vdq, Vdq, Wdq, Ib)
     g_assert_not_reached();
 }
 
+GEN_INSN4(cmppd, Vdq, Vdq, Wdq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(cmpeqpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(cmpltpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(cmplepd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(cmpunordpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(cmpneqpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(cmpnltpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(cmpnlepd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(cmpordpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
 GEN_INSN4(cmpss, Vd, Vd, Wd, Ib)
 {
     switch (arg4 & 7) {
@@ -5808,26 +6038,78 @@ GEN_INSN4(cmpss, Vd, Vd, Wd, Ib)
     g_assert_not_reached();
 }
 
+GEN_INSN4(cmpsd, Vq, Vq, Wq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(cmpeqsd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(cmpltsd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(cmplesd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(cmpunordsd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(cmpneqsd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(cmpnltsd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(cmpnlesd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(cmpordsd, Vq, Vq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
 DEF_GEN_INSN2_HELPER_EPP(comiss, comiss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(comisd, comisd, Vq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(ucomiss, ucomiss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(ucomisd, ucomisd, Vq, Wq)
 
 DEF_GEN_INSN3_GVEC(pand, Pq, Pq, Qq, and, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(pand, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andps, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(andpd, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pandn, Pq, Pq, Qq, andn, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(pandn, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andnps, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(andnpd, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(por, Pq, Pq, Qq, or, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(por, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(orps, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(orpd, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pxor, Pq, Pq, Qq, xor, MM_OPRSZ, MM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(pxor, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(xorps, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(xorpd, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 
 DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pslldq, pslldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(psrldq, psrldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_xmm, Vdq, Vdq, Wdq)
 
 #define DEF_GEN_PSHIFT_IMM_MM(mnem, opT1, opT2)                         \
     GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
@@ -5841,31 +6123,70 @@ DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_mmx, Pq, Pq, Qq)
         gen_insn2(movq, Pq, Eq)(env, s, arg3_mm, arg3_r64);             \
         gen_insn3(mnem, Pq, Pq, Qq)(env, s, arg1, arg2, arg3_mm);       \
     }
+#define DEF_GEN_PSHIFT_IMM_XMM(mnem, opT1, opT2)                        \
+    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
+    {                                                                   \
+        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
+        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
+        const insnop_arg_t(Wdq) arg3_xmm =                              \
+            offsetof(CPUX86State, xmm_t0.ZMM_Q(0));                     \
+                                                                        \
+        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
+        gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
+        gen_insn3(mnem, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3_xmm);   \
+    }
 
 DEF_GEN_PSHIFT_IMM_MM(psllw, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psllw, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(pslld, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(pslld, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psllq, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psllq, Udq, Udq)
+DEF_GEN_PSHIFT_IMM_XMM(pslldq, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrlw, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psrlw, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrld, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psrld, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrlq, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psrlq, Udq, Udq)
+DEF_GEN_PSHIFT_IMM_XMM(psrldq, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psraw, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psraw, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrad, Nq, Nq)
+DEF_GEN_PSHIFT_IMM_XMM(psrad, Udq, Udq)
 
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_mmx, Pq, Pq, Qd)
+DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_mmx, Pq, Pq, Qd)
+DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_mmx, Pq, Pq, Qd)
+DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(punpcklqdq, punpcklqdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(punpckhqdq, punpckhqdq_xmm, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(unpcklps, punpckldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(unpcklpd, punpcklqdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(unpckhps, punpckhdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(unpckhpd, punpckhqdq_xmm, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_PPI(pshufw, pshufw_mmx, Pq, Qq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(pshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(pshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(pshufd, pshufd_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(shufps, shufps, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_PPI(shufpd, shufpd, Vdq, Vdq, Wdq, Ib)
 
 GEN_INSN4(pinsrw, Pq, Pq, RdMw, Ib)
 {
@@ -5874,6 +6195,13 @@ GEN_INSN4(pinsrw, Pq, Pq, RdMw, Ib)
     const size_t ofs = offsetof(MMXReg, MMX_W(arg4 & 3));
     tcg_gen_st16_i32(arg3, cpu_env, arg1 + ofs);
 }
+GEN_INSN4(pinsrw, Vdq, Vdq, RdMw, Ib)
+{
+    assert(arg1 == arg2);
+
+    const size_t ofs = offsetof(ZMMReg, ZMM_W(arg4 & 7));
+    tcg_gen_st16_i32(arg3, cpu_env, arg1 + ofs);
+}
 
 GEN_INSN3(pextrw, Gd, Nq, Ib)
 {
@@ -5885,16 +6213,46 @@ GEN_INSN3(pextrw, Gq, Nq, Ib)
     const size_t ofs = offsetof(MMXReg, MMX_W(arg3 & 3));
     tcg_gen_ld16u_i64(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(pextrw, Gd, Udq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_W(arg3 & 7));
+    tcg_gen_ld16u_i32(arg1, cpu_env, arg2 + ofs);
+}
+GEN_INSN3(pextrw, Gq, Udq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_W(arg3 & 7));
+    tcg_gen_ld16u_i64(arg1, cpu_env, arg2 + ofs);
+}
 
 DEF_GEN_INSN2_HELPER_EPP(cvtpi2ps, cvtpi2ps, Vdq, Qq)
 DEF_GEN_INSN2_HELPER_EPD(cvtsi2ss, cvtsi2ss, Vd, Ed)
 DEF_GEN_INSN2_HELPER_EPQ(cvtsi2ss, cvtsq2ss, Vd, Eq)
+DEF_GEN_INSN2_HELPER_EPP(cvtpi2pd, cvtpi2pd, Vdq, Qq)
+DEF_GEN_INSN2_HELPER_EPD(cvtsi2sd, cvtsi2sd, Vq, Ed)
+DEF_GEN_INSN2_HELPER_EPQ(cvtsi2sd, cvtsq2sd, Vq, Eq)
 DEF_GEN_INSN2_HELPER_EPP(cvtps2pi, cvtps2pi, Pq, Wq)
 DEF_GEN_INSN2_HELPER_DEP(cvtss2si, cvtss2si, Gd, Wd)
 DEF_GEN_INSN2_HELPER_QEP(cvtss2si, cvtss2sq, Gq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(cvtpd2pi, cvtpd2pi, Pq, Wdq)
+DEF_GEN_INSN2_HELPER_DEP(cvtsd2si, cvtsd2si, Gd, Wq)
+DEF_GEN_INSN2_HELPER_QEP(cvtsd2si, cvtsd2sq, Gq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(cvttps2pi, cvttps2pi, Pq, Wq)
 DEF_GEN_INSN2_HELPER_DEP(cvttss2si, cvttss2si, Gd, Wd)
 DEF_GEN_INSN2_HELPER_QEP(cvttss2si, cvttss2sq, Gq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(cvttpd2pi, cvttpd2pi, Pq, Wdq)
+DEF_GEN_INSN2_HELPER_DEP(cvttsd2si, cvttsd2si, Gd, Wq)
+DEF_GEN_INSN2_HELPER_QEP(cvttsd2si, cvttsd2sq, Gq, Wq)
+
+DEF_GEN_INSN2_HELPER_EPP(cvtpd2dq, cvtpd2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(cvttpd2dq, cvttpd2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(cvtdq2pd, cvtdq2pd, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(cvtps2pd, cvtps2pd, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(cvtpd2ps, cvtpd2ps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(cvtss2sd, cvtss2sd, Vq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(cvtsd2ss, cvtsd2ss, Vd, Wq)
+DEF_GEN_INSN2_HELPER_EPP(cvtdq2ps, cvtdq2ps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(cvtps2dq, cvtps2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(cvttps2dq, cvttps2dq, Vdq, Wdq)
 
 GEN_INSN2(maskmovq, Pq, Nq)
 {
@@ -5912,25 +6270,90 @@ GEN_INSN2(maskmovq, Pq, Nq)
     tcg_temp_free_ptr(arg1_ptr);
     tcg_temp_free_ptr(arg2_ptr);
 }
+GEN_INSN2(maskmovdqu, Vdq, Udq)
+{
+    const TCGv_ptr arg1_ptr = tcg_temp_new_ptr();
+    const TCGv_ptr arg2_ptr = tcg_temp_new_ptr();
+
+    tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]);
+    gen_extu(s->aflag, s->A0);
+    gen_add_A0_ds_seg(s);
+
+    tcg_gen_addi_ptr(arg1_ptr, cpu_env, arg1);
+    tcg_gen_addi_ptr(arg2_ptr, cpu_env, arg2);
+    gen_helper_maskmov_xmm(cpu_env, arg1_ptr, arg2_ptr, s->A0);
+
+    tcg_temp_free_ptr(arg1_ptr);
+    tcg_temp_free_ptr(arg2_ptr);
+}
 
 GEN_INSN2(movntps, Mdq, Vdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(movntpd, Mdq, Vdq)
+{
+    insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
+}
+
+GEN_INSN2(movnti, Md, Gd)
+{
+    insnop_ldst(tcg_i32, Md)(env, s, 1, arg2, arg1);
+}
+GEN_INSN2(movnti, Mq, Gq)
+{
+    insnop_ldst(tcg_i64, Mq)(env, s, 1, arg2, arg1);
+}
 
 GEN_INSN2(movntq, Mq, Pq)
 {
     insnop_ldst(mm, Mq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(movntdq, Mdq, Vdq)
+{
+    insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
+}
+
+GEN_INSN0(pause)
+{
+    /* handled in disas_insn at the moment */
+    g_assert_not_reached();
+}
 
 DEF_GEN_INSN0_HELPER(emms, emms)
 
-GEN_INSN0(sfence)
+GEN_INSN2(sfence_clflush, modrm_mod, modrm)
+{
+    if (arg1 == 3) {
+        /* sfence */
+        if (s->prefix & PREFIX_LOCK) {
+            gen_illegal_opcode(s);
+        } else {
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
+        }
+    } else {
+        /* clflush */
+        if (!check_cpuid(env, s, CHECK_CPUID_CLFLUSH)) {
+            gen_illegal_opcode(s);
+        } else {
+            gen_nop_modrm(env, s, arg2);
+        }
+    }
+}
+GEN_INSN0(lfence)
+{
+    if (s->prefix & PREFIX_LOCK) {
+        gen_illegal_opcode(s);
+    } else {
+        tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
+    }
+}
+GEN_INSN0(mfence)
 {
     if (s->prefix & PREFIX_LOCK) {
         gen_illegal_opcode(s);
     } else {
-        tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
+        tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
     }
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 44/75] target/i386: introduce SSE2 vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (42 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 43/75] target/i386: introduce SSE2 code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 45/75] target/i386: introduce SSE3 translators Jan Bobek
                   ` (29 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the SSE2 vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 326 ++++++++++++++++++++++++++++++++++-
 1 file changed, 325 insertions(+), 1 deletion(-)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index 92581703cd..6df5fda010 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -183,127 +183,434 @@
  * 0F 18 /2             PREFETCHT1 m8
  * 0F 18 /3             PREFETCHT2 m8
  * 0F 18 /0             PREFETCHNTA m8
+ *
+ * SSE2 Instructions
+ * ------------------
+ * 66 0F 6E /r          MOVD xmm,r/m32
+ * 66 0F 7E /r          MOVD r/m32,xmm
+ * 66 REX.W 0F 6E /r    MOVQ xmm,r/m64
+ * 66 REX.W 0F 7E /r    MOVQ r/m64,xmm
+ * F3 0F 7E /r          MOVQ xmm1, xmm2/m64
+ * 66 0F D6 /r          MOVQ xmm2/m64, xmm1
+ * 66 0F 28 /r          MOVAPD xmm1, xmm2/m128
+ * 66 0F 29 /r          MOVAPD xmm2/m128, xmm1
+ * 66 0F 6F /r          MOVDQA xmm1, xmm2/m128
+ * 66 0F 7F /r          MOVDQA xmm2/m128, xmm1
+ * 66 0F 10 /r          MOVUPD xmm1, xmm2/m128
+ * 66 0F 11 /r          MOVUPD xmm2/m128, xmm1
+ * F3 0F 6F /r          MOVDQU xmm1,xmm2/m128
+ * F3 0F 7F /r          MOVDQU xmm2/m128,xmm1
+ * F2 0F 10 /r          MOVSD xmm1, xmm2/m64
+ * F2 0F 11 /r          MOVSD xmm1/m64, xmm2
+ * F3 0F D6 /r          MOVQ2DQ xmm, mm
+ * F2 0F D6 /r          MOVDQ2Q mm, xmm
+ * 66 0F 12 /r          MOVLPD xmm1,m64
+ * 66 0F 13 /r          MOVLPD m64,xmm1
+ * 66 0F 16 /r          MOVHPD xmm1, m64
+ * 66 0F 17 /r          MOVHPD m64, xmm1
+ * 66 0F D7 /r          PMOVMSKB r32, xmm
+ * 66 REX.W 0F D7 /r    PMOVMSKB r64, xmm
+ * 66 0F 50 /r          MOVMSKPD r32, xmm
+ * 66 REX.W 0F 50 /r    MOVMSKPD r64, xmm
+ * 66 0F FC /r          PADDB xmm1, xmm2/m128
+ * 66 0F FD /r          PADDW xmm1, xmm2/m128
+ * 66 0F FE /r          PADDD xmm1, xmm2/m128
+ * NP 0F D4 /r          PADDQ mm, mm/m64
+ * 66 0F D4 /r          PADDQ xmm1, xmm2/m128
+ * 66 0F EC /r          PADDSB xmm1, xmm2/m128
+ * 66 0F ED /r          PADDSW xmm1, xmm2/m128
+ * 66 0F DC /r          PADDUSB xmm1,xmm2/m128
+ * 66 0F DD /r          PADDUSW xmm1,xmm2/m128
+ * 66 0F 58 /r          ADDPD xmm1, xmm2/m128
+ * F2 0F 58 /r          ADDSD xmm1, xmm2/m64
+ * 66 0F F8 /r          PSUBB xmm1, xmm2/m128
+ * 66 0F F9 /r          PSUBW xmm1, xmm2/m128
+ * 66 0F FA /r          PSUBD xmm1, xmm2/m128
+ * NP 0F FB /r          PSUBQ mm1, mm2/m64
+ * 66 0F FB /r          PSUBQ xmm1, xmm2/m128
+ * 66 0F E8 /r          PSUBSB xmm1, xmm2/m128
+ * 66 0F E9 /r          PSUBSW xmm1, xmm2/m128
+ * 66 0F D8 /r          PSUBUSB xmm1, xmm2/m128
+ * 66 0F D9 /r          PSUBUSW xmm1, xmm2/m128
+ * 66 0F 5C /r          SUBPD xmm1, xmm2/m128
+ * F2 0F 5C /r          SUBSD xmm1, xmm2/m64
+ * 66 0F D5 /r          PMULLW xmm1, xmm2/m128
+ * 66 0F E5 /r          PMULHW xmm1, xmm2/m128
+ * 66 0F E4 /r          PMULHUW xmm1, xmm2/m128
+ * NP 0F F4 /r          PMULUDQ mm1, mm2/m64
+ * 66 0F F4 /r          PMULUDQ xmm1, xmm2/m128
+ * 66 0F 59 /r          MULPD xmm1, xmm2/m128
+ * F2 0F 59 /r          MULSD xmm1,xmm2/m64
+ * 66 0F F5 /r          PMADDWD xmm1, xmm2/m128
+ * 66 0F 5E /r          DIVPD xmm1, xmm2/m128
+ * F2 0F 5E /r          DIVSD xmm1, xmm2/m64
+ * 66 0F 51 /r          SQRTPD xmm1, xmm2/m128
+ * F2 0F 51 /r          SQRTSD xmm1,xmm2/m64
+ * 66 0F DA /r          PMINUB xmm1, xmm2/m128
+ * 66 0F EA /r          PMINSW xmm1, xmm2/m128
+ * 66 0F 5D /r          MINPD xmm1, xmm2/m128
+ * F2 0F 5D /r          MINSD xmm1, xmm2/m64
+ * 66 0F DE /r          PMAXUB xmm1, xmm2/m128
+ * 66 0F EE /r          PMAXSW xmm1, xmm2/m128
+ * 66 0F 5F /r          MAXPD xmm1, xmm2/m128
+ * F2 0F 5F /r          MAXSD xmm1, xmm2/m64
+ * 66 0F E0 /r          PAVGB xmm1, xmm2/m128
+ * 66 0F E3 /r          PAVGW xmm1, xmm2/m128
+ * 66 0F F6 /r          PSADBW xmm1, xmm2/m128
+ * 66 0F 74 /r          PCMPEQB xmm1,xmm2/m128
+ * 66 0F 75 /r          PCMPEQW xmm1,xmm2/m128
+ * 66 0F 76 /r          PCMPEQD xmm1,xmm2/m128
+ * 66 0F 64 /r          PCMPGTB xmm1,xmm2/m128
+ * 66 0F 65 /r          PCMPGTW xmm1,xmm2/m128
+ * 66 0F 66 /r          PCMPGTD xmm1,xmm2/m128
+ * 66 0F C2 /r ib       CMPPD xmm1, xmm2/m128, imm8
+ * F2 0F C2 /r ib       CMPSD xmm1, xmm2/m64, imm8
+ * 66 0F 2E /r          UCOMISD xmm1, xmm2/m64
+ * 66 0F 2F /r          COMISD xmm1, xmm2/m64
+ * 66 0F DB /r          PAND xmm1, xmm2/m128
+ * 66 0F 54 /r          ANDPD xmm1, xmm2/m128
+ * 66 0F DF /r          PANDN xmm1, xmm2/m128
+ * 66 0F 55 /r          ANDNPD xmm1, xmm2/m128
+ * 66 0F EB /r          POR xmm1, xmm2/m128
+ * 66 0F 56 /r          ORPD xmm1, xmm2/m128
+ * 66 0F EF /r          PXOR xmm1, xmm2/m128
+ * 66 0F 57 /r          XORPD xmm1, xmm2/m128
+ * 66 0F F1 /r          PSLLW xmm1, xmm2/m128
+ * 66 0F F2 /r          PSLLD xmm1, xmm2/m128
+ * 66 0F F3 /r          PSLLQ xmm1, xmm2/m128
+ * 66 0F D1 /r          PSRLW xmm1, xmm2/m128
+ * 66 0F D2 /r          PSRLD xmm1, xmm2/m128
+ * 66 0F D3 /r          PSRLQ xmm1, xmm2/m128
+ * 66 0F E1 /r          PSRAW xmm1,xmm2/m128
+ * 66 0F E2 /r          PSRAD xmm1,xmm2/m128
+ * 66 0F 63 /r          PACKSSWB xmm1, xmm2/m128
+ * 66 0F 6B /r          PACKSSDW xmm1, xmm2/m128
+ * 66 0F 67 /r          PACKUSWB xmm1, xmm2/m128
+ * 66 0F 68 /r          PUNPCKHBW xmm1, xmm2/m128
+ * 66 0F 69 /r          PUNPCKHWD xmm1, xmm2/m128
+ * 66 0F 6A /r          PUNPCKHDQ xmm1, xmm2/m128
+ * 66 0F 6D /r          PUNPCKHQDQ xmm1, xmm2/m128
+ * 66 0F 60 /r          PUNPCKLBW xmm1, xmm2/m128
+ * 66 0F 61 /r          PUNPCKLWD xmm1, xmm2/m128
+ * 66 0F 62 /r          PUNPCKLDQ xmm1, xmm2/m128
+ * 66 0F 6C /r          PUNPCKLQDQ xmm1, xmm2/m128
+ * 66 0F 14 /r          UNPCKLPD xmm1, xmm2/m128
+ * 66 0F 15 /r          UNPCKHPD xmm1, xmm2/m128
+ * F2 0F 70 /r ib       PSHUFLW xmm1, xmm2/m128, imm8
+ * F3 0F 70 /r ib       PSHUFHW xmm1, xmm2/m128, imm8
+ * 66 0F 70 /r ib       PSHUFD xmm1, xmm2/m128, imm8
+ * 66 0F C6 /r ib       SHUFPD xmm1, xmm2/m128, imm8
+ * 66 0F C4 /r ib       PINSRW xmm, r32/m16, imm8
+ * 66 0F C5 /r ib       PEXTRW r32, xmm, imm8
+ * 66 REX.W 0F C5 /r ib PEXTRW r64, xmm, imm8
+ * 66 0F 2A /r          CVTPI2PD xmm, mm/m64
+ * F2 0F 2A /r          CVTSI2SD xmm1,r32/m32
+ * F2 REX.W 0F 2A /r    CVTSI2SD xmm1,r/m64
+ * 66 0F 2D /r          CVTPD2PI mm, xmm/m128
+ * F2 0F 2D /r          CVTSD2SI r32,xmm1/m64
+ * F2 REX.W 0F 2D /r    CVTSD2SI r64,xmm1/m64
+ * 66 0F 2C /r          CVTTPD2PI mm, xmm/m128
+ * F2 0F 2C /r          CVTTSD2SI r32,xmm1/m64
+ * F2 REX.W 0F 2C /r    CVTTSD2SI r64,xmm1/m64
+ * F2 0F E6 /r          CVTPD2DQ xmm1, xmm2/m128
+ * 66 0F E6 /r          CVTTPD2DQ xmm1, xmm2/m128
+ * F3 0F E6 /r          CVTDQ2PD xmm1, xmm2/m64
+ * NP 0F 5A /r          CVTPS2PD xmm1, xmm2/m64
+ * 66 0F 5A /r          CVTPD2PS xmm1, xmm2/m128
+ * F3 0F 5A /r          CVTSS2SD xmm1, xmm2/m32
+ * F2 0F 5A /r          CVTSD2SS xmm1, xmm2/m64
+ * NP 0F 5B /r          CVTDQ2PS xmm1, xmm2/m128
+ * 66 0F 5B /r          CVTPS2DQ xmm1, xmm2/m128
+ * F3 0F 5B /r          CVTTPS2DQ xmm1, xmm2/m128
+ * 66 0F F7 /r          MASKMOVDQU xmm1, xmm2
+ * 66 0F 2B /r          MOVNTPD m128, xmm1
+ * NP 0F C3 /r          MOVNTI m32, r32
+ * NP REX.W + 0F C3 /r  MOVNTI m64, r64
+ * 66 0F E7 /r          MOVNTDQ m128, xmm1
+ * F3 90                PAUSE
+ * 66 0F 71 /6 ib       PSLLW xmm1, imm8
+ * 66 0F 71 /2 ib       PSRLW xmm1, imm8
+ * 66 0F 71 /4 ib       PSRAW xmm1,imm8
+ * 66 0F 72 /6 ib       PSLLD xmm1, imm8
+ * 66 0F 72 /2 ib       PSRLD xmm1, imm8
+ * 66 0F 72 /4 ib       PSRAD xmm1,imm8
+ * 66 0F 73 /6 ib       PSLLQ xmm1, imm8
+ * 66 0F 73 /7 ib       PSLLDQ xmm1, imm8
+ * 66 0F 73 /2 ib       PSRLQ xmm1, imm8
+ * 66 0F 73 /3 ib       PSRLDQ xmm1, imm8
+ * NP 0F AE /7          CLFLUSH m8
+ * NP 0F AE /5          LFENCE
+ * NP 0F AE /6          MFENCE
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
 OPCODE(movd, LEG(NP, 0F, 0, 0x7e), MMX, WR, Ed, Pq)
+OPCODE(movd, LEG(66, 0F, 0, 0x6e), SSE2, WR, Vdq, Ed)
+OPCODE(movd, LEG(66, 0F, 0, 0x7e), SSE2, WR, Ed, Vdq)
 OPCODE(movq, LEG(NP, 0F, 1, 0x6e), MMX, WR, Pq, Eq)
 OPCODE(movq, LEG(NP, 0F, 1, 0x7e), MMX, WR, Eq, Pq)
+OPCODE(movq, LEG(66, 0F, 1, 0x6e), SSE2, WR, Vdq, Eq)
+OPCODE(movq, LEG(66, 0F, 1, 0x7e), SSE2, WR, Eq, Vdq)
 OPCODE(movq, LEG(NP, 0F, 0, 0x6f), MMX, WR, Pq, Qq)
 OPCODE(movq, LEG(NP, 0F, 0, 0x7f), MMX, WR, Qq, Pq)
+OPCODE(movq, LEG(F3, 0F, 0, 0x7e), SSE2, WR, Vdq, Wq)
+OPCODE(movq, LEG(66, 0F, 0, 0xd6), SSE2, WR, UdqMq, Vq)
 OPCODE(movaps, LEG(NP, 0F, 0, 0x28), SSE, WR, Vdq, Wdq)
 OPCODE(movaps, LEG(NP, 0F, 0, 0x29), SSE, WR, Wdq, Vdq)
+OPCODE(movapd, LEG(66, 0F, 0, 0x28), SSE2, WR, Vdq, Wdq)
+OPCODE(movapd, LEG(66, 0F, 0, 0x29), SSE2, WR, Wdq, Vdq)
+OPCODE(movdqa, LEG(66, 0F, 0, 0x6f), SSE2, WR, Vdq, Wdq)
+OPCODE(movdqa, LEG(66, 0F, 0, 0x7f), SSE2, WR, Wdq, Vdq)
 OPCODE(movups, LEG(NP, 0F, 0, 0x10), SSE, WR, Vdq, Wdq)
 OPCODE(movups, LEG(NP, 0F, 0, 0x11), SSE, WR, Wdq, Vdq)
+OPCODE(movupd, LEG(66, 0F, 0, 0x10), SSE2, WR, Vdq, Wdq)
+OPCODE(movupd, LEG(66, 0F, 0, 0x11), SSE2, WR, Wdq, Vdq)
+OPCODE(movdqu, LEG(F3, 0F, 0, 0x6f), SSE2, WR, Vdq, Wdq)
+OPCODE(movdqu, LEG(F3, 0F, 0, 0x7f), SSE2, WR, Wdq, Vdq)
 OPCODE(movss, LEG(F3, 0F, 0, 0x10), SSE, WRRR, Vdq, Vdq, Wd, modrm_mod)
 OPCODE(movss, LEG(F3, 0F, 0, 0x11), SSE, WR, Wd, Vd)
+OPCODE(movsd, LEG(F2, 0F, 0, 0x10), SSE2, WRRR, Vdq, Vdq, Wq, modrm_mod)
+OPCODE(movsd, LEG(F2, 0F, 0, 0x11), SSE2, WR, Wq, Vq)
+OPCODE(movq2dq, LEG(F3, 0F, 0, 0xd6), SSE2, WR, Vdq, Nq)
+OPCODE(movdq2q, LEG(F2, 0F, 0, 0xd6), SSE2, WR, Pq, Uq)
 OPCODE(movhlps, LEG(NP, 0F, 0, 0x12), SSE, WRR, Vdq, Vdq, UdqMhq)
 OPCODE(movlps, LEG(NP, 0F, 0, 0x13), SSE, WR, Mq, Vq)
+OPCODE(movlpd, LEG(66, 0F, 0, 0x12), SSE2, WRR, Vdq, Vdq, Mq)
+OPCODE(movlpd, LEG(66, 0F, 0, 0x13), SSE2, WR, Mq, Vq)
 OPCODE(movlhps, LEG(NP, 0F, 0, 0x16), SSE, WRR, Vdq, Vq, Wq)
 OPCODE(movhps, LEG(NP, 0F, 0, 0x17), SSE, WR, Mq, Vdq)
+OPCODE(movhpd, LEG(66, 0F, 0, 0x16), SSE2, WRR, Vdq, Vq, Mq)
+OPCODE(movhpd, LEG(66, 0F, 0, 0x17), SSE2, WR, Mq, Vdq)
 OPCODE(pmovmskb, LEG(NP, 0F, 0, 0xd7), SSE, WR, Gd, Nq)
 OPCODE(pmovmskb, LEG(NP, 0F, 1, 0xd7), SSE, WR, Gq, Nq)
+OPCODE(pmovmskb, LEG(66, 0F, 0, 0xd7), SSE2, WR, Gd, Udq)
+OPCODE(pmovmskb, LEG(66, 0F, 1, 0xd7), SSE2, WR, Gq, Udq)
 OPCODE(movmskps, LEG(NP, 0F, 0, 0x50), SSE, WR, Gd, Udq)
 OPCODE(movmskps, LEG(NP, 0F, 1, 0x50), SSE, WR, Gq, Udq)
+OPCODE(movmskpd, LEG(66, 0F, 0, 0x50), SSE2, WR, Gd, Udq)
+OPCODE(movmskpd, LEG(66, 0F, 1, 0x50), SSE2, WR, Gq, Udq)
 OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddb, LEG(66, 0F, 0, 0xfc), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddw, LEG(66, 0F, 0, 0xfd), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddd, LEG(NP, 0F, 0, 0xfe), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddd, LEG(66, 0F, 0, 0xfe), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(paddq, LEG(NP, 0F, 0, 0xd4), SSE2, WRR, Pq, Pq, Qq)
+OPCODE(paddq, LEG(66, 0F, 0, 0xd4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddsb, LEG(NP, 0F, 0, 0xec), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddsb, LEG(66, 0F, 0, 0xec), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddsw, LEG(NP, 0F, 0, 0xed), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddsw, LEG(66, 0F, 0, 0xed), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddusb, LEG(NP, 0F, 0, 0xdc), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddusb, LEG(66, 0F, 0, 0xdc), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddusw, LEG(NP, 0F, 0, 0xdd), MMX, WRR, Pq, Pq, Qq)
+OPCODE(paddusw, LEG(66, 0F, 0, 0xdd), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(addpd, LEG(66, 0F, 0, 0x58), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(addss, LEG(F3, 0F, 0, 0x58), SSE, WRR, Vd, Vd, Wd)
+OPCODE(addsd, LEG(F2, 0F, 0, 0x58), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubb, LEG(66, 0F, 0, 0xf8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubw, LEG(66, 0F, 0, 0xf9), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubd, LEG(NP, 0F, 0, 0xfa), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubd, LEG(66, 0F, 0, 0xfa), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(psubq, LEG(NP, 0F, 0, 0xfb), SSE2, WRR, Pq, Pq, Qq)
+OPCODE(psubq, LEG(66, 0F, 0, 0xfb), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubsb, LEG(NP, 0F, 0, 0xe8), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubsb, LEG(66, 0F, 0, 0xe8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubsw, LEG(NP, 0F, 0, 0xe9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubsw, LEG(66, 0F, 0, 0xe9), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubusb, LEG(NP, 0F, 0, 0xd8), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubusb, LEG(66, 0F, 0, 0xd8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubusw, LEG(NP, 0F, 0, 0xd9), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psubusw, LEG(66, 0F, 0, 0xd9), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(subpd, LEG(66, 0F, 0, 0x5c), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(subss, LEG(F3, 0F, 0, 0x5c), SSE, WRR, Vd, Vd, Wd)
+OPCODE(subsd, LEG(F2, 0F, 0, 0x5c), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmullw, LEG(66, 0F, 0, 0xd5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmulhw, LEG(66, 0F, 0, 0xe5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pmulhuw, LEG(66, 0F, 0, 0xe4), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmuludq, LEG(NP, 0F, 0, 0xf4), SSE2, WRR, Pq, Pq, Qq)
+OPCODE(pmuludq, LEG(66, 0F, 0, 0xf4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(mulps, LEG(NP, 0F, 0, 0x59), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(mulpd, LEG(66, 0F, 0, 0x59), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(mulss, LEG(F3, 0F, 0, 0x59), SSE, WRR, Vd, Vd, Wd)
+OPCODE(mulsd, LEG(F2, 0F, 0, 0x59), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pmaddwd, LEG(66, 0F, 0, 0xf5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(divps, LEG(NP, 0F, 0, 0x5e), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(divpd, LEG(66, 0F, 0, 0x5e), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(divss, LEG(F3, 0F, 0, 0x5e), SSE, WRR, Vd, Vd, Wd)
+OPCODE(divsd, LEG(F2, 0F, 0, 0x5e), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(rcpps, LEG(NP, 0F, 0, 0x53), SSE, WR, Vdq, Wdq)
 OPCODE(rcpss, LEG(F3, 0F, 0, 0x53), SSE, WR, Vd, Wd)
 OPCODE(sqrtps, LEG(NP, 0F, 0, 0x51), SSE, WR, Vdq, Wdq)
+OPCODE(sqrtpd, LEG(66, 0F, 0, 0x51), SSE2, WR, Vdq, Wdq)
 OPCODE(sqrtss, LEG(F3, 0F, 0, 0x51), SSE, WR, Vd, Wd)
+OPCODE(sqrtsd, LEG(F2, 0F, 0, 0x51), SSE2, WR, Vq, Wq)
 OPCODE(rsqrtps, LEG(NP, 0F, 0, 0x52), SSE, WR, Vdq, Wdq)
 OPCODE(rsqrtss, LEG(F3, 0F, 0, 0x52), SSE, WR, Vd, Wd)
 OPCODE(pminub, LEG(NP, 0F, 0, 0xda), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pminub, LEG(66, 0F, 0, 0xda), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pminsw, LEG(NP, 0F, 0, 0xea), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pminsw, LEG(66, 0F, 0, 0xea), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(minps, LEG(NP, 0F, 0, 0x5d), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(minpd, LEG(66, 0F, 0, 0x5d), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(minss, LEG(F3, 0F, 0, 0x5d), SSE, WRR, Vd, Vd, Wd)
+OPCODE(minsd, LEG(F2, 0F, 0, 0x5d), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(pmaxub, LEG(NP, 0F, 0, 0xde), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pmaxub, LEG(66, 0F, 0, 0xde), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmaxsw, LEG(NP, 0F, 0, 0xee), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pmaxsw, LEG(66, 0F, 0, 0xee), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(maxps, LEG(NP, 0F, 0, 0x5f), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(maxpd, LEG(66, 0F, 0, 0x5f), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(maxss, LEG(F3, 0F, 0, 0x5f), SSE, WRR, Vd, Vd, Wd)
+OPCODE(maxsd, LEG(F2, 0F, 0, 0x5f), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(pavgb, LEG(NP, 0F, 0, 0xe0), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pavgb, LEG(66, 0F, 0, 0xe0), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
+OPCODE(pavgw, LEG(66, 0F, 0, 0xe3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
+OPCODE(psadbw, LEG(66, 0F, 0, 0xf6), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpeqw, LEG(66, 0F, 0, 0x75), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpeqd, LEG(66, 0F, 0, 0x76), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpgtb, LEG(66, 0F, 0, 0x64), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpgtw, LEG(66, 0F, 0, 0x65), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pcmpgtd, LEG(66, 0F, 0, 0x66), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(cmpps, LEG(NP, 0F, 0, 0xc2), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(cmppd, LEG(66, 0F, 0, 0xc2), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(cmpss, LEG(F3, 0F, 0, 0xc2), SSE, WRRR, Vd, Vd, Wd, Ib)
+OPCODE(cmpsd, LEG(F2, 0F, 0, 0xc2), SSE2, WRRR, Vq, Vq, Wq, Ib)
 OPCODE(ucomiss, LEG(NP, 0F, 0, 0x2e), SSE, RR, Vd, Wd)
+OPCODE(ucomisd, LEG(66, 0F, 0, 0x2e), SSE2, RR, Vq, Wq)
 OPCODE(comiss, LEG(NP, 0F, 0, 0x2f), SSE, RR, Vd, Wd)
+OPCODE(comisd, LEG(66, 0F, 0, 0x2f), SSE2, RR, Vq, Wq)
 OPCODE(pand, LEG(NP, 0F, 0, 0xdb), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pand, LEG(66, 0F, 0, 0xdb), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(andps, LEG(NP, 0F, 0, 0x54), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(andpd, LEG(66, 0F, 0, 0x54), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pandn, LEG(NP, 0F, 0, 0xdf), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pandn, LEG(66, 0F, 0, 0xdf), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(andnps, LEG(NP, 0F, 0, 0x55), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(andnpd, LEG(66, 0F, 0, 0x55), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(por, LEG(NP, 0F, 0, 0xeb), MMX, WRR, Pq, Pq, Qq)
+OPCODE(por, LEG(66, 0F, 0, 0xeb), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(orps, LEG(NP, 0F, 0, 0x56), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(orpd, LEG(66, 0F, 0, 0x56), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pxor, LEG(NP, 0F, 0, 0xef), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pxor, LEG(66, 0F, 0, 0xef), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(xorps, LEG(NP, 0F, 0, 0x57), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(xorpd, LEG(66, 0F, 0, 0x57), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psllw, LEG(NP, 0F, 0, 0xf1), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psllw, LEG(66, 0F, 0, 0xf1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pslld, LEG(NP, 0F, 0, 0xf2), MMX, WRR, Pq, Pq, Qq)
+OPCODE(pslld, LEG(66, 0F, 0, 0xf2), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psllq, LEG(NP, 0F, 0, 0xf3), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psllq, LEG(66, 0F, 0, 0xf3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psrlw, LEG(NP, 0F, 0, 0xd1), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrlw, LEG(66, 0F, 0, 0xd1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psrld, LEG(NP, 0F, 0, 0xd2), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrld, LEG(66, 0F, 0, 0xd2), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psrlq, LEG(NP, 0F, 0, 0xd3), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrlq, LEG(66, 0F, 0, 0xd3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psraw, LEG(NP, 0F, 0, 0xe1), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psraw, LEG(66, 0F, 0, 0xe1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psrad, LEG(NP, 0F, 0, 0xe2), MMX, WRR, Pq, Pq, Qq)
+OPCODE(psrad, LEG(66, 0F, 0, 0xe2), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(packsswb, LEG(NP, 0F, 0, 0x63), MMX, WRR, Pq, Pq, Qq)
+OPCODE(packsswb, LEG(66, 0F, 0, 0x63), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
+OPCODE(packssdw, LEG(66, 0F, 0, 0x6b), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(packuswb, LEG(NP, 0F, 0, 0x67), MMX, WRR, Pq, Pq, Qq)
+OPCODE(packuswb, LEG(66, 0F, 0, 0x67), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpckhbw, LEG(NP, 0F, 0, 0x68), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpckhbw, LEG(66, 0F, 0, 0x68), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpckhwd, LEG(NP, 0F, 0, 0x69), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpckhwd, LEG(66, 0F, 0, 0x69), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpckhdq, LEG(NP, 0F, 0, 0x6a), MMX, WRR, Pq, Pq, Qq)
+OPCODE(punpckhdq, LEG(66, 0F, 0, 0x6a), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(punpckhqdq, LEG(66, 0F, 0, 0x6d), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpcklbw, LEG(NP, 0F, 0, 0x60), MMX, WRR, Pq, Pq, Qd)
+OPCODE(punpcklbw, LEG(66, 0F, 0, 0x60), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpcklwd, LEG(NP, 0F, 0, 0x61), MMX, WRR, Pq, Pq, Qd)
+OPCODE(punpcklwd, LEG(66, 0F, 0, 0x61), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpckldq, LEG(NP, 0F, 0, 0x62), MMX, WRR, Pq, Pq, Qd)
+OPCODE(punpckldq, LEG(66, 0F, 0, 0x62), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(punpcklqdq, LEG(66, 0F, 0, 0x6c), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(unpcklps, LEG(NP, 0F, 0, 0x14), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(unpcklpd, LEG(66, 0F, 0, 0x14), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(unpckhps, LEG(NP, 0F, 0, 0x15), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(unpckhpd, LEG(66, 0F, 0, 0x15), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pshufw, LEG(NP, 0F, 0, 0x70), SSE, WRR, Pq, Qq, Ib)
+OPCODE(pshuflw, LEG(F2, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
+OPCODE(pshufhw, LEG(F3, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
+OPCODE(pshufd, LEG(66, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(shufps, LEG(NP, 0F, 0, 0xc6), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(shufpd, LEG(66, 0F, 0, 0xc6), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(pinsrw, LEG(NP, 0F, 0, 0xc4), SSE, WRRR, Pq, Pq, RdMw, Ib)
+OPCODE(pinsrw, LEG(66, 0F, 0, 0xc4), SSE2, WRRR, Vdq, Vdq, RdMw, Ib)
 OPCODE(pextrw, LEG(NP, 0F, 0, 0xc5), SSE, WRR, Gd, Nq, Ib)
 OPCODE(pextrw, LEG(NP, 0F, 1, 0xc5), SSE, WRR, Gq, Nq, Ib)
+OPCODE(pextrw, LEG(66, 0F, 0, 0xc5), SSE2, WRR, Gd, Udq, Ib)
+OPCODE(pextrw, LEG(66, 0F, 1, 0xc5), SSE2, WRR, Gq, Udq, Ib)
 OPCODE(cvtpi2ps, LEG(NP, 0F, 0, 0x2a), SSE, WR, Vdq, Qq)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 0, 0x2a), SSE, WR, Vd, Ed)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 1, 0x2a), SSE, WR, Vd, Eq)
+OPCODE(cvtpi2pd, LEG(66, 0F, 0, 0x2a), SSE2, WR, Vdq, Qq)
+OPCODE(cvtsi2sd, LEG(F2, 0F, 0, 0x2a), SSE2, WR, Vq, Ed)
+OPCODE(cvtsi2sd, LEG(F2, 0F, 1, 0x2a), SSE2, WR, Vq, Eq)
 OPCODE(cvtps2pi, LEG(NP, 0F, 0, 0x2d), SSE, WR, Pq, Wq)
 OPCODE(cvtss2si, LEG(F3, 0F, 0, 0x2d), SSE, WR, Gd, Wd)
 OPCODE(cvtss2si, LEG(F3, 0F, 1, 0x2d), SSE, WR, Gq, Wd)
+OPCODE(cvtpd2pi, LEG(66, 0F, 0, 0x2d), SSE2, WR, Pq, Wdq)
+OPCODE(cvtsd2si, LEG(F2, 0F, 0, 0x2d), SSE2, WR, Gd, Wq)
+OPCODE(cvtsd2si, LEG(F2, 0F, 1, 0x2d), SSE2, WR, Gq, Wq)
 OPCODE(cvttps2pi, LEG(NP, 0F, 0, 0x2c), SSE, WR, Pq, Wq)
 OPCODE(cvttss2si, LEG(F3, 0F, 0, 0x2c), SSE, WR, Gd, Wd)
 OPCODE(cvttss2si, LEG(F3, 0F, 1, 0x2c), SSE, WR, Gq, Wd)
+OPCODE(cvttpd2pi, LEG(66, 0F, 0, 0x2c), SSE2, WR, Pq, Wdq)
+OPCODE(cvttsd2si, LEG(F2, 0F, 0, 0x2c), SSE2, WR, Gd, Wq)
+OPCODE(cvttsd2si, LEG(F2, 0F, 1, 0x2c), SSE2, WR, Gq, Wq)
+OPCODE(cvtpd2dq, LEG(F2, 0F, 0, 0xe6), SSE2, WR, Vdq, Wdq)
+OPCODE(cvttpd2dq, LEG(66, 0F, 0, 0xe6), SSE2, WR, Vdq, Wdq)
+OPCODE(cvtdq2pd, LEG(F3, 0F, 0, 0xe6), SSE2, WR, Vdq, Wq)
+OPCODE(cvtps2pd, LEG(NP, 0F, 0, 0x5a), SSE2, WR, Vdq, Wq)
+OPCODE(cvtpd2ps, LEG(66, 0F, 0, 0x5a), SSE2, WR, Vdq, Wdq)
+OPCODE(cvtss2sd, LEG(F3, 0F, 0, 0x5a), SSE2, WR, Vq, Wd)
+OPCODE(cvtsd2ss, LEG(F2, 0F, 0, 0x5a), SSE2, WR, Vd, Wq)
+OPCODE(cvtdq2ps, LEG(NP, 0F, 0, 0x5b), SSE2, WR, Vdq, Wdq)
+OPCODE(cvtps2dq, LEG(66, 0F, 0, 0x5b), SSE2, WR, Vdq, Wdq)
+OPCODE(cvttps2dq, LEG(F3, 0F, 0, 0x5b), SSE2, WR, Vdq, Wdq)
 OPCODE(maskmovq, LEG(NP, 0F, 0, 0xf7), SSE, RR, Pq, Nq)
+OPCODE(maskmovdqu, LEG(66, 0F, 0, 0xf7), SSE2, RR, Vdq, Udq)
 OPCODE(movntps, LEG(NP, 0F, 0, 0x2b), SSE, WR, Mdq, Vdq)
+OPCODE(movntpd, LEG(66, 0F, 0, 0x2b), SSE2, WR, Mdq, Vdq)
+OPCODE(movnti, LEG(NP, 0F, 0, 0xc3), SSE2, WR, Md, Gd)
+OPCODE(movnti, LEG(NP, 0F, 1, 0xc3), SSE2, WR, Mq, Gq)
 OPCODE(movntq, LEG(NP, 0F, 0, 0xe7), SSE, WR, Mq, Pq)
+OPCODE(movntdq, LEG(66, 0F, 0, 0xe7), SSE2, WR, Mdq, Vdq)
+OPCODE(pause, LEG(F3, NA, 0, 0x90), SSE2, )
 OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
 
+OPCODE_GRP(grp12_LEG_66, LEG(66, 0F, 0, 0x71))
+OPCODE_GRP_BEGIN(grp12_LEG_66)
+    OPCODE_GRPMEMB(grp12_LEG_66, psllw, 6, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp12_LEG_66, psrlw, 2, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp12_LEG_66, psraw, 4, SSE2, WRR, Udq, Udq, Ib)
+OPCODE_GRP_END(grp12_LEG_66)
+
 OPCODE_GRP(grp12_LEG_NP, LEG(NP, 0F, 0, 0x71))
 OPCODE_GRP_BEGIN(grp12_LEG_NP)
     OPCODE_GRPMEMB(grp12_LEG_NP, psllw, 6, MMX, WRR, Nq, Nq, Ib)
@@ -311,6 +618,13 @@ OPCODE_GRP_BEGIN(grp12_LEG_NP)
     OPCODE_GRPMEMB(grp12_LEG_NP, psraw, 4, MMX, WRR, Nq, Nq, Ib)
 OPCODE_GRP_END(grp12_LEG_NP)
 
+OPCODE_GRP(grp13_LEG_66, LEG(66, 0F, 0, 0x72))
+OPCODE_GRP_BEGIN(grp13_LEG_66)
+    OPCODE_GRPMEMB(grp13_LEG_66, pslld, 6, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp13_LEG_66, psrld, 2, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp13_LEG_66, psrad, 4, SSE2, WRR, Udq, Udq, Ib)
+OPCODE_GRP_END(grp13_LEG_66)
+
 OPCODE_GRP(grp13_LEG_NP, LEG(NP, 0F, 0, 0x72))
 OPCODE_GRP_BEGIN(grp13_LEG_NP)
     OPCODE_GRPMEMB(grp13_LEG_NP, pslld, 6, MMX, WRR, Nq, Nq, Ib)
@@ -318,6 +632,14 @@ OPCODE_GRP_BEGIN(grp13_LEG_NP)
     OPCODE_GRPMEMB(grp13_LEG_NP, psrad, 4, MMX, WRR, Nq, Nq, Ib)
 OPCODE_GRP_END(grp13_LEG_NP)
 
+OPCODE_GRP(grp14_LEG_66, LEG(66, 0F, 0, 0x73))
+OPCODE_GRP_BEGIN(grp14_LEG_66)
+    OPCODE_GRPMEMB(grp14_LEG_66, psllq, 6, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp14_LEG_66, pslldq, 7, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp14_LEG_66, psrlq, 2, SSE2, WRR, Udq, Udq, Ib)
+    OPCODE_GRPMEMB(grp14_LEG_66, psrldq, 3, SSE2, WRR, Udq, Udq, Ib)
+OPCODE_GRP_END(grp14_LEG_66)
+
 OPCODE_GRP(grp14_LEG_NP, LEG(NP, 0F, 0, 0x73))
 OPCODE_GRP_BEGIN(grp14_LEG_NP)
     OPCODE_GRPMEMB(grp14_LEG_NP, psllq, 6, MMX, WRR, Nq, Nq, Ib)
@@ -326,7 +648,9 @@ OPCODE_GRP_END(grp14_LEG_NP)
 
 OPCODE_GRP(grp15_LEG_NP, LEG(NP, 0F, 0, 0xae))
 OPCODE_GRP_BEGIN(grp15_LEG_NP)
-    OPCODE_GRPMEMB(grp15_LEG_NP, sfence, 7, SSE, )
+    OPCODE_GRPMEMB(grp15_LEG_NP, sfence_clflush, 7, SSE, RR, modrm_mod, modrm)
+    OPCODE_GRPMEMB(grp15_LEG_NP, lfence, 5, SSE2, )
+    OPCODE_GRPMEMB(grp15_LEG_NP, mfence, 6, SSE2, )
     OPCODE_GRPMEMB(grp15_LEG_NP, ldmxcsr, 2, SSE, R, Md)
     OPCODE_GRPMEMB(grp15_LEG_NP, stmxcsr, 3, SSE, W, Md)
 OPCODE_GRP_END(grp15_LEG_NP)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 45/75] target/i386: introduce SSE3 translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (43 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 44/75] target/i386: introduce SSE2 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 46/75] target/i386: introduce SSE3 code generators Jan Bobek
                   ` (28 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by SSE3
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3445b4cff1..a478f73c17 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6519,6 +6519,7 @@ DEF_TRANSLATE_INSN2(Vd, Wd)
 DEF_TRANSLATE_INSN2(Vd, Wq)
 DEF_TRANSLATE_INSN2(Vdq, Ed)
 DEF_TRANSLATE_INSN2(Vdq, Eq)
+DEF_TRANSLATE_INSN2(Vdq, Mdq)
 DEF_TRANSLATE_INSN2(Vdq, Nq)
 DEF_TRANSLATE_INSN2(Vdq, Qq)
 DEF_TRANSLATE_INSN2(Vdq, Udq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 46/75] target/i386: introduce SSE3 code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (44 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 45/75] target/i386: introduce SSE3 translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 47/75] target/i386: introduce SSE3 vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (27 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by SSE3 instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 61 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index a478f73c17..d449a64464 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5802,6 +5802,60 @@ GEN_INSN2(movmskpd, Gq, Udq)
     tcg_temp_free_i32(arg1_r32);
 }
 
+GEN_INSN2(lddqu, Vdq, Mdq)
+{
+    insnop_ldst(xmm, Mdq)(env, s, 0, arg1, arg2);
+}
+
+GEN_INSN2(movshdup, Vdq, Wdq)
+{
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+
+    tcg_gen_ld_i32(r32, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(1)));
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(0)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(1)));
+    }
+
+    tcg_gen_ld_i32(r32, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(3)));
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(2)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(3)));
+    }
+
+    tcg_temp_free_i32(r32);
+}
+GEN_INSN2(movsldup, Vdq, Wdq)
+{
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+
+    tcg_gen_ld_i32(r32, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(0)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(0)));
+    }
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(1)));
+
+    tcg_gen_ld_i32(r32, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(2)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(2)));
+    }
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(3)));
+
+    tcg_temp_free_i32(r32);
+}
+GEN_INSN2(movddup, Vdq, Wq)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+
+    tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    }
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+
+    tcg_temp_free_i64(r64);
+}
+
 DEF_GEN_INSN3_GVEC(paddb, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddb, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddw, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_16)
@@ -5822,6 +5876,8 @@ DEF_GEN_INSN3_HELPER_EPP(addps, addps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(addpd, addpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(addss, addss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(addsd, addsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(haddps, haddps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(haddpd, haddpd, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_GVEC(psubb, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubb, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
@@ -5843,6 +5899,11 @@ DEF_GEN_INSN3_HELPER_EPP(subps, subps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(subpd, subpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(subss, subss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(subsd, subsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(hsubps, hsubps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(hsubpd, hsubpd, Vdq, Vdq, Wdq)
+
+DEF_GEN_INSN3_HELPER_EPP(addsubps, addsubps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(addsubpd, addsubpd, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_xmm, Vdq, Vdq, Wdq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 47/75] target/i386: introduce SSE3 vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (45 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 46/75] target/i386: introduce SSE3 code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 48/75] target/i386: introduce SSSE3 translators Jan Bobek
                   ` (26 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the SSE3 vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index 6df5fda010..84785a4e04 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -341,6 +341,19 @@
  * NP 0F AE /7          CLFLUSH m8
  * NP 0F AE /5          LFENCE
  * NP 0F AE /6          MFENCE
+ *
+ * SSE3 Instructions
+ * ------------------
+ * F2 0F F0 /r          LDDQU xmm1, m128
+ * F3 0F 16 /r          MOVSHDUP xmm1, xmm2/m128
+ * F3 0F 12 /r          MOVSLDUP xmm1, xmm2/m128
+ * F2 0F 12 /r          MOVDDUP xmm1, xmm2/m64
+ * F2 0F 7C /r          HADDPS xmm1, xmm2/m128
+ * 66 0F 7C /r          HADDPD xmm1, xmm2/m128
+ * F2 0F 7D /r          HSUBPS xmm1, xmm2/m128
+ * 66 0F 7D /r          HSUBPD xmm1, xmm2/m128
+ * F2 0F D0 /r          ADDSUBPS xmm1, xmm2/m128
+ * 66 0F D0 /r          ADDSUBPD xmm1, xmm2/m128
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -389,6 +402,10 @@ OPCODE(movmskps, LEG(NP, 0F, 0, 0x50), SSE, WR, Gd, Udq)
 OPCODE(movmskps, LEG(NP, 0F, 1, 0x50), SSE, WR, Gq, Udq)
 OPCODE(movmskpd, LEG(66, 0F, 0, 0x50), SSE2, WR, Gd, Udq)
 OPCODE(movmskpd, LEG(66, 0F, 1, 0x50), SSE2, WR, Gq, Udq)
+OPCODE(lddqu, LEG(F2, 0F, 0, 0xf0), SSE3, WR, Vdq, Mdq)
+OPCODE(movshdup, LEG(F3, 0F, 0, 0x16), SSE3, WR, Vdq, Wdq)
+OPCODE(movsldup, LEG(F3, 0F, 0, 0x12), SSE3, WR, Vdq, Wdq)
+OPCODE(movddup, LEG(F2, 0F, 0, 0x12), SSE3, WR, Vdq, Wq)
 OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddb, LEG(66, 0F, 0, 0xfc), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
@@ -409,6 +426,8 @@ OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(addpd, LEG(66, 0F, 0, 0x58), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(addss, LEG(F3, 0F, 0, 0x58), SSE, WRR, Vd, Vd, Wd)
 OPCODE(addsd, LEG(F2, 0F, 0, 0x58), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(haddps, LEG(F2, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(haddpd, LEG(66, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubb, LEG(66, 0F, 0, 0xf8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
@@ -429,6 +448,10 @@ OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(subpd, LEG(66, 0F, 0, 0x5c), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(subss, LEG(F3, 0F, 0, 0x5c), SSE, WRR, Vd, Vd, Wd)
 OPCODE(subsd, LEG(F2, 0F, 0, 0x5c), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(hsubps, LEG(F2, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(hsubpd, LEG(66, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(addsubps, LEG(F2, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(addsubpd, LEG(66, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmullw, LEG(66, 0F, 0, 0xd5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 48/75] target/i386: introduce SSSE3 translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (46 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 47/75] target/i386: introduce SSE3 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 49/75] target/i386: introduce SSSE3 code generators Jan Bobek
                   ` (25 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by SSSE3
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index d449a64464..25d3b969b1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6695,6 +6695,7 @@ DEF_TRANSLATE_INSN3(Vq, Vq, Wq)
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN4(Pq, Pq, Qq, Ib)
 DEF_TRANSLATE_INSN4(Pq, Pq, RdMw, Ib)
 DEF_TRANSLATE_INSN4(Vd, Vd, Wd, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, RdMw, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 49/75] target/i386: introduce SSSE3 code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (47 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 48/75] target/i386: introduce SSSE3 translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 50/75] target/i386: introduce SSSE3 vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (24 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by SSSE3 instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 25d3b969b1..f43e9b1ba4 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5876,6 +5876,12 @@ DEF_GEN_INSN3_HELPER_EPP(addps, addps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(addpd, addpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(addss, addss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(addsd, addsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(phaddw, phaddw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(phaddw, phaddw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(phaddd, phaddd_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(phaddd, phaddd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(phaddsw, phaddsw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(phaddsw, phaddsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(haddps, haddps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(haddpd, haddpd, Vdq, Vdq, Wdq)
 
@@ -5899,6 +5905,12 @@ DEF_GEN_INSN3_HELPER_EPP(subps, subps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(subpd, subpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(subss, subss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(subsd, subsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(phsubw, phsubw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(phsubw, phsubw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(phsubd, phsubd_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(phsubd, phsubd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(phsubsw, phsubsw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(phsubsw, phsubsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(hsubps, hsubps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(hsubpd, hsubpd, Vdq, Vdq, Wdq)
 
@@ -5913,12 +5925,16 @@ DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(mulps, mulps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(mulpd, mulpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(mulss, mulss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(mulsd, mulsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_xmm, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(divps, divps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(divpd, divpd, Vdq, Vdq, Wdq)
@@ -5956,6 +5972,18 @@ DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_mmx, Pq, Qq)
+DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_mmx, Pq, Qq)
+DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(pabsd, pabsd_mmx, Pq, Qq)
+DEF_GEN_INSN2_HELPER_EPP(pabsd, pabsd_xmm, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(psignb, psignb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psignb, psignb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_xmm, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
@@ -6216,6 +6244,9 @@ DEF_GEN_PSHIFT_IMM_XMM(psraw, Udq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrad, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrad, Udq, Udq)
 
+DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_mmx, Pq, Pq, Qq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_xmm, Vdq, Vdq, Wdq, Ib)
+
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_mmx, Pq, Pq, Qq)
@@ -6242,6 +6273,8 @@ DEF_GEN_INSN3_HELPER_EPP(unpcklpd, punpcklqdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(unpckhps, punpckhdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(unpckhpd, punpckhqdq_xmm, Vdq, Vdq, Wdq)
 
+DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_mmx, Pq, Pq, Qq)
+DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_PPI(pshufw, pshufw_mmx, Pq, Qq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 50/75] target/i386: introduce SSSE3 vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (48 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 49/75] target/i386: introduce SSSE3 code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 51/75] target/i386: introduce SSE4.1 translators Jan Bobek
                   ` (23 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the SSSE3 vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 67 ++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index 84785a4e04..d8ea71aa6c 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -354,6 +354,41 @@
  * 66 0F 7D /r          HSUBPD xmm1, xmm2/m128
  * F2 0F D0 /r          ADDSUBPS xmm1, xmm2/m128
  * 66 0F D0 /r          ADDSUBPD xmm1, xmm2/m128
+ *
+ * SSSE3 Instructions
+ * -------------------
+ * NP 0F 38 01 /r          PHADDW mm1, mm2/m64
+ * 66 0F 38 01 /r          PHADDW xmm1, xmm2/m128
+ * NP 0F 38 02 /r          PHADDD mm1, mm2/m64
+ * 66 0F 38 02 /r          PHADDD xmm1, xmm2/m128
+ * NP 0F 38 03 /r          PHADDSW mm1, mm2/m64
+ * 66 0F 38 03 /r          PHADDSW xmm1, xmm2/m128
+ * NP 0F 38 05 /r          PHSUBW mm1, mm2/m64
+ * 66 0F 38 05 /r          PHSUBW xmm1, xmm2/m128
+ * NP 0F 38 06 /r          PHSUBD mm1, mm2/m64
+ * 66 0F 38 06 /r          PHSUBD xmm1, xmm2/m128
+ * NP 0F 38 07 /r          PHSUBSW mm1, mm2/m64
+ * 66 0F 38 07 /r          PHSUBSW xmm1, xmm2/m128
+ * NP 0F 38 0B /r          PMULHRSW mm1, mm2/m64
+ * 66 0F 38 0B /r          PMULHRSW xmm1, xmm2/m128
+ * NP 0F 38 04 /r          PMADDUBSW mm1, mm2/m64
+ * 66 0F 38 04 /r          PMADDUBSW xmm1, xmm2/m128
+ * NP 0F 38 1C /r          PABSB mm1, mm2/m64
+ * 66 0F 38 1C /r          PABSB xmm1, xmm2/m128
+ * NP 0F 38 1D /r          PABSW mm1, mm2/m64
+ * 66 0F 38 1D /r          PABSW xmm1, xmm2/m128
+ * NP 0F 38 1E /r          PABSD mm1, mm2/m64
+ * 66 0F 38 1E /r          PABSD xmm1, xmm2/m128
+ * NP 0F 38 08 /r          PSIGNB mm1, mm2/m64
+ * 66 0F 38 08 /r          PSIGNB xmm1, xmm2/m128
+ * NP 0F 38 09 /r          PSIGNW mm1, mm2/m64
+ * 66 0F 38 09 /r          PSIGNW xmm1, xmm2/m128
+ * NP 0F 38 0A /r          PSIGND mm1, mm2/m64
+ * 66 0F 38 0A /r          PSIGND xmm1, xmm2/m128
+ * NP 0F 3A 0F /r ib       PALIGNR mm1, mm2/m64, imm8
+ * 66 0F 3A 0F /r ib       PALIGNR xmm1, xmm2/m128, imm8
+ * NP 0F 38 00 /r          PSHUFB mm1, mm2/m64
+ * 66 0F 38 00 /r          PSHUFB xmm1, xmm2/m128
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -426,6 +461,12 @@ OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(addpd, LEG(66, 0F, 0, 0x58), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(addss, LEG(F3, 0F, 0, 0x58), SSE, WRR, Vd, Vd, Wd)
 OPCODE(addsd, LEG(F2, 0F, 0, 0x58), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(phaddw, LEG(NP, 0F38, 0, 0x01), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(phaddw, LEG(66, 0F38, 0, 0x01), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(phaddd, LEG(NP, 0F38, 0, 0x02), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(phaddd, LEG(66, 0F38, 0, 0x02), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(phaddsw, LEG(NP, 0F38, 0, 0x03), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(phaddsw, LEG(66, 0F38, 0, 0x03), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(haddps, LEG(F2, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(haddpd, LEG(66, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
@@ -448,6 +489,12 @@ OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(subpd, LEG(66, 0F, 0, 0x5c), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(subss, LEG(F3, 0F, 0, 0x5c), SSE, WRR, Vd, Vd, Wd)
 OPCODE(subsd, LEG(F2, 0F, 0, 0x5c), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(phsubw, LEG(NP, 0F38, 0, 0x05), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(phsubw, LEG(66, 0F38, 0, 0x05), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(phsubd, LEG(NP, 0F38, 0, 0x06), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(phsubd, LEG(66, 0F38, 0, 0x06), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(phsubsw, LEG(NP, 0F38, 0, 0x07), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(phsubsw, LEG(66, 0F38, 0, 0x07), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(hsubps, LEG(F2, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(hsubpd, LEG(66, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(addsubps, LEG(F2, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
@@ -460,12 +507,16 @@ OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmulhuw, LEG(66, 0F, 0, 0xe4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmuludq, LEG(NP, 0F, 0, 0xf4), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(pmuludq, LEG(66, 0F, 0, 0xf4), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmulhrsw, LEG(NP, 0F38, 0, 0x0b), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(pmulhrsw, LEG(66, 0F38, 0, 0x0b), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(mulps, LEG(NP, 0F, 0, 0x59), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(mulpd, LEG(66, 0F, 0, 0x59), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(mulss, LEG(F3, 0F, 0, 0x59), SSE, WRR, Vd, Vd, Wd)
 OPCODE(mulsd, LEG(F2, 0F, 0, 0x59), SSE2, WRR, Vq, Vq, Wq)
 OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmaddwd, LEG(66, 0F, 0, 0xf5), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmaddubsw, LEG(NP, 0F38, 0, 0x04), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(pmaddubsw, LEG(66, 0F38, 0, 0x04), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(divps, LEG(NP, 0F, 0, 0x5e), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(divpd, LEG(66, 0F, 0, 0x5e), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(divss, LEG(F3, 0F, 0, 0x5e), SSE, WRR, Vd, Vd, Wd)
@@ -500,6 +551,18 @@ OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pavgw, LEG(66, 0F, 0, 0xe3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
 OPCODE(psadbw, LEG(66, 0F, 0, 0xf6), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pabsb, LEG(NP, 0F38, 0, 0x1c), SSSE3, WR, Pq, Qq)
+OPCODE(pabsb, LEG(66, 0F38, 0, 0x1c), SSSE3, WR, Vdq, Wdq)
+OPCODE(pabsw, LEG(NP, 0F38, 0, 0x1d), SSSE3, WR, Pq, Qq)
+OPCODE(pabsw, LEG(66, 0F38, 0, 0x1d), SSSE3, WR, Vdq, Wdq)
+OPCODE(pabsd, LEG(NP, 0F38, 0, 0x1e), SSSE3, WR, Pq, Qq)
+OPCODE(pabsd, LEG(66, 0F38, 0, 0x1e), SSSE3, WR, Vdq, Wdq)
+OPCODE(psignb, LEG(NP, 0F38, 0, 0x08), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(psignb, LEG(66, 0F38, 0, 0x08), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(psignw, LEG(NP, 0F38, 0, 0x09), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(psignw, LEG(66, 0F38, 0, 0x09), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(psignd, LEG(NP, 0F38, 0, 0x0a), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(psignd, LEG(66, 0F38, 0, 0x0a), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
@@ -552,6 +615,8 @@ OPCODE(psraw, LEG(NP, 0F, 0, 0xe1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psraw, LEG(66, 0F, 0, 0xe1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psrad, LEG(NP, 0F, 0, 0xe2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrad, LEG(66, 0F, 0, 0xe2), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(palignr, LEG(NP, 0F3A, 0, 0x0f), SSSE3, WRRR, Pq, Pq, Qq, Ib)
+OPCODE(palignr, LEG(66, 0F3A, 0, 0x0f), SSSE3, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(packsswb, LEG(NP, 0F, 0, 0x63), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packsswb, LEG(66, 0F, 0, 0x63), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
@@ -576,6 +641,8 @@ OPCODE(unpcklps, LEG(NP, 0F, 0, 0x14), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(unpcklpd, LEG(66, 0F, 0, 0x14), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(unpckhps, LEG(NP, 0F, 0, 0x15), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(unpckhpd, LEG(66, 0F, 0, 0x15), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pshufb, LEG(NP, 0F38, 0, 0x00), SSSE3, WRR, Pq, Pq, Qq)
+OPCODE(pshufb, LEG(66, 0F38, 0, 0x00), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(pshufw, LEG(NP, 0F, 0, 0x70), SSE, WRR, Pq, Qq, Ib)
 OPCODE(pshuflw, LEG(F2, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(pshufhw, LEG(F3, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 51/75] target/i386: introduce SSE4.1 translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (49 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 50/75] target/i386: introduce SSSE3 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 52/75] target/i386: introduce SSE4.1 code generators Jan Bobek
                   ` (22 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by SSE4.1
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index f43e9b1ba4..110b963215 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6617,8 +6617,10 @@ DEF_TRANSLATE_INSN2(Vdq, Mdq)
 DEF_TRANSLATE_INSN2(Vdq, Nq)
 DEF_TRANSLATE_INSN2(Vdq, Qq)
 DEF_TRANSLATE_INSN2(Vdq, Udq)
+DEF_TRANSLATE_INSN2(Vdq, Wd)
 DEF_TRANSLATE_INSN2(Vdq, Wdq)
 DEF_TRANSLATE_INSN2(Vdq, Wq)
+DEF_TRANSLATE_INSN2(Vdq, Ww)
 DEF_TRANSLATE_INSN2(Vq, Ed)
 DEF_TRANSLATE_INSN2(Vq, Eq)
 DEF_TRANSLATE_INSN2(Vq, Wd)
@@ -6666,6 +6668,8 @@ DEF_TRANSLATE_INSN2(modrm_mod, modrm)
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN3(Ed, Vdq, Ib)
+DEF_TRANSLATE_INSN3(Eq, Vdq, Ib)
 DEF_TRANSLATE_INSN3(Gd, Nq, Ib)
 DEF_TRANSLATE_INSN3(Gd, Udq, Ib)
 DEF_TRANSLATE_INSN3(Gq, Nq, Ib)
@@ -6674,8 +6678,11 @@ DEF_TRANSLATE_INSN3(Nq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qd)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qq)
 DEF_TRANSLATE_INSN3(Pq, Qq, Ib)
+DEF_TRANSLATE_INSN3(RdMb, Vdq, Ib)
+DEF_TRANSLATE_INSN3(RdMw, Vdq, Ib)
 DEF_TRANSLATE_INSN3(Udq, Udq, Ib)
 DEF_TRANSLATE_INSN3(Vd, Vd, Wd)
+DEF_TRANSLATE_INSN3(Vd, Wd, Ib)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, UdqMhq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, Wdq)
@@ -6683,6 +6690,7 @@ DEF_TRANSLATE_INSN3(Vdq, Vq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vq, Wq)
 DEF_TRANSLATE_INSN3(Vdq, Wdq, Ib)
 DEF_TRANSLATE_INSN3(Vq, Vq, Wq)
+DEF_TRANSLATE_INSN3(Vq, Wq, Ib)
 
 #define DEF_TRANSLATE_INSN4(opT1, opT2, opT3, opT4)                     \
     static void translate_insn4(opT1, opT2, opT3, opT4)(                \
@@ -6731,7 +6739,11 @@ DEF_TRANSLATE_INSN3(Vq, Vq, Wq)
 DEF_TRANSLATE_INSN4(Pq, Pq, Qq, Ib)
 DEF_TRANSLATE_INSN4(Pq, Pq, RdMw, Ib)
 DEF_TRANSLATE_INSN4(Vd, Vd, Wd, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, Ed, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, Eq, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, RdMb, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, RdMw, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Vdq, Wd, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wd, modrm_mod)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wdq, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wq, modrm_mod)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 52/75] target/i386: introduce SSE4.1 code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (50 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 51/75] target/i386: introduce SSE4.1 translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 53/75] target/i386: introduce SSE4.1 vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (21 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by SSE4.1 instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 101 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 110b963215..3ff063b701 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5919,10 +5919,12 @@ DEF_GEN_INSN3_HELPER_EPP(addsubpd, addsubpd, Vdq, Vdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pmulld, pmulld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pmuldq, pmuldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_mmx, Pq, Pq, Qq)
@@ -5952,16 +5954,25 @@ DEF_GEN_INSN2_HELPER_EPP(rsqrtss, rsqrtss, Vd, Wd)
 
 DEF_GEN_INSN3_GVEC(pminub, Pq, Pq, Qq, umin, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminub, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(pminuw, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(pminud, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(pminsb, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminsw, Pq, Pq, Qq, smin, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminsw, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(pminsd, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_HELPER_EPP(minps, minps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(minpd, minpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(minss, minss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(minsd, minsd, Vq, Vq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(phminposuw, phminposuw_xmm, Vdq, Wdq)
 DEF_GEN_INSN3_GVEC(pmaxub, Pq, Pq, Qq, umax, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxub, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(pmaxuw, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(pmaxud, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(pmaxsb, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxsw, Pq, Pq, Qq, smax, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxsw, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(pmaxsd, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_HELPER_EPP(maxps, maxps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(maxpd, maxpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(maxss, maxss, Vd, Vd, Wd)
@@ -5972,6 +5983,7 @@ DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN4_HELPER_EPPI(mpsadbw, mpsadbw_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_xmm, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_mmx, Pq, Qq)
@@ -5985,12 +5997,20 @@ DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_xmm, Vdq, Vdq, Wdq)
 
+DEF_GEN_INSN4_HELPER_EPPI(dpps, dpps_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(dppd, dppd_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(roundps, roundps_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(roundpd, roundpd_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(roundss, roundss_xmm, Vd, Wd, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(roundsd, roundsd_xmm, Vq, Wq, Ib)
+
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(pcmpeqq, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_GT)
@@ -5998,6 +6018,8 @@ DEF_GEN_INSN3_GVEC(pcmpgtw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG
 DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
 
+DEF_GEN_INSN2_HELPER_EPP(ptest, ptest_xmm, Vdq, Wdq)
+
 DEF_GEN_INSN3_HELPER_EPP(cmpeqps, cmpeqps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpeqpd, cmpeqpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(cmpeqss, cmpeqss, Vd, Vd, Wd)
@@ -6253,6 +6275,7 @@ DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(packusdw, packusdw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_mmx, Pq, Pq, Qd)
@@ -6282,6 +6305,28 @@ DEF_GEN_INSN3_HELPER_PPI(pshufd, pshufd_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(shufps, shufps, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(shufpd, shufpd, Vdq, Vdq, Wdq, Ib)
 
+DEF_GEN_INSN4_HELPER_EPPI(blendps, blendps_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(blendpd, blendpd_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPP(blendvps, blendvps_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(blendvpd, blendvpd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(pblendvb, pblendvb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN4_HELPER_EPPI(pblendw, pblendw_xmm, Vdq, Vdq, Wdq, Ib)
+
+GEN_INSN4(insertps, Vdq, Vdq, Wd, Ib)
+{
+    assert(arg1 == arg2);
+
+    const size_t dofs = offsetof(ZMMReg, ZMM_L(arg4 & 3));
+    const size_t aofs = offsetof(ZMMReg, ZMM_L(0));
+    gen_op_movl(s, arg1 + dofs, arg3 + aofs);
+}
+GEN_INSN4(pinsrb, Vdq, Vdq, RdMb, Ib)
+{
+    assert(arg1 == arg2);
+
+    const size_t ofs = offsetof(ZMMReg, ZMM_B(arg4 & 15));
+    tcg_gen_st8_i32(arg3, cpu_env, arg1 + ofs);
+}
 GEN_INSN4(pinsrw, Pq, Pq, RdMw, Ib)
 {
     assert(arg1 == arg2);
@@ -6296,7 +6341,46 @@ GEN_INSN4(pinsrw, Vdq, Vdq, RdMw, Ib)
     const size_t ofs = offsetof(ZMMReg, ZMM_W(arg4 & 7));
     tcg_gen_st16_i32(arg3, cpu_env, arg1 + ofs);
 }
+GEN_INSN4(pinsrd, Vdq, Vdq, Ed, Ib)
+{
+    assert(arg1 == arg2);
 
+    const size_t ofs = offsetof(ZMMReg, ZMM_L(arg4 & 3));
+    tcg_gen_st_i32(arg3, cpu_env, arg1 + ofs);
+}
+GEN_INSN4(pinsrq, Vdq, Vdq, Eq, Ib)
+{
+    assert(arg1 == arg2);
+
+    const size_t ofs = offsetof(ZMMReg, ZMM_Q(arg4 & 1));
+    tcg_gen_st_i64(arg3, cpu_env, arg1 + ofs);
+}
+
+GEN_INSN3(extractps, Ed, Vdq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_L(arg3 & 3));
+    tcg_gen_ld_i32(arg1, cpu_env, arg2 + ofs);
+}
+GEN_INSN3(pextrb, RdMb, Vdq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_B(arg3 & 15));
+    tcg_gen_ld8u_i32(arg1, cpu_env, arg2 + ofs);
+}
+GEN_INSN3(pextrw, RdMw, Vdq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_W(arg3 & 7));
+    tcg_gen_ld16u_i32(arg1, cpu_env, arg2 + ofs);
+}
+GEN_INSN3(pextrd, Ed, Vdq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_L(arg3 & 3));
+    tcg_gen_ld_i32(arg1, cpu_env, arg2 + ofs);
+}
+GEN_INSN3(pextrq, Eq, Vdq, Ib)
+{
+    const size_t ofs = offsetof(ZMMReg, ZMM_Q(arg3 & 1));
+    tcg_gen_ld_i64(arg1, cpu_env, arg2 + ofs);
+}
 GEN_INSN3(pextrw, Gd, Nq, Ib)
 {
     const size_t ofs = offsetof(MMXReg, MMX_W(arg3 & 3));
@@ -6318,6 +6402,19 @@ GEN_INSN3(pextrw, Gq, Udq, Ib)
     tcg_gen_ld16u_i64(arg1, cpu_env, arg2 + ofs);
 }
 
+DEF_GEN_INSN2_HELPER_EPP(pmovsxbw, pmovsxbw_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(pmovsxbd, pmovsxbd_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(pmovsxbq, pmovsxbq_xmm, Vdq, Ww)
+DEF_GEN_INSN2_HELPER_EPP(pmovsxwd, pmovsxwd_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(pmovsxwq, pmovsxwq_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(pmovsxdq, pmovsxdq_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(pmovzxbw, pmovzxbw_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(pmovzxbd, pmovzxbd_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(pmovzxbq, pmovzxbq_xmm, Vdq, Ww)
+DEF_GEN_INSN2_HELPER_EPP(pmovzxwd, pmovzxwd_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(pmovzxwq, pmovzxwq_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(pmovzxdq, pmovzxdq_xmm, Vdq, Wq)
+
 DEF_GEN_INSN2_HELPER_EPP(cvtpi2ps, cvtpi2ps, Vdq, Qq)
 DEF_GEN_INSN2_HELPER_EPD(cvtsi2ss, cvtsi2ss, Vd, Ed)
 DEF_GEN_INSN2_HELPER_EPQ(cvtsi2ss, cvtsq2ss, Vd, Eq)
@@ -6407,6 +6504,10 @@ GEN_INSN2(movntdq, Mdq, Vdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(movntdqa, Vdq, Mdq)
+{
+    insnop_ldst(xmm, Mdq)(env, s, 0, arg1, arg2);
+}
 
 GEN_INSN0(pause)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 53/75] target/i386: introduce SSE4.1 vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (51 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 52/75] target/i386: introduce SSE4.1 code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 54/75] target/i386: introduce SSE4.2 code generators Jan Bobek
                   ` (20 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the SSE4.1 vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 101 +++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index d8ea71aa6c..9682cce7ef 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -389,6 +389,58 @@
  * 66 0F 3A 0F /r ib       PALIGNR xmm1, xmm2/m128, imm8
  * NP 0F 38 00 /r          PSHUFB mm1, mm2/m64
  * 66 0F 38 00 /r          PSHUFB xmm1, xmm2/m128
+ *
+ * SSE4.1 Instructions
+ * --------------------
+ * 66 0F 38 40 /r          PMULLD xmm1, xmm2/m128
+ * 66 0F 38 28 /r          PMULDQ xmm1, xmm2/m128
+ * 66 0F 38 3A /r          PMINUW xmm1, xmm2/m128
+ * 66 0F 38 3B /r          PMINUD xmm1, xmm2/m128
+ * 66 0F 38 38 /r          PMINSB xmm1, xmm2/m128
+ * 66 0F 38 39 /r          PMINSD xmm1, xmm2/m128
+ * 66 0F 38 41 /r          PHMINPOSUW xmm1, xmm2/m128
+ * 66 0F 38 3E /r          PMAXUW xmm1, xmm2/m128
+ * 66 0F 38 3F /r          PMAXUD xmm1, xmm2/m128
+ * 66 0F 38 3C /r          PMAXSB xmm1, xmm2/m128
+ * 66 0F 38 3D /r          PMAXSD xmm1, xmm2/m128
+ * 66 0F 3A 42 /r ib       MPSADBW xmm1, xmm2/m128, imm8
+ * 66 0F 3A 40 /r ib       DPPS xmm1, xmm2/m128, imm8
+ * 66 0F 3A 41 /r ib       DPPD xmm1, xmm2/m128, imm8
+ * 66 0F 3A 08 /r ib       ROUNDPS xmm1, xmm2/m128, imm8
+ * 66 0F 3A 09 /r ib       ROUNDPD xmm1, xmm2/m128, imm8
+ * 66 0F 3A 0A /r ib       ROUNDSS xmm1, xmm2/m32, imm8
+ * 66 0F 3A 0B /r ib       ROUNDSD xmm1, xmm2/m64, imm8
+ * 66 0F 38 29 /r          PCMPEQQ xmm1, xmm2/m128
+ * 66 0F 38 17 /r          PTEST xmm1, xmm2/m128
+ * 66 0F 38 2B /r          PACKUSDW xmm1, xmm2/m128
+ * 66 0F 3A 0C /r ib       BLENDPS xmm1, xmm2/m128, imm8
+ * 66 0F 3A 0D /r ib       BLENDPD xmm1, xmm2/m128, imm8
+ * 66 0F 38 14 /r          BLENDVPS xmm1, xmm2/m128, <XMM0>
+ * 66 0F 38 15 /r          BLENDVPD xmm1, xmm2/m128 , <XMM0>
+ * 66 0F 38 10 /r          PBLENDVB xmm1, xmm2/m128, <XMM0>
+ * 66 0F 3A 0E /r ib       PBLENDW xmm1, xmm2/m128, imm8
+ * 66 0F 3A 21 /r ib       INSERTPS xmm1, xmm2/m32, imm8
+ * 66 0F 3A 20 /r ib       PINSRB xmm1,r32/m8,imm8
+ * 66 0F 3A 22 /r ib       PINSRD xmm1,r/m32,imm8
+ * 66 REX.W 0F 3A 22 /r ib PINSRQ xmm1,r/m64,imm8
+ * 66 0F 3A 17 /r ib       EXTRACTPS reg/m32, xmm1, imm8
+ * 66 0F 3A 14 /r ib       PEXTRB r32/m8,xmm2,imm8
+ * 66 0F 3A 15 /r ib       PEXTRW r32/m16, xmm, imm8
+ * 66 0F 3A 16 /r ib       PEXTRD r/m32,xmm2,imm8
+ * 66 REX.W 0F 3A 16 /r ib PEXTRQ r/m64,xmm2,imm8
+ * 66 0f 38 20 /r          PMOVSXBW xmm1, xmm2/m64
+ * 66 0f 38 21 /r          PMOVSXBD xmm1, xmm2/m32
+ * 66 0f 38 22 /r          PMOVSXBQ xmm1, xmm2/m16
+ * 66 0f 38 23 /r          PMOVSXWD xmm1, xmm2/m64
+ * 66 0f 38 24 /r          PMOVSXWQ xmm1, xmm2/m32
+ * 66 0f 38 25 /r          PMOVSXDQ xmm1, xmm2/m64
+ * 66 0f 38 30 /r          PMOVZXBW xmm1, xmm2/m64
+ * 66 0f 38 31 /r          PMOVZXBD xmm1, xmm2/m32
+ * 66 0f 38 32 /r          PMOVZXBQ xmm1, xmm2/m16
+ * 66 0f 38 33 /r          PMOVZXWD xmm1, xmm2/m64
+ * 66 0f 38 34 /r          PMOVZXWQ xmm1, xmm2/m32
+ * 66 0f 38 35 /r          PMOVZXDQ xmm1, xmm2/m64
+ * 66 0F 38 2A /r          MOVNTDQA xmm1, m128
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -501,10 +553,12 @@ OPCODE(addsubps, LEG(F2, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(addsubpd, LEG(66, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmullw, LEG(66, 0F, 0, 0xd5), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmulld, LEG(66, 0F38, 0, 0x40), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmulhw, LEG(66, 0F, 0, 0xe5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmulhuw, LEG(66, 0F, 0, 0xe4), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmuldq, LEG(66, 0F38, 0, 0x28), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmuludq, LEG(NP, 0F, 0, 0xf4), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(pmuludq, LEG(66, 0F, 0, 0xf4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmulhrsw, LEG(NP, 0F38, 0, 0x0b), SSSE3, WRR, Pq, Pq, Qq)
@@ -531,16 +585,25 @@ OPCODE(rsqrtps, LEG(NP, 0F, 0, 0x52), SSE, WR, Vdq, Wdq)
 OPCODE(rsqrtss, LEG(F3, 0F, 0, 0x52), SSE, WR, Vd, Wd)
 OPCODE(pminub, LEG(NP, 0F, 0, 0xda), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pminub, LEG(66, 0F, 0, 0xda), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pminuw, LEG(66, 0F38, 0, 0x3a), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(pminud, LEG(66, 0F38, 0, 0x3b), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(pminsb, LEG(66, 0F38, 0, 0x38), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(pminsw, LEG(NP, 0F, 0, 0xea), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pminsw, LEG(66, 0F, 0, 0xea), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pminsd, LEG(66, 0F38, 0, 0x39), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(minps, LEG(NP, 0F, 0, 0x5d), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(minpd, LEG(66, 0F, 0, 0x5d), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(minss, LEG(F3, 0F, 0, 0x5d), SSE, WRR, Vd, Vd, Wd)
 OPCODE(minsd, LEG(F2, 0F, 0, 0x5d), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(phminposuw, LEG(66, 0F38, 0, 0x41), SSE4_1, WR, Vdq, Wdq)
 OPCODE(pmaxub, LEG(NP, 0F, 0, 0xde), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmaxub, LEG(66, 0F, 0, 0xde), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmaxuw, LEG(66, 0F38, 0, 0x3e), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmaxud, LEG(66, 0F38, 0, 0x3f), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmaxsb, LEG(66, 0F38, 0, 0x3c), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(pmaxsw, LEG(NP, 0F, 0, 0xee), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmaxsw, LEG(66, 0F, 0, 0xee), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pmaxsd, LEG(66, 0F38, 0, 0x3d), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(maxps, LEG(NP, 0F, 0, 0x5f), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(maxpd, LEG(66, 0F, 0, 0x5f), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(maxss, LEG(F3, 0F, 0, 0x5f), SSE, WRR, Vd, Vd, Wd)
@@ -551,6 +614,7 @@ OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pavgw, LEG(66, 0F, 0, 0xe3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
 OPCODE(psadbw, LEG(66, 0F, 0, 0xf6), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(mpsadbw, LEG(66, 0F3A, 0, 0x42), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(pabsb, LEG(NP, 0F38, 0, 0x1c), SSSE3, WR, Pq, Qq)
 OPCODE(pabsb, LEG(66, 0F38, 0, 0x1c), SSSE3, WR, Vdq, Wdq)
 OPCODE(pabsw, LEG(NP, 0F38, 0, 0x1d), SSSE3, WR, Pq, Qq)
@@ -563,18 +627,26 @@ OPCODE(psignw, LEG(NP, 0F38, 0, 0x09), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignw, LEG(66, 0F38, 0, 0x09), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(psignd, LEG(NP, 0F38, 0, 0x0a), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignd, LEG(66, 0F38, 0, 0x0a), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(dpps, LEG(66, 0F3A, 0, 0x40), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(dppd, LEG(66, 0F3A, 0, 0x41), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(roundps, LEG(66, 0F3A, 0, 0x08), SSE4_1, WRR, Vdq, Wdq, Ib)
+OPCODE(roundpd, LEG(66, 0F3A, 0, 0x09), SSE4_1, WRR, Vdq, Wdq, Ib)
+OPCODE(roundss, LEG(66, 0F3A, 0, 0x0a), SSE4_1, WRR, Vd, Wd, Ib)
+OPCODE(roundsd, LEG(66, 0F3A, 0, 0x0b), SSE4_1, WRR, Vq, Wq, Ib)
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqw, LEG(66, 0F, 0, 0x75), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqd, LEG(66, 0F, 0, 0x76), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pcmpeqq, LEG(66, 0F38, 0, 0x29), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtb, LEG(66, 0F, 0, 0x64), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtw, LEG(66, 0F, 0, 0x65), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtd, LEG(66, 0F, 0, 0x66), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(ptest, LEG(66, 0F38, 0, 0x17), SSE4_1, RR, Vdq, Wdq)
 OPCODE(cmpps, LEG(NP, 0F, 0, 0xc2), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(cmppd, LEG(66, 0F, 0, 0xc2), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(cmpss, LEG(F3, 0F, 0, 0xc2), SSE, WRRR, Vd, Vd, Wd, Ib)
@@ -623,6 +695,7 @@ OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packssdw, LEG(66, 0F, 0, 0x6b), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(packuswb, LEG(NP, 0F, 0, 0x67), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packuswb, LEG(66, 0F, 0, 0x67), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(packusdw, LEG(66, 0F38, 0, 0x2b), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpckhbw, LEG(NP, 0F, 0, 0x68), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhbw, LEG(66, 0F, 0, 0x68), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(punpckhwd, LEG(NP, 0F, 0, 0x69), MMX, WRR, Pq, Pq, Qq)
@@ -649,12 +722,39 @@ OPCODE(pshufhw, LEG(F3, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(pshufd, LEG(66, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(shufps, LEG(NP, 0F, 0, 0xc6), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(shufpd, LEG(66, 0F, 0, 0xc6), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(blendps, LEG(66, 0F3A, 0, 0x0c), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(blendpd, LEG(66, 0F3A, 0, 0x0d), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(blendvps, LEG(66, 0F38, 0, 0x14), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(blendvpd, LEG(66, 0F38, 0, 0x15), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(pblendvb, LEG(66, 0F38, 0, 0x10), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(pblendw, LEG(66, 0F3A, 0, 0x0e), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(insertps, LEG(66, 0F3A, 0, 0x21), SSE4_1, WRRR, Vdq, Vdq, Wd, Ib)
+OPCODE(pinsrb, LEG(66, 0F3A, 0, 0x20), SSE4_1, WRRR, Vdq, Vdq, RdMb, Ib)
 OPCODE(pinsrw, LEG(NP, 0F, 0, 0xc4), SSE, WRRR, Pq, Pq, RdMw, Ib)
 OPCODE(pinsrw, LEG(66, 0F, 0, 0xc4), SSE2, WRRR, Vdq, Vdq, RdMw, Ib)
+OPCODE(pinsrd, LEG(66, 0F3A, 0, 0x22), SSE4_1, WRRR, Vdq, Vdq, Ed, Ib)
+OPCODE(pinsrq, LEG(66, 0F3A, 1, 0x22), SSE4_1, WRRR, Vdq, Vdq, Eq, Ib)
+OPCODE(extractps, LEG(66, 0F3A, 0, 0x17), SSE4_1, WRR, Ed, Vdq, Ib)
+OPCODE(pextrb, LEG(66, 0F3A, 0, 0x14), SSE4_1, WRR, RdMb, Vdq, Ib)
+OPCODE(pextrw, LEG(66, 0F3A, 0, 0x15), SSE4_1, WRR, RdMw, Vdq, Ib)
+OPCODE(pextrd, LEG(66, 0F3A, 0, 0x16), SSE4_1, WRR, Ed, Vdq, Ib)
+OPCODE(pextrq, LEG(66, 0F3A, 1, 0x16), SSE4_1, WRR, Eq, Vdq, Ib)
 OPCODE(pextrw, LEG(NP, 0F, 0, 0xc5), SSE, WRR, Gd, Nq, Ib)
 OPCODE(pextrw, LEG(NP, 0F, 1, 0xc5), SSE, WRR, Gq, Nq, Ib)
 OPCODE(pextrw, LEG(66, 0F, 0, 0xc5), SSE2, WRR, Gd, Udq, Ib)
 OPCODE(pextrw, LEG(66, 0F, 1, 0xc5), SSE2, WRR, Gq, Udq, Ib)
+OPCODE(pmovsxbw, LEG(66, 0F38, 0, 0x20), SSE4_1, WR, Vdq, Wq)
+OPCODE(pmovsxbd, LEG(66, 0F38, 0, 0x21), SSE4_1, WR, Vdq, Wd)
+OPCODE(pmovsxbq, LEG(66, 0F38, 0, 0x22), SSE4_1, WR, Vdq, Ww)
+OPCODE(pmovsxwd, LEG(66, 0F38, 0, 0x23), SSE4_1, WR, Vdq, Wq)
+OPCODE(pmovsxwq, LEG(66, 0F38, 0, 0x24), SSE4_1, WR, Vdq, Wd)
+OPCODE(pmovsxdq, LEG(66, 0F38, 0, 0x25), SSE4_1, WR, Vdq, Wq)
+OPCODE(pmovzxbw, LEG(66, 0F38, 0, 0x30), SSE4_1, WR, Vdq, Wq)
+OPCODE(pmovzxbd, LEG(66, 0F38, 0, 0x31), SSE4_1, WR, Vdq, Wd)
+OPCODE(pmovzxbq, LEG(66, 0F38, 0, 0x32), SSE4_1, WR, Vdq, Ww)
+OPCODE(pmovzxwd, LEG(66, 0F38, 0, 0x33), SSE4_1, WR, Vdq, Wq)
+OPCODE(pmovzxwq, LEG(66, 0F38, 0, 0x34), SSE4_1, WR, Vdq, Wd)
+OPCODE(pmovzxdq, LEG(66, 0F38, 0, 0x35), SSE4_1, WR, Vdq, Wq)
 OPCODE(cvtpi2ps, LEG(NP, 0F, 0, 0x2a), SSE, WR, Vdq, Qq)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 0, 0x2a), SSE, WR, Vd, Ed)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 1, 0x2a), SSE, WR, Vd, Eq)
@@ -691,6 +791,7 @@ OPCODE(movnti, LEG(NP, 0F, 0, 0xc3), SSE2, WR, Md, Gd)
 OPCODE(movnti, LEG(NP, 0F, 1, 0xc3), SSE2, WR, Mq, Gq)
 OPCODE(movntq, LEG(NP, 0F, 0, 0xe7), SSE, WR, Mq, Pq)
 OPCODE(movntdq, LEG(66, 0F, 0, 0xe7), SSE2, WR, Mdq, Vdq)
+OPCODE(movntdqa, LEG(66, 0F38, 0, 0x2a), SSE4_1, WR, Vdq, Mdq)
 OPCODE(pause, LEG(F3, NA, 0, 0x90), SSE2, )
 OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 54/75] target/i386: introduce SSE4.2 code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (52 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 53/75] target/i386: introduce SSE4.1 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 55/75] target/i386: introduce SSE4.2 vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (19 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by SSE4.2 instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3ff063b701..f3b047c0df 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6017,6 +6017,12 @@ DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND
 DEF_GEN_INSN3_GVEC(pcmpgtw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(pcmpgtq, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_GT)
+
+DEF_GEN_INSN3_HELPER_EPPI(pcmpestrm, pcmpestrm_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(pcmpestri, pcmpestri_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(pcmpistrm, pcmpistrm_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(pcmpistri, pcmpistri_xmm, Vdq, Wdq, Ib)
 
 DEF_GEN_INSN2_HELPER_EPP(ptest, ptest_xmm, Vdq, Wdq)
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 55/75] target/i386: introduce SSE4.2 vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (53 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 54/75] target/i386: introduce SSE4.2 code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 57/75] target/i386: introduce AES and PCLMULQDQ code generators Jan Bobek
                   ` (18 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the SSE4.2 vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index 9682cce7ef..f43436213e 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -441,6 +441,14 @@
  * 66 0f 38 34 /r          PMOVZXWQ xmm1, xmm2/m32
  * 66 0f 38 35 /r          PMOVZXDQ xmm1, xmm2/m64
  * 66 0F 38 2A /r          MOVNTDQA xmm1, m128
+ *
+ * SSE4.2 Instructions
+ * --------------------
+ * 66 0F 38 37 /r          PCMPGTQ xmm1,xmm2/m128
+ * 66 0F 3A 60 /r imm8     PCMPESTRM xmm1, xmm2/m128, imm8
+ * 66 0F 3A 61 /r imm8     PCMPESTRI xmm1, xmm2/m128, imm8
+ * 66 0F 3A 62 /r imm8     PCMPISTRM xmm1, xmm2/m128, imm8
+ * 66 0F 3A 63 /r imm8     PCMPISTRI xmm1, xmm2/m128, imm8
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -646,6 +654,11 @@ OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtw, LEG(66, 0F, 0, 0x65), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtd, LEG(66, 0F, 0, 0x66), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pcmpgtq, LEG(66, 0F38, 0, 0x37), SSE4_2, WRR, Vdq, Vdq, Wdq)
+OPCODE(pcmpestrm, LEG(66, 0F3A, 0, 0x60), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(pcmpestri, LEG(66, 0F3A, 0, 0x61), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(pcmpistrm, LEG(66, 0F3A, 0, 0x62), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(pcmpistri, LEG(66, 0F3A, 0, 0x63), SSE4_2, RRR, Vdq, Wdq, Ib)
 OPCODE(ptest, LEG(66, 0F38, 0, 0x17), SSE4_1, RR, Vdq, Wdq)
 OPCODE(cmpps, LEG(NP, 0F, 0, 0xc2), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(cmppd, LEG(66, 0F, 0, 0xc2), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 57/75] target/i386: introduce AES and PCLMULQDQ code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (54 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 55/75] target/i386: introduce SSE4.2 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 58/75] target/i386: introduce AES and PCLMULQDQ vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (17 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by AES and PCLMULQDQ instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 75b0a818f2..14117c2993 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6004,6 +6004,21 @@ DEF_GEN_INSN3_HELPER_EPPI(roundpd, roundpd_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(roundss, roundss_xmm, Vd, Wd, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(roundsd, roundsd_xmm, Vq, Wq, Ib)
 
+DEF_GEN_INSN3_HELPER_EPP(aesdec, aesdec_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaesdec, aesdec_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(aesdeclast, aesdeclast_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaesdeclast, aesdeclast_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(aesenc, aesenc_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaesenc, aesenc_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(aesenclast, aesenclast_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaesenclast, aesenclast_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(aesimc, aesimc_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vaesimc, aesimc_xmm, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPPI(aeskeygenassist, aeskeygenassist_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vaeskeygenassist, aeskeygenassist_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(pclmulqdq, pclmulqdq_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vpclmulqdq, pclmulqdq_xmm, Vdq, Hdq, Wdq, Ib)
+
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 58/75] target/i386: introduce AES and PCLMULQDQ vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (55 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 57/75] target/i386: introduce AES and PCLMULQDQ code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-22  4:02   ` Aleksandar Markovic
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 59/75] target/i386: introduce AVX translators Jan Bobek
                   ` (16 subsequent siblings)
  73 siblings, 1 reply; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the AES and PCLMULQDQ vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index f43436213e..1359508424 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -449,6 +449,26 @@
  * 66 0F 3A 61 /r imm8     PCMPESTRI xmm1, xmm2/m128, imm8
  * 66 0F 3A 62 /r imm8     PCMPISTRM xmm1, xmm2/m128, imm8
  * 66 0F 3A 63 /r imm8     PCMPISTRI xmm1, xmm2/m128, imm8
+ *
+ * AES Instructions
+ * -----------------
+ * 66 0F 38 DE /r                  AESDEC xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG DE /r       VAESDEC xmm1, xmm2, xmm3/m128
+ * 66 0F 38 DF /r                  AESDECLAST xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG DF /r       VAESDECLAST xmm1, xmm2, xmm3/m128
+ * 66 0F 38 DC /r                  AESENC xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG DC /r       VAESENC xmm1, xmm2, xmm3/m128
+ * 66 0F 38 DD /r                  AESENCLAST xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG DD /r       VAESENCLAST xmm1, xmm2, xmm3/m128
+ * 66 0F 38 DB /r                  AESIMC xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG DB /r       VAESIMC xmm1, xmm2/m128
+ * 66 0F 3A DF /r ib               AESKEYGENASSIST xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F3A.WIG DF /r ib    VAESKEYGENASSIST xmm1, xmm2/m128, imm8
+ *
+ * PCLMULQDQ Instructions
+ * -----------------------
+ * 66 0F 3A 44 /r ib               PCLMULQDQ xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F3A.WIG 44 /r ib    VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -641,6 +661,20 @@ OPCODE(roundps, LEG(66, 0F3A, 0, 0x08), SSE4_1, WRR, Vdq, Wdq, Ib)
 OPCODE(roundpd, LEG(66, 0F3A, 0, 0x09), SSE4_1, WRR, Vdq, Wdq, Ib)
 OPCODE(roundss, LEG(66, 0F3A, 0, 0x0a), SSE4_1, WRR, Vd, Wd, Ib)
 OPCODE(roundsd, LEG(66, 0F3A, 0, 0x0b), SSE4_1, WRR, Vq, Wq, Ib)
+OPCODE(aesdec, LEG(66, 0F38, 0, 0xde), AES, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaesdec, VEX(128, 66, 0F38, IG, 0xde), AES_AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(aesdeclast, LEG(66, 0F38, 0, 0xdf), AES, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaesdeclast, VEX(128, 66, 0F38, IG, 0xdf), AES_AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(aesenc, LEG(66, 0F38, 0, 0xdc), AES, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaesenc, VEX(128, 66, 0F38, IG, 0xdc), AES_AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(aesenclast, LEG(66, 0F38, 0, 0xdd), AES, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaesenclast, VEX(128, 66, 0F38, IG, 0xdd), AES_AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(aesimc, LEG(66, 0F38, 0, 0xdb), AES, WR, Vdq, Wdq)
+OPCODE(vaesimc, VEX(128, 66, 0F38, IG, 0xdb), AES_AVX, WR, Vdq, Wdq)
+OPCODE(aeskeygenassist, LEG(66, 0F3A, 0, 0xdf), AES, WRR, Vdq, Wdq, Ib)
+OPCODE(vaeskeygenassist, VEX(128, 66, 0F3A, IG, 0xdf), AES_AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(pclmulqdq, LEG(66, 0F3A, 0, 0x44), PCLMULQDQ, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vpclmulqdq, VEX(128, 66, 0F3A, IG, 0x44), PCLMULQDQ_AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 59/75] target/i386: introduce AVX translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (56 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 58/75] target/i386: introduce AES and PCLMULQDQ vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 60/75] target/i386: introduce AVX code generators Jan Bobek
                   ` (15 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by AVX
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 48 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 14117c2993..9b9f0d4ed1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6708,10 +6708,12 @@ DEF_TRANSLATE_INSN2(Eq, Pq)
 DEF_TRANSLATE_INSN2(Eq, Vdq)
 DEF_TRANSLATE_INSN2(Gd, Nq)
 DEF_TRANSLATE_INSN2(Gd, Udq)
+DEF_TRANSLATE_INSN2(Gd, Uqq)
 DEF_TRANSLATE_INSN2(Gd, Wd)
 DEF_TRANSLATE_INSN2(Gd, Wq)
 DEF_TRANSLATE_INSN2(Gq, Nq)
 DEF_TRANSLATE_INSN2(Gq, Udq)
+DEF_TRANSLATE_INSN2(Gq, Uqq)
 DEF_TRANSLATE_INSN2(Gq, Wd)
 DEF_TRANSLATE_INSN2(Gq, Wq)
 DEF_TRANSLATE_INSN2(Md, Gd)
@@ -6720,6 +6722,7 @@ DEF_TRANSLATE_INSN2(Mq, Gq)
 DEF_TRANSLATE_INSN2(Mq, Pq)
 DEF_TRANSLATE_INSN2(Mq, Vdq)
 DEF_TRANSLATE_INSN2(Mq, Vq)
+DEF_TRANSLATE_INSN2(Mqq, Vqq)
 DEF_TRANSLATE_INSN2(Pq, Ed)
 DEF_TRANSLATE_INSN2(Pq, Eq)
 DEF_TRANSLATE_INSN2(Pq, Nq)
@@ -6735,6 +6738,7 @@ DEF_TRANSLATE_INSN2(Vd, Wd)
 DEF_TRANSLATE_INSN2(Vd, Wq)
 DEF_TRANSLATE_INSN2(Vdq, Ed)
 DEF_TRANSLATE_INSN2(Vdq, Eq)
+DEF_TRANSLATE_INSN2(Vdq, Md)
 DEF_TRANSLATE_INSN2(Vdq, Mdq)
 DEF_TRANSLATE_INSN2(Vdq, Nq)
 DEF_TRANSLATE_INSN2(Vdq, Qq)
@@ -6742,14 +6746,22 @@ DEF_TRANSLATE_INSN2(Vdq, Udq)
 DEF_TRANSLATE_INSN2(Vdq, Wd)
 DEF_TRANSLATE_INSN2(Vdq, Wdq)
 DEF_TRANSLATE_INSN2(Vdq, Wq)
+DEF_TRANSLATE_INSN2(Vdq, Wqq)
 DEF_TRANSLATE_INSN2(Vdq, Ww)
 DEF_TRANSLATE_INSN2(Vq, Ed)
 DEF_TRANSLATE_INSN2(Vq, Eq)
 DEF_TRANSLATE_INSN2(Vq, Wd)
 DEF_TRANSLATE_INSN2(Vq, Wq)
+DEF_TRANSLATE_INSN2(Vqq, Md)
+DEF_TRANSLATE_INSN2(Vqq, Mdq)
+DEF_TRANSLATE_INSN2(Vqq, Mq)
+DEF_TRANSLATE_INSN2(Vqq, Mqq)
+DEF_TRANSLATE_INSN2(Vqq, Wdq)
+DEF_TRANSLATE_INSN2(Vqq, Wqq)
 DEF_TRANSLATE_INSN2(Wd, Vd)
 DEF_TRANSLATE_INSN2(Wdq, Vdq)
 DEF_TRANSLATE_INSN2(Wq, Vq)
+DEF_TRANSLATE_INSN2(Wqq, Vqq)
 DEF_TRANSLATE_INSN2(modrm_mod, modrm)
 
 #define DEF_TRANSLATE_INSN3(opT1, opT2, opT3)                           \
@@ -6796,6 +6808,9 @@ DEF_TRANSLATE_INSN3(Gd, Nq, Ib)
 DEF_TRANSLATE_INSN3(Gd, Udq, Ib)
 DEF_TRANSLATE_INSN3(Gq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Gq, Udq, Ib)
+DEF_TRANSLATE_INSN3(Hdq, Udq, Ib)
+DEF_TRANSLATE_INSN3(Mdq, Hdq, Vdq)
+DEF_TRANSLATE_INSN3(Mqq, Hqq, Vqq)
 DEF_TRANSLATE_INSN3(Nq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qd)
 DEF_TRANSLATE_INSN3(Pq, Pq, Qq)
@@ -6803,17 +6818,34 @@ DEF_TRANSLATE_INSN3(Pq, Qq, Ib)
 DEF_TRANSLATE_INSN3(RdMb, Vdq, Ib)
 DEF_TRANSLATE_INSN3(RdMw, Vdq, Ib)
 DEF_TRANSLATE_INSN3(Udq, Udq, Ib)
+DEF_TRANSLATE_INSN3(Vd, Hd, Ed)
+DEF_TRANSLATE_INSN3(Vd, Hd, Eq)
+DEF_TRANSLATE_INSN3(Vd, Hd, Wd)
+DEF_TRANSLATE_INSN3(Vd, Hd, Wq)
 DEF_TRANSLATE_INSN3(Vd, Vd, Wd)
 DEF_TRANSLATE_INSN3(Vd, Wd, Ib)
+DEF_TRANSLATE_INSN3(Vdq, Hdq, Mdq)
+DEF_TRANSLATE_INSN3(Vdq, Hdq, Mq)
+DEF_TRANSLATE_INSN3(Vdq, Hdq, UdqMhq)
 DEF_TRANSLATE_INSN3(Vdq, Hdq, Wdq)
+DEF_TRANSLATE_INSN3(Vdq, Hq, Mq)
+DEF_TRANSLATE_INSN3(Vdq, Hq, Wq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, UdqMhq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, Wdq)
 DEF_TRANSLATE_INSN3(Vdq, Vq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vq, Wq)
 DEF_TRANSLATE_INSN3(Vdq, Wdq, Ib)
+DEF_TRANSLATE_INSN3(Vq, Hq, Ed)
+DEF_TRANSLATE_INSN3(Vq, Hq, Eq)
+DEF_TRANSLATE_INSN3(Vq, Hq, Wd)
+DEF_TRANSLATE_INSN3(Vq, Hq, Wq)
 DEF_TRANSLATE_INSN3(Vq, Vq, Wq)
 DEF_TRANSLATE_INSN3(Vq, Wq, Ib)
+DEF_TRANSLATE_INSN3(Vqq, Hqq, Mqq)
+DEF_TRANSLATE_INSN3(Vqq, Hqq, Wqq)
+DEF_TRANSLATE_INSN3(Vqq, Wqq, Ib)
+DEF_TRANSLATE_INSN3(Wdq, Vqq, Ib)
 
 #define DEF_TRANSLATE_INSN4(opT1, opT2, opT3, opT4)                     \
     static void translate_insn4(opT1, opT2, opT3, opT4)(                \
@@ -6861,8 +6893,15 @@ DEF_TRANSLATE_INSN3(Vq, Wq, Ib)
 
 DEF_TRANSLATE_INSN4(Pq, Pq, Qq, Ib)
 DEF_TRANSLATE_INSN4(Pq, Pq, RdMw, Ib)
+DEF_TRANSLATE_INSN4(Vd, Hd, Wd, Ib)
 DEF_TRANSLATE_INSN4(Vd, Vd, Wd, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Hdq, Ed, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Hdq, Eq, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Hdq, RdMb, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Hdq, RdMw, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Hdq, Wd, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Hdq, Wdq, Ib)
+DEF_TRANSLATE_INSN4(Vdq, Hdq, Wdq, Ldq)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Ed, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Eq, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, RdMb, Ib)
@@ -6871,7 +6910,11 @@ DEF_TRANSLATE_INSN4(Vdq, Vdq, Wd, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wd, modrm_mod)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wdq, Ib)
 DEF_TRANSLATE_INSN4(Vdq, Vdq, Wq, modrm_mod)
+DEF_TRANSLATE_INSN4(Vq, Hq, Wq, Ib)
 DEF_TRANSLATE_INSN4(Vq, Vq, Wq, Ib)
+DEF_TRANSLATE_INSN4(Vqq, Hqq, Wdq, Ib)
+DEF_TRANSLATE_INSN4(Vqq, Hqq, Wqq, Ib)
+DEF_TRANSLATE_INSN4(Vqq, Hqq, Wqq, Lqq)
 
 #define DEF_TRANSLATE_INSN5(opT1, opT2, opT3, opT4, opT5)               \
     static void translate_insn5(opT1, opT2, opT3, opT4, opT5)(          \
@@ -6924,6 +6967,11 @@ DEF_TRANSLATE_INSN4(Vq, Vq, Wq, Ib)
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN5(Vdq, Hdq, Wd, modrm_mod, vex_v)
+DEF_TRANSLATE_INSN5(Vdq, Hdq, Wq, modrm_mod, vex_v)
+DEF_TRANSLATE_INSN5(Wdq, Hdq, Vd, modrm_mod, vex_v)
+DEF_TRANSLATE_INSN5(Wdq, Hdq, Vq, modrm_mod, vex_v)
+
 #define OPCODE_GRP_BEGIN(grpname)                                       \
     static void translate_group(grpname)(                               \
         CPUX86State *env, DisasContext *s, int modrm)                   \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 60/75] target/i386: introduce AVX code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (57 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 59/75] target/i386: introduce AVX translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 61/75] target/i386: introduce AVX vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (14 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by AVX instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 954 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 954 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 9b9f0d4ed1..50eab9181c 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5586,6 +5586,14 @@ GEN_INSN2(movd, Ed, Vdq)
 {
     tcg_gen_ld_i32(arg1, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(0)));
 }
+GEN_INSN2(vmovd, Vdq, Ed)
+{
+    gen_insn2(movd, Vdq, Ed)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovd, Ed, Vdq)
+{
+    gen_insn2(movd, Ed, Vdq)(env, s, arg1, arg2);
+}
 
 GEN_INSN2(movq, Pq, Eq)
 {
@@ -5607,6 +5615,14 @@ GEN_INSN2(movq, Eq, Vdq)
 {
     tcg_gen_ld_i64(arg1, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
 }
+GEN_INSN2(vmovq, Vdq, Eq)
+{
+    gen_insn2(movq, Vdq, Eq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovq, Eq, Vdq)
+{
+    gen_insn2(movq, Eq, Vdq)(env, s, arg1, arg2);
+}
 
 DEF_GEN_INSN2_GVEC(movq, Pq, Qq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movq, Qq, Pq, mov, MM_OPRSZ, MM_MAXSZ, MO_64)
@@ -5621,19 +5637,51 @@ GEN_INSN2(movq, UdqMq, Vq)
 {
     gen_insn2(movq, Vdq, Wq)(env, s, arg1, arg2);
 }
+GEN_INSN2(vmovq, Vdq, Wq)
+{
+    gen_insn2(movq, Vdq, Wq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovq, UdqMq, Vq)
+{
+    gen_insn2(movq, UdqMq, Vq)(env, s, arg1, arg2);
+}
 
 DEF_GEN_INSN2_GVEC(movaps, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movaps, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovaps, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovaps, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovaps, Vqq, Wqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovaps, Wqq, Vqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movapd, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movapd, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovapd, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovapd, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovapd, Vqq, Wqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovapd, Wqq, Vqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movdqa, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movdqa, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqa, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqa, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqa, Vqq, Wqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqa, Wqq, Vqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movups, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movups, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovups, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovups, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovups, Vqq, Wqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovups, Wqq, Vqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movupd, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movupd, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovupd, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovupd, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovupd, Vqq, Wqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovupd, Wqq, Vqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movdqu, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN2_GVEC(movdqu, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqu, Vdq, Wdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqu, Wdq, Vdq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqu, Vqq, Wqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN2_GVEC(vmovdqu, Wqq, Vqq, mov, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 
 GEN_INSN4(movss, Vdq, Vdq, Wd, modrm_mod)
 {
@@ -5664,6 +5712,24 @@ GEN_INSN2(movss, Wd, Vd)
 {
     gen_insn4(movss, Vdq, Vdq, Wd, modrm_mod)(env, s, arg1, arg1, arg2, 3);
 }
+GEN_INSN5(vmovss, Vdq, Hdq, Wd, modrm_mod, vex_v)
+{
+    if (arg4 == 3 || arg5 == 0) {
+        gen_insn4(movss, Vdq, Vdq, Wd, modrm_mod)(env, s, arg1,
+                                                  arg2, arg3, arg4);
+    } else {
+        gen_unknown_opcode(env, s);
+    }
+}
+GEN_INSN5(vmovss, Wdq, Hdq, Vd, modrm_mod, vex_v)
+{
+    if (arg4 == 3 || arg5 == 0) {
+        gen_insn4(movss, Vdq, Vdq, Wd, modrm_mod)(env, s, arg1,
+                                                  arg2, arg3, 3);
+    } else {
+        gen_unknown_opcode(env, s);
+    }
+}
 
 GEN_INSN4(movsd, Vdq, Vdq, Wq, modrm_mod)
 {
@@ -5687,6 +5753,24 @@ GEN_INSN2(movsd, Wq, Vq)
 {
     gen_insn4(movsd, Vdq, Vdq, Wq, modrm_mod)(env, s, arg1, arg1, arg2, 3);
 }
+GEN_INSN5(vmovsd, Vdq, Hdq, Wq, modrm_mod, vex_v)
+{
+    if (arg4 == 3 || arg5 == 0) {
+        gen_insn4(movsd, Vdq, Vdq, Wq, modrm_mod)(env, s, arg1,
+                                                  arg2, arg3, arg4);
+    } else {
+        gen_unknown_opcode(env, s);
+    }
+}
+GEN_INSN5(vmovsd, Wdq, Hdq, Vq, modrm_mod, vex_v)
+{
+    if (arg4 == 3 || arg5 == 0) {
+        gen_insn4(movsd, Vdq, Vdq, Wq, modrm_mod)(env, s, arg1,
+                                                  arg2, arg3, 3);
+    } else {
+        gen_unknown_opcode(env, s);
+    }
+}
 
 GEN_INSN2(movq2dq, Vdq, Nq)
 {
@@ -5716,10 +5800,18 @@ GEN_INSN3(movhlps, Vdq, Vdq, UdqMhq)
     }
     tcg_temp_free_i64(r64);
 }
+GEN_INSN3(vmovhlps, Vdq, Hdq, UdqMhq)
+{
+    gen_insn3(movhlps, Vdq, Vdq, UdqMhq)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN2(movlps, Mq, Vq)
 {
     insnop_ldst(xmm, Mq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(vmovlps, Mq, Vq)
+{
+    gen_insn2(movlps, Mq, Vq)(env, s, arg1, arg2);
+}
 
 GEN_INSN3(movlpd, Vdq, Vdq, Mq)
 {
@@ -5731,10 +5823,18 @@ GEN_INSN3(movlpd, Vdq, Vdq, Mq)
         tcg_temp_free_i64(r64);
     }
 }
+GEN_INSN3(vmovlpd, Vdq, Hdq, Mq)
+{
+    gen_insn3(movlpd, Vdq, Vdq, Mq)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN2(movlpd, Mq, Vq)
 {
     insnop_ldst(xmm, Mq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(vmovlpd, Mq, Vq)
+{
+    gen_insn2(movlpd, Mq, Vq)(env, s, arg1, arg2);
+}
 
 GEN_INSN3(movlhps, Vdq, Vq, Wq)
 {
@@ -5747,10 +5847,18 @@ GEN_INSN3(movlhps, Vdq, Vq, Wq)
     }
     tcg_temp_free_i64(r64);
 }
+GEN_INSN3(vmovlhps, Vdq, Hq, Wq)
+{
+    gen_insn3(movlhps, Vdq, Vq, Wq)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN2(movhps, Mq, Vdq)
 {
     insnop_ldst(xmm, Mhq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(vmovhps, Mq, Vdq)
+{
+    gen_insn2(movhps, Mq, Vdq)(env, s, arg1, arg2);
+}
 
 GEN_INSN3(movhpd, Vdq, Vq, Mq)
 {
@@ -5766,6 +5874,14 @@ GEN_INSN2(movhpd, Mq, Vdq)
 {
     insnop_ldst(xmm, Mhq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN3(vmovhpd, Vdq, Hq, Mq)
+{
+    gen_insn3(movhpd, Vdq, Vq, Mq)(env, s, arg1, arg2, arg3);
+}
+GEN_INSN2(vmovhpd, Mq, Vdq)
+{
+    gen_insn2(movhpd, Mq, Vdq)(env, s, arg1, arg2);
+}
 
 DEF_GEN_INSN2_HELPER_DEP(pmovmskb, pmovmskb_mmx, Gd, Nq)
 GEN_INSN2(pmovmskb, Gq, Nq)
@@ -5783,6 +5899,14 @@ GEN_INSN2(pmovmskb, Gq, Udq)
     tcg_gen_extu_i32_i64(arg1, arg1_r32);
     tcg_temp_free_i32(arg1_r32);
 }
+DEF_GEN_INSN2_HELPER_DEP(vpmovmskb, pmovmskb_xmm, Gd, Udq)
+GEN_INSN2(vpmovmskb, Gq, Udq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(vpmovmskb, Gd, Udq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
 
 DEF_GEN_INSN2_HELPER_DEP(movmskps, movmskps, Gd, Udq)
 GEN_INSN2(movmskps, Gq, Udq)
@@ -5792,6 +5916,22 @@ GEN_INSN2(movmskps, Gq, Udq)
     tcg_gen_extu_i32_i64(arg1, arg1_r32);
     tcg_temp_free_i32(arg1_r32);
 }
+DEF_GEN_INSN2_HELPER_DEP(vmovmskps, movmskps, Gd, Udq)
+GEN_INSN2(vmovmskps, Gq, Udq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(vmovmskps, Gd, Udq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
+DEF_GEN_INSN2_HELPER_DEP(vmovmskps, movmskps, Gd, Uqq)
+GEN_INSN2(vmovmskps, Gq, Uqq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(vmovmskps, Gd, Uqq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
 
 DEF_GEN_INSN2_HELPER_DEP(movmskpd, movmskpd, Gd, Udq)
 GEN_INSN2(movmskpd, Gq, Udq)
@@ -5801,11 +5941,35 @@ GEN_INSN2(movmskpd, Gq, Udq)
     tcg_gen_extu_i32_i64(arg1, arg1_r32);
     tcg_temp_free_i32(arg1_r32);
 }
+DEF_GEN_INSN2_HELPER_DEP(vmovmskpd, movmskpd, Gd, Udq)
+GEN_INSN2(vmovmskpd, Gq, Udq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(vmovmskpd, Gd, Udq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
+DEF_GEN_INSN2_HELPER_DEP(vmovmskpd, movmskpd, Gd, Uqq)
+GEN_INSN2(vmovmskpd, Gq, Uqq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(vmovmskpd, Gd, Uqq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
 
 GEN_INSN2(lddqu, Vdq, Mdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 0, arg1, arg2);
 }
+GEN_INSN2(vlddqu, Vdq, Mdq)
+{
+    gen_insn2(lddqu, Vdq, Mdq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vlddqu, Vqq, Mqq)
+{
+    gen_insn2(vlddqu, Vdq, Mdq)(env, s, arg1, arg2);
+}
 
 GEN_INSN2(movshdup, Vdq, Wdq)
 {
@@ -5825,6 +5989,15 @@ GEN_INSN2(movshdup, Vdq, Wdq)
 
     tcg_temp_free_i32(r32);
 }
+GEN_INSN2(vmovshdup, Vdq, Wdq)
+{
+    gen_insn2(movshdup, Vdq, Wdq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovshdup, Vqq, Wqq)
+{
+    gen_insn2(vmovshdup, Vdq, Wdq)(env, s, arg1, arg2);
+}
+
 GEN_INSN2(movsldup, Vdq, Wdq)
 {
     const TCGv_i32 r32 = tcg_temp_new_i32();
@@ -5843,6 +6016,15 @@ GEN_INSN2(movsldup, Vdq, Wdq)
 
     tcg_temp_free_i32(r32);
 }
+GEN_INSN2(vmovsldup, Vdq, Wdq)
+{
+    gen_insn2(movsldup, Vdq, Wdq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovsldup, Vqq, Wqq)
+{
+    gen_insn2(movsldup, Vdq, Wdq)(env, s, arg1, arg2);
+}
+
 GEN_INSN2(movddup, Vdq, Wq)
 {
     const TCGv_i64 r64 = tcg_temp_new_i64();
@@ -5855,154 +6037,285 @@ GEN_INSN2(movddup, Vdq, Wq)
 
     tcg_temp_free_i64(r64);
 }
+GEN_INSN2(vmovddup, Vdq, Wq)
+{
+    gen_insn2(movddup, Vdq, Wq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovddup, Vqq, Wqq)
+{
+    gen_insn2(vmovddup, Vdq, Wq)(env, s, arg1, arg2);
+}
 
 DEF_GEN_INSN3_GVEC(paddb, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddb, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpaddb, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddw, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddw, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpaddw, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddd, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(paddd, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpaddd, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(paddq, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(paddq, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpaddq, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(paddsb, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddsb, Vdq, Vdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpaddsb, Vdq, Hdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddsw, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddsw, Vdq, Vdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpaddsw, Vdq, Hdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddusb, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddusb, Vdq, Vdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpaddusb, Vdq, Hdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddusw, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddusw, Vdq, Vdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpaddusw, Vdq, Hdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(addps, addps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddps, addps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddps, addps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(addpd, addpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddpd, addpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddpd, addpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(addss, addss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vaddss, addss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(addsd, addsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vaddsd, addsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(phaddw, phaddw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phaddw, phaddw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphaddw, phaddw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(phaddd, phaddd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phaddd, phaddd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphaddd, phaddd_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(phaddsw, phaddsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phaddsw, phaddsw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphaddsw, phaddsw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(haddps, haddps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhaddps, haddps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhaddps, haddps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(haddpd, haddpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhaddpd, haddpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhaddpd, haddpd, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN3_GVEC(psubb, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubb, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpsubb, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubw, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubw, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpsubw, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubd, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(psubd, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpsubd, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(psubq, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(psubq, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpsubq, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(psubsb, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubsb, Vdq, Vdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpsubsb, Vdq, Hdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubsw, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubsw, Vdq, Vdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpsubsw, Vdq, Hdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubusb, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubusb, Vdq, Vdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpsubusb, Vdq, Hdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubusw, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubusw, Vdq, Vdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpsubusw, Vdq, Hdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(subps, subps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vsubps, subps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vsubps, subps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(subpd, subpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vsubpd, subpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vsubpd, subpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(subss, subss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vsubss, subss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(subsd, subsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vsubsd, subsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(phsubw, phsubw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phsubw, phsubw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphsubw, phsubw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(phsubd, phsubd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phsubd, phsubd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphsubd, phsubd_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(phsubsw, phsubsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phsubsw, phsubsw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphsubsw, phsubsw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(hsubps, hsubps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhsubps, hsubps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhsubps, hsubps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(hsubpd, hsubpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhsubpd, hsubpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vhsubpd, hsubpd, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN3_HELPER_EPP(addsubps, addsubps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddsubps, addsubps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddsubps, addsubps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(addsubpd, addsubpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddsubpd, addsubpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vaddsubpd, addsubpd, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmullw, pmullw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulld, pmulld_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulld, pmulld_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulhw, pmulhw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulhuw, pmulhuw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmuldq, pmuldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmuludq, pmuludq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulhrsw, pmulhrsw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(mulps, mulps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmulps, mulps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmulps, mulps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(mulpd, mulpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmulpd, mulpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmulpd, mulpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(mulss, mulss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vmulss, mulss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(mulsd, mulsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vmulsd, mulsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmaddwd, pmaddwd_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmaddubsw, pmaddubsw_xmm, Vdq, Hdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(divps, divps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vdivps, divps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vdivps, divps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(divpd, divpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vdivpd, divpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vdivpd, divpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(divss, divss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vdivss, divss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(divsd, divsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vdivsd, divsd, Vq, Hq, Wq)
 
 DEF_GEN_INSN2_HELPER_EPP(rcpps, rcpps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vrcpps, rcpps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vrcpps, rcpps, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(rcpss, rcpss, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vrcpss, rcpss, Vd, Hd, Wd)
 DEF_GEN_INSN2_HELPER_EPP(sqrtps, sqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vsqrtps, sqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vsqrtps, sqrtps, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(sqrtpd, sqrtpd, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vsqrtpd, sqrtpd, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vsqrtpd, sqrtpd, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(sqrtss, sqrtss, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vsqrtss, sqrtss, Vd, Hd, Wd)
 DEF_GEN_INSN2_HELPER_EPP(sqrtsd, sqrtsd, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vsqrtsd, sqrtsd, Vq, Hq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(rsqrtps, rsqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vrsqrtps, rsqrtps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vrsqrtps, rsqrtps, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(rsqrtss, rsqrtss, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vrsqrtss, rsqrtss, Vd, Hd, Wd)
 
 DEF_GEN_INSN3_GVEC(pminub, Pq, Pq, Qq, umin, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminub, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpminub, Vdq, Hdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminuw, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpminuw, Vdq, Hdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminud, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpminud, Vdq, Hdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(pminsb, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpminsb, Vdq, Hdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminsw, Pq, Pq, Qq, smin, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminsw, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpminsw, Vdq, Hdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminsd, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpminsd, Vdq, Hdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_HELPER_EPP(minps, minps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vminps, minps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vminps, minps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(minpd, minpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vminpd, minpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vminpd, minpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(minss, minss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vminss, minss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(minsd, minsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vminsd, minsd, Vq, Hq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(phminposuw, phminposuw_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vphminposuw, phminposuw_xmm, Vdq, Wdq)
 DEF_GEN_INSN3_GVEC(pmaxub, Pq, Pq, Qq, umax, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxub, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpmaxub, Vdq, Hdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxuw, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpmaxuw, Vdq, Hdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxud, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpmaxud, Vdq, Hdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(pmaxsb, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpmaxsb, Vdq, Hdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxsw, Pq, Pq, Qq, smax, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxsw, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpmaxsw, Vdq, Hdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxsd, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpmaxsd, Vdq, Hdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_HELPER_EPP(maxps, maxps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmaxps, maxps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmaxps, maxps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(maxpd, maxpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmaxpd, maxpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vmaxpd, maxpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(maxss, maxss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vmaxss, maxss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(maxsd, maxsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vmaxsd, maxsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpavgb, pavgb_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpavgw, pavgw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsadbw, psadbw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN4_HELPER_EPPI(mpsadbw, mpsadbw_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vmpsadbw, mpsadbw_xmm, Vdq, Hdq, Wdq, Ib)
 DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vpabsb, pabsb_xmm, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vpabsw, pabsw_xmm, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pabsd, pabsd_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsd, pabsd_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vpabsd, pabsd_xmm, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psignb, psignb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignb, psignb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsignb, psignb_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsignw, psignw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsignd, psignd_xmm, Vdq, Hdq, Wdq)
 
 DEF_GEN_INSN4_HELPER_EPPI(dpps, dpps_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vdpps, dpps_xmm, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vdpps, dpps_xmm, Vqq, Hqq, Wqq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(dppd, dppd_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vdppd, dppd_xmm, Vdq, Hdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(roundps, roundps_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vroundps, roundps_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vroundps, roundps_xmm, Vqq, Wqq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(roundpd, roundpd_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vroundpd, roundpd_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vroundpd, roundpd_xmm, Vqq, Wqq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(roundss, roundss_xmm, Vd, Wd, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vroundss, roundss_xmm, Vd, Hd, Wd, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(roundsd, roundsd_xmm, Vq, Wq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vroundsd, roundsd_xmm, Vq, Hq, Wq, Ib)
 
 DEF_GEN_INSN3_HELPER_EPP(aesdec, aesdec_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vaesdec, aesdec_xmm, Vdq, Hdq, Wdq)
@@ -6021,58 +6334,124 @@ DEF_GEN_INSN4_HELPER_EPPI(vpclmulqdq, pclmulqdq_xmm, Vdq, Hdq, Wdq, Ib)
 
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqb, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqw, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqd, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqq, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqq, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtb, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtw, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtd, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtq, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtq, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_GT)
 
 DEF_GEN_INSN3_HELPER_EPPI(pcmpestrm, pcmpestrm_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vpcmpestrm, pcmpestrm_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(pcmpestri, pcmpestri_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vpcmpestri, pcmpestri_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(pcmpistrm, pcmpistrm_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vpcmpistrm, pcmpistrm_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(pcmpistri, pcmpistri_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_EPPI(vpcmpistri, pcmpistri_xmm, Vdq, Wdq, Ib)
 
 DEF_GEN_INSN2_HELPER_EPP(ptest, ptest_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vptest, ptest_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vptest, ptest_xmm, Vqq, Wqq)
+DEF_GEN_INSN2_HELPER_EPP(vtestps, ptest_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vtestps, ptest_xmm, Vqq, Wqq)
+DEF_GEN_INSN2_HELPER_EPP(vtestpd, ptest_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vtestpd, ptest_xmm, Vqq, Wqq)
 
 DEF_GEN_INSN3_HELPER_EPP(cmpeqps, cmpeqps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpeqps, cmpeqps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpeqps, cmpeqps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpeqpd, cmpeqpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpeqpd, cmpeqpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpeqpd, cmpeqpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpeqss, cmpeqss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpeqss, cmpeqss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpeqsd, cmpeqsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpeqsd, cmpeqsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpltps, cmpltps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpltps, cmpltps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpltps, cmpltps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpltpd, cmpltpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpltpd, cmpltpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpltpd, cmpltpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpltss, cmpltss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpltss, cmpltss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpltsd, cmpltsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpltsd, cmpltsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpleps, cmpleps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpleps, cmpleps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpleps, cmpleps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmplepd, cmplepd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmplepd, cmplepd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmplepd, cmplepd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpless, cmpless, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpless, cmpless, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmplesd, cmplesd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmplesd, cmplesd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpunordps, cmpunordps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpunordps, cmpunordps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpunordps, cmpunordps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpunordpd, cmpunordpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpunordpd, cmpunordpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpunordpd, cmpunordpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpunordss, cmpunordss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpunordss, cmpunordss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpunordsd, cmpunordsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpunordsd, cmpunordsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpneqps, cmpneqps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpneqps, cmpneqps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpneqps, cmpneqps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpneqpd, cmpneqpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpneqpd, cmpneqpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpneqpd, cmpneqpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpneqss, cmpneqss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpneqss, cmpneqss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpneqsd, cmpneqsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpneqsd, cmpneqsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnltps, cmpnltps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnltps, cmpnltps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnltps, cmpnltps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnltpd, cmpnltpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnltpd, cmpnltpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnltpd, cmpnltpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnltss, cmpnltss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnltss, cmpnltss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpnltsd, cmpnltsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnltsd, cmpnltsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnleps, cmpnleps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnleps, cmpnleps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnleps, cmpnleps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnlepd, cmpnlepd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnlepd, cmpnlepd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnlepd, cmpnlepd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpnless, cmpnless, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnless, cmpnless, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpnlesd, cmpnlesd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpnlesd, cmpnlesd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(cmpordps, cmpordps, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpordps, cmpordps, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpordps, cmpordps, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpordpd, cmpordpd, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpordpd, cmpordpd, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpordpd, cmpordpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(cmpordss, cmpordss, Vd, Vd, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcmpordss, cmpordss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(cmpordsd, cmpordsd, Vq, Vq, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcmpordsd, cmpordsd, Vq, Hq, Wq)
 
 GEN_INSN4(cmpps, Vdq, Vdq, Wdq, Ib)
 {
@@ -6106,6 +6485,70 @@ GEN_INSN4(cmpps, Vdq, Vdq, Wdq, Ib)
     g_assert_not_reached();
 }
 
+GEN_INSN4(vcmpps, Vdq, Hdq, Wdq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(vcmpeqps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(vcmpltps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(vcmpleps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(vcmpunordps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(vcmpneqps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(vcmpnltps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(vcmpnleps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(vcmpordps, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
+GEN_INSN4(vcmpps, Vqq, Hqq, Wqq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(vcmpeqps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(vcmpltps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(vcmpleps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(vcmpunordps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(vcmpneqps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(vcmpnltps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(vcmpnleps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(vcmpordps, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
 GEN_INSN4(cmppd, Vdq, Vdq, Wdq, Ib)
 {
     switch (arg4 & 7) {
@@ -6138,6 +6581,70 @@ GEN_INSN4(cmppd, Vdq, Vdq, Wdq, Ib)
     g_assert_not_reached();
 }
 
+GEN_INSN4(vcmppd, Vdq, Hdq, Wdq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(vcmpeqpd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(vcmpltpd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(vcmplepd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(vcmpunordpd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(vcmpneqpd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(vcmpnltpd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(vcmpnlepd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(vcmpordpd, Vdq, Hdq, Wdq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
+GEN_INSN4(vcmppd, Vqq, Hqq, Wqq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(vcmpeqpd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(vcmpltpd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(vcmplepd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(vcmpunordpd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(vcmpneqpd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(vcmpnltpd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(vcmpnlepd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(vcmpordpd, Vqq, Hqq, Wqq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
 GEN_INSN4(cmpss, Vd, Vd, Wd, Ib)
 {
     switch (arg4 & 7) {
@@ -6170,6 +6677,38 @@ GEN_INSN4(cmpss, Vd, Vd, Wd, Ib)
     g_assert_not_reached();
 }
 
+GEN_INSN4(vcmpss, Vd, Hd, Wd, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(vcmpeqss, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(vcmpltss, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(vcmpless, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(vcmpunordss, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(vcmpneqss, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(vcmpnltss, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(vcmpnless, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(vcmpordss, Vd, Hd, Wd)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
 GEN_INSN4(cmpsd, Vq, Vq, Wq, Ib)
 {
     switch (arg4 & 7) {
@@ -6202,46 +6741,112 @@ GEN_INSN4(cmpsd, Vq, Vq, Wq, Ib)
     g_assert_not_reached();
 }
 
+GEN_INSN4(vcmpsd, Vq, Hq, Wq, Ib)
+{
+    switch (arg4 & 7) {
+    case 0:
+        gen_insn3(vcmpeqsd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 1:
+        gen_insn3(vcmpltsd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 2:
+        gen_insn3(vcmplesd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 3:
+        gen_insn3(vcmpunordsd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 4:
+        gen_insn3(vcmpneqsd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 5:
+        gen_insn3(vcmpnltsd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 6:
+        gen_insn3(vcmpnlesd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    case 7:
+        gen_insn3(vcmpordsd, Vq, Hq, Wq)(env, s, arg1, arg2, arg3);
+        return;
+    }
+
+    g_assert_not_reached();
+}
+
 DEF_GEN_INSN2_HELPER_EPP(comiss, comiss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vcomiss, comiss, Vd, Wd)
 DEF_GEN_INSN2_HELPER_EPP(comisd, comisd, Vq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vcomisd, comisd, Vq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(ucomiss, ucomiss, Vd, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vucomiss, ucomiss, Vd, Wd)
 DEF_GEN_INSN2_HELPER_EPP(ucomisd, ucomisd, Vq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vucomisd, ucomisd, Vq, Wq)
 
 DEF_GEN_INSN3_GVEC(pand, Pq, Pq, Qq, and, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pand, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpand, Vdq, Hdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andps, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandps, Vdq, Hdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandps, Vqq, Hqq, Wqq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andpd, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandpd, Vdq, Hdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandpd, Vqq, Hqq, Wqq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pandn, Pq, Pq, Qq, andn, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pandn, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpandn, Vdq, Hdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andnps, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandnps, Vdq, Hdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandnps, Vqq, Hqq, Wqq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andnpd, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandnpd, Vdq, Hdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vandnpd, Vqq, Hqq, Wqq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(por, Pq, Pq, Qq, or, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(por, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpor, Vdq, Hdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(orps, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vorps, Vdq, Hdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vorps, Vqq, Hqq, Wqq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(orpd, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vorpd, Vdq, Hdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vorpd, Vqq, Hqq, Wqq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pxor, Pq, Pq, Qq, xor, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pxor, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpxor, Vdq, Hdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(xorps, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vxorps, Vdq, Hdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vxorps, Vqq, Hqq, Wqq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(xorpd, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vxorpd, Vdq, Hdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vxorpd, Vqq, Hqq, Wqq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 
 DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsllw, psllw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpslld, pslld_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsllq, psllq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pslldq, pslldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpslldq, pslldq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrlw, psrlw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrld, psrld_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrlq, psrlq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrldq, psrldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrldq, psrldq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsraw, psraw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrad, psrad_xmm, Vdq, Hdq, Wdq)
 
 #define DEF_GEN_PSHIFT_IMM_MM(mnem, opT1, opT2)                         \
     GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
@@ -6267,71 +6872,151 @@ DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_xmm, Vdq, Vdq, Wdq)
         gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
         gen_insn3(mnem, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3_xmm);   \
     }
+#define DEF_GEN_VPSHIFT_IMM_XMM(mnem, opT1, opT2)                       \
+    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
+    {                                                                   \
+        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
+        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
+        const insnop_arg_t(Wdq) arg3_xmm =                              \
+            offsetof(CPUX86State, xmm_t0.ZMM_Q(0));                     \
+                                                                        \
+        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
+        gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
+        gen_insn3(mnem, Vdq, Hdq, Wdq)(env, s, arg2, arg2, arg3_xmm);   \
+    }
 
 DEF_GEN_PSHIFT_IMM_MM(psllw, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psllw, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsllw, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(pslld, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(pslld, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpslld, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psllq, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psllq, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsllq, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_XMM(pslldq, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpslldq, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrlw, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrlw, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsrlw, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrld, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrld, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsrld, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrlq, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrlq, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsrlq, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_XMM(psrldq, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsrldq, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psraw, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psraw, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsraw, Hdq, Udq)
 DEF_GEN_PSHIFT_IMM_MM(psrad, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrad, Udq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsrad, Hdq, Udq)
 
 DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_mmx, Pq, Pq, Qq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vpalignr, palignr_xmm, Vdq, Hdq, Wdq, Ib)
 
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpacksswb, packsswb_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpackssdw, packssdw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpackuswb, packuswb_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(packusdw, packusdw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpackusdw, packusdw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpcklbw, punpcklbw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpcklwd, punpcklwd_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckldq, punpckldq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklqdq, punpcklqdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpcklqdq, punpcklqdq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhbw, punpckhbw_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhwd, punpckhwd_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhdq, punpckhdq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhqdq, punpckhqdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhqdq, punpckhqdq_xmm, Vdq, Hdq, Wdq)
 
 DEF_GEN_INSN3_HELPER_EPP(unpcklps, punpckldq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpcklps, punpckldq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpcklps, punpckldq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(unpcklpd, punpcklqdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpcklpd, punpcklqdq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpcklpd, punpcklqdq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(unpckhps, punpckhdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpckhps, punpckhdq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpckhps, punpckhdq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(unpckhpd, punpckhqdq_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpckhpd, punpckhqdq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vunpckhpd, punpckhqdq_xmm, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_xmm, Vdq, Vdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpshufb, pshufb_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_PPI(pshufw, pshufw_mmx, Pq, Qq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(vpshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(vpshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshufd, pshufd_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(vpshufd, pshufd_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(shufps, shufps, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_PPI(vshufps, shufps, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_PPI(vshufps, shufps, Vqq, Hqq, Wqq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(shufpd, shufpd, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_PPI(vshufpd, shufpd, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_PPI(vshufpd, shufpd, Vqq, Hqq, Wqq, Ib)
 
 DEF_GEN_INSN4_HELPER_EPPI(blendps, blendps_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vblendps, blendps_xmm, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vblendps, blendps_xmm, Vqq, Hqq, Wqq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(blendpd, blendpd_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vblendpd, blendpd_xmm, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vblendpd, blendpd_xmm, Vqq, Hqq, Wqq, Ib)
+
 DEF_GEN_INSN3_HELPER_EPP(blendvps, blendvps_xmm, Vdq, Vdq, Wdq)
+GEN_INSN4(vblendvps, Vdq, Hdq, Wdq, Ldq)
+{
+    gen_insn3(blendvps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+}
+GEN_INSN4(vblendvps, Vqq, Hqq, Wqq, Lqq)
+{
+    gen_insn3(blendvps, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+}
+
 DEF_GEN_INSN3_HELPER_EPP(blendvpd, blendvpd_xmm, Vdq, Vdq, Wdq)
+GEN_INSN4(vblendvpd, Vdq, Hdq, Wdq, Ldq)
+{
+    gen_insn3(blendvpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+}
+GEN_INSN4(vblendvpd, Vqq, Hqq, Wqq, Lqq)
+{
+    gen_insn3(blendvpd, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+}
+
 DEF_GEN_INSN3_HELPER_EPP(pblendvb, pblendvb_xmm, Vdq, Vdq, Wdq)
+GEN_INSN4(vpblendvb, Vdq, Hdq, Wdq, Ldq)
+{
+    gen_insn3(pblendvb, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+}
+
 DEF_GEN_INSN4_HELPER_EPPI(pblendw, pblendw_xmm, Vdq, Vdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vpblendw, pblendw_xmm, Vdq, Hdq, Wdq, Ib)
 
 GEN_INSN4(insertps, Vdq, Vdq, Wd, Ib)
 {
@@ -6341,6 +7026,13 @@ GEN_INSN4(insertps, Vdq, Vdq, Wd, Ib)
     const size_t aofs = offsetof(ZMMReg, ZMM_L(0));
     gen_op_movl(s, arg1 + dofs, arg3 + aofs);
 }
+GEN_INSN4(vinsertps, Vdq, Hdq, Wd, Ib)
+{
+    if (arg1 != arg2) {
+        gen_insn2(vmovaps, Vdq, Wdq)(env, s, arg1, arg2);
+    }
+    gen_insn4(insertps, Vdq, Vdq, Wd, Ib)(env, s, arg1, arg1, arg3, arg4);
+}
 GEN_INSN4(pinsrb, Vdq, Vdq, RdMb, Ib)
 {
     assert(arg1 == arg2);
@@ -6348,6 +7040,13 @@ GEN_INSN4(pinsrb, Vdq, Vdq, RdMb, Ib)
     const size_t ofs = offsetof(ZMMReg, ZMM_B(arg4 & 15));
     tcg_gen_st8_i32(arg3, cpu_env, arg1 + ofs);
 }
+GEN_INSN4(vpinsrb, Vdq, Hdq, RdMb, Ib)
+{
+    if (arg1 != arg2) {
+        gen_insn2(vmovaps, Vdq, Wdq)(env, s, arg1, arg2);
+    }
+    gen_insn4(pinsrb, Vdq, Vdq, RdMb, Ib)(env, s, arg1, arg1, arg3, arg4);
+}
 GEN_INSN4(pinsrw, Pq, Pq, RdMw, Ib)
 {
     assert(arg1 == arg2);
@@ -6362,6 +7061,13 @@ GEN_INSN4(pinsrw, Vdq, Vdq, RdMw, Ib)
     const size_t ofs = offsetof(ZMMReg, ZMM_W(arg4 & 7));
     tcg_gen_st16_i32(arg3, cpu_env, arg1 + ofs);
 }
+GEN_INSN4(vpinsrw, Vdq, Hdq, RdMw, Ib)
+{
+    if (arg1 != arg2) {
+        gen_insn2(vmovaps, Vdq, Wdq)(env, s, arg1, arg2);
+    }
+    gen_insn4(pinsrw, Vdq, Vdq, RdMw, Ib)(env, s, arg1, arg1, arg3, arg4);
+}
 GEN_INSN4(pinsrd, Vdq, Vdq, Ed, Ib)
 {
     assert(arg1 == arg2);
@@ -6369,6 +7075,13 @@ GEN_INSN4(pinsrd, Vdq, Vdq, Ed, Ib)
     const size_t ofs = offsetof(ZMMReg, ZMM_L(arg4 & 3));
     tcg_gen_st_i32(arg3, cpu_env, arg1 + ofs);
 }
+GEN_INSN4(vpinsrd, Vdq, Hdq, Ed, Ib)
+{
+    if (arg1 != arg2) {
+        gen_insn2(vmovaps, Vdq, Wdq)(env, s, arg1, arg2);
+    }
+    gen_insn4(pinsrd, Vdq, Vdq, Ed, Ib)(env, s, arg1, arg1, arg3, arg4);
+}
 GEN_INSN4(pinsrq, Vdq, Vdq, Eq, Ib)
 {
     assert(arg1 == arg2);
@@ -6376,32 +7089,63 @@ GEN_INSN4(pinsrq, Vdq, Vdq, Eq, Ib)
     const size_t ofs = offsetof(ZMMReg, ZMM_Q(arg4 & 1));
     tcg_gen_st_i64(arg3, cpu_env, arg1 + ofs);
 }
+GEN_INSN4(vpinsrq, Vdq, Hdq, Eq, Ib)
+{
+    if (arg1 != arg2) {
+        gen_insn2(vmovaps, Vdq, Wdq)(env, s, arg1, arg2);
+    }
+    gen_insn4(pinsrq, Vdq, Vdq, Eq, Ib)(env, s, arg1, arg1, arg3, arg4);
+}
+GEN_INSN4(vinsertf128, Vqq, Hqq, Wdq, Ib)
+{
+    gen_insn2(movaps, Vdq, Wdq)(env, s, arg1, arg3);
+}
 
 GEN_INSN3(extractps, Ed, Vdq, Ib)
 {
     const size_t ofs = offsetof(ZMMReg, ZMM_L(arg3 & 3));
     tcg_gen_ld_i32(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(vextractps, Ed, Vdq, Ib)
+{
+    gen_insn3(extractps, Ed, Vdq, Ib)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN3(pextrb, RdMb, Vdq, Ib)
 {
     const size_t ofs = offsetof(ZMMReg, ZMM_B(arg3 & 15));
     tcg_gen_ld8u_i32(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(vpextrb, RdMb, Vdq, Ib)
+{
+    gen_insn3(pextrb, RdMb, Vdq, Ib)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN3(pextrw, RdMw, Vdq, Ib)
 {
     const size_t ofs = offsetof(ZMMReg, ZMM_W(arg3 & 7));
     tcg_gen_ld16u_i32(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(vpextrw, RdMw, Vdq, Ib)
+{
+    gen_insn3(pextrw, RdMw, Vdq, Ib)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN3(pextrd, Ed, Vdq, Ib)
 {
     const size_t ofs = offsetof(ZMMReg, ZMM_L(arg3 & 3));
     tcg_gen_ld_i32(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(vpextrd, Ed, Vdq, Ib)
+{
+    gen_insn3(pextrd, Ed, Vdq, Ib)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN3(pextrq, Eq, Vdq, Ib)
 {
     const size_t ofs = offsetof(ZMMReg, ZMM_Q(arg3 & 1));
     tcg_gen_ld_i64(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(vpextrq, Eq, Vdq, Ib)
+{
+    gen_insn3(pextrq, Eq, Vdq, Ib)(env, s, arg1, arg2, arg3);
+}
 GEN_INSN3(pextrw, Gd, Nq, Ib)
 {
     const size_t ofs = offsetof(MMXReg, MMX_W(arg3 & 3));
@@ -6422,49 +7166,173 @@ GEN_INSN3(pextrw, Gq, Udq, Ib)
     const size_t ofs = offsetof(ZMMReg, ZMM_W(arg3 & 7));
     tcg_gen_ld16u_i64(arg1, cpu_env, arg2 + ofs);
 }
+GEN_INSN3(vpextrw, Gd, Udq, Ib)
+{
+    gen_insn3(pextrw, Gd, Udq, Ib)(env, s, arg1, arg2, arg3);
+}
+GEN_INSN3(vpextrw, Gq, Udq, Ib)
+{
+    gen_insn3(pextrw, Gq, Udq, Ib)(env, s, arg1, arg2, arg3);
+}
+GEN_INSN3(vextractf128, Wdq, Vqq, Ib)
+{
+    gen_insn2(movaps, Wdq, Vdq)(env, s, arg1, arg2);
+}
+
+GEN_INSN2(vbroadcastss, Vdq, Md)
+{
+    const TCGv_i32 r32 = tcg_temp_new_i32();
+    insnop_ldst(tcg_i32, Md)(env, s, 0, r32, arg2);
+
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(0)));
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(1)));
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(2)));
+    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(3)));
+
+    tcg_temp_free_i32(r32);
+}
+GEN_INSN2(vbroadcastss, Vqq, Md)
+{
+    gen_insn2(vbroadcastss, Vdq, Md)(env, s, arg1, arg2);
+}
+GEN_INSN2(vbroadcastsd, Vqq, Mq)
+{
+    const TCGv_i64 r64 = tcg_temp_new_i64();
+    insnop_ldst(tcg_i64, Mq)(env, s, 0, r64, arg2);
+
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
+
+    tcg_temp_free_i64(r64);
+}
+GEN_INSN2(vbroadcastf128, Vqq, Mdq)
+{
+    insnop_ldst(xmm, Mqq)(env, s, 0, arg1, arg2);
+}
+
+GEN_INSN4(vperm2f128, Vqq, Hqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vpermilps, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermilps, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermilps, Vdq, Wdq, Ib)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermilps, Vqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vpermilpd, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermilpd, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermilpd, Vdq, Wdq, Ib)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermilpd, Vqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
 
 DEF_GEN_INSN2_HELPER_EPP(pmovsxbw, pmovsxbw_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxbw, pmovsxbw_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxbd, pmovsxbd_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxbd, pmovsxbd_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxbq, pmovsxbq_xmm, Vdq, Ww)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxbq, pmovsxbq_xmm, Vdq, Ww)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxwd, pmovsxwd_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxwd, pmovsxwd_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxwq, pmovsxwq_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxwq, pmovsxwq_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxdq, pmovsxdq_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxdq, pmovsxdq_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxbw, pmovzxbw_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxbw, pmovzxbw_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxbd, pmovzxbd_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxbd, pmovzxbd_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxbq, pmovzxbq_xmm, Vdq, Ww)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxbq, pmovzxbq_xmm, Vdq, Ww)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxwd, pmovzxwd_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxwd, pmovzxwd_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxwq, pmovzxwq_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxwq, pmovzxwq_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxdq, pmovzxdq_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxdq, pmovzxdq_xmm, Vdq, Wq)
 
 DEF_GEN_INSN2_HELPER_EPP(cvtpi2ps, cvtpi2ps, Vdq, Qq)
 DEF_GEN_INSN2_HELPER_EPD(cvtsi2ss, cvtsi2ss, Vd, Ed)
 DEF_GEN_INSN2_HELPER_EPQ(cvtsi2ss, cvtsq2ss, Vd, Eq)
+DEF_GEN_INSN3_HELPER_EPD(vcvtsi2ss, cvtsi2ss, Vd, Hd, Ed)
+DEF_GEN_INSN3_HELPER_EPQ(vcvtsi2ss, cvtsq2ss, Vd, Hd, Eq)
 DEF_GEN_INSN2_HELPER_EPP(cvtpi2pd, cvtpi2pd, Vdq, Qq)
 DEF_GEN_INSN2_HELPER_EPD(cvtsi2sd, cvtsi2sd, Vq, Ed)
 DEF_GEN_INSN2_HELPER_EPQ(cvtsi2sd, cvtsq2sd, Vq, Eq)
+DEF_GEN_INSN3_HELPER_EPD(vcvtsi2sd, cvtsi2sd, Vq, Hq, Ed)
+DEF_GEN_INSN3_HELPER_EPQ(vcvtsi2sd, cvtsq2sd, Vq, Hq, Eq)
 DEF_GEN_INSN2_HELPER_EPP(cvtps2pi, cvtps2pi, Pq, Wq)
 DEF_GEN_INSN2_HELPER_DEP(cvtss2si, cvtss2si, Gd, Wd)
 DEF_GEN_INSN2_HELPER_QEP(cvtss2si, cvtss2sq, Gq, Wd)
+DEF_GEN_INSN2_HELPER_DEP(vcvtss2si, cvtss2si, Gd, Wd)
+DEF_GEN_INSN2_HELPER_QEP(vcvtss2si, cvtss2sq, Gq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(cvtpd2pi, cvtpd2pi, Pq, Wdq)
 DEF_GEN_INSN2_HELPER_DEP(cvtsd2si, cvtsd2si, Gd, Wq)
 DEF_GEN_INSN2_HELPER_QEP(cvtsd2si, cvtsd2sq, Gq, Wq)
+DEF_GEN_INSN2_HELPER_DEP(vcvtsd2si, cvtsd2si, Gd, Wq)
+DEF_GEN_INSN2_HELPER_QEP(vcvtsd2si, cvtsd2sq, Gq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(cvttps2pi, cvttps2pi, Pq, Wq)
 DEF_GEN_INSN2_HELPER_DEP(cvttss2si, cvttss2si, Gd, Wd)
 DEF_GEN_INSN2_HELPER_QEP(cvttss2si, cvttss2sq, Gq, Wd)
+DEF_GEN_INSN2_HELPER_DEP(vcvttss2si, cvttss2si, Gd, Wd)
+DEF_GEN_INSN2_HELPER_QEP(vcvttss2si, cvttss2sq, Gq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(cvttpd2pi, cvttpd2pi, Pq, Wdq)
 DEF_GEN_INSN2_HELPER_DEP(cvttsd2si, cvttsd2si, Gd, Wq)
 DEF_GEN_INSN2_HELPER_QEP(cvttsd2si, cvttsd2sq, Gq, Wq)
+DEF_GEN_INSN2_HELPER_DEP(vcvttsd2si, cvttsd2si, Gd, Wq)
+DEF_GEN_INSN2_HELPER_QEP(vcvttsd2si, cvttsd2sq, Gq, Wq)
 
 DEF_GEN_INSN2_HELPER_EPP(cvtpd2dq, cvtpd2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtpd2dq, cvtpd2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtpd2dq, cvtpd2dq, Vdq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(cvttpd2dq, cvttpd2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvttpd2dq, cvttpd2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvttpd2dq, cvttpd2dq, Vdq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(cvtdq2pd, cvtdq2pd, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtdq2pd, cvtdq2pd, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtdq2pd, cvtdq2pd, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(cvtps2pd, cvtps2pd, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtps2pd, cvtps2pd, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtps2pd, cvtps2pd, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(cvtpd2ps, cvtpd2ps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtpd2ps, cvtpd2ps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtpd2ps, cvtpd2ps, Vdq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(cvtss2sd, cvtss2sd, Vq, Wd)
+DEF_GEN_INSN3_HELPER_EPP(vcvtss2sd, cvtss2sd, Vq, Hq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(cvtsd2ss, cvtsd2ss, Vd, Wq)
+DEF_GEN_INSN3_HELPER_EPP(vcvtsd2ss, cvtsd2ss, Vd, Hd, Wq)
 DEF_GEN_INSN2_HELPER_EPP(cvtdq2ps, cvtdq2ps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtdq2ps, cvtdq2ps, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtdq2ps, cvtdq2ps, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(cvtps2dq, cvtps2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtps2dq, cvtps2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvtps2dq, cvtps2dq, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(cvttps2dq, cvttps2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvttps2dq, cvttps2dq, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vcvttps2dq, cvttps2dq, Vqq, Wqq)
 
 GEN_INSN2(maskmovq, Pq, Nq)
 {
@@ -6498,15 +7366,70 @@ GEN_INSN2(maskmovdqu, Vdq, Udq)
     tcg_temp_free_ptr(arg1_ptr);
     tcg_temp_free_ptr(arg2_ptr);
 }
+GEN_INSN2(vmaskmovdqu, Vdq, Udq)
+{
+    gen_insn2(maskmovdqu, Vdq, Udq)(env, s, arg1, arg2);
+}
+
+GEN_INSN3(vmaskmovps, Vdq, Hdq, Mdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vmaskmovps, Mdq, Hdq, Vdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vmaskmovps, Vqq, Hqq, Mqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vmaskmovps, Mqq, Hqq, Vqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vmaskmovpd, Vdq, Hdq, Mdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vmaskmovpd, Mdq, Hdq, Vdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vmaskmovpd, Vqq, Hqq, Mqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vmaskmovpd, Mqq, Hqq, Vqq)
+{
+    /* XXX TODO implement this */
+}
 
 GEN_INSN2(movntps, Mdq, Vdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(vmovntps, Mdq, Vdq)
+{
+    gen_insn2(movntps, Mdq, Vdq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovntps, Mqq, Vqq)
+{
+    gen_insn2(vmovntps, Mdq, Vdq)(env, s, arg1, arg2);
+}
+
 GEN_INSN2(movntpd, Mdq, Vdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(vmovntpd, Mdq, Vdq)
+{
+    gen_insn2(movntpd, Mdq, Vdq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovntpd, Mqq, Vqq)
+{
+    gen_insn2(vmovntpd, Mdq, Vdq)(env, s, arg1, arg2);
+}
 
 GEN_INSN2(movnti, Md, Gd)
 {
@@ -6525,10 +7448,23 @@ GEN_INSN2(movntdq, Mdq, Vdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
 }
+GEN_INSN2(vmovntdq, Mdq, Vdq)
+{
+    gen_insn2(movntdq, Mdq, Vdq)(env, s, arg1, arg2);
+}
+GEN_INSN2(vmovntdq, Mqq, Vqq)
+{
+    gen_insn2(vmovntdq, Mdq, Vdq)(env, s, arg1, arg2);
+}
+
 GEN_INSN2(movntdqa, Vdq, Mdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 0, arg1, arg2);
 }
+GEN_INSN2(vmovntdqa, Vdq, Mdq)
+{
+    gen_insn2(movntdqa, Vdq, Mdq)(env, s, arg1, arg2);
+}
 
 GEN_INSN0(pause)
 {
@@ -6538,6 +7474,16 @@ GEN_INSN0(pause)
 
 DEF_GEN_INSN0_HELPER(emms, emms)
 
+GEN_INSN0(vzeroupper)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN0(vzeroall)
+{
+    /* XXX TODO implement this */
+}
+
 GEN_INSN2(sfence_clflush, modrm_mod, modrm)
 {
     if (arg1 == 3) {
@@ -6584,6 +7530,10 @@ GEN_INSN1(ldmxcsr, Md)
         gen_helper_ldmxcsr(cpu_env, s->tmp2_i32);
     }
 }
+GEN_INSN1(vldmxcsr, Md)
+{
+    gen_insn1(ldmxcsr, Md)(env, s, arg1);
+}
 GEN_INSN1(stmxcsr, Md)
 {
     if ((s->flags & HF_EM_MASK) || !(s->flags & HF_OSFXSR_MASK)) {
@@ -6596,6 +7546,10 @@ GEN_INSN1(stmxcsr, Md)
         tcg_gen_qemu_st_i32(s->tmp2_i32, arg1, s->mem_index, MO_LEUL);
     }
 }
+GEN_INSN1(vstmxcsr, Md)
+{
+    gen_insn1(stmxcsr, Md)(env, s, arg1);
+}
 
 GEN_INSN1(prefetcht0, Mb)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 61/75] target/i386: introduce AVX vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (58 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 60/75] target/i386: introduce AVX code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 62/75] target/i386: introduce AVX2 translators Jan Bobek
                   ` (13 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the AVX vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 779 +++++++++++++++++++++++++++++++++++
 1 file changed, 779 insertions(+)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index 1359508424..c3c0ec4f89 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -469,198 +469,767 @@
  * -----------------------
  * 66 0F 3A 44 /r ib               PCLMULQDQ xmm1, xmm2/m128, imm8
  * VEX.128.66.0F3A.WIG 44 /r ib    VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
+ *
+ * AVX Instructions
+ * -----------------
+ * VEX.128.66.0F.W0 6E /r          VMOVD xmm1,r32/m32
+ * VEX.128.66.0F.W0 7E /r          VMOVD r32/m32,xmm1
+ * VEX.128.66.0F.W1 6E /r          VMOVQ xmm1,r64/m64
+ * VEX.128.66.0F.W1 7E /r          VMOVQ r64/m64,xmm1
+ * VEX.128.F3.0F.WIG 7E /r         VMOVQ xmm1, xmm2/m64
+ * VEX.128.66.0F.WIG D6 /r         VMOVQ xmm1/m64, xmm2
+ * VEX.128.0F.WIG 28 /r            VMOVAPS xmm1, xmm2/m128
+ * VEX.128.0F.WIG 29 /r            VMOVAPS xmm2/m128, xmm1
+ * VEX.256.0F.WIG 28 /r            VMOVAPS ymm1, ymm2/m256
+ * VEX.256.0F.WIG 29 /r            VMOVAPS ymm2/m256, ymm1
+ * VEX.128.66.0F.WIG 28 /r         VMOVAPD xmm1, xmm2/m128
+ * VEX.128.66.0F.WIG 29 /r         VMOVAPD xmm2/m128, xmm1
+ * VEX.256.66.0F.WIG 28 /r         VMOVAPD ymm1, ymm2/m256
+ * VEX.256.66.0F.WIG 29 /r         VMOVAPD ymm2/m256, ymm1
+ * VEX.128.66.0F.WIG 6F /r         VMOVDQA xmm1, xmm2/m128
+ * VEX.128.66.0F.WIG 7F /r         VMOVDQA xmm2/m128, xmm1
+ * VEX.256.66.0F.WIG 6F /r         VMOVDQA ymm1, ymm2/m256
+ * VEX.256.66.0F.WIG 7F /r         VMOVDQA ymm2/m256, ymm1
+ * VEX.128.0F.WIG 10 /r            VMOVUPS xmm1, xmm2/m128
+ * VEX.128.0F.WIG 11 /r            VMOVUPS xmm2/m128, xmm1
+ * VEX.256.0F.WIG 10 /r            VMOVUPS ymm1, ymm2/m256
+ * VEX.256.0F.WIG 11 /r            VMOVUPS ymm2/m256, ymm1
+ * VEX.128.66.0F.WIG 10 /r         VMOVUPD xmm1, xmm2/m128
+ * VEX.128.66.0F.WIG 11 /r         VMOVUPD xmm2/m128, xmm1
+ * VEX.256.66.0F.WIG 10 /r         VMOVUPD ymm1, ymm2/m256
+ * VEX.256.66.0F.WIG 11 /r         VMOVUPD ymm2/m256, ymm1
+ * VEX.128.F3.0F.WIG 6F /r         VMOVDQU xmm1,xmm2/m128
+ * VEX.128.F3.0F.WIG 7F /r         VMOVDQU xmm2/m128,xmm1
+ * VEX.256.F3.0F.WIG 6F /r         VMOVDQU ymm1,ymm2/m256
+ * VEX.256.F3.0F.WIG 7F /r         VMOVDQU ymm2/m256,ymm1
+ * VEX.LIG.F3.0F.WIG 10 /r         VMOVSS xmm1, xmm2, xmm3
+ * VEX.LIG.F3.0F.WIG 10 /r         VMOVSS xmm1, m32
+ * VEX.LIG.F3.0F.WIG 11 /r         VMOVSS xmm1, xmm2, xmm3
+ * VEX.LIG.F3.0F.WIG 11 /r         VMOVSS m32, xmm1
+ * VEX.LIG.F2.0F.WIG 10 /r         VMOVSD xmm1, xmm2, xmm3
+ * VEX.LIG.F2.0F.WIG 10 /r         VMOVSD xmm1, m64
+ * VEX.LIG.F2.0F.WIG 11 /r         VMOVSD xmm1, xmm2, xmm3
+ * VEX.LIG.F2.0F.WIG 11 /r         VMOVSD m64, xmm1
+ * VEX.128.0F.WIG 12 /r            VMOVHLPS xmm1, xmm2, xmm3
+ * VEX.128.0F.WIG 12 /r            VMOVLPS xmm2, xmm1, m64
+ * VEX.128.0F.WIG 13 /r            VMOVLPS m64, xmm1
+ * VEX.128.66.0F.WIG 12 /r         VMOVLPD xmm2,xmm1,m64
+ * VEX.128.66.0F.WIG 13 /r         VMOVLPD m64,xmm1
+ * VEX.128.0F.WIG 16 /r            VMOVLHPS xmm1, xmm2, xmm3
+ * VEX.128.0F.WIG 16 /r            VMOVHPS xmm2, xmm1, m64
+ * VEX.128.0F.WIG 17 /r            VMOVHPS m64, xmm1
+ * VEX.128.66.0F.WIG 16 /r         VMOVHPD xmm2, xmm1, m64
+ * VEX.128.66.0F.WIG 17 /r         VMOVHPD m64, xmm1
+ * VEX.128.66.0F.W0 D7 /r          VPMOVMSKB r32, xmm1
+ * VEX.128.66.0F.W1 D7 /r          VPMOVMSKB r64, xmm1
+ * VEX.128.0F.W0 50 /r             VMOVMSKPS r32, xmm2
+ * VEX.128.0F.W1 50 /r             VMOVMSKPS r64, xmm2
+ * VEX.256.0F.W0 50 /r             VMOVMSKPS r32, ymm2
+ * VEX.256.0F.W1 50 /r             VMOVMSKPS r64, ymm2
+ * VEX.128.66.0F.W0 50 /r          VMOVMSKPD r32, xmm2
+ * VEX.128.66.0F.W1 50 /r          VMOVMSKPD r64, xmm2
+ * VEX.256.66.0F.W0 50 /r          VMOVMSKPD r32, ymm2
+ * VEX.256.66.0F.W1 50 /r          VMOVMSKPD r64, ymm2
+ * VEX.128.F2.0F.WIG F0 /r         VLDDQU xmm1, m128
+ * VEX.256.F2.0F.WIG F0 /r         VLDDQU ymm1, m256
+ * VEX.128.F3.0F.WIG 16 /r         VMOVSHDUP xmm1, xmm2/m128
+ * VEX.256.F3.0F.WIG 16 /r         VMOVSHDUP ymm1, ymm2/m256
+ * VEX.128.F3.0F.WIG 12 /r         VMOVSLDUP xmm1, xmm2/m128
+ * VEX.256.F3.0F.WIG 12 /r         VMOVSLDUP ymm1, ymm2/m256
+ * VEX.128.F2.0F.WIG 12 /r         VMOVDDUP xmm1, xmm2/m64
+ * VEX.256.F2.0F.WIG 12 /r         VMOVDDUP ymm1, ymm2/m256
+ * VEX.128.66.0F.WIG FC /r         VPADDB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG FD /r         VPADDW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG FE /r         VPADDD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG D4 /r         VPADDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG EC /r         VPADDSB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG ED /r         VPADDSW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG DC /r         VPADDUSB xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F.WIG DD /r         VPADDUSW xmm1,xmm2,xmm3/m128
+ * VEX.128.0F.WIG 58 /r            VADDPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F.WIG 58 /r            VADDPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 58 /r         VADDPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 58 /r         VADDPD ymm1, ymm2, ymm3/m256
+ * VEX.LIG.F3.0F.WIG 58 /r         VADDSS xmm1,xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 58 /r         VADDSD xmm1, xmm2, xmm3/m64
+ * VEX.128.66.0F38.WIG 01 /r       VPHADDW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 02 /r       VPHADDD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 03 /r       VPHADDSW xmm1, xmm2, xmm3/m128
+ * VEX.128.F2.0F.WIG 7C /r         VHADDPS xmm1, xmm2, xmm3/m128
+ * VEX.256.F2.0F.WIG 7C /r         VHADDPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 7C /r         VHADDPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 7C /r         VHADDPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG F8 /r         VPSUBB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG F9 /r         VPSUBW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG FA /r         VPSUBD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG FB /r         VPSUBQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG E8 /r         VPSUBSB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG E9 /r         VPSUBSW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG D8 /r         VPSUBUSB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG D9 /r         VPSUBUSW xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 5C /r            VSUBPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F.WIG 5C /r            VSUBPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 5C /r         VSUBPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 5C /r         VSUBPD ymm1, ymm2, ymm3/m256
+ * VEX.LIG.F3.0F.WIG 5C /r         VSUBSS xmm1,xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 5C /r         VSUBSD xmm1,xmm2, xmm3/m64
+ * VEX.128.66.0F38.WIG 05 /r       VPHSUBW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 06 /r       VPHSUBD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 07 /r       VPHSUBSW xmm1, xmm2, xmm3/m128
+ * VEX.128.F2.0F.WIG 7D /r         VHSUBPS xmm1, xmm2, xmm3/m128
+ * VEX.256.F2.0F.WIG 7D /r         VHSUBPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 7D /r         VHSUBPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 7D /r         VHSUBPD ymm1, ymm2, ymm3/m256
+ * VEX.128.F2.0F.WIG D0 /r         VADDSUBPS xmm1, xmm2, xmm3/m128
+ * VEX.256.F2.0F.WIG D0 /r         VADDSUBPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG D0 /r         VADDSUBPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG D0 /r         VADDSUBPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG D5 /r         VPMULLW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 40 /r       VPMULLD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG E5 /r         VPMULHW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG E4 /r         VPMULHUW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 28 /r       VPMULDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG F4 /r         VPMULUDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 0B /r       VPMULHRSW xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 59 /r            VMULPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F.WIG 59 /r            VMULPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 59 /r         VMULPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 59 /r         VMULPD ymm1, ymm2, ymm3/m256
+ * VEX.LIG.F3.0F.WIG 59 /r         VMULSS xmm1,xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 59 /r         VMULSD xmm1,xmm2, xmm3/m64
+ * VEX.128.66.0F.WIG F5 /r         VPMADDWD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 04 /r       VPMADDUBSW xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 5E /r            VDIVPS xmm1, xmm2, xmm3/m128
+ * VEX.256.0F.WIG 5E /r            VDIVPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 5E /r         VDIVPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 5E /r         VDIVPD ymm1, ymm2, ymm3/m256
+ * VEX.LIG.F3.0F.WIG 5E /r         VDIVSS xmm1, xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 5E /r         VDIVSD xmm1, xmm2, xmm3/m64
+ * VEX.128.0F.WIG 53 /r            VRCPPS xmm1, xmm2/m128
+ * VEX.256.0F.WIG 53 /r            VRCPPS ymm1, ymm2/m256
+ * VEX.LIG.F3.0F.WIG 53 /r         VRCPSS xmm1, xmm2, xmm3/m32
+ * VEX.128.0F.WIG 51 /r            VSQRTPS xmm1, xmm2/m128
+ * VEX.256.0F.WIG 51 /r            VSQRTPS ymm1, ymm2/m256
+ * VEX.128.66.0F.WIG 51 /r         VSQRTPD xmm1, xmm2/m128
+ * VEX.256.66.0F.WIG 51 /r         VSQRTPD ymm1, ymm2/m256
+ * VEX.LIG.F3.0F.WIG 51 /r         VSQRTSS xmm1, xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 51 /r         VSQRTSD xmm1,xmm2, xmm3/m64
+ * VEX.128.0F.WIG 52 /r            VRSQRTPS xmm1, xmm2/m128
+ * VEX.256.0F.WIG 52 /r            VRSQRTPS ymm1, ymm2/m256
+ * VEX.LIG.F3.0F.WIG 52 /r         VRSQRTSS xmm1, xmm2, xmm3/m32
+ * VEX.128.66.0F DA /r             VPMINUB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38 3A /r           VPMINUW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 3B /r       VPMINUD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38 38 /r           VPMINSB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F EA /r             VPMINSW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 39 /r       VPMINSD xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 5D /r            VMINPS xmm1, xmm2, xmm3/m128
+ * VEX.256.0F.WIG 5D /r            VMINPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 5D /r         VMINPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 5D /r         VMINPD ymm1, ymm2, ymm3/m256
+ * VEX.LIG.F3.0F.WIG 5D /r         VMINSS xmm1,xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 5D /r         VMINSD xmm1, xmm2, xmm3/m64
+ * VEX.128.66.0F38.WIG 41 /r       VPHMINPOSUW xmm1, xmm2/m128
+ * VEX.128.66.0F DE /r             VPMAXUB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38 3E /r           VPMAXUW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 3F /r       VPMAXUD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 3C /r       VPMAXSB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG EE /r         VPMAXSW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 3D /r       VPMAXSD xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 5F /r            VMAXPS xmm1, xmm2, xmm3/m128
+ * VEX.256.0F.WIG 5F /r            VMAXPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 5F /r         VMAXPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 5F /r         VMAXPD ymm1, ymm2, ymm3/m256
+ * VEX.LIG.F3.0F.WIG 5F /r         VMAXSS xmm1, xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 5F /r         VMAXSD xmm1, xmm2, xmm3/m64
+ * VEX.128.66.0F.WIG E0 /r         VPAVGB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG E3 /r         VPAVGW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG F6 /r         VPSADBW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F3A.WIG 42 /r ib    VMPSADBW xmm1, xmm2, xmm3/m128, imm8
+ * VEX.128.66.0F38.WIG 1C /r       VPABSB xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG 1D /r       VPABSW xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG 1E /r       VPABSD xmm1, xmm2/m128
+ * VEX.128.66.0F38.WIG 08 /r       VPSIGNB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 09 /r       VPSIGNW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38.WIG 0A /r       VPSIGND xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F3A.WIG 40 /r ib    VDPPS xmm1,xmm2, xmm3/m128, imm8
+ * VEX.256.66.0F3A.WIG 40 /r ib    VDPPS ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F3A.WIG 41 /r ib    VDPPD xmm1,xmm2, xmm3/m128, imm8
+ * VEX.128.66.0F3A.WIG 08 /r ib    VROUNDPS xmm1, xmm2/m128, imm8
+ * VEX.256.66.0F3A.WIG 08 /r ib    VROUNDPS ymm1, ymm2/m256, imm8
+ * VEX.128.66.0F3A.WIG 09 /r ib    VROUNDPD xmm1, xmm2/m128, imm8
+ * VEX.256.66.0F3A.WIG 09 /r ib    VROUNDPD ymm1, ymm2/m256, imm8
+ * VEX.LIG.66.0F3A.WIG 0A /r ib    VROUNDSS xmm1, xmm2, xmm3/m32, imm8
+ * VEX.LIG.66.0F3A.WIG 0B /r ib    VROUNDSD xmm1, xmm2, xmm3/m64, imm8
+ * VEX.128.66.0F.WIG 74 /r         VPCMPEQB xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F.WIG 75 /r         VPCMPEQW xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F.WIG 76 /r         VPCMPEQD xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F38.WIG 29 /r       VPCMPEQQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 64 /r         VPCMPGTB xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F.WIG 65 /r         VPCMPGTW xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F.WIG 66 /r         VPCMPGTD xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F38.WIG 37 /r       VPCMPGTQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F3A 60 /r ib        VPCMPESTRM xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F3A 61 /r ib        VPCMPESTRI xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F3A.WIG 62 /r ib    VPCMPISTRM xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F3A.WIG 63 /r ib    VPCMPISTRI xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F38.WIG 17 /r       VPTEST xmm1, xmm2/m128
+ * VEX.256.66.0F38.WIG 17 /r       VPTEST ymm1, ymm2/m256
+ * VEX.128.66.0F38.W0 0E /r        VTESTPS xmm1, xmm2/m128
+ * VEX.256.66.0F38.W0 0E /r        VTESTPS ymm1, ymm2/m256
+ * VEX.128.66.0F38.W0 0F /r        VTESTPD xmm1, xmm2/m128
+ * VEX.256.66.0F38.W0 0F /r        VTESTPD ymm1, ymm2/m256
+ * VEX.128.0F.WIG C2 /r ib         VCMPPS xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.0F.WIG C2 /r ib         VCMPPS ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F.WIG C2 /r ib      VCMPPD xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.66.0F.WIG C2 /r ib      VCMPPD ymm1, ymm2, ymm3/m256, imm8
+ * VEX.LIG.F3.0F.WIG C2 /r ib      VCMPSS xmm1, xmm2, xmm3/m32, imm8
+ * VEX.LIG.F2.0F.WIG C2 /r ib      VCMPSD xmm1, xmm2, xmm3/m64, imm8
+ * VEX.LIG.0F.WIG 2E /r            VUCOMISS xmm1, xmm2/m32
+ * VEX.LIG.66.0F.WIG 2E /r         VUCOMISD xmm1, xmm2/m64
+ * VEX.LIG.0F.WIG 2F /r            VCOMISS xmm1, xmm2/m32
+ * VEX.LIG.66.0F.WIG 2F /r         VCOMISD xmm1, xmm2/m64
+ * VEX.128.66.0F.WIG DB /r         VPAND xmm1, xmm2, xmm3/m128
+ * VEX.128.0F 54 /r                VANDPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F 54 /r                VANDPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F 54 /r             VANDPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F 54 /r             VANDPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG DF /r         VPANDN xmm1, xmm2, xmm3/m128
+ * VEX.128.0F 55 /r                VANDNPS xmm1, xmm2, xmm3/m128
+ * VEX.256.0F 55 /r                VANDNPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F 55 /r             VANDNPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F 55 /r             VANDNPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG EB /r         VPOR xmm1, xmm2, xmm3/m128
+ * VEX.128.0F 56 /r                VORPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F 56 /r                VORPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F 56 /r             VORPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F 56 /r             VORPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG EF /r         VPXOR xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 57 /r            VXORPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F.WIG 57 /r            VXORPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 57 /r         VXORPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 57 /r         VXORPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG F1 /r         VPSLLW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG F2 /r         VPSLLD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG F3 /r         VPSLLQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG D1 /r         VPSRLW xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG D2 /r         VPSRLD xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG D3 /r         VPSRLQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG E1 /r         VPSRAW xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F.WIG E2 /r         VPSRAD xmm1,xmm2,xmm3/m128
+ * VEX.128.66.0F3A.WIG 0F /r ib    VPALIGNR xmm1, xmm2, xmm3/m128, imm8
+ * VEX.128.66.0F.WIG 63 /r         VPACKSSWB xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 6B /r         VPACKSSDW xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 67 /r         VPACKUSWB xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F38 2B /r           VPACKUSDW xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 68 /r         VPUNPCKHBW xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 69 /r         VPUNPCKHWD xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 6A /r         VPUNPCKHDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 6D /r         VPUNPCKHQDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 60 /r         VPUNPCKLBW xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 61 /r         VPUNPCKLWD xmm1,xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 62 /r         VPUNPCKLDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.66.0F.WIG 6C /r         VPUNPCKLQDQ xmm1, xmm2, xmm3/m128
+ * VEX.128.0F.WIG 14 /r            VUNPCKLPS xmm1,xmm2, xmm3/m128
+ * VEX.256.0F.WIG 14 /r            VUNPCKLPS ymm1,ymm2,ymm3/m256
+ * VEX.128.66.0F.WIG 14 /r         VUNPCKLPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 14 /r         VUNPCKLPD ymm1,ymm2, ymm3/m256
+ * VEX.128.0F.WIG 15 /r            VUNPCKHPS xmm1, xmm2, xmm3/m128
+ * VEX.256.0F.WIG 15 /r            VUNPCKHPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F.WIG 15 /r         VUNPCKHPD xmm1,xmm2, xmm3/m128
+ * VEX.256.66.0F.WIG 15 /r         VUNPCKHPD ymm1,ymm2, ymm3/m256
+ * VEX.128.66.0F38.WIG 00 /r       VPSHUFB xmm1, xmm2, xmm3/m128
+ * VEX.128.F2.0F.WIG 70 /r ib      VPSHUFLW xmm1, xmm2/m128, imm8
+ * VEX.128.F3.0F.WIG 70 /r ib      VPSHUFHW xmm1, xmm2/m128, imm8
+ * VEX.128.66.0F.WIG 70 /r ib      VPSHUFD xmm1, xmm2/m128, imm8
+ * VEX.128.0F.WIG C6 /r ib         VSHUFPS xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.0F.WIG C6 /r ib         VSHUFPS ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F.WIG C6 /r ib      VSHUFPD xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.66.0F.WIG C6 /r ib      VSHUFPD ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F3A.WIG 0C /r ib    VBLENDPS xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.66.0F3A.WIG 0C /r ib    VBLENDPS ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F3A.WIG 0D /r ib    VBLENDPD xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.66.0F3A.WIG 0D /r ib    VBLENDPD ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F3A.W0 4A /r /is4   VBLENDVPS xmm1, xmm2, xmm3/m128, xmm4
+ * VEX.256.66.0F3A.W0 4A /r /is4   VBLENDVPS ymm1, ymm2, ymm3/m256, ymm4
+ * VEX.128.66.0F3A.W0 4B /r /is4   VBLENDVPD xmm1, xmm2, xmm3/m128, xmm4
+ * VEX.256.66.0F3A.W0 4B /r /is4   VBLENDVPD ymm1, ymm2, ymm3/m256, ymm4
+ * VEX.128.66.0F3A.W0 4C /r /is4   VPBLENDVB xmm1, xmm2, xmm3/m128, xmm4
+ * VEX.128.66.0F3A.WIG 0E /r ib    VPBLENDW xmm1, xmm2, xmm3/m128, imm8
+ * VEX.128.66.0F3A.WIG 21 /r ib    VINSERTPS xmm1, xmm2, xmm3/m32, imm8
+ * VEX.128.66.0F3A.W0 20 /r ib     VPINSRB xmm1,xmm2,r32/m8,imm8
+ * VEX.128.66.0F.W0 C4 /r ib       VPINSRW xmm1, xmm2, r32/m16, imm8
+ * VEX.128.66.0F3A.W0 22 /r ib     VPINSRD xmm1,xmm2,r/m32,imm8
+ * VEX.128.66.0F3A.W1 22 /r ib     VPINSRQ xmm1,xmm2,r/m64,imm8
+ * VEX.256.66.0F3A.W0 18 /r ib     VINSERTF128 ymm1, ymm2, xmm3/m128, imm8
+ * VEX.128.66.0F3A.WIG 17 /r ib    VEXTRACTPS reg/m32, xmm1, imm8
+ * VEX.128.66.0F3A.W0 14 /r ib     VPEXTRB r32/m8,xmm2,imm8
+ * VEX.128.66.0F3A.W0 15 /r ib     VPEXTRW r32/m16, xmm2, imm8
+ * VEX.128.66.0F3A.W0 16 /r ib     VPEXTRD r/m32,xmm2,imm8
+ * VEX.128.66.0F3A.W1 16 /r ib     VPEXTRQ r64/m64,xmm2,imm8
+ * VEX.128.66.0F.W0 C5 /r ib       VPEXTRW r32, xmm1, imm8
+ * VEX.128.66.0F.W1 C5 /r ib       VPEXTRW r64, xmm1, imm8
+ * VEX.256.66.0F3A.W0 19 /r ib     VEXTRACTF128 xmm1/m128, ymm2, imm8
+ * VEX.128.66.0F38.W0 18 /r        VBROADCASTSS xmm1, m32
+ * VEX.256.66.0F38.W0 18 /r        VBROADCASTSS ymm1, m32
+ * VEX.256.66.0F38.W0 19 /r        VBROADCASTSD ymm1, m64
+ * VEX.256.66.0F38.W0 1A /r        VBROADCASTF128 ymm1, m128
+ * VEX.256.66.0F3A.W0 06 /r ib     VPERM2F128 ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F38.W0 0C /r        VPERMILPS xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W0 0C /r        VPERMILPS ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F3A.W0 04 /r ib     VPERMILPS xmm1, xmm2/m128, imm8
+ * VEX.256.66.0F3A.W0 04 /r ib     VPERMILPS ymm1, ymm2/m256, imm8
+ * VEX.128.66.0F38.W0 0D /r        VPERMILPD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W0 0D /r        VPERMILPD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F3A.W0 05 /r ib     VPERMILPD xmm1, xmm2/m128, imm8
+ * VEX.256.66.0F3A.W0 05 /r ib     VPERMILPD ymm1, ymm2/m256, imm8
+ * VEX.128.66.0F38.WIG 20 /r       VPMOVSXBW xmm1, xmm2/m64
+ * VEX.128.66.0F38.WIG 21 /r       VPMOVSXBD xmm1, xmm2/m32
+ * VEX.128.66.0F38.WIG 22 /r       VPMOVSXBQ xmm1, xmm2/m16
+ * VEX.128.66.0F38.WIG 23 /r       VPMOVSXWD xmm1, xmm2/m64
+ * VEX.128.66.0F38.WIG 24 /r       VPMOVSXWQ xmm1, xmm2/m32
+ * VEX.128.66.0F38.WIG 25 /r       VPMOVSXDQ xmm1, xmm2/m64
+ * VEX.128.66.0F38.WIG 30 /r       VPMOVZXBW xmm1, xmm2/m64
+ * VEX.128.66.0F38.WIG 31 /r       VPMOVZXBD xmm1, xmm2/m32
+ * VEX.128.66.0F38.WIG 32 /r       VPMOVZXBQ xmm1, xmm2/m16
+ * VEX.128.66.0F38.WIG 33 /r       VPMOVZXWD xmm1, xmm2/m64
+ * VEX.128.66.0F38.WIG 34 /r       VPMOVZXWQ xmm1, xmm2/m32
+ * VEX.128.66.0F38.WIG 35 /r       VPMOVZXDQ xmm1, xmm2/m64
+ * VEX.LIG.F3.0F.W0 2A /r          VCVTSI2SS xmm1,xmm2,r/m32
+ * VEX.LIG.F3.0F.W1 2A /r          VCVTSI2SS xmm1,xmm2,r/m64
+ * VEX.LIG.F2.0F.W0 2A /r          VCVTSI2SD xmm1,xmm2,r/m32
+ * VEX.LIG.F2.0F.W1 2A /r          VCVTSI2SD xmm1,xmm2,r/m64
+ * VEX.LIG.F3.0F.W0 2D /r          VCVTSS2SI r32,xmm1/m32
+ * VEX.LIG.F3.0F.W1 2D /r          VCVTSS2SI r64,xmm1/m32
+ * VEX.LIG.F2.0F.W0 2D /r          VCVTSD2SI r32,xmm1/m64
+ * VEX.LIG.F2.0F.W1 2D /r          VCVTSD2SI r64,xmm1/m64
+ * VEX.LIG.F3.0F.W0 2C /r          VCVTTSS2SI r32,xmm1/m32
+ * VEX.LIG.F3.0F.W1 2C /r          VCVTTSS2SI r64,xmm1/m32
+ * VEX.LIG.F2.0F.W0 2C /r          VCVTTSD2SI r32,xmm1/m64
+ * VEX.LIG.F2.0F.W1 2C /r          VCVTTSD2SI r64,xmm1/m64
+ * VEX.128.F2.0F.WIG E6 /r         VCVTPD2DQ xmm1, xmm2/m128
+ * VEX.256.F2.0F.WIG E6 /r         VCVTPD2DQ xmm1, ymm2/m256
+ * VEX.128.66.0F.WIG E6 /r         VCVTTPD2DQ xmm1, xmm2/m128
+ * VEX.256.66.0F.WIG E6 /r         VCVTTPD2DQ xmm1, ymm2/m256
+ * VEX.128.F3.0F.WIG E6 /r         VCVTDQ2PD xmm1, xmm2/m64
+ * VEX.256.F3.0F.WIG E6 /r         VCVTDQ2PD ymm1, xmm2/m128
+ * VEX.128.0F.WIG 5A /r            VCVTPS2PD xmm1, xmm2/m64
+ * VEX.256.0F.WIG 5A /r            VCVTPS2PD ymm1, xmm2/m128
+ * VEX.128.66.0F.WIG 5A /r         VCVTPD2PS xmm1, xmm2/m128
+ * VEX.256.66.0F.WIG 5A /r         VCVTPD2PS xmm1, ymm2/m256
+ * VEX.LIG.F3.0F.WIG 5A /r         VCVTSS2SD xmm1, xmm2, xmm3/m32
+ * VEX.LIG.F2.0F.WIG 5A /r         VCVTSD2SS xmm1,xmm2, xmm3/m64
+ * VEX.128.0F.WIG 5B /r            VCVTDQ2PS xmm1, xmm2/m128
+ * VEX.256.0F.WIG 5B /r            VCVTDQ2PS ymm1, ymm2/m256
+ * VEX.128.66.0F.WIG 5B /r         VCVTPS2DQ xmm1, xmm2/m128
+ * VEX.256.66.0F.WIG 5B /r         VCVTPS2DQ ymm1, ymm2/m256
+ * VEX.128.F3.0F.WIG 5B /r         VCVTTPS2DQ xmm1, xmm2/m128
+ * VEX.256.F3.0F.WIG 5B /r         VCVTTPS2DQ ymm1, ymm2/m256
+ * VEX.128.66.0F.WIG F7 /r         VMASKMOVDQU xmm1, xmm2
+ * VEX.128.66.0F38.W0 2C /r        VMASKMOVPS xmm1, xmm2, m128
+ * VEX.128.66.0F38.W0 2E /r        VMASKMOVPS m128, xmm1, xmm2
+ * VEX.256.66.0F38.W0 2C /r        VMASKMOVPS ymm1, ymm2, m256
+ * VEX.256.66.0F38.W0 2E /r        VMASKMOVPS m256, ymm1, ymm2
+ * VEX.128.66.0F38.W0 2D /r        VMASKMOVPD xmm1, xmm2, m128
+ * VEX.128.66.0F38.W0 2F /r        VMASKMOVPD m128, xmm1, xmm2
+ * VEX.256.66.0F38.W0 2D /r        VMASKMOVPD ymm1, ymm2, m256
+ * VEX.256.66.0F38.W0 2F /r        VMASKMOVPD m256, ymm1, ymm2
+ * VEX.128.0F.WIG 2B /r            VMOVNTPS m128, xmm1
+ * VEX.256.0F.WIG 2B /r            VMOVNTPS m256, ymm1
+ * VEX.128.66.0F.WIG 2B /r         VMOVNTPD m128, xmm1
+ * VEX.256.66.0F.WIG 2B /r         VMOVNTPD m256, ymm1
+ * VEX.128.66.0F.WIG E7 /r         VMOVNTDQ m128, xmm1
+ * VEX.256.66.0F.WIG E7 /r         VMOVNTDQ m256, ymm1
+ * VEX.128.66.0F38.WIG 2A /r       VMOVNTDQA xmm1, m128
+ * VEX.128.0F.WIG 77               VZEROUPPER
+ * VEX.256.0F.WIG 77               VZEROALL
+ * VEX.128.66.0F.WIG 71 /6 ib      VPSLLW xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 71 /2 ib      VPSRLW xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 71 /4 ib      VPSRAW xmm1,xmm2,imm8
+ * VEX.128.66.0F.WIG 72 /6 ib      VPSLLD xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 72 /2 ib      VPSRLD xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 72 /4 ib      VPSRAD xmm1,xmm2,imm8
+ * VEX.128.66.0F.WIG 73 /6 ib      VPSLLQ xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 73 /7 ib      VPSLLDQ xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 73 /2 ib      VPSRLQ xmm1, xmm2, imm8
+ * VEX.128.66.0F.WIG 73 /3 ib      VPSRLDQ xmm1, xmm2, imm8
+ * VEX.LZ.0F.WIG AE /2             VLDMXCSR m32
+ * VEX.LZ.0F.WIG AE /3             VSTMXCSR m32
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
 OPCODE(movd, LEG(NP, 0F, 0, 0x7e), MMX, WR, Ed, Pq)
 OPCODE(movd, LEG(66, 0F, 0, 0x6e), SSE2, WR, Vdq, Ed)
 OPCODE(movd, LEG(66, 0F, 0, 0x7e), SSE2, WR, Ed, Vdq)
+OPCODE(vmovd, VEX(128, 66, 0F, 0, 0x6e), AVX, WR, Vdq, Ed)
+OPCODE(vmovd, VEX(128, 66, 0F, 0, 0x7e), AVX, WR, Ed, Vdq)
 OPCODE(movq, LEG(NP, 0F, 1, 0x6e), MMX, WR, Pq, Eq)
 OPCODE(movq, LEG(NP, 0F, 1, 0x7e), MMX, WR, Eq, Pq)
 OPCODE(movq, LEG(66, 0F, 1, 0x6e), SSE2, WR, Vdq, Eq)
 OPCODE(movq, LEG(66, 0F, 1, 0x7e), SSE2, WR, Eq, Vdq)
+OPCODE(vmovq, VEX(128, 66, 0F, 1, 0x6e), AVX, WR, Vdq, Eq)
+OPCODE(vmovq, VEX(128, 66, 0F, 1, 0x7e), AVX, WR, Eq, Vdq)
 OPCODE(movq, LEG(NP, 0F, 0, 0x6f), MMX, WR, Pq, Qq)
 OPCODE(movq, LEG(NP, 0F, 0, 0x7f), MMX, WR, Qq, Pq)
 OPCODE(movq, LEG(F3, 0F, 0, 0x7e), SSE2, WR, Vdq, Wq)
 OPCODE(movq, LEG(66, 0F, 0, 0xd6), SSE2, WR, UdqMq, Vq)
+OPCODE(vmovq, VEX(128, F3, 0F, IG, 0x7e), AVX, WR, Vdq, Wq)
+OPCODE(vmovq, VEX(128, 66, 0F, IG, 0xd6), AVX, WR, UdqMq, Vq)
 OPCODE(movaps, LEG(NP, 0F, 0, 0x28), SSE, WR, Vdq, Wdq)
 OPCODE(movaps, LEG(NP, 0F, 0, 0x29), SSE, WR, Wdq, Vdq)
+OPCODE(vmovaps, VEX(128, NP, 0F, IG, 0x28), AVX, WR, Vdq, Wdq)
+OPCODE(vmovaps, VEX(128, NP, 0F, IG, 0x29), AVX, WR, Wdq, Vdq)
+OPCODE(vmovaps, VEX(256, NP, 0F, IG, 0x28), AVX, WR, Vqq, Wqq)
+OPCODE(vmovaps, VEX(256, NP, 0F, IG, 0x29), AVX, WR, Wqq, Vqq)
 OPCODE(movapd, LEG(66, 0F, 0, 0x28), SSE2, WR, Vdq, Wdq)
 OPCODE(movapd, LEG(66, 0F, 0, 0x29), SSE2, WR, Wdq, Vdq)
+OPCODE(vmovapd, VEX(128, 66, 0F, IG, 0x28), AVX, WR, Vdq, Wdq)
+OPCODE(vmovapd, VEX(128, 66, 0F, IG, 0x29), AVX, WR, Wdq, Vdq)
+OPCODE(vmovapd, VEX(256, 66, 0F, IG, 0x28), AVX, WR, Vqq, Wqq)
+OPCODE(vmovapd, VEX(256, 66, 0F, IG, 0x29), AVX, WR, Wqq, Vqq)
 OPCODE(movdqa, LEG(66, 0F, 0, 0x6f), SSE2, WR, Vdq, Wdq)
 OPCODE(movdqa, LEG(66, 0F, 0, 0x7f), SSE2, WR, Wdq, Vdq)
+OPCODE(vmovdqa, VEX(128, 66, 0F, IG, 0x6f), AVX, WR, Vdq, Wdq)
+OPCODE(vmovdqa, VEX(128, 66, 0F, IG, 0x7f), AVX, WR, Wdq, Vdq)
+OPCODE(vmovdqa, VEX(256, 66, 0F, IG, 0x6f), AVX, WR, Vqq, Wqq)
+OPCODE(vmovdqa, VEX(256, 66, 0F, IG, 0x7f), AVX, WR, Wqq, Vqq)
 OPCODE(movups, LEG(NP, 0F, 0, 0x10), SSE, WR, Vdq, Wdq)
 OPCODE(movups, LEG(NP, 0F, 0, 0x11), SSE, WR, Wdq, Vdq)
+OPCODE(vmovups, VEX(128, NP, 0F, IG, 0x10), AVX, WR, Vdq, Wdq)
+OPCODE(vmovups, VEX(128, NP, 0F, IG, 0x11), AVX, WR, Wdq, Vdq)
+OPCODE(vmovups, VEX(256, NP, 0F, IG, 0x10), AVX, WR, Vqq, Wqq)
+OPCODE(vmovups, VEX(256, NP, 0F, IG, 0x11), AVX, WR, Wqq, Vqq)
 OPCODE(movupd, LEG(66, 0F, 0, 0x10), SSE2, WR, Vdq, Wdq)
 OPCODE(movupd, LEG(66, 0F, 0, 0x11), SSE2, WR, Wdq, Vdq)
+OPCODE(vmovupd, VEX(128, 66, 0F, IG, 0x10), AVX, WR, Vdq, Wdq)
+OPCODE(vmovupd, VEX(128, 66, 0F, IG, 0x11), AVX, WR, Wdq, Vdq)
+OPCODE(vmovupd, VEX(256, 66, 0F, IG, 0x10), AVX, WR, Vqq, Wqq)
+OPCODE(vmovupd, VEX(256, 66, 0F, IG, 0x11), AVX, WR, Wqq, Vqq)
 OPCODE(movdqu, LEG(F3, 0F, 0, 0x6f), SSE2, WR, Vdq, Wdq)
 OPCODE(movdqu, LEG(F3, 0F, 0, 0x7f), SSE2, WR, Wdq, Vdq)
+OPCODE(vmovdqu, VEX(128, F3, 0F, IG, 0x6f), AVX, WR, Vdq, Wdq)
+OPCODE(vmovdqu, VEX(128, F3, 0F, IG, 0x7f), AVX, WR, Wdq, Vdq)
+OPCODE(vmovdqu, VEX(256, F3, 0F, IG, 0x6f), AVX, WR, Vqq, Wqq)
+OPCODE(vmovdqu, VEX(256, F3, 0F, IG, 0x7f), AVX, WR, Wqq, Vqq)
 OPCODE(movss, LEG(F3, 0F, 0, 0x10), SSE, WRRR, Vdq, Vdq, Wd, modrm_mod)
 OPCODE(movss, LEG(F3, 0F, 0, 0x11), SSE, WR, Wd, Vd)
+OPCODE(vmovss, VEX(IG, F3, 0F, IG, 0x10), AVX, WRRRR, Vdq, Hdq, Wd, modrm_mod, vex_v)
+OPCODE(vmovss, VEX(IG, F3, 0F, IG, 0x11), AVX, WRRRR, Wdq, Hdq, Vd, modrm_mod, vex_v)
 OPCODE(movsd, LEG(F2, 0F, 0, 0x10), SSE2, WRRR, Vdq, Vdq, Wq, modrm_mod)
 OPCODE(movsd, LEG(F2, 0F, 0, 0x11), SSE2, WR, Wq, Vq)
+OPCODE(vmovsd, VEX(IG, F2, 0F, IG, 0x10), AVX, WRRRR, Vdq, Hdq, Wq, modrm_mod, vex_v)
+OPCODE(vmovsd, VEX(IG, F2, 0F, IG, 0x11), AVX, WRRRR, Wdq, Hdq, Vq, modrm_mod, vex_v)
 OPCODE(movq2dq, LEG(F3, 0F, 0, 0xd6), SSE2, WR, Vdq, Nq)
 OPCODE(movdq2q, LEG(F2, 0F, 0, 0xd6), SSE2, WR, Pq, Uq)
 OPCODE(movhlps, LEG(NP, 0F, 0, 0x12), SSE, WRR, Vdq, Vdq, UdqMhq)
+OPCODE(vmovhlps, VEX(128, NP, 0F, IG, 0x12), AVX, WRR, Vdq, Hdq, UdqMhq)
 OPCODE(movlps, LEG(NP, 0F, 0, 0x13), SSE, WR, Mq, Vq)
+OPCODE(vmovlps, VEX(128, NP, 0F, IG, 0x13), AVX, WR, Mq, Vq)
 OPCODE(movlpd, LEG(66, 0F, 0, 0x12), SSE2, WRR, Vdq, Vdq, Mq)
 OPCODE(movlpd, LEG(66, 0F, 0, 0x13), SSE2, WR, Mq, Vq)
+OPCODE(vmovlpd, VEX(128, 66, 0F, IG, 0x12), AVX, WRR, Vdq, Hdq, Mq)
+OPCODE(vmovlpd, VEX(128, 66, 0F, IG, 0x13), AVX, WR, Mq, Vq)
 OPCODE(movlhps, LEG(NP, 0F, 0, 0x16), SSE, WRR, Vdq, Vq, Wq)
+OPCODE(vmovlhps, VEX(128, NP, 0F, IG, 0x16), AVX, WRR, Vdq, Hq, Wq)
 OPCODE(movhps, LEG(NP, 0F, 0, 0x17), SSE, WR, Mq, Vdq)
+OPCODE(vmovhps, VEX(128, NP, 0F, IG, 0x17), AVX, WR, Mq, Vdq)
 OPCODE(movhpd, LEG(66, 0F, 0, 0x16), SSE2, WRR, Vdq, Vq, Mq)
 OPCODE(movhpd, LEG(66, 0F, 0, 0x17), SSE2, WR, Mq, Vdq)
+OPCODE(vmovhpd, VEX(128, 66, 0F, IG, 0x16), AVX, WRR, Vdq, Hq, Mq)
+OPCODE(vmovhpd, VEX(128, 66, 0F, IG, 0x17), AVX, WR, Mq, Vdq)
 OPCODE(pmovmskb, LEG(NP, 0F, 0, 0xd7), SSE, WR, Gd, Nq)
 OPCODE(pmovmskb, LEG(NP, 0F, 1, 0xd7), SSE, WR, Gq, Nq)
 OPCODE(pmovmskb, LEG(66, 0F, 0, 0xd7), SSE2, WR, Gd, Udq)
 OPCODE(pmovmskb, LEG(66, 0F, 1, 0xd7), SSE2, WR, Gq, Udq)
+OPCODE(vpmovmskb, VEX(128, 66, 0F, 0, 0xd7), AVX, WR, Gd, Udq)
+OPCODE(vpmovmskb, VEX(128, 66, 0F, 1, 0xd7), AVX, WR, Gq, Udq)
 OPCODE(movmskps, LEG(NP, 0F, 0, 0x50), SSE, WR, Gd, Udq)
 OPCODE(movmskps, LEG(NP, 0F, 1, 0x50), SSE, WR, Gq, Udq)
+OPCODE(vmovmskps, VEX(128, NP, 0F, 0, 0x50), AVX, WR, Gd, Udq)
+OPCODE(vmovmskps, VEX(128, NP, 0F, 1, 0x50), AVX, WR, Gq, Udq)
+OPCODE(vmovmskps, VEX(256, NP, 0F, 0, 0x50), AVX, WR, Gd, Uqq)
+OPCODE(vmovmskps, VEX(256, NP, 0F, 1, 0x50), AVX, WR, Gq, Uqq)
 OPCODE(movmskpd, LEG(66, 0F, 0, 0x50), SSE2, WR, Gd, Udq)
 OPCODE(movmskpd, LEG(66, 0F, 1, 0x50), SSE2, WR, Gq, Udq)
+OPCODE(vmovmskpd, VEX(128, 66, 0F, 0, 0x50), AVX, WR, Gd, Udq)
+OPCODE(vmovmskpd, VEX(128, 66, 0F, 1, 0x50), AVX, WR, Gq, Udq)
+OPCODE(vmovmskpd, VEX(256, 66, 0F, 0, 0x50), AVX, WR, Gd, Uqq)
+OPCODE(vmovmskpd, VEX(256, 66, 0F, 1, 0x50), AVX, WR, Gq, Uqq)
 OPCODE(lddqu, LEG(F2, 0F, 0, 0xf0), SSE3, WR, Vdq, Mdq)
+OPCODE(vlddqu, VEX(128, F2, 0F, IG, 0xf0), AVX, WR, Vdq, Mdq)
+OPCODE(vlddqu, VEX(256, F2, 0F, IG, 0xf0), AVX, WR, Vqq, Mqq)
 OPCODE(movshdup, LEG(F3, 0F, 0, 0x16), SSE3, WR, Vdq, Wdq)
+OPCODE(vmovshdup, VEX(128, F3, 0F, IG, 0x16), AVX, WR, Vdq, Wdq)
+OPCODE(vmovshdup, VEX(256, F3, 0F, IG, 0x16), AVX, WR, Vqq, Wqq)
 OPCODE(movsldup, LEG(F3, 0F, 0, 0x12), SSE3, WR, Vdq, Wdq)
+OPCODE(vmovsldup, VEX(128, F3, 0F, IG, 0x12), AVX, WR, Vdq, Wdq)
+OPCODE(vmovsldup, VEX(256, F3, 0F, IG, 0x12), AVX, WR, Vqq, Wqq)
 OPCODE(movddup, LEG(F2, 0F, 0, 0x12), SSE3, WR, Vdq, Wq)
+OPCODE(vmovddup, VEX(128, F2, 0F, IG, 0x12), AVX, WR, Vdq, Wq)
+OPCODE(vmovddup, VEX(256, F2, 0F, IG, 0x12), AVX, WR, Vqq, Wqq)
 OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddb, LEG(66, 0F, 0, 0xfc), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddb, VEX(128, 66, 0F, IG, 0xfc), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddw, LEG(66, 0F, 0, 0xfd), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddw, VEX(128, 66, 0F, IG, 0xfd), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddd, LEG(NP, 0F, 0, 0xfe), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddd, LEG(66, 0F, 0, 0xfe), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddd, VEX(128, 66, 0F, IG, 0xfe), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddq, LEG(NP, 0F, 0, 0xd4), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(paddq, LEG(66, 0F, 0, 0xd4), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddq, VEX(128, 66, 0F, IG, 0xd4), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddsb, LEG(NP, 0F, 0, 0xec), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddsb, LEG(66, 0F, 0, 0xec), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddsb, VEX(128, 66, 0F, IG, 0xec), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddsw, LEG(NP, 0F, 0, 0xed), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddsw, LEG(66, 0F, 0, 0xed), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddsw, VEX(128, 66, 0F, IG, 0xed), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddusb, LEG(NP, 0F, 0, 0xdc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddusb, LEG(66, 0F, 0, 0xdc), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddusb, VEX(128, 66, 0F, IG, 0xdc), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(paddusw, LEG(NP, 0F, 0, 0xdd), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddusw, LEG(66, 0F, 0, 0xdd), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpaddusw, VEX(128, 66, 0F, IG, 0xdd), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaddps, VEX(128, NP, 0F, IG, 0x58), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vaddps, VEX(256, NP, 0F, IG, 0x58), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(addpd, LEG(66, 0F, 0, 0x58), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaddpd, VEX(128, 66, 0F, IG, 0x58), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vaddpd, VEX(256, 66, 0F, IG, 0x58), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(addss, LEG(F3, 0F, 0, 0x58), SSE, WRR, Vd, Vd, Wd)
+OPCODE(vaddss, VEX(IG, F3, 0F, IG, 0x58), AVX, WRR, Vd, Hd, Wd)
 OPCODE(addsd, LEG(F2, 0F, 0, 0x58), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(vaddsd, VEX(IG, F2, 0F, IG, 0x58), AVX, WRR, Vq, Hq, Wq)
 OPCODE(phaddw, LEG(NP, 0F38, 0, 0x01), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phaddw, LEG(66, 0F38, 0, 0x01), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vphaddw, VEX(128, 66, 0F38, IG, 0x01), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(phaddd, LEG(NP, 0F38, 0, 0x02), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phaddd, LEG(66, 0F38, 0, 0x02), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vphaddd, VEX(128, 66, 0F38, IG, 0x02), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(phaddsw, LEG(NP, 0F38, 0, 0x03), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phaddsw, LEG(66, 0F38, 0, 0x03), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vphaddsw, VEX(128, 66, 0F38, IG, 0x03), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(haddps, LEG(F2, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vhaddps, VEX(128, F2, 0F, IG, 0x7c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vhaddps, VEX(256, F2, 0F, IG, 0x7c), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(haddpd, LEG(66, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vhaddpd, VEX(128, 66, 0F, IG, 0x7c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vhaddpd, VEX(256, 66, 0F, IG, 0x7c), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubb, LEG(66, 0F, 0, 0xf8), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubb, VEX(128, 66, 0F, IG, 0xf8), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubw, LEG(66, 0F, 0, 0xf9), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubw, VEX(128, 66, 0F, IG, 0xf9), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubd, LEG(NP, 0F, 0, 0xfa), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubd, LEG(66, 0F, 0, 0xfa), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubd, VEX(128, 66, 0F, IG, 0xfa), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubq, LEG(NP, 0F, 0, 0xfb), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(psubq, LEG(66, 0F, 0, 0xfb), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubq, VEX(128, 66, 0F, IG, 0xfb), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubsb, LEG(NP, 0F, 0, 0xe8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubsb, LEG(66, 0F, 0, 0xe8), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubsb, VEX(128, 66, 0F, IG, 0xe8), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubsw, LEG(NP, 0F, 0, 0xe9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubsw, LEG(66, 0F, 0, 0xe9), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubsw, VEX(128, 66, 0F, IG, 0xe9), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubusb, LEG(NP, 0F, 0, 0xd8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubusb, LEG(66, 0F, 0, 0xd8), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubusb, VEX(128, 66, 0F, IG, 0xd8), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psubusw, LEG(NP, 0F, 0, 0xd9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubusw, LEG(66, 0F, 0, 0xd9), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsubusw, VEX(128, 66, 0F, IG, 0xd9), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vsubps, VEX(128, NP, 0F, IG, 0x5c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vsubps, VEX(256, NP, 0F, IG, 0x5c), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(subpd, LEG(66, 0F, 0, 0x5c), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vsubpd, VEX(128, 66, 0F, IG, 0x5c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vsubpd, VEX(256, 66, 0F, IG, 0x5c), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(subss, LEG(F3, 0F, 0, 0x5c), SSE, WRR, Vd, Vd, Wd)
+OPCODE(vsubss, VEX(IG, F3, 0F, IG, 0x5c), AVX, WRR, Vd, Hd, Wd)
 OPCODE(subsd, LEG(F2, 0F, 0, 0x5c), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(vsubsd, VEX(IG, F2, 0F, IG, 0x5c), AVX, WRR, Vq, Hq, Wq)
 OPCODE(phsubw, LEG(NP, 0F38, 0, 0x05), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phsubw, LEG(66, 0F38, 0, 0x05), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vphsubw, VEX(128, 66, 0F38, IG, 0x05), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(phsubd, LEG(NP, 0F38, 0, 0x06), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phsubd, LEG(66, 0F38, 0, 0x06), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vphsubd, VEX(128, 66, 0F38, IG, 0x06), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(phsubsw, LEG(NP, 0F38, 0, 0x07), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phsubsw, LEG(66, 0F38, 0, 0x07), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vphsubsw, VEX(128, 66, 0F38, IG, 0x07), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(hsubps, LEG(F2, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vhsubps, VEX(128, F2, 0F, IG, 0x7d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vhsubps, VEX(256, F2, 0F, IG, 0x7d), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(hsubpd, LEG(66, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vhsubpd, VEX(128, 66, 0F, IG, 0x7d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vhsubpd, VEX(256, 66, 0F, IG, 0x7d), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(addsubps, LEG(F2, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaddsubps, VEX(128, F2, 0F, IG, 0xd0), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vaddsubps, VEX(256, F2, 0F, IG, 0xd0), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(addsubpd, LEG(66, 0F, 0, 0xd0), SSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vaddsubpd, VEX(128, 66, 0F, IG, 0xd0), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vaddsubpd, VEX(256, 66, 0F, IG, 0xd0), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmullw, LEG(66, 0F, 0, 0xd5), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmullw, VEX(128, 66, 0F, IG, 0xd5), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmulld, LEG(66, 0F38, 0, 0x40), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmulld, VEX(128, 66, 0F38, IG, 0x40), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmulhw, LEG(66, 0F, 0, 0xe5), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmulhw, VEX(128, 66, 0F, IG, 0xe5), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmulhuw, LEG(66, 0F, 0, 0xe4), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmulhuw, VEX(128, 66, 0F, IG, 0xe4), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmuldq, LEG(66, 0F38, 0, 0x28), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmuldq, VEX(128, 66, 0F38, IG, 0x28), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmuludq, LEG(NP, 0F, 0, 0xf4), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(pmuludq, LEG(66, 0F, 0, 0xf4), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmuludq, VEX(128, 66, 0F, IG, 0xf4), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmulhrsw, LEG(NP, 0F38, 0, 0x0b), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(pmulhrsw, LEG(66, 0F38, 0, 0x0b), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmulhrsw, VEX(128, 66, 0F38, IG, 0x0b), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(mulps, LEG(NP, 0F, 0, 0x59), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vmulps, VEX(128, NP, 0F, IG, 0x59), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vmulps, VEX(256, NP, 0F, IG, 0x59), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(mulpd, LEG(66, 0F, 0, 0x59), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vmulpd, VEX(128, 66, 0F, IG, 0x59), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vmulpd, VEX(256, 66, 0F, IG, 0x59), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(mulss, LEG(F3, 0F, 0, 0x59), SSE, WRR, Vd, Vd, Wd)
+OPCODE(vmulss, VEX(IG, F3, 0F, IG, 0x59), AVX, WRR, Vd, Hd, Wd)
 OPCODE(mulsd, LEG(F2, 0F, 0, 0x59), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(vmulsd, VEX(IG, F2, 0F, IG, 0x59), AVX, WRR, Vq, Hq, Wq)
 OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmaddwd, LEG(66, 0F, 0, 0xf5), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaddwd, VEX(128, 66, 0F, IG, 0xf5), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmaddubsw, LEG(NP, 0F38, 0, 0x04), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(pmaddubsw, LEG(66, 0F38, 0, 0x04), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaddubsw, VEX(128, 66, 0F38, IG, 0x04), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(divps, LEG(NP, 0F, 0, 0x5e), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vdivps, VEX(128, NP, 0F, IG, 0x5e), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vdivps, VEX(256, NP, 0F, IG, 0x5e), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(divpd, LEG(66, 0F, 0, 0x5e), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vdivpd, VEX(128, 66, 0F, IG, 0x5e), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vdivpd, VEX(256, 66, 0F, IG, 0x5e), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(divss, LEG(F3, 0F, 0, 0x5e), SSE, WRR, Vd, Vd, Wd)
+OPCODE(vdivss, VEX(IG, F3, 0F, IG, 0x5e), AVX, WRR, Vd, Hd, Wd)
 OPCODE(divsd, LEG(F2, 0F, 0, 0x5e), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(vdivsd, VEX(IG, F2, 0F, IG, 0x5e), AVX, WRR, Vq, Hq, Wq)
 OPCODE(rcpps, LEG(NP, 0F, 0, 0x53), SSE, WR, Vdq, Wdq)
+OPCODE(vrcpps, VEX(128, NP, 0F, IG, 0x53), AVX, WR, Vdq, Wdq)
+OPCODE(vrcpps, VEX(256, NP, 0F, IG, 0x53), AVX, WR, Vqq, Wqq)
 OPCODE(rcpss, LEG(F3, 0F, 0, 0x53), SSE, WR, Vd, Wd)
+OPCODE(vrcpss, VEX(IG, F3, 0F, IG, 0x53), AVX, WRR, Vd, Hd, Wd)
 OPCODE(sqrtps, LEG(NP, 0F, 0, 0x51), SSE, WR, Vdq, Wdq)
+OPCODE(vsqrtps, VEX(128, NP, 0F, IG, 0x51), AVX, WR, Vdq, Wdq)
+OPCODE(vsqrtps, VEX(256, NP, 0F, IG, 0x51), AVX, WR, Vqq, Wqq)
 OPCODE(sqrtpd, LEG(66, 0F, 0, 0x51), SSE2, WR, Vdq, Wdq)
+OPCODE(vsqrtpd, VEX(128, 66, 0F, IG, 0x51), AVX, WR, Vdq, Wdq)
+OPCODE(vsqrtpd, VEX(256, 66, 0F, IG, 0x51), AVX, WR, Vqq, Wqq)
 OPCODE(sqrtss, LEG(F3, 0F, 0, 0x51), SSE, WR, Vd, Wd)
+OPCODE(vsqrtss, VEX(IG, F3, 0F, IG, 0x51), AVX, WRR, Vd, Hd, Wd)
 OPCODE(sqrtsd, LEG(F2, 0F, 0, 0x51), SSE2, WR, Vq, Wq)
+OPCODE(vsqrtsd, VEX(IG, F2, 0F, IG, 0x51), AVX, WRR, Vq, Hq, Wq)
 OPCODE(rsqrtps, LEG(NP, 0F, 0, 0x52), SSE, WR, Vdq, Wdq)
+OPCODE(vrsqrtps, VEX(128, NP, 0F, IG, 0x52), AVX, WR, Vdq, Wdq)
+OPCODE(vrsqrtps, VEX(256, NP, 0F, IG, 0x52), AVX, WR, Vqq, Wqq)
 OPCODE(rsqrtss, LEG(F3, 0F, 0, 0x52), SSE, WR, Vd, Wd)
+OPCODE(vrsqrtss, VEX(IG, F3, 0F, IG, 0x52), AVX, WRR, Vd, Hd, Wd)
 OPCODE(pminub, LEG(NP, 0F, 0, 0xda), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pminub, LEG(66, 0F, 0, 0xda), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpminub, VEX(128, 66, 0F, IG, 0xda), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pminuw, LEG(66, 0F38, 0, 0x3a), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpminuw, VEX(128, 66, 0F38, IG, 0x3a), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pminud, LEG(66, 0F38, 0, 0x3b), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpminud, VEX(128, 66, 0F38, IG, 0x3b), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pminsb, LEG(66, 0F38, 0, 0x38), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpminsb, VEX(128, 66, 0F38, IG, 0x38), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pminsw, LEG(NP, 0F, 0, 0xea), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pminsw, LEG(66, 0F, 0, 0xea), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpminsw, VEX(128, 66, 0F, IG, 0xea), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pminsd, LEG(66, 0F38, 0, 0x39), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpminsd, VEX(128, 66, 0F38, IG, 0x39), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(minps, LEG(NP, 0F, 0, 0x5d), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vminps, VEX(128, NP, 0F, IG, 0x5d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vminps, VEX(256, NP, 0F, IG, 0x5d), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(minpd, LEG(66, 0F, 0, 0x5d), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vminpd, VEX(128, 66, 0F, IG, 0x5d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vminpd, VEX(256, 66, 0F, IG, 0x5d), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(minss, LEG(F3, 0F, 0, 0x5d), SSE, WRR, Vd, Vd, Wd)
+OPCODE(vminss, VEX(IG, F3, 0F, IG, 0x5d), AVX, WRR, Vd, Hd, Wd)
 OPCODE(minsd, LEG(F2, 0F, 0, 0x5d), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(vminsd, VEX(IG, F2, 0F, IG, 0x5d), AVX, WRR, Vq, Hq, Wq)
 OPCODE(phminposuw, LEG(66, 0F38, 0, 0x41), SSE4_1, WR, Vdq, Wdq)
+OPCODE(vphminposuw, VEX(128, 66, 0F38, IG, 0x41), AVX, WR, Vdq, Wdq)
 OPCODE(pmaxub, LEG(NP, 0F, 0, 0xde), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmaxub, LEG(66, 0F, 0, 0xde), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaxub, VEX(128, 66, 0F, IG, 0xde), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmaxuw, LEG(66, 0F38, 0, 0x3e), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaxuw, VEX(128, 66, 0F38, IG, 0x3e), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmaxud, LEG(66, 0F38, 0, 0x3f), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaxud, VEX(128, 66, 0F38, IG, 0x3f), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmaxsb, LEG(66, 0F38, 0, 0x3c), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaxsb, VEX(128, 66, 0F38, IG, 0x3c), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmaxsw, LEG(NP, 0F, 0, 0xee), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmaxsw, LEG(66, 0F, 0, 0xee), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaxsw, VEX(128, 66, 0F, IG, 0xee), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pmaxsd, LEG(66, 0F38, 0, 0x3d), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpmaxsd, VEX(128, 66, 0F38, IG, 0x3d), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(maxps, LEG(NP, 0F, 0, 0x5f), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vmaxps, VEX(128, NP, 0F, IG, 0x5f), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vmaxps, VEX(256, NP, 0F, IG, 0x5f), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(maxpd, LEG(66, 0F, 0, 0x5f), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vmaxpd, VEX(128, 66, 0F, IG, 0x5f), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vmaxpd, VEX(256, 66, 0F, IG, 0x5f), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(maxss, LEG(F3, 0F, 0, 0x5f), SSE, WRR, Vd, Vd, Wd)
+OPCODE(vmaxss, VEX(IG, F3, 0F, IG, 0x5f), AVX, WRR, Vd, Hd, Wd)
 OPCODE(maxsd, LEG(F2, 0F, 0, 0x5f), SSE2, WRR, Vq, Vq, Wq)
+OPCODE(vmaxsd, VEX(IG, F2, 0F, IG, 0x5f), AVX, WRR, Vq, Hq, Wq)
 OPCODE(pavgb, LEG(NP, 0F, 0, 0xe0), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pavgb, LEG(66, 0F, 0, 0xe0), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpavgb, VEX(128, 66, 0F, IG, 0xe0), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pavgw, LEG(66, 0F, 0, 0xe3), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpavgw, VEX(128, 66, 0F, IG, 0xe3), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
 OPCODE(psadbw, LEG(66, 0F, 0, 0xf6), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsadbw, VEX(128, 66, 0F, IG, 0xf6), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(mpsadbw, LEG(66, 0F3A, 0, 0x42), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vmpsadbw, VEX(128, 66, 0F3A, IG, 0x42), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(pabsb, LEG(NP, 0F38, 0, 0x1c), SSSE3, WR, Pq, Qq)
 OPCODE(pabsb, LEG(66, 0F38, 0, 0x1c), SSSE3, WR, Vdq, Wdq)
+OPCODE(vpabsb, VEX(128, 66, 0F38, IG, 0x1c), AVX, WR, Vdq, Wdq)
 OPCODE(pabsw, LEG(NP, 0F38, 0, 0x1d), SSSE3, WR, Pq, Qq)
 OPCODE(pabsw, LEG(66, 0F38, 0, 0x1d), SSSE3, WR, Vdq, Wdq)
+OPCODE(vpabsw, VEX(128, 66, 0F38, IG, 0x1d), AVX, WR, Vdq, Wdq)
 OPCODE(pabsd, LEG(NP, 0F38, 0, 0x1e), SSSE3, WR, Pq, Qq)
 OPCODE(pabsd, LEG(66, 0F38, 0, 0x1e), SSSE3, WR, Vdq, Wdq)
+OPCODE(vpabsd, VEX(128, 66, 0F38, IG, 0x1e), AVX, WR, Vdq, Wdq)
 OPCODE(psignb, LEG(NP, 0F38, 0, 0x08), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignb, LEG(66, 0F38, 0, 0x08), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsignb, VEX(128, 66, 0F38, IG, 0x08), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psignw, LEG(NP, 0F38, 0, 0x09), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignw, LEG(66, 0F38, 0, 0x09), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsignw, VEX(128, 66, 0F38, IG, 0x09), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psignd, LEG(NP, 0F38, 0, 0x0a), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignd, LEG(66, 0F38, 0, 0x0a), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsignd, VEX(128, 66, 0F38, IG, 0x0a), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(dpps, LEG(66, 0F3A, 0, 0x40), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vdpps, VEX(128, 66, 0F3A, IG, 0x40), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vdpps, VEX(256, 66, 0F3A, IG, 0x40), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(dppd, LEG(66, 0F3A, 0, 0x41), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vdppd, VEX(128, 66, 0F3A, IG, 0x41), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(roundps, LEG(66, 0F3A, 0, 0x08), SSE4_1, WRR, Vdq, Wdq, Ib)
+OPCODE(vroundps, VEX(128, 66, 0F3A, IG, 0x08), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vroundps, VEX(256, 66, 0F3A, IG, 0x08), AVX, WRR, Vqq, Wqq, Ib)
 OPCODE(roundpd, LEG(66, 0F3A, 0, 0x09), SSE4_1, WRR, Vdq, Wdq, Ib)
+OPCODE(vroundpd, VEX(128, 66, 0F3A, IG, 0x09), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vroundpd, VEX(256, 66, 0F3A, IG, 0x09), AVX, WRR, Vqq, Wqq, Ib)
 OPCODE(roundss, LEG(66, 0F3A, 0, 0x0a), SSE4_1, WRR, Vd, Wd, Ib)
+OPCODE(vroundss, VEX(IG, 66, 0F3A, IG, 0x0a), AVX, WRRR, Vd, Hd, Wd, Ib)
 OPCODE(roundsd, LEG(66, 0F3A, 0, 0x0b), SSE4_1, WRR, Vq, Wq, Ib)
+OPCODE(vroundsd, VEX(IG, 66, 0F3A, IG, 0x0b), AVX, WRRR, Vq, Hq, Wq, Ib)
 OPCODE(aesdec, LEG(66, 0F38, 0, 0xde), AES, WRR, Vdq, Vdq, Wdq)
 OPCODE(vaesdec, VEX(128, 66, 0F38, IG, 0xde), AES_AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(aesdeclast, LEG(66, 0F38, 0, 0xdf), AES, WRR, Vdq, Vdq, Wdq)
@@ -677,170 +1246,352 @@ OPCODE(pclmulqdq, LEG(66, 0F3A, 0, 0x44), PCLMULQDQ, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(vpclmulqdq, VEX(128, 66, 0F3A, IG, 0x44), PCLMULQDQ_AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpeqb, VEX(128, 66, 0F, IG, 0x74), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqw, LEG(66, 0F, 0, 0x75), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpeqw, VEX(128, 66, 0F, IG, 0x75), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqd, LEG(66, 0F, 0, 0x76), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpeqd, VEX(128, 66, 0F, IG, 0x76), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpeqq, LEG(66, 0F38, 0, 0x29), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpeqq, VEX(128, 66, 0F38, IG, 0x29), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtb, LEG(66, 0F, 0, 0x64), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpgtb, VEX(128, 66, 0F, IG, 0x64), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtw, LEG(66, 0F, 0, 0x65), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpgtw, VEX(128, 66, 0F, IG, 0x65), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtd, LEG(66, 0F, 0, 0x66), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpgtd, VEX(128, 66, 0F, IG, 0x66), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpgtq, LEG(66, 0F38, 0, 0x37), SSE4_2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpcmpgtq, VEX(128, 66, 0F38, IG, 0x37), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pcmpestrm, LEG(66, 0F3A, 0, 0x60), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(vpcmpestrm, VEX(128, 66, 0F3A, IG, 0x60), AVX, RRR, Vdq, Wdq, Ib)
 OPCODE(pcmpestri, LEG(66, 0F3A, 0, 0x61), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(vpcmpestri, VEX(128, 66, 0F3A, IG, 0x61), AVX, RRR, Vdq, Wdq, Ib)
 OPCODE(pcmpistrm, LEG(66, 0F3A, 0, 0x62), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(vpcmpistrm, VEX(128, 66, 0F3A, IG, 0x62), AVX, RRR, Vdq, Wdq, Ib)
 OPCODE(pcmpistri, LEG(66, 0F3A, 0, 0x63), SSE4_2, RRR, Vdq, Wdq, Ib)
+OPCODE(vpcmpistri, VEX(128, 66, 0F3A, IG, 0x63), AVX, RRR, Vdq, Wdq, Ib)
 OPCODE(ptest, LEG(66, 0F38, 0, 0x17), SSE4_1, RR, Vdq, Wdq)
+OPCODE(vptest, VEX(128, 66, 0F38, IG, 0x17), AVX, RR, Vdq, Wdq)
+OPCODE(vptest, VEX(256, 66, 0F38, IG, 0x17), AVX, RR, Vqq, Wqq)
+OPCODE(vtestps, VEX(128, 66, 0F38, 0, 0x0e), AVX, RR, Vdq, Wdq)
+OPCODE(vtestps, VEX(256, 66, 0F38, 0, 0x0e), AVX, RR, Vqq, Wqq)
+OPCODE(vtestpd, VEX(128, 66, 0F38, 0, 0x0f), AVX, RR, Vdq, Wdq)
+OPCODE(vtestpd, VEX(256, 66, 0F38, 0, 0x0f), AVX, RR, Vqq, Wqq)
 OPCODE(cmpps, LEG(NP, 0F, 0, 0xc2), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vcmpps, VEX(128, NP, 0F, IG, 0xc2), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vcmpps, VEX(256, NP, 0F, IG, 0xc2), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(cmppd, LEG(66, 0F, 0, 0xc2), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vcmppd, VEX(128, 66, 0F, IG, 0xc2), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vcmppd, VEX(256, 66, 0F, IG, 0xc2), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(cmpss, LEG(F3, 0F, 0, 0xc2), SSE, WRRR, Vd, Vd, Wd, Ib)
+OPCODE(vcmpss, VEX(IG, F3, 0F, IG, 0xc2), AVX, WRRR, Vd, Hd, Wd, Ib)
 OPCODE(cmpsd, LEG(F2, 0F, 0, 0xc2), SSE2, WRRR, Vq, Vq, Wq, Ib)
+OPCODE(vcmpsd, VEX(IG, F2, 0F, IG, 0xc2), AVX, WRRR, Vq, Hq, Wq, Ib)
 OPCODE(ucomiss, LEG(NP, 0F, 0, 0x2e), SSE, RR, Vd, Wd)
+OPCODE(vucomiss, VEX(IG, NP, 0F, IG, 0x2e), AVX, RR, Vd, Wd)
 OPCODE(ucomisd, LEG(66, 0F, 0, 0x2e), SSE2, RR, Vq, Wq)
+OPCODE(vucomisd, VEX(IG, 66, 0F, IG, 0x2e), AVX, RR, Vq, Wq)
 OPCODE(comiss, LEG(NP, 0F, 0, 0x2f), SSE, RR, Vd, Wd)
+OPCODE(vcomiss, VEX(IG, NP, 0F, IG, 0x2f), AVX, RR, Vd, Wd)
 OPCODE(comisd, LEG(66, 0F, 0, 0x2f), SSE2, RR, Vq, Wq)
+OPCODE(vcomisd, VEX(IG, 66, 0F, IG, 0x2f), AVX, RR, Vq, Wq)
 OPCODE(pand, LEG(NP, 0F, 0, 0xdb), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pand, LEG(66, 0F, 0, 0xdb), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpand, VEX(128, 66, 0F, IG, 0xdb), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(andps, LEG(NP, 0F, 0, 0x54), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vandps, VEX(128, NP, 0F, IG, 0x54), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vandps, VEX(256, NP, 0F, IG, 0x54), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(andpd, LEG(66, 0F, 0, 0x54), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vandpd, VEX(128, 66, 0F, IG, 0x54), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vandpd, VEX(256, 66, 0F, IG, 0x54), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pandn, LEG(NP, 0F, 0, 0xdf), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pandn, LEG(66, 0F, 0, 0xdf), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpandn, VEX(128, 66, 0F, IG, 0xdf), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(andnps, LEG(NP, 0F, 0, 0x55), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vandnps, VEX(128, NP, 0F, IG, 0x55), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vandnps, VEX(256, NP, 0F, IG, 0x55), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(andnpd, LEG(66, 0F, 0, 0x55), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vandnpd, VEX(128, 66, 0F, IG, 0x55), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vandnpd, VEX(256, 66, 0F, IG, 0x55), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(por, LEG(NP, 0F, 0, 0xeb), MMX, WRR, Pq, Pq, Qq)
 OPCODE(por, LEG(66, 0F, 0, 0xeb), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpor, VEX(128, 66, 0F, IG, 0xeb), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(orps, LEG(NP, 0F, 0, 0x56), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vorps, VEX(128, NP, 0F, IG, 0x56), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vorps, VEX(256, NP, 0F, IG, 0x56), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(orpd, LEG(66, 0F, 0, 0x56), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vorpd, VEX(128, 66, 0F, IG, 0x56), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vorpd, VEX(256, 66, 0F, IG, 0x56), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pxor, LEG(NP, 0F, 0, 0xef), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pxor, LEG(66, 0F, 0, 0xef), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpxor, VEX(128, 66, 0F, IG, 0xef), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(xorps, LEG(NP, 0F, 0, 0x57), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vxorps, VEX(128, NP, 0F, IG, 0x57), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vxorps, VEX(256, NP, 0F, IG, 0x57), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(xorpd, LEG(66, 0F, 0, 0x57), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vxorpd, VEX(128, 66, 0F, IG, 0x57), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vxorpd, VEX(256, 66, 0F, IG, 0x57), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(psllw, LEG(NP, 0F, 0, 0xf1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psllw, LEG(66, 0F, 0, 0xf1), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsllw, VEX(128, 66, 0F, IG, 0xf1), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pslld, LEG(NP, 0F, 0, 0xf2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pslld, LEG(66, 0F, 0, 0xf2), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpslld, VEX(128, 66, 0F, IG, 0xf2), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psllq, LEG(NP, 0F, 0, 0xf3), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psllq, LEG(66, 0F, 0, 0xf3), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsllq, VEX(128, 66, 0F, IG, 0xf3), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psrlw, LEG(NP, 0F, 0, 0xd1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrlw, LEG(66, 0F, 0, 0xd1), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsrlw, VEX(128, 66, 0F, IG, 0xd1), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psrld, LEG(NP, 0F, 0, 0xd2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrld, LEG(66, 0F, 0, 0xd2), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsrld, VEX(128, 66, 0F, IG, 0xd2), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psrlq, LEG(NP, 0F, 0, 0xd3), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrlq, LEG(66, 0F, 0, 0xd3), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsrlq, VEX(128, 66, 0F, IG, 0xd3), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psraw, LEG(NP, 0F, 0, 0xe1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psraw, LEG(66, 0F, 0, 0xe1), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsraw, VEX(128, 66, 0F, IG, 0xe1), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(psrad, LEG(NP, 0F, 0, 0xe2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrad, LEG(66, 0F, 0, 0xe2), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpsrad, VEX(128, 66, 0F, IG, 0xe2), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(palignr, LEG(NP, 0F3A, 0, 0x0f), SSSE3, WRRR, Pq, Pq, Qq, Ib)
 OPCODE(palignr, LEG(66, 0F3A, 0, 0x0f), SSSE3, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vpalignr, VEX(128, 66, 0F3A, IG, 0x0f), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(packsswb, LEG(NP, 0F, 0, 0x63), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packsswb, LEG(66, 0F, 0, 0x63), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpacksswb, VEX(128, 66, 0F, IG, 0x63), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packssdw, LEG(66, 0F, 0, 0x6b), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpackssdw, VEX(128, 66, 0F, IG, 0x6b), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(packuswb, LEG(NP, 0F, 0, 0x67), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packuswb, LEG(66, 0F, 0, 0x67), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpackuswb, VEX(128, 66, 0F, IG, 0x67), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(packusdw, LEG(66, 0F38, 0, 0x2b), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpackusdw, VEX(128, 66, 0F38, IG, 0x2b), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpckhbw, LEG(NP, 0F, 0, 0x68), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhbw, LEG(66, 0F, 0, 0x68), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpckhbw, VEX(128, 66, 0F, IG, 0x68), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpckhwd, LEG(NP, 0F, 0, 0x69), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhwd, LEG(66, 0F, 0, 0x69), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpckhwd, VEX(128, 66, 0F, IG, 0x69), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpckhdq, LEG(NP, 0F, 0, 0x6a), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhdq, LEG(66, 0F, 0, 0x6a), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpckhdq, VEX(128, 66, 0F, IG, 0x6a), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpckhqdq, LEG(66, 0F, 0, 0x6d), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpckhqdq, VEX(128, 66, 0F, IG, 0x6d), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpcklbw, LEG(NP, 0F, 0, 0x60), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpcklbw, LEG(66, 0F, 0, 0x60), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpcklbw, VEX(128, 66, 0F, IG, 0x60), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpcklwd, LEG(NP, 0F, 0, 0x61), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpcklwd, LEG(66, 0F, 0, 0x61), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpcklwd, VEX(128, 66, 0F, IG, 0x61), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpckldq, LEG(NP, 0F, 0, 0x62), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpckldq, LEG(66, 0F, 0, 0x62), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpckldq, VEX(128, 66, 0F, IG, 0x62), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(punpcklqdq, LEG(66, 0F, 0, 0x6c), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpunpcklqdq, VEX(128, 66, 0F, IG, 0x6c), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(unpcklps, LEG(NP, 0F, 0, 0x14), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vunpcklps, VEX(128, NP, 0F, IG, 0x14), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vunpcklps, VEX(256, NP, 0F, IG, 0x14), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(unpcklpd, LEG(66, 0F, 0, 0x14), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vunpcklpd, VEX(128, 66, 0F, IG, 0x14), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vunpcklpd, VEX(256, 66, 0F, IG, 0x14), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(unpckhps, LEG(NP, 0F, 0, 0x15), SSE, WRR, Vdq, Vdq, Wdq)
+OPCODE(vunpckhps, VEX(128, NP, 0F, IG, 0x15), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vunpckhps, VEX(256, NP, 0F, IG, 0x15), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(unpckhpd, LEG(66, 0F, 0, 0x15), SSE2, WRR, Vdq, Vdq, Wdq)
+OPCODE(vunpckhpd, VEX(128, 66, 0F, IG, 0x15), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vunpckhpd, VEX(256, 66, 0F, IG, 0x15), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pshufb, LEG(NP, 0F38, 0, 0x00), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(pshufb, LEG(66, 0F38, 0, 0x00), SSSE3, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpshufb, VEX(128, 66, 0F38, IG, 0x00), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(pshufw, LEG(NP, 0F, 0, 0x70), SSE, WRR, Pq, Qq, Ib)
 OPCODE(pshuflw, LEG(F2, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
+OPCODE(vpshuflw, VEX(128, F2, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
 OPCODE(pshufhw, LEG(F3, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
+OPCODE(vpshufhw, VEX(128, F3, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
 OPCODE(pshufd, LEG(66, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
+OPCODE(vpshufd, VEX(128, 66, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
 OPCODE(shufps, LEG(NP, 0F, 0, 0xc6), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vshufps, VEX(128, NP, 0F, IG, 0xc6), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vshufps, VEX(256, NP, 0F, IG, 0xc6), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(shufpd, LEG(66, 0F, 0, 0xc6), SSE2, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vshufpd, VEX(128, 66, 0F, IG, 0xc6), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vshufpd, VEX(256, 66, 0F, IG, 0xc6), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(blendps, LEG(66, 0F3A, 0, 0x0c), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vblendps, VEX(128, 66, 0F3A, IG, 0x0c), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vblendps, VEX(256, 66, 0F3A, IG, 0x0c), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(blendpd, LEG(66, 0F3A, 0, 0x0d), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vblendpd, VEX(128, 66, 0F3A, IG, 0x0d), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vblendpd, VEX(256, 66, 0F3A, IG, 0x0d), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(blendvps, LEG(66, 0F38, 0, 0x14), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vblendvps, VEX(128, 66, 0F3A, 0, 0x4a), AVX, WRRR, Vdq, Hdq, Wdq, Ldq)
+OPCODE(vblendvps, VEX(256, 66, 0F3A, 0, 0x4a), AVX, WRRR, Vqq, Hqq, Wqq, Lqq)
 OPCODE(blendvpd, LEG(66, 0F38, 0, 0x15), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vblendvpd, VEX(128, 66, 0F3A, 0, 0x4b), AVX, WRRR, Vdq, Hdq, Wdq, Ldq)
+OPCODE(vblendvpd, VEX(256, 66, 0F3A, 0, 0x4b), AVX, WRRR, Vqq, Hqq, Wqq, Lqq)
 OPCODE(pblendvb, LEG(66, 0F38, 0, 0x10), SSE4_1, WRR, Vdq, Vdq, Wdq)
+OPCODE(vpblendvb, VEX(128, 66, 0F3A, 0, 0x4c), AVX, WRRR, Vdq, Hdq, Wdq, Ldq)
 OPCODE(pblendw, LEG(66, 0F3A, 0, 0x0e), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
+OPCODE(vpblendw, VEX(128, 66, 0F3A, IG, 0x0e), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(insertps, LEG(66, 0F3A, 0, 0x21), SSE4_1, WRRR, Vdq, Vdq, Wd, Ib)
+OPCODE(vinsertps, VEX(128, 66, 0F3A, IG, 0x21), AVX, WRRR, Vdq, Hdq, Wd, Ib)
 OPCODE(pinsrb, LEG(66, 0F3A, 0, 0x20), SSE4_1, WRRR, Vdq, Vdq, RdMb, Ib)
+OPCODE(vpinsrb, VEX(128, 66, 0F3A, 0, 0x20), AVX, WRRR, Vdq, Hdq, RdMb, Ib)
 OPCODE(pinsrw, LEG(NP, 0F, 0, 0xc4), SSE, WRRR, Pq, Pq, RdMw, Ib)
 OPCODE(pinsrw, LEG(66, 0F, 0, 0xc4), SSE2, WRRR, Vdq, Vdq, RdMw, Ib)
+OPCODE(vpinsrw, VEX(128, 66, 0F, 0, 0xc4), AVX, WRRR, Vdq, Hdq, RdMw, Ib)
 OPCODE(pinsrd, LEG(66, 0F3A, 0, 0x22), SSE4_1, WRRR, Vdq, Vdq, Ed, Ib)
+OPCODE(vpinsrd, VEX(128, 66, 0F3A, 0, 0x22), AVX, WRRR, Vdq, Hdq, Ed, Ib)
 OPCODE(pinsrq, LEG(66, 0F3A, 1, 0x22), SSE4_1, WRRR, Vdq, Vdq, Eq, Ib)
+OPCODE(vpinsrq, VEX(128, 66, 0F3A, 1, 0x22), AVX, WRRR, Vdq, Hdq, Eq, Ib)
+OPCODE(vinsertf128, VEX(256, 66, 0F3A, 0, 0x18), AVX, WRRR, Vqq, Hqq, Wdq, Ib)
 OPCODE(extractps, LEG(66, 0F3A, 0, 0x17), SSE4_1, WRR, Ed, Vdq, Ib)
+OPCODE(vextractps, VEX(128, 66, 0F3A, IG, 0x17), AVX, WRR, Ed, Vdq, Ib)
 OPCODE(pextrb, LEG(66, 0F3A, 0, 0x14), SSE4_1, WRR, RdMb, Vdq, Ib)
+OPCODE(vpextrb, VEX(128, 66, 0F3A, 0, 0x14), AVX, WRR, RdMb, Vdq, Ib)
 OPCODE(pextrw, LEG(66, 0F3A, 0, 0x15), SSE4_1, WRR, RdMw, Vdq, Ib)
+OPCODE(vpextrw, VEX(128, 66, 0F3A, 0, 0x15), AVX, WRR, RdMw, Vdq, Ib)
 OPCODE(pextrd, LEG(66, 0F3A, 0, 0x16), SSE4_1, WRR, Ed, Vdq, Ib)
+OPCODE(vpextrd, VEX(128, 66, 0F3A, 0, 0x16), AVX, WRR, Ed, Vdq, Ib)
 OPCODE(pextrq, LEG(66, 0F3A, 1, 0x16), SSE4_1, WRR, Eq, Vdq, Ib)
+OPCODE(vpextrq, VEX(128, 66, 0F3A, 1, 0x16), AVX, WRR, Eq, Vdq, Ib)
 OPCODE(pextrw, LEG(NP, 0F, 0, 0xc5), SSE, WRR, Gd, Nq, Ib)
 OPCODE(pextrw, LEG(NP, 0F, 1, 0xc5), SSE, WRR, Gq, Nq, Ib)
 OPCODE(pextrw, LEG(66, 0F, 0, 0xc5), SSE2, WRR, Gd, Udq, Ib)
 OPCODE(pextrw, LEG(66, 0F, 1, 0xc5), SSE2, WRR, Gq, Udq, Ib)
+OPCODE(vpextrw, VEX(128, 66, 0F, 0, 0xc5), AVX, WRR, Gd, Udq, Ib)
+OPCODE(vpextrw, VEX(128, 66, 0F, 1, 0xc5), AVX, WRR, Gq, Udq, Ib)
+OPCODE(vextractf128, VEX(256, 66, 0F3A, 0, 0x19), AVX, WRR, Wdq, Vqq, Ib)
+OPCODE(vbroadcastss, VEX(128, 66, 0F38, 0, 0x18), AVX, WR, Vdq, Md)
+OPCODE(vbroadcastss, VEX(256, 66, 0F38, 0, 0x18), AVX, WR, Vqq, Md)
+OPCODE(vbroadcastsd, VEX(256, 66, 0F38, 0, 0x19), AVX, WR, Vqq, Mq)
+OPCODE(vbroadcastf128, VEX(256, 66, 0F38, 0, 0x1a), AVX, WR, Vqq, Mdq)
+OPCODE(vperm2f128, VEX(256, 66, 0F3A, 0, 0x06), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
+OPCODE(vpermilps, VEX(128, 66, 0F38, 0, 0x0c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpermilps, VEX(256, 66, 0F38, 0, 0x0c), AVX, WRR, Vqq, Hqq, Wqq)
+OPCODE(vpermilps, VEX(128, 66, 0F3A, 0, 0x04), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vpermilps, VEX(256, 66, 0F3A, 0, 0x04), AVX, WRR, Vqq, Wqq, Ib)
+OPCODE(vpermilpd, VEX(128, 66, 0F38, 0, 0x0d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpermilpd, VEX(256, 66, 0F38, 0, 0x0d), AVX, WRR, Vqq, Hqq, Wqq)
+OPCODE(vpermilpd, VEX(128, 66, 0F3A, 0, 0x05), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vpermilpd, VEX(256, 66, 0F3A, 0, 0x05), AVX, WRR, Vqq, Wqq, Ib)
 OPCODE(pmovsxbw, LEG(66, 0F38, 0, 0x20), SSE4_1, WR, Vdq, Wq)
+OPCODE(vpmovsxbw, VEX(128, 66, 0F38, IG, 0x20), AVX, WR, Vdq, Wq)
 OPCODE(pmovsxbd, LEG(66, 0F38, 0, 0x21), SSE4_1, WR, Vdq, Wd)
+OPCODE(vpmovsxbd, VEX(128, 66, 0F38, IG, 0x21), AVX, WR, Vdq, Wd)
 OPCODE(pmovsxbq, LEG(66, 0F38, 0, 0x22), SSE4_1, WR, Vdq, Ww)
+OPCODE(vpmovsxbq, VEX(128, 66, 0F38, IG, 0x22), AVX, WR, Vdq, Ww)
 OPCODE(pmovsxwd, LEG(66, 0F38, 0, 0x23), SSE4_1, WR, Vdq, Wq)
+OPCODE(vpmovsxwd, VEX(128, 66, 0F38, IG, 0x23), AVX, WR, Vdq, Wq)
 OPCODE(pmovsxwq, LEG(66, 0F38, 0, 0x24), SSE4_1, WR, Vdq, Wd)
+OPCODE(vpmovsxwq, VEX(128, 66, 0F38, IG, 0x24), AVX, WR, Vdq, Wd)
 OPCODE(pmovsxdq, LEG(66, 0F38, 0, 0x25), SSE4_1, WR, Vdq, Wq)
+OPCODE(vpmovsxdq, VEX(128, 66, 0F38, IG, 0x25), AVX, WR, Vdq, Wq)
 OPCODE(pmovzxbw, LEG(66, 0F38, 0, 0x30), SSE4_1, WR, Vdq, Wq)
+OPCODE(vpmovzxbw, VEX(128, 66, 0F38, IG, 0x30), AVX, WR, Vdq, Wq)
 OPCODE(pmovzxbd, LEG(66, 0F38, 0, 0x31), SSE4_1, WR, Vdq, Wd)
+OPCODE(vpmovzxbd, VEX(128, 66, 0F38, IG, 0x31), AVX, WR, Vdq, Wd)
 OPCODE(pmovzxbq, LEG(66, 0F38, 0, 0x32), SSE4_1, WR, Vdq, Ww)
+OPCODE(vpmovzxbq, VEX(128, 66, 0F38, IG, 0x32), AVX, WR, Vdq, Ww)
 OPCODE(pmovzxwd, LEG(66, 0F38, 0, 0x33), SSE4_1, WR, Vdq, Wq)
+OPCODE(vpmovzxwd, VEX(128, 66, 0F38, IG, 0x33), AVX, WR, Vdq, Wq)
 OPCODE(pmovzxwq, LEG(66, 0F38, 0, 0x34), SSE4_1, WR, Vdq, Wd)
+OPCODE(vpmovzxwq, VEX(128, 66, 0F38, IG, 0x34), AVX, WR, Vdq, Wd)
 OPCODE(pmovzxdq, LEG(66, 0F38, 0, 0x35), SSE4_1, WR, Vdq, Wq)
+OPCODE(vpmovzxdq, VEX(128, 66, 0F38, IG, 0x35), AVX, WR, Vdq, Wq)
 OPCODE(cvtpi2ps, LEG(NP, 0F, 0, 0x2a), SSE, WR, Vdq, Qq)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 0, 0x2a), SSE, WR, Vd, Ed)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 1, 0x2a), SSE, WR, Vd, Eq)
+OPCODE(vcvtsi2ss, VEX(IG, F3, 0F, 0, 0x2a), AVX, WRR, Vd, Hd, Ed)
+OPCODE(vcvtsi2ss, VEX(IG, F3, 0F, 1, 0x2a), AVX, WRR, Vd, Hd, Eq)
 OPCODE(cvtpi2pd, LEG(66, 0F, 0, 0x2a), SSE2, WR, Vdq, Qq)
 OPCODE(cvtsi2sd, LEG(F2, 0F, 0, 0x2a), SSE2, WR, Vq, Ed)
 OPCODE(cvtsi2sd, LEG(F2, 0F, 1, 0x2a), SSE2, WR, Vq, Eq)
+OPCODE(vcvtsi2sd, VEX(IG, F2, 0F, 0, 0x2a), AVX, WRR, Vq, Hq, Ed)
+OPCODE(vcvtsi2sd, VEX(IG, F2, 0F, 1, 0x2a), AVX, WRR, Vq, Hq, Eq)
 OPCODE(cvtps2pi, LEG(NP, 0F, 0, 0x2d), SSE, WR, Pq, Wq)
 OPCODE(cvtss2si, LEG(F3, 0F, 0, 0x2d), SSE, WR, Gd, Wd)
 OPCODE(cvtss2si, LEG(F3, 0F, 1, 0x2d), SSE, WR, Gq, Wd)
+OPCODE(vcvtss2si, VEX(IG, F3, 0F, 0, 0x2d), AVX, WR, Gd, Wd)
+OPCODE(vcvtss2si, VEX(IG, F3, 0F, 1, 0x2d), AVX, WR, Gq, Wd)
 OPCODE(cvtpd2pi, LEG(66, 0F, 0, 0x2d), SSE2, WR, Pq, Wdq)
 OPCODE(cvtsd2si, LEG(F2, 0F, 0, 0x2d), SSE2, WR, Gd, Wq)
 OPCODE(cvtsd2si, LEG(F2, 0F, 1, 0x2d), SSE2, WR, Gq, Wq)
+OPCODE(vcvtsd2si, VEX(IG, F2, 0F, 0, 0x2d), AVX, WR, Gd, Wq)
+OPCODE(vcvtsd2si, VEX(IG, F2, 0F, 1, 0x2d), AVX, WR, Gq, Wq)
 OPCODE(cvttps2pi, LEG(NP, 0F, 0, 0x2c), SSE, WR, Pq, Wq)
 OPCODE(cvttss2si, LEG(F3, 0F, 0, 0x2c), SSE, WR, Gd, Wd)
 OPCODE(cvttss2si, LEG(F3, 0F, 1, 0x2c), SSE, WR, Gq, Wd)
+OPCODE(vcvttss2si, VEX(IG, F3, 0F, 0, 0x2c), AVX, WR, Gd, Wd)
+OPCODE(vcvttss2si, VEX(IG, F3, 0F, 1, 0x2c), AVX, WR, Gq, Wd)
 OPCODE(cvttpd2pi, LEG(66, 0F, 0, 0x2c), SSE2, WR, Pq, Wdq)
 OPCODE(cvttsd2si, LEG(F2, 0F, 0, 0x2c), SSE2, WR, Gd, Wq)
 OPCODE(cvttsd2si, LEG(F2, 0F, 1, 0x2c), SSE2, WR, Gq, Wq)
+OPCODE(vcvttsd2si, VEX(IG, F2, 0F, 0, 0x2c), AVX, WR, Gd, Wq)
+OPCODE(vcvttsd2si, VEX(IG, F2, 0F, 1, 0x2c), AVX, WR, Gq, Wq)
 OPCODE(cvtpd2dq, LEG(F2, 0F, 0, 0xe6), SSE2, WR, Vdq, Wdq)
+OPCODE(vcvtpd2dq, VEX(128, F2, 0F, IG, 0xe6), AVX, WR, Vdq, Wdq)
+OPCODE(vcvtpd2dq, VEX(256, F2, 0F, IG, 0xe6), AVX, WR, Vdq, Wqq)
 OPCODE(cvttpd2dq, LEG(66, 0F, 0, 0xe6), SSE2, WR, Vdq, Wdq)
+OPCODE(vcvttpd2dq, VEX(128, 66, 0F, IG, 0xe6), AVX, WR, Vdq, Wdq)
+OPCODE(vcvttpd2dq, VEX(256, 66, 0F, IG, 0xe6), AVX, WR, Vdq, Wqq)
 OPCODE(cvtdq2pd, LEG(F3, 0F, 0, 0xe6), SSE2, WR, Vdq, Wq)
+OPCODE(vcvtdq2pd, VEX(128, F3, 0F, IG, 0xe6), AVX, WR, Vdq, Wq)
+OPCODE(vcvtdq2pd, VEX(256, F3, 0F, IG, 0xe6), AVX, WR, Vqq, Wdq)
 OPCODE(cvtps2pd, LEG(NP, 0F, 0, 0x5a), SSE2, WR, Vdq, Wq)
+OPCODE(vcvtps2pd, VEX(128, NP, 0F, IG, 0x5a), AVX, WR, Vdq, Wq)
+OPCODE(vcvtps2pd, VEX(256, NP, 0F, IG, 0x5a), AVX, WR, Vqq, Wdq)
 OPCODE(cvtpd2ps, LEG(66, 0F, 0, 0x5a), SSE2, WR, Vdq, Wdq)
+OPCODE(vcvtpd2ps, VEX(128, 66, 0F, IG, 0x5a), AVX, WR, Vdq, Wdq)
+OPCODE(vcvtpd2ps, VEX(256, 66, 0F, IG, 0x5a), AVX, WR, Vdq, Wqq)
 OPCODE(cvtss2sd, LEG(F3, 0F, 0, 0x5a), SSE2, WR, Vq, Wd)
+OPCODE(vcvtss2sd, VEX(IG, F3, 0F, IG, 0x5a), AVX, WRR, Vq, Hq, Wd)
 OPCODE(cvtsd2ss, LEG(F2, 0F, 0, 0x5a), SSE2, WR, Vd, Wq)
+OPCODE(vcvtsd2ss, VEX(IG, F2, 0F, IG, 0x5a), AVX, WRR, Vd, Hd, Wq)
 OPCODE(cvtdq2ps, LEG(NP, 0F, 0, 0x5b), SSE2, WR, Vdq, Wdq)
+OPCODE(vcvtdq2ps, VEX(128, NP, 0F, IG, 0x5b), AVX, WR, Vdq, Wdq)
+OPCODE(vcvtdq2ps, VEX(256, NP, 0F, IG, 0x5b), AVX, WR, Vqq, Wqq)
 OPCODE(cvtps2dq, LEG(66, 0F, 0, 0x5b), SSE2, WR, Vdq, Wdq)
+OPCODE(vcvtps2dq, VEX(128, 66, 0F, IG, 0x5b), AVX, WR, Vdq, Wdq)
+OPCODE(vcvtps2dq, VEX(256, 66, 0F, IG, 0x5b), AVX, WR, Vqq, Wqq)
 OPCODE(cvttps2dq, LEG(F3, 0F, 0, 0x5b), SSE2, WR, Vdq, Wdq)
+OPCODE(vcvttps2dq, VEX(128, F3, 0F, IG, 0x5b), AVX, WR, Vdq, Wdq)
+OPCODE(vcvttps2dq, VEX(256, F3, 0F, IG, 0x5b), AVX, WR, Vqq, Wqq)
 OPCODE(maskmovq, LEG(NP, 0F, 0, 0xf7), SSE, RR, Pq, Nq)
 OPCODE(maskmovdqu, LEG(66, 0F, 0, 0xf7), SSE2, RR, Vdq, Udq)
+OPCODE(vmaskmovdqu, VEX(128, 66, 0F, IG, 0xf7), AVX, RR, Vdq, Udq)
+OPCODE(vmaskmovps, VEX(128, 66, 0F38, 0, 0x2c), AVX, WRR, Vdq, Hdq, Mdq)
+OPCODE(vmaskmovps, VEX(128, 66, 0F38, 0, 0x2e), AVX, WRR, Mdq, Hdq, Vdq)
+OPCODE(vmaskmovps, VEX(256, 66, 0F38, 0, 0x2c), AVX, WRR, Vqq, Hqq, Mqq)
+OPCODE(vmaskmovps, VEX(256, 66, 0F38, 0, 0x2e), AVX, WRR, Mqq, Hqq, Vqq)
+OPCODE(vmaskmovpd, VEX(128, 66, 0F38, 0, 0x2d), AVX, WRR, Vdq, Hdq, Mdq)
+OPCODE(vmaskmovpd, VEX(128, 66, 0F38, 0, 0x2f), AVX, WRR, Mdq, Hdq, Vdq)
+OPCODE(vmaskmovpd, VEX(256, 66, 0F38, 0, 0x2d), AVX, WRR, Vqq, Hqq, Mqq)
+OPCODE(vmaskmovpd, VEX(256, 66, 0F38, 0, 0x2f), AVX, WRR, Mqq, Hqq, Vqq)
 OPCODE(movntps, LEG(NP, 0F, 0, 0x2b), SSE, WR, Mdq, Vdq)
+OPCODE(vmovntps, VEX(128, NP, 0F, IG, 0x2b), AVX, WR, Mdq, Vdq)
+OPCODE(vmovntps, VEX(256, NP, 0F, IG, 0x2b), AVX, WR, Mqq, Vqq)
 OPCODE(movntpd, LEG(66, 0F, 0, 0x2b), SSE2, WR, Mdq, Vdq)
+OPCODE(vmovntpd, VEX(128, 66, 0F, IG, 0x2b), AVX, WR, Mdq, Vdq)
+OPCODE(vmovntpd, VEX(256, 66, 0F, IG, 0x2b), AVX, WR, Mqq, Vqq)
 OPCODE(movnti, LEG(NP, 0F, 0, 0xc3), SSE2, WR, Md, Gd)
 OPCODE(movnti, LEG(NP, 0F, 1, 0xc3), SSE2, WR, Mq, Gq)
 OPCODE(movntq, LEG(NP, 0F, 0, 0xe7), SSE, WR, Mq, Pq)
 OPCODE(movntdq, LEG(66, 0F, 0, 0xe7), SSE2, WR, Mdq, Vdq)
+OPCODE(vmovntdq, VEX(128, 66, 0F, IG, 0xe7), AVX, WR, Mdq, Vdq)
+OPCODE(vmovntdq, VEX(256, 66, 0F, IG, 0xe7), AVX, WR, Mqq, Vqq)
 OPCODE(movntdqa, LEG(66, 0F38, 0, 0x2a), SSE4_1, WR, Vdq, Mdq)
+OPCODE(vmovntdqa, VEX(128, 66, 0F38, IG, 0x2a), AVX, WR, Vdq, Mdq)
 OPCODE(pause, LEG(F3, NA, 0, 0x90), SSE2, )
 OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
+OPCODE(vzeroupper, VEX(128, NP, 0F, IG, 0x77), AVX, )
+OPCODE(vzeroall, VEX(256, NP, 0F, IG, 0x77), AVX, )
 
 OPCODE_GRP(grp12_LEG_66, LEG(66, 0F, 0, 0x71))
 OPCODE_GRP_BEGIN(grp12_LEG_66)
@@ -856,6 +1607,13 @@ OPCODE_GRP_BEGIN(grp12_LEG_NP)
     OPCODE_GRPMEMB(grp12_LEG_NP, psraw, 4, MMX, WRR, Nq, Nq, Ib)
 OPCODE_GRP_END(grp12_LEG_NP)
 
+OPCODE_GRP(grp12_VEX_128_66, VEX(128, 66, 0F, IG, 0x71))
+OPCODE_GRP_BEGIN(grp12_VEX_128_66)
+    OPCODE_GRPMEMB(grp12_VEX_128_66, vpsllw, 6, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp12_VEX_128_66, vpsrlw, 2, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp12_VEX_128_66, vpsraw, 4, AVX, WRR, Hdq, Udq, Ib)
+OPCODE_GRP_END(grp12_VEX_128_66)
+
 OPCODE_GRP(grp13_LEG_66, LEG(66, 0F, 0, 0x72))
 OPCODE_GRP_BEGIN(grp13_LEG_66)
     OPCODE_GRPMEMB(grp13_LEG_66, pslld, 6, SSE2, WRR, Udq, Udq, Ib)
@@ -870,6 +1628,13 @@ OPCODE_GRP_BEGIN(grp13_LEG_NP)
     OPCODE_GRPMEMB(grp13_LEG_NP, psrad, 4, MMX, WRR, Nq, Nq, Ib)
 OPCODE_GRP_END(grp13_LEG_NP)
 
+OPCODE_GRP(grp13_VEX_128_66, VEX(128, 66, 0F, IG, 0x72))
+OPCODE_GRP_BEGIN(grp13_VEX_128_66)
+    OPCODE_GRPMEMB(grp13_VEX_128_66, vpslld, 6, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp13_VEX_128_66, vpsrld, 2, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp13_VEX_128_66, vpsrad, 4, AVX, WRR, Hdq, Udq, Ib)
+OPCODE_GRP_END(grp13_VEX_128_66)
+
 OPCODE_GRP(grp14_LEG_66, LEG(66, 0F, 0, 0x73))
 OPCODE_GRP_BEGIN(grp14_LEG_66)
     OPCODE_GRPMEMB(grp14_LEG_66, psllq, 6, SSE2, WRR, Udq, Udq, Ib)
@@ -884,6 +1649,14 @@ OPCODE_GRP_BEGIN(grp14_LEG_NP)
     OPCODE_GRPMEMB(grp14_LEG_NP, psrlq, 2, MMX, WRR, Nq, Nq, Ib)
 OPCODE_GRP_END(grp14_LEG_NP)
 
+OPCODE_GRP(grp14_VEX_128_66, VEX(128, 66, 0F, IG, 0x73))
+OPCODE_GRP_BEGIN(grp14_VEX_128_66)
+    OPCODE_GRPMEMB(grp14_VEX_128_66, vpsllq, 6, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp14_VEX_128_66, vpslldq, 7, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp14_VEX_128_66, vpsrlq, 2, AVX, WRR, Hdq, Udq, Ib)
+    OPCODE_GRPMEMB(grp14_VEX_128_66, vpsrldq, 3, AVX, WRR, Hdq, Udq, Ib)
+OPCODE_GRP_END(grp14_VEX_128_66)
+
 OPCODE_GRP(grp15_LEG_NP, LEG(NP, 0F, 0, 0xae))
 OPCODE_GRP_BEGIN(grp15_LEG_NP)
     OPCODE_GRPMEMB(grp15_LEG_NP, sfence_clflush, 7, SSE, RR, modrm_mod, modrm)
@@ -893,6 +1666,12 @@ OPCODE_GRP_BEGIN(grp15_LEG_NP)
     OPCODE_GRPMEMB(grp15_LEG_NP, stmxcsr, 3, SSE, W, Md)
 OPCODE_GRP_END(grp15_LEG_NP)
 
+OPCODE_GRP(grp15_VEX_Z_NP, VEX(Z, NP, 0F, IG, 0xae))
+OPCODE_GRP_BEGIN(grp15_VEX_Z_NP)
+    OPCODE_GRPMEMB(grp15_VEX_Z_NP, vldmxcsr, 2, AVX, R, Md)
+    OPCODE_GRPMEMB(grp15_VEX_Z_NP, vstmxcsr, 3, AVX, W, Md)
+OPCODE_GRP_END(grp15_VEX_Z_NP)
+
 OPCODE_GRP(grp16_LEG_NP, LEG(NP, 0F, 0, 0x18))
 OPCODE_GRP_BEGIN(grp16_LEG_NP)
     OPCODE_GRPMEMB(grp16_LEG_NP, prefetcht0, 1, SSE, R, Mb)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 62/75] target/i386: introduce AVX2 translators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (59 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 61/75] target/i386: introduce AVX vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 63/75] target/i386: introduce AVX2 code generators Jan Bobek
                   ` (12 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Use the translator macros to define translators required by AVX2
instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 50eab9181c..3f4bb40932 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -7692,11 +7692,11 @@ DEF_TRANSLATE_INSN2(Vd, Wd)
 DEF_TRANSLATE_INSN2(Vd, Wq)
 DEF_TRANSLATE_INSN2(Vdq, Ed)
 DEF_TRANSLATE_INSN2(Vdq, Eq)
-DEF_TRANSLATE_INSN2(Vdq, Md)
 DEF_TRANSLATE_INSN2(Vdq, Mdq)
 DEF_TRANSLATE_INSN2(Vdq, Nq)
 DEF_TRANSLATE_INSN2(Vdq, Qq)
 DEF_TRANSLATE_INSN2(Vdq, Udq)
+DEF_TRANSLATE_INSN2(Vdq, Wb)
 DEF_TRANSLATE_INSN2(Vdq, Wd)
 DEF_TRANSLATE_INSN2(Vdq, Wdq)
 DEF_TRANSLATE_INSN2(Vdq, Wq)
@@ -7706,12 +7706,14 @@ DEF_TRANSLATE_INSN2(Vq, Ed)
 DEF_TRANSLATE_INSN2(Vq, Eq)
 DEF_TRANSLATE_INSN2(Vq, Wd)
 DEF_TRANSLATE_INSN2(Vq, Wq)
-DEF_TRANSLATE_INSN2(Vqq, Md)
 DEF_TRANSLATE_INSN2(Vqq, Mdq)
-DEF_TRANSLATE_INSN2(Vqq, Mq)
 DEF_TRANSLATE_INSN2(Vqq, Mqq)
+DEF_TRANSLATE_INSN2(Vqq, Wb)
+DEF_TRANSLATE_INSN2(Vqq, Wd)
 DEF_TRANSLATE_INSN2(Vqq, Wdq)
+DEF_TRANSLATE_INSN2(Vqq, Wq)
 DEF_TRANSLATE_INSN2(Vqq, Wqq)
+DEF_TRANSLATE_INSN2(Vqq, Ww)
 DEF_TRANSLATE_INSN2(Wd, Vd)
 DEF_TRANSLATE_INSN2(Wdq, Vdq)
 DEF_TRANSLATE_INSN2(Wq, Vq)
@@ -7763,6 +7765,7 @@ DEF_TRANSLATE_INSN3(Gd, Udq, Ib)
 DEF_TRANSLATE_INSN3(Gq, Nq, Ib)
 DEF_TRANSLATE_INSN3(Gq, Udq, Ib)
 DEF_TRANSLATE_INSN3(Hdq, Udq, Ib)
+DEF_TRANSLATE_INSN3(Hqq, Uqq, Ib)
 DEF_TRANSLATE_INSN3(Mdq, Hdq, Vdq)
 DEF_TRANSLATE_INSN3(Mqq, Hqq, Vqq)
 DEF_TRANSLATE_INSN3(Nq, Nq, Ib)
@@ -7789,6 +7792,7 @@ DEF_TRANSLATE_INSN3(Vdq, Vdq, UdqMhq)
 DEF_TRANSLATE_INSN3(Vdq, Vdq, Wdq)
 DEF_TRANSLATE_INSN3(Vdq, Vq, Mq)
 DEF_TRANSLATE_INSN3(Vdq, Vq, Wq)
+DEF_TRANSLATE_INSN3(Vdq, Wd, modrm_mod)
 DEF_TRANSLATE_INSN3(Vdq, Wdq, Ib)
 DEF_TRANSLATE_INSN3(Vq, Hq, Ed)
 DEF_TRANSLATE_INSN3(Vq, Hq, Eq)
@@ -7797,7 +7801,10 @@ DEF_TRANSLATE_INSN3(Vq, Hq, Wq)
 DEF_TRANSLATE_INSN3(Vq, Vq, Wq)
 DEF_TRANSLATE_INSN3(Vq, Wq, Ib)
 DEF_TRANSLATE_INSN3(Vqq, Hqq, Mqq)
+DEF_TRANSLATE_INSN3(Vqq, Hqq, Wdq)
 DEF_TRANSLATE_INSN3(Vqq, Hqq, Wqq)
+DEF_TRANSLATE_INSN3(Vqq, Wd, modrm_mod)
+DEF_TRANSLATE_INSN3(Vqq, Wq, modrm_mod)
 DEF_TRANSLATE_INSN3(Vqq, Wqq, Ib)
 DEF_TRANSLATE_INSN3(Wdq, Vqq, Ib)
 
@@ -7921,8 +7928,14 @@ DEF_TRANSLATE_INSN4(Vqq, Hqq, Wqq, Lqq)
         }                                                               \
     }
 
+DEF_TRANSLATE_INSN5(Vdq, Hdq, Vdq, MDdq, Hdq)
+DEF_TRANSLATE_INSN5(Vdq, Hdq, Vdq, MQdq, Hdq)
+DEF_TRANSLATE_INSN5(Vdq, Hdq, Vdq, MQqq, Hdq)
 DEF_TRANSLATE_INSN5(Vdq, Hdq, Wd, modrm_mod, vex_v)
 DEF_TRANSLATE_INSN5(Vdq, Hdq, Wq, modrm_mod, vex_v)
+DEF_TRANSLATE_INSN5(Vqq, Hqq, Vqq, MDdq, Hqq)
+DEF_TRANSLATE_INSN5(Vqq, Hqq, Vqq, MDqq, Hqq)
+DEF_TRANSLATE_INSN5(Vqq, Hqq, Vqq, MQqq, Hqq)
 DEF_TRANSLATE_INSN5(Wdq, Hdq, Vd, modrm_mod, vex_v)
 DEF_TRANSLATE_INSN5(Wdq, Hdq, Vq, modrm_mod, vex_v)
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 63/75] target/i386: introduce AVX2 code generators
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (60 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 62/75] target/i386: introduce AVX2 translators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 64/75] target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h Jan Bobek
                   ` (11 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Introduce code generators required by AVX2 instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/translate.c | 407 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 395 insertions(+), 12 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3f4bb40932..3149989d68 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4946,6 +4946,11 @@ DEF_INSNOP_ALIAS(Mhq, M)
 DEF_INSNOP_ALIAS(Mdq, M)
 DEF_INSNOP_ALIAS(Mqq, M)
 
+DEF_INSNOP_ALIAS(MDdq, M)
+DEF_INSNOP_ALIAS(MDqq, M)
+DEF_INSNOP_ALIAS(MQdq, M)
+DEF_INSNOP_ALIAS(MQqq, M)
+
 /*
  * 32-bit general register operands
  */
@@ -5907,6 +5912,14 @@ GEN_INSN2(vpmovmskb, Gq, Udq)
     tcg_gen_extu_i32_i64(arg1, arg1_r32);
     tcg_temp_free_i32(arg1_r32);
 }
+DEF_GEN_INSN2_HELPER_DEP(vpmovmskb, pmovmskb_xmm, Gd, Uqq)
+GEN_INSN2(vpmovmskb, Gq, Uqq)
+{
+    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
+    gen_insn2(vpmovmskb, Gd, Uqq)(env, s, arg1_r32, arg2);
+    tcg_gen_extu_i32_i64(arg1, arg1_r32);
+    tcg_temp_free_i32(arg1_r32);
+}
 
 DEF_GEN_INSN2_HELPER_DEP(movmskps, movmskps, Gd, Udq)
 GEN_INSN2(movmskps, Gq, Udq)
@@ -6049,27 +6062,35 @@ GEN_INSN2(vmovddup, Vqq, Wqq)
 DEF_GEN_INSN3_GVEC(paddb, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddb, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpaddb, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpaddb, Vqq, Hqq, Wqq, add, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddw, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddw, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpaddw, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpaddw, Vqq, Hqq, Wqq, add, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddd, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(paddd, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(vpaddd, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpaddd, Vqq, Hqq, Wqq, add, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(paddq, Pq, Pq, Qq, add, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(paddq, Vdq, Vdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vpaddq, Vdq, Hdq, Wdq, add, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpaddq, Vqq, Hqq, Wqq, add, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(paddsb, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddsb, Vdq, Vdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpaddsb, Vdq, Hdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpaddsb, Vqq, Hqq, Wqq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddsw, Pq, Pq, Qq, ssadd, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddsw, Vdq, Vdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpaddsw, Vdq, Hdq, Wdq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpaddsw, Vqq, Hqq, Wqq, ssadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddusb, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddusb, Vdq, Vdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpaddusb, Vdq, Hdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpaddusb, Vqq, Hqq, Wqq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(paddusw, Pq, Pq, Qq, usadd, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(paddusw, Vdq, Vdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpaddusw, Vdq, Hdq, Wdq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpaddusw, Vqq, Hqq, Wqq, usadd, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(addps, addps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vaddps, addps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vaddps, addps, Vqq, Hqq, Wqq)
@@ -6083,12 +6104,15 @@ DEF_GEN_INSN3_HELPER_EPP(vaddsd, addsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(phaddw, phaddw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phaddw, phaddw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vphaddw, phaddw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphaddw, phaddw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(phaddd, phaddd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phaddd, phaddd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vphaddd, phaddd_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphaddd, phaddd_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(phaddsw, phaddsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phaddsw, phaddsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vphaddsw, phaddsw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphaddsw, phaddsw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(haddps, haddps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vhaddps, haddps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vhaddps, haddps, Vqq, Hqq, Wqq)
@@ -6099,27 +6123,35 @@ DEF_GEN_INSN3_HELPER_EPP(vhaddpd, haddpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_GVEC(psubb, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubb, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpsubb, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpsubb, Vqq, Hqq, Wqq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubw, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubw, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpsubw, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpsubw, Vqq, Hqq, Wqq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubd, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(psubd, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(vpsubd, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpsubd, Vqq, Hqq, Wqq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(psubq, Pq, Pq, Qq, sub, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(psubq, Vdq, Vdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vpsubq, Vdq, Hdq, Wdq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpsubq, Vqq, Hqq, Wqq, sub, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(psubsb, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubsb, Vdq, Vdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpsubsb, Vdq, Hdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpsubsb, Vqq, Hqq, Wqq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubsw, Pq, Pq, Qq, sssub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubsw, Vdq, Vdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpsubsw, Vdq, Hdq, Wdq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpsubsw, Vqq, Hqq, Wqq, sssub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubusb, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubusb, Vdq, Vdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpsubusb, Vdq, Hdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpsubusb, Vqq, Hqq, Wqq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(psubusw, Pq, Pq, Qq, ussub, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(psubusw, Vdq, Vdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpsubusw, Vdq, Hdq, Wdq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpsubusw, Vqq, Hqq, Wqq, ussub, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_HELPER_EPP(subps, subps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vsubps, subps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vsubps, subps, Vqq, Hqq, Wqq)
@@ -6133,12 +6165,15 @@ DEF_GEN_INSN3_HELPER_EPP(vsubsd, subsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(phsubw, phsubw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phsubw, phsubw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vphsubw, phsubw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphsubw, phsubw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(phsubd, phsubd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phsubd, phsubd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vphsubd, phsubd_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphsubd, phsubd_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(phsubsw, phsubsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(phsubsw, phsubsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vphsubsw, phsubsw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vphsubsw, phsubsw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(hsubps, hsubps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vhsubps, hsubps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vhsubps, hsubps, Vqq, Hqq, Wqq)
@@ -6156,22 +6191,29 @@ DEF_GEN_INSN3_HELPER_EPP(vaddsubpd, addsubpd, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmullw, pmullw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmullw, pmullw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmulld, pmulld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulld, pmulld_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulld, pmulld_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulhw, pmulhw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulhw, pmulhw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulhuw, pmulhuw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulhuw, pmulhuw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmuldq, pmuldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmuludq, pmuludq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmuludq, pmuludq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulhrsw, pmulhrsw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmulhrsw, pmulhrsw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(mulps, mulps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vmulps, mulps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vmulps, mulps, Vqq, Hqq, Wqq)
@@ -6185,9 +6227,11 @@ DEF_GEN_INSN3_HELPER_EPP(vmulsd, mulsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmaddwd, pmaddwd_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmaddwd, pmaddwd_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmaddubsw, pmaddubsw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpmaddubsw, pmaddubsw_xmm, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN3_HELPER_EPP(divps, divps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vdivps, divps, Vdq, Hdq, Wdq)
@@ -6224,17 +6268,23 @@ DEF_GEN_INSN3_HELPER_EPP(vrsqrtss, rsqrtss, Vd, Hd, Wd)
 DEF_GEN_INSN3_GVEC(pminub, Pq, Pq, Qq, umin, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminub, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpminub, Vdq, Hdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpminub, Vqq, Hqq, Wqq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminuw, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpminuw, Vdq, Hdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpminuw, Vqq, Hqq, Wqq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminud, Vdq, Vdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(vpminud, Vdq, Hdq, Wdq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpminud, Vqq, Hqq, Wqq, umin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(pminsb, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpminsb, Vdq, Hdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpminsb, Vqq, Hqq, Wqq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pminsw, Pq, Pq, Qq, smin, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminsw, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpminsw, Vdq, Hdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpminsw, Vqq, Hqq, Wqq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pminsd, Vdq, Vdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(vpminsd, Vdq, Hdq, Wdq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpminsd, Vqq, Hqq, Wqq, smin, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_HELPER_EPP(minps, minps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vminps, minps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vminps, minps, Vqq, Hqq, Wqq)
@@ -6250,17 +6300,23 @@ DEF_GEN_INSN2_HELPER_EPP(vphminposuw, phminposuw_xmm, Vdq, Wdq)
 DEF_GEN_INSN3_GVEC(pmaxub, Pq, Pq, Qq, umax, MM_OPRSZ, MM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxub, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpmaxub, Vdq, Hdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpmaxub, Vqq, Hqq, Wqq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxuw, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpmaxuw, Vdq, Hdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpmaxuw, Vqq, Hqq, Wqq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxud, Vdq, Vdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(vpmaxud, Vdq, Hdq, Wdq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpmaxud, Vqq, Hqq, Wqq, umax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(pmaxsb, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(vpmaxsb, Vdq, Hdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
+DEF_GEN_INSN3_GVEC(vpmaxsb, Vqq, Hqq, Wqq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_8)
 DEF_GEN_INSN3_GVEC(pmaxsw, Pq, Pq, Qq, smax, MM_OPRSZ, MM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxsw, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(vpmaxsw, Vdq, Hdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
+DEF_GEN_INSN3_GVEC(vpmaxsw, Vqq, Hqq, Wqq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_16)
 DEF_GEN_INSN3_GVEC(pmaxsd, Vdq, Vdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_GVEC(vpmaxsd, Vdq, Hdq, Wdq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
+DEF_GEN_INSN3_GVEC(vpmaxsd, Vqq, Hqq, Wqq, smax, XMM_OPRSZ, XMM_MAXSZ, MO_32)
 DEF_GEN_INSN3_HELPER_EPP(maxps, maxps, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vmaxps, maxps, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vmaxps, maxps, Vqq, Hqq, Wqq)
@@ -6274,32 +6330,42 @@ DEF_GEN_INSN3_HELPER_EPP(vmaxsd, maxsd, Vq, Hq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpavgb, pavgb_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpavgb, pavgb_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpavgw, pavgw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpavgw, pavgw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsadbw, psadbw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsadbw, psadbw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN4_HELPER_EPPI(mpsadbw, mpsadbw_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vmpsadbw, mpsadbw_xmm, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vmpsadbw, mpsadbw_xmm, Vqq, Hqq, Wqq, Ib)
 DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsb, pabsb_xmm, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(vpabsb, pabsb_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vpabsb, pabsb_xmm, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsw, pabsw_xmm, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(vpabsw, pabsw_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vpabsw, pabsw_xmm, Vqq, Wqq)
 DEF_GEN_INSN2_HELPER_EPP(pabsd, pabsd_mmx, Pq, Qq)
 DEF_GEN_INSN2_HELPER_EPP(pabsd, pabsd_xmm, Vdq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(vpabsd, pabsd_xmm, Vdq, Wdq)
+DEF_GEN_INSN2_HELPER_EPP(vpabsd, pabsd_xmm, Vqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(psignb, psignb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignb, psignb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsignb, psignb_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsignb, psignb_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignw, psignw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsignw, psignw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsignw, psignw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psignd, psignd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsignd, psignd_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsignd, psignd_xmm, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN4_HELPER_EPPI(dpps, dpps_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vdpps, dpps_xmm, Vdq, Hdq, Wdq, Ib)
@@ -6335,25 +6401,33 @@ DEF_GEN_INSN4_HELPER_EPPI(vpclmulqdq, pclmulqdq_xmm, Vdq, Hdq, Wdq, Ib)
 DEF_GEN_INSN3_GVEC(pcmpeqb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(vpcmpeqb, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqb, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(vpcmpeqw, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqw, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(vpcmpeqd, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqd, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpeqq, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(vpcmpeqq, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_EQ)
+DEF_GEN_INSN3_GVEC(vpcmpeqq, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_EQ)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtb, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(vpcmpgtb, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtb, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_8, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtw, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtw, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(vpcmpgtw, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtw, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_16, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Pq, Pq, Qq, cmp, MM_OPRSZ, MM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtd, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(vpcmpgtd, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtd, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_32, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(pcmpgtq, Vdq, Vdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_GT)
 DEF_GEN_INSN3_GVEC(vpcmpgtq, Vdq, Hdq, Wdq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_GT)
+DEF_GEN_INSN3_GVEC(vpcmpgtq, Vqq, Hqq, Wqq, cmp, XMM_OPRSZ, XMM_MAXSZ, MO_64, TCG_COND_GT)
 
 DEF_GEN_INSN3_HELPER_EPPI(pcmpestrm, pcmpestrm_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_EPPI(vpcmpestrm, pcmpestrm_xmm, Vdq, Wdq, Ib)
@@ -6785,6 +6859,7 @@ DEF_GEN_INSN2_HELPER_EPP(vucomisd, ucomisd, Vq, Wq)
 DEF_GEN_INSN3_GVEC(pand, Pq, Pq, Qq, and, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pand, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vpand, Vdq, Hdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpand, Vqq, Hqq, Wqq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andps, Vdq, Vdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vandps, Vdq, Hdq, Wdq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vandps, Vqq, Hqq, Wqq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
@@ -6794,6 +6869,7 @@ DEF_GEN_INSN3_GVEC(vandpd, Vqq, Hqq, Wqq, and, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pandn, Pq, Pq, Qq, andn, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pandn, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vpandn, Vdq, Hdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpandn, Vqq, Hqq, Wqq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(andnps, Vdq, Vdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vandnps, Vdq, Hdq, Wdq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vandnps, Vqq, Hqq, Wqq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
@@ -6803,6 +6879,7 @@ DEF_GEN_INSN3_GVEC(vandnpd, Vqq, Hqq, Wqq, andn, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(por, Pq, Pq, Qq, or, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(por, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vpor, Vdq, Hdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpor, Vqq, Hqq, Wqq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(orps, Vdq, Vdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vorps, Vdq, Hdq, Wdq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vorps, Vqq, Hqq, Wqq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
@@ -6812,6 +6889,7 @@ DEF_GEN_INSN3_GVEC(vorpd, Vqq, Hqq, Wqq, or, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pxor, Pq, Pq, Qq, xor, MM_OPRSZ, MM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(pxor, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vpxor, Vdq, Hdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
+DEF_GEN_INSN3_GVEC(vpxor, Vqq, Hqq, Wqq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(xorps, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vxorps, Vdq, Hdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vxorps, Vqq, Hqq, Wqq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
@@ -6822,31 +6900,88 @@ DEF_GEN_INSN3_GVEC(vxorpd, Vqq, Hqq, Wqq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsllw, psllw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsllw, psllw_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpslld, pslld_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpslld, pslld_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsllq, psllq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsllq, psllq_xmm, Vqq, Hqq, Wdq)
+
+GEN_INSN3(vpsllvd, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpsllvd, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vpsllvq, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpsllvq, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+
 DEF_GEN_INSN3_HELPER_EPP(pslldq, pslldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpslldq, pslldq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpslldq, pslldq_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsrlw, psrlw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrlw, psrlw_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsrld, psrld_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrld, psrld_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsrlq, psrlq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrlq, psrlq_xmm, Vqq, Hqq, Wdq)
+
+GEN_INSN3(vpsrlvd, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpsrlvd, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vpsrlvq, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpsrlvq, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+
 DEF_GEN_INSN3_HELPER_EPP(psrldq, psrldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsrldq, psrldq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrldq, psrldq_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsraw, psraw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsraw, psraw_xmm, Vqq, Hqq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsrad, psrad_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpsrad, psrad_xmm, Vqq, Hqq, Wdq)
+
+GEN_INSN3(vpsravd, Vdq, Hdq, Wdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpsravd, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
 
 #define DEF_GEN_PSHIFT_IMM_MM(mnem, opT1, opT2)                         \
     GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
@@ -6884,73 +7019,108 @@ DEF_GEN_INSN3_HELPER_EPP(vpsrad, psrad_xmm, Vdq, Hdq, Wdq)
         gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
         gen_insn3(mnem, Vdq, Hdq, Wdq)(env, s, arg2, arg2, arg3_xmm);   \
     }
+#define DEF_GEN_VPSHIFT_IMM_YMM(mnem, opT1, opT2)                       \
+    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
+    {                                                                   \
+        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
+        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
+        const insnop_arg_t(Wdq) arg3_xmm =                              \
+            offsetof(CPUX86State, xmm_t0.ZMM_Q(0));                     \
+                                                                        \
+        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
+        gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
+        gen_insn3(mnem, Vqq, Hqq, Wdq)(env, s, arg2, arg2, arg3_xmm);   \
+    }
 
 DEF_GEN_PSHIFT_IMM_MM(psllw, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psllw, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsllw, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpsllw, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(pslld, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(pslld, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpslld, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpslld, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(psllq, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psllq, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsllq, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpsllq, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_XMM(pslldq, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpslldq, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpslldq, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(psrlw, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrlw, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsrlw, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpsrlw, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(psrld, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrld, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsrld, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpsrld, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(psrlq, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrlq, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsrlq, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpsrlq, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_XMM(psrldq, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsrldq, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_YMM(vpsrldq, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(psraw, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psraw, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsraw, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsraw, Hqq, Uqq)
 DEF_GEN_PSHIFT_IMM_MM(psrad, Nq, Nq)
 DEF_GEN_PSHIFT_IMM_XMM(psrad, Udq, Udq)
 DEF_GEN_VPSHIFT_IMM_XMM(vpsrad, Hdq, Udq)
+DEF_GEN_VPSHIFT_IMM_XMM(vpsrad, Hqq, Uqq)
 
 DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_mmx, Pq, Pq, Qq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vpalignr, palignr_xmm, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vpalignr, palignr_xmm, Vqq, Hqq, Wqq, Ib)
 
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packsswb, packsswb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpacksswb, packsswb_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpacksswb, packsswb_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packssdw, packssdw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpackssdw, packssdw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpackssdw, packssdw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(packuswb, packuswb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpackuswb, packuswb_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpackuswb, packuswb_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(packusdw, packusdw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpackusdw, packusdw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpackusdw, packusdw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpcklbw, punpcklbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpcklbw, punpcklbw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpcklbw, punpcklbw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpcklwd, punpcklwd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpcklwd, punpcklwd_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpcklwd, punpcklwd_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_mmx, Pq, Pq, Qd)
 DEF_GEN_INSN3_HELPER_EPP(punpckldq, punpckldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpckldq, punpckldq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckldq, punpckldq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpcklqdq, punpcklqdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpcklqdq, punpcklqdq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpcklqdq, punpcklqdq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhbw, punpckhbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpckhbw, punpckhbw_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhbw, punpckhbw_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhwd, punpckhwd_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpckhwd, punpckhwd_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhwd, punpckhwd_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhdq, punpckhdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpckhdq, punpckhdq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhdq, punpckhdq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(punpckhqdq, punpckhqdq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpunpckhqdq, punpckhqdq_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpunpckhqdq, punpckhqdq_xmm, Vqq, Hqq, Wqq)
 
 DEF_GEN_INSN3_HELPER_EPP(unpcklps, punpckldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vunpcklps, punpckldq_xmm, Vdq, Hdq, Wdq)
@@ -6968,13 +7138,17 @@ DEF_GEN_INSN3_HELPER_EPP(vunpckhpd, punpckhqdq_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpshufb, pshufb_xmm, Vdq, Hdq, Wdq)
+DEF_GEN_INSN3_HELPER_EPP(vpshufb, pshufb_xmm, Vqq, Hqq, Wqq)
 DEF_GEN_INSN3_HELPER_PPI(pshufw, pshufw_mmx, Pq, Qq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(vpshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(vpshuflw, pshuflw_xmm, Vqq, Wqq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(vpshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(vpshufhw, pshufhw_xmm, Vqq, Wqq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(pshufd, pshufd_xmm, Vdq, Wdq, Ib)
 DEF_GEN_INSN3_HELPER_PPI(vpshufd, pshufd_xmm, Vdq, Wdq, Ib)
+DEF_GEN_INSN3_HELPER_PPI(vpshufd, pshufd_xmm, Vqq, Wqq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(shufps, shufps, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(vshufps, shufps, Vdq, Hdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_PPI(vshufps, shufps, Vqq, Hqq, Wqq, Ib)
@@ -7014,9 +7188,23 @@ GEN_INSN4(vpblendvb, Vdq, Hdq, Wdq, Ldq)
 {
     gen_insn3(pblendvb, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
 }
+GEN_INSN4(vpblendvb, Vqq, Hqq, Wqq, Lqq)
+{
+    gen_insn3(pblendvb, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3);
+}
 
 DEF_GEN_INSN4_HELPER_EPPI(pblendw, pblendw_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vpblendw, pblendw_xmm, Vdq, Hdq, Wdq, Ib)
+DEF_GEN_INSN4_HELPER_EPPI(vpblendw, pblendw_xmm, Vqq, Hqq, Wqq, Ib)
+
+GEN_INSN4(vpblendd, Vdq, Hdq, Wdq, Ib)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN4(vpblendd, Vqq, Hqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
 
 GEN_INSN4(insertps, Vdq, Vdq, Wd, Ib)
 {
@@ -7100,6 +7288,10 @@ GEN_INSN4(vinsertf128, Vqq, Hqq, Wdq, Ib)
 {
     gen_insn2(movaps, Vdq, Wdq)(env, s, arg1, arg3);
 }
+GEN_INSN4(vinserti128, Vqq, Hqq, Wdq, Ib)
+{
+    gen_insn2(movaps, Vdq, Wdq)(env, s, arg1, arg3);
+}
 
 GEN_INSN3(extractps, Ed, Vdq, Ib)
 {
@@ -7178,42 +7370,106 @@ GEN_INSN3(vextractf128, Wdq, Vqq, Ib)
 {
     gen_insn2(movaps, Wdq, Vdq)(env, s, arg1, arg2);
 }
+GEN_INSN3(vextracti128, Wdq, Vqq, Ib)
+{
+    gen_insn2(movaps, Wdq, Vdq)(env, s, arg1, arg2);
+}
 
-GEN_INSN2(vbroadcastss, Vdq, Md)
+GEN_INSN2(vpbroadcastb, Vdq, Wb)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastb, Vqq, Wb)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastw, Vdq, Ww)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastw, Vqq, Ww)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastd, Vdq, Wd)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastd, Vqq, Wd)
 {
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastq, Vdq, Wq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN2(vpbroadcastq, Vqq, Wq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vbroadcastss, Vdq, Wd, modrm_mod)
+{
+    if (arg3 != 3 && !check_cpuid(env, s, CHECK_CPUID_AVX2)) {
+        gen_unknown_opcode(env, s);
+        return;
+    }
+
     const TCGv_i32 r32 = tcg_temp_new_i32();
-    insnop_ldst(tcg_i32, Md)(env, s, 0, r32, arg2);
-
-    tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(0)));
+    tcg_gen_ld_i32(r32, cpu_env, arg2 + offsetof(ZMMReg, ZMM_L(0)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(0)));
+    }
     tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(1)));
     tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(2)));
     tcg_gen_st_i32(r32, cpu_env, arg1 + offsetof(ZMMReg, ZMM_L(3)));
-
     tcg_temp_free_i32(r32);
 }
-GEN_INSN2(vbroadcastss, Vqq, Md)
+GEN_INSN3(vbroadcastss, Vqq, Wd, modrm_mod)
 {
-    gen_insn2(vbroadcastss, Vdq, Md)(env, s, arg1, arg2);
+    gen_insn3(vbroadcastss, Vdq, Wd, modrm_mod)(env, s, arg1, arg2, arg3);
 }
-GEN_INSN2(vbroadcastsd, Vqq, Mq)
+GEN_INSN3(vbroadcastsd, Vqq, Wq, modrm_mod)
 {
+    if (arg3 != 3 && !check_cpuid(env, s, CHECK_CPUID_AVX2)) {
+        gen_unknown_opcode(env, s);
+        return;
+    }
+
     const TCGv_i64 r64 = tcg_temp_new_i64();
-    insnop_ldst(tcg_i64, Mq)(env, s, 0, r64, arg2);
-
-    tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    tcg_gen_ld_i64(r64, cpu_env, arg2 + offsetof(ZMMReg, ZMM_Q(0)));
+    if (arg1 != arg2) {
+        tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(0)));
+    }
     tcg_gen_st_i64(r64, cpu_env, arg1 + offsetof(ZMMReg, ZMM_Q(1)));
-
     tcg_temp_free_i64(r64);
 }
 GEN_INSN2(vbroadcastf128, Vqq, Mdq)
 {
     insnop_ldst(xmm, Mqq)(env, s, 0, arg1, arg2);
 }
+GEN_INSN2(vbroadcasti128, Vqq, Mdq)
+{
+    insnop_ldst(xmm, Mqq)(env, s, 0, arg1, arg2);
+}
 
 GEN_INSN4(vperm2f128, Vqq, Hqq, Wqq, Ib)
 {
     /* XXX TODO implement this */
 }
+GEN_INSN4(vperm2i128, Vqq, Hqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vpermd, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermps, Vqq, Hqq, Wqq)
+{
+    /* XXX TODO implement this */
+}
 
 GEN_INSN3(vpermilps, Vdq, Hdq, Wdq)
 {
@@ -7249,30 +7505,119 @@ GEN_INSN3(vpermilpd, Vqq, Wqq, Ib)
     /* XXX TODO implement this */
 }
 
+GEN_INSN3(vpermq, Vqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpermpd, Vqq, Wqq, Ib)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN5(vgatherdps, Vdq, Hdq, Vdq, MDdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vgatherdps, Vqq, Hqq, Vqq, MDqq, Hqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vgatherdpd, Vdq, Hdq, Vdq, MDdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vgatherdpd, Vqq, Hqq, Vqq, MDdq, Hqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN5(vgatherqps, Vdq, Hdq, Vdq, MQdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vgatherqps, Vdq, Hdq, Vdq, MQqq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vgatherqpd, Vdq, Hdq, Vdq, MQdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vgatherqpd, Vqq, Hqq, Vqq, MQqq, Hqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN5(vpgatherdd, Vdq, Hdq, Vdq, MDdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vpgatherdd, Vqq, Hqq, Vqq, MDqq, Hqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vpgatherdq, Vdq, Hdq, Vdq, MDdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vpgatherdq, Vqq, Hqq, Vqq, MDdq, Hqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN5(vpgatherqd, Vdq, Hdq, Vdq, MQdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vpgatherqd, Vdq, Hdq, Vdq, MQqq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vpgatherqq, Vdq, Hdq, Vdq, MQdq, Hdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN5(vpgatherqq, Vqq, Hqq, Vqq, MQqq, Hqq)
+{
+    /* XXX TODO implement this */
+}
+
 DEF_GEN_INSN2_HELPER_EPP(pmovsxbw, pmovsxbw_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(vpmovsxbw, pmovsxbw_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxbw, pmovsxbw_xmm, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxbd, pmovsxbd_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(vpmovsxbd, pmovsxbd_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxbd, pmovsxbd_xmm, Vqq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxbq, pmovsxbq_xmm, Vdq, Ww)
 DEF_GEN_INSN2_HELPER_EPP(vpmovsxbq, pmovsxbq_xmm, Vdq, Ww)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxbq, pmovsxbq_xmm, Vqq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxwd, pmovsxwd_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(vpmovsxwd, pmovsxwd_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxwd, pmovsxwd_xmm, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxwq, pmovsxwq_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(vpmovsxwq, pmovsxwq_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxwq, pmovsxwq_xmm, Vqq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovsxdq, pmovsxdq_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(vpmovsxdq, pmovsxdq_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovsxdq, pmovsxdq_xmm, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxbw, pmovzxbw_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(vpmovzxbw, pmovzxbw_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxbw, pmovzxbw_xmm, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxbd, pmovzxbd_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(vpmovzxbd, pmovzxbd_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxbd, pmovzxbd_xmm, Vqq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxbq, pmovzxbq_xmm, Vdq, Ww)
 DEF_GEN_INSN2_HELPER_EPP(vpmovzxbq, pmovzxbq_xmm, Vdq, Ww)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxbq, pmovzxbq_xmm, Vqq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxwd, pmovzxwd_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(vpmovzxwd, pmovzxwd_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxwd, pmovzxwd_xmm, Vqq, Wdq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxwq, pmovzxwq_xmm, Vdq, Wd)
 DEF_GEN_INSN2_HELPER_EPP(vpmovzxwq, pmovzxwq_xmm, Vdq, Wd)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxwq, pmovzxwq_xmm, Vqq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(pmovzxdq, pmovzxdq_xmm, Vdq, Wq)
 DEF_GEN_INSN2_HELPER_EPP(vpmovzxdq, pmovzxdq_xmm, Vdq, Wq)
+DEF_GEN_INSN2_HELPER_EPP(vpmovzxdq, pmovzxdq_xmm, Vqq, Wdq)
 
 DEF_GEN_INSN2_HELPER_EPP(cvtpi2ps, cvtpi2ps, Vdq, Qq)
 DEF_GEN_INSN2_HELPER_EPD(cvtsi2ss, cvtsi2ss, Vd, Ed)
@@ -7405,6 +7750,40 @@ GEN_INSN3(vmaskmovpd, Mqq, Hqq, Vqq)
     /* XXX TODO implement this */
 }
 
+GEN_INSN3(vpmaskmovd, Vdq, Hdq, Mdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpmaskmovd, Mdq, Hdq, Vdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpmaskmovd, Vqq, Hqq, Mqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpmaskmovd, Mqq, Hqq, Vqq)
+{
+    /* XXX TODO implement this */
+}
+
+GEN_INSN3(vpmaskmovq, Vdq, Hdq, Mdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpmaskmovq, Mdq, Hdq, Vdq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpmaskmovq, Vqq, Hqq, Mqq)
+{
+    /* XXX TODO implement this */
+}
+GEN_INSN3(vpmaskmovq, Mqq, Hqq, Vqq)
+{
+    /* XXX TODO implement this */
+}
+
 GEN_INSN2(movntps, Mdq, Vdq)
 {
     insnop_ldst(xmm, Mdq)(env, s, 1, arg2, arg1);
@@ -7465,6 +7844,10 @@ GEN_INSN2(vmovntdqa, Vdq, Mdq)
 {
     gen_insn2(movntdqa, Vdq, Mdq)(env, s, arg1, arg2);
 }
+GEN_INSN2(vmovntdqa, Vqq, Mqq)
+{
+    gen_insn2(movntdqa, Vdq, Mdq)(env, s, arg1, arg2);
+}
 
 GEN_INSN0(pause)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 64/75] target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (61 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 63/75] target/i386: introduce AVX2 code generators Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-22  3:54   ` Aleksandar Markovic
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 65/75] target/i386: remove obsoleted helpers Jan Bobek
                   ` (10 subsequent siblings)
  73 siblings, 1 reply; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add all the AVX2 vector instruction entries to sse-opcode.inc.h.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/sse-opcode.inc.h | 362 ++++++++++++++++++++++++++++++++++-
 1 file changed, 359 insertions(+), 3 deletions(-)

diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
index c3c0ec4f89..abbb0a15d7 100644
--- a/target/i386/sse-opcode.inc.h
+++ b/target/i386/sse-opcode.inc.h
@@ -855,6 +855,181 @@
  * VEX.128.66.0F.WIG 73 /3 ib      VPSRLDQ xmm1, xmm2, imm8
  * VEX.LZ.0F.WIG AE /2             VLDMXCSR m32
  * VEX.LZ.0F.WIG AE /3             VSTMXCSR m32
+ *
+ * AVX2 Instructions
+ * ------------------
+ * VEX.256.66.0F.W0 D7 /r          VPMOVMSKB r32, ymm1
+ * VEX.256.66.0F.W1 D7 /r          VPMOVMSKB r64, ymm1
+ * VEX.256.66.0F.WIG FC /r         VPADDB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG FD /r         VPADDW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG FE /r         VPADDD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG D4 /r         VPADDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG EC /r         VPADDSB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG ED /r         VPADDSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG DC /r         VPADDUSB ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F.WIG DD /r         VPADDUSW ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F38.WIG 01 /r       VPHADDW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 02 /r       VPHADDD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 03 /r       VPHADDSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG F8 /r         VPSUBB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG F9 /r         VPSUBW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG FA /r         VPSUBD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG FB /r         VPSUBQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E8 /r         VPSUBSB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E9 /r         VPSUBSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG D8 /r         VPSUBUSB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG D9 /r         VPSUBUSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 05 /r       VPHSUBW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 06 /r       VPHSUBD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 07 /r       VPHSUBSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG D5 /r         VPMULLW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 40 /r       VPMULLD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E5 /r         VPMULHW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E4 /r         VPMULHUW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 28 /r       VPMULDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG F4 /r         VPMULUDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 0B /r       VPMULHRSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG F5 /r         VPMADDWD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 04 /r       VPMADDUBSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F DA /r             VPMINUB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38 3A /r           VPMINUW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 3B /r       VPMINUD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38 38 /r           VPMINSB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F EA /r             VPMINSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 39 /r       VPMINSD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F DE /r             VPMAXUB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38 3E /r           VPMAXUW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 3F /r       VPMAXUD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 3C /r       VPMAXSB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG EE /r         VPMAXSW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 3D /r       VPMAXSD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E0 /r         VPAVGB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E3 /r         VPAVGW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG F6 /r         VPSADBW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F3A.WIG 42 /r ib    VMPSADBW ymm1, ymm2, ymm3/m256, imm8
+ * VEX.256.66.0F38.WIG 1C /r       VPABSB ymm1, ymm2/m256
+ * VEX.256.66.0F38.WIG 1D /r       VPABSW ymm1, ymm2/m256
+ * VEX.256.66.0F38.WIG 1E /r       VPABSD ymm1, ymm2/m256
+ * VEX.256.66.0F38.WIG 08 /r       VPSIGNB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 09 /r       VPSIGNW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 0A /r       VPSIGND ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 74 /r         VPCMPEQB ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F.WIG 75 /r         VPCMPEQW ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F.WIG 76 /r         VPCMPEQD ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F38.WIG 29 /r       VPCMPEQQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 64 /r         VPCMPGTB ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F.WIG 65 /r         VPCMPGTW ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F.WIG 66 /r         VPCMPGTD ymm1,ymm2,ymm3/m256
+ * VEX.256.66.0F38.WIG 37 /r       VPCMPGTQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG DB /r         VPAND ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG DF /r         VPANDN ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG EB /r         VPOR ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG EF /r         VPXOR ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG F1 /r         VPSLLW ymm1, ymm2, xmm3/m128
+ * VEX.256.66.0F.WIG F2 /r         VPSLLD ymm1, ymm2, xmm3/m128
+ * VEX.256.66.0F.WIG F3 /r         VPSLLQ ymm1, ymm2, xmm3/m128
+ * VEX.128.66.0F38.W0 47 /r        VPSLLVD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W0 47 /r        VPSLLVD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F38.W1 47 /r        VPSLLVQ xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W1 47 /r        VPSLLVQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG D1 /r         VPSRLW ymm1, ymm2, xmm3/m128
+ * VEX.256.66.0F.WIG D2 /r         VPSRLD ymm1, ymm2, xmm3/m128
+ * VEX.256.66.0F.WIG D3 /r         VPSRLQ ymm1, ymm2, xmm3/m128
+ * VEX.128.66.0F38.W0 45 /r        VPSRLVD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W0 45 /r        VPSRLVD ymm1, ymm2, ymm3/m256
+ * VEX.128.66.0F38.W1 45 /r        VPSRLVQ xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W1 45 /r        VPSRLVQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG E1 /r         VPSRAW ymm1,ymm2,xmm3/m128
+ * VEX.256.66.0F.WIG E2 /r         VPSRAD ymm1,ymm2,xmm3/m128
+ * VEX.128.66.0F38.W0 46 /r        VPSRAVD xmm1, xmm2, xmm3/m128
+ * VEX.256.66.0F38.W0 46 /r        VPSRAVD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F3A.WIG 0F /r ib    VPALIGNR ymm1, ymm2, ymm3/m256, imm8
+ * VEX.256.66.0F.WIG 63 /r         VPACKSSWB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 6B /r         VPACKSSDW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 67 /r         VPACKUSWB ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38 2B /r           VPACKUSDW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 68 /r         VPUNPCKHBW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 69 /r         VPUNPCKHWD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 6A /r         VPUNPCKHDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 6D /r         VPUNPCKHQDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 60 /r         VPUNPCKLBW ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 61 /r         VPUNPCKLWD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 62 /r         VPUNPCKLDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F.WIG 6C /r         VPUNPCKLQDQ ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.WIG 00 /r       VPSHUFB ymm1, ymm2, ymm3/m256
+ * VEX.256.F2.0F.WIG 70 /r ib      VPSHUFLW ymm1, ymm2/m256, imm8
+ * VEX.256.F3.0F.WIG 70 /r ib      VPSHUFHW ymm1, ymm2/m256, imm8
+ * VEX.256.66.0F.WIG 70 /r ib      VPSHUFD ymm1, ymm2/m256, imm8
+ * VEX.256.66.0F3A.W0 4C /r /is4   VPBLENDVB ymm1, ymm2, ymm3/m256, ymm4
+ * VEX.256.66.0F3A.WIG 0E /r ib    VPBLENDW ymm1, ymm2, ymm3/m256, imm8
+ * VEX.128.66.0F3A.W0 02 /r ib     VPBLENDD xmm1, xmm2, xmm3/m128, imm8
+ * VEX.256.66.0F3A.W0 02 /r ib     VPBLENDD ymm1, ymm2, ymm3/m256, imm8
+ * VEX.256.66.0F3A.W0 38 /r ib     VINSERTI128 ymm1, ymm2, xmm3/m128, imm8
+ * VEX.256.66.0F3A.W0 39 /r ib     VEXTRACTI128 xmm1/m128, ymm2, imm8
+ * VEX.128.66.0F38.W0 78 /r        VPBROADCASTB xmm1,xmm2/m8
+ * VEX.256.66.0F38.W0 78 /r        VPBROADCASTB ymm1,xmm2/m8
+ * VEX.128.66.0F38.W0 79 /r        VPBROADCASTW xmm1,xmm2/m16
+ * VEX.256.66.0F38.W0 79 /r        VPBROADCASTW ymm1,xmm2/m16
+ * VEX.128.66.0F38.W0 58 /r        VPBROADCASTD xmm1,xmm2/m32
+ * VEX.256.66.0F38.W0 58 /r        VPBROADCASTD ymm1,xmm2/m32
+ * VEX.128.66.0F38.W0 59 /r        VPBROADCASTQ xmm1,xmm2/m64
+ * VEX.256.66.0F38.W0 59 /r        VPBROADCASTQ ymm1,xmm2/m64
+ * VEX.128.66.0F38.W0 18 /r        VBROADCASTSS xmm1, xmm2
+ * VEX.256.66.0F38.W0 18 /r        VBROADCASTSS ymm1, xmm2
+ * VEX.256.66.0F38.W0 19 /r        VBROADCASTSD ymm1, xmm2
+ * VEX.256.66.0F38.W0 5A /r        VBROADCASTI128 ymm1,m128
+ * VEX.256.66.0F3A.W0 46 /r ib     VPERM2I128 ymm1, ymm2, ymm3/m256, imm8
+ * VEX.256.66.0F38.W0 36 /r        VPERMD ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F38.W0 16 /r        VPERMPS ymm1, ymm2, ymm3/m256
+ * VEX.256.66.0F3A.W1 00 /r ib     VPERMQ ymm1, ymm2/m256, imm8
+ * VEX.256.66.0F3A.W1 01 /r ib     VPERMPD ymm1, ymm2/m256, imm8
+ * VEX.128.66.0F38.W0 92 /r        VGATHERDPS xmm1, vm32x, xmm2
+ * VEX.256.66.0F38.W0 92 /r        VGATHERDPS ymm1, vm32y, ymm2
+ * VEX.128.66.0F38.W1 92 /r        VGATHERDPD xmm1, vm32x, xmm2
+ * VEX.256.66.0F38.W1 92 /r        VGATHERDPD ymm1, vm32x, ymm2
+ * VEX.128.66.0F38.W0 93 /r        VGATHERQPS xmm1, vm64x, xmm2
+ * VEX.256.66.0F38.W0 93 /r        VGATHERQPS xmm1, vm64y, xmm2
+ * VEX.128.66.0F38.W1 93 /r        VGATHERQPD xmm1, vm64x, xmm2
+ * VEX.256.66.0F38.W1 93 /r        VGATHERQPD ymm1, vm64y, ymm2
+ * VEX.128.66.0F38.W0 90 /r        VPGATHERDD xmm1, vm32x, xmm2
+ * VEX.256.66.0F38.W0 90 /r        VPGATHERDD ymm1, vm32y, ymm2
+ * VEX.128.66.0F38.W1 90 /r        VPGATHERDQ xmm1, vm32x, xmm2
+ * VEX.256.66.0F38.W1 90 /r        VPGATHERDQ ymm1, vm32x, ymm2
+ * VEX.128.66.0F38.W0 91 /r        VPGATHERQD xmm1, vm64x, xmm2
+ * VEX.256.66.0F38.W0 91 /r        VPGATHERQD xmm1, vm64y, xmm2
+ * VEX.128.66.0F38.W1 91 /r        VPGATHERQQ xmm1, vm64x, xmm2
+ * VEX.256.66.0F38.W1 91 /r        VPGATHERQQ ymm1, vm64y, ymm2
+ * VEX.256.66.0F38.WIG 20 /r       VPMOVSXBW ymm1, xmm2/m128
+ * VEX.256.66.0F38.WIG 21 /r       VPMOVSXBD ymm1, xmm2/m64
+ * VEX.256.66.0F38.WIG 22 /r       VPMOVSXBQ ymm1, xmm2/m32
+ * VEX.256.66.0F38.WIG 23 /r       VPMOVSXWD ymm1, xmm2/m128
+ * VEX.256.66.0F38.WIG 24 /r       VPMOVSXWQ ymm1, xmm2/m64
+ * VEX.256.66.0F38.WIG 25 /r       VPMOVSXDQ ymm1, xmm2/m128
+ * VEX.256.66.0F38.WIG 30 /r       VPMOVZXBW ymm1, xmm2/m128
+ * VEX.256.66.0F38.WIG 31 /r       VPMOVZXBD ymm1, xmm2/m64
+ * VEX.256.66.0F38.WIG 32 /r       VPMOVZXBQ ymm1, xmm2/m32
+ * VEX.256.66.0F38.WIG 33 /r       VPMOVZXWD ymm1, xmm2/m128
+ * VEX.256.66.0F38.WIG 34 /r       VPMOVZXWQ ymm1, xmm2/m64
+ * VEX.256.66.0F38.WIG 35 /r       VPMOVZXDQ ymm1, xmm2/m128
+ * VEX.128.66.0F38.W0 8C /r        VPMASKMOVD xmm1, xmm2, m128
+ * VEX.128.66.0F38.W0 8E /r        VPMASKMOVD m128, xmm1, xmm2
+ * VEX.256.66.0F38.W0 8C /r        VPMASKMOVD ymm1, ymm2, m256
+ * VEX.256.66.0F38.W0 8E /r        VPMASKMOVD m256, ymm1, ymm2
+ * VEX.128.66.0F38.W1 8C /r        VPMASKMOVQ xmm1, xmm2, m128
+ * VEX.128.66.0F38.W1 8E /r        VPMASKMOVQ m128, xmm1, xmm2
+ * VEX.256.66.0F38.W1 8C /r        VPMASKMOVQ ymm1, ymm2, m256
+ * VEX.256.66.0F38.W1 8E /r        VPMASKMOVQ m256, ymm1, ymm2
+ * VEX.256.66.0F38.WIG 2A /r       VMOVNTDQA ymm1, m256
+ * VEX.256.66.0F.WIG 71 /6 ib      VPSLLW ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 71 /2 ib      VPSRLW ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 71 /4 ib      VPSRAW ymm1,ymm2,imm8
+ * VEX.256.66.0F.WIG 72 /6 ib      VPSLLD ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 72 /2 ib      VPSRLD ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 72 /4 ib      VPSRAD ymm1,ymm2,imm8
+ * VEX.256.66.0F.WIG 73 /6 ib      VPSLLQ ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 73 /7 ib      VPSLLDQ ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 73 /2 ib      VPSRLQ ymm1, ymm2, imm8
+ * VEX.256.66.0F.WIG 73 /3 ib      VPSRLDQ ymm1, ymm2, imm8
  */
 
 OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
@@ -943,6 +1118,8 @@ OPCODE(pmovmskb, LEG(66, 0F, 0, 0xd7), SSE2, WR, Gd, Udq)
 OPCODE(pmovmskb, LEG(66, 0F, 1, 0xd7), SSE2, WR, Gq, Udq)
 OPCODE(vpmovmskb, VEX(128, 66, 0F, 0, 0xd7), AVX, WR, Gd, Udq)
 OPCODE(vpmovmskb, VEX(128, 66, 0F, 1, 0xd7), AVX, WR, Gq, Udq)
+OPCODE(vpmovmskb, VEX(256, 66, 0F, 0, 0xd7), AVX2, WR, Gd, Uqq)
+OPCODE(vpmovmskb, VEX(256, 66, 0F, 1, 0xd7), AVX2, WR, Gq, Uqq)
 OPCODE(movmskps, LEG(NP, 0F, 0, 0x50), SSE, WR, Gd, Udq)
 OPCODE(movmskps, LEG(NP, 0F, 1, 0x50), SSE, WR, Gq, Udq)
 OPCODE(vmovmskps, VEX(128, NP, 0F, 0, 0x50), AVX, WR, Gd, Udq)
@@ -970,27 +1147,35 @@ OPCODE(vmovddup, VEX(256, F2, 0F, IG, 0x12), AVX, WR, Vqq, Wqq)
 OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddb, LEG(66, 0F, 0, 0xfc), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddb, VEX(128, 66, 0F, IG, 0xfc), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddb, VEX(256, 66, 0F, IG, 0xfc), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddw, LEG(66, 0F, 0, 0xfd), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddw, VEX(128, 66, 0F, IG, 0xfd), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddw, VEX(256, 66, 0F, IG, 0xfd), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddd, LEG(NP, 0F, 0, 0xfe), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddd, LEG(66, 0F, 0, 0xfe), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddd, VEX(128, 66, 0F, IG, 0xfe), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddd, VEX(256, 66, 0F, IG, 0xfe), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddq, LEG(NP, 0F, 0, 0xd4), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(paddq, LEG(66, 0F, 0, 0xd4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddq, VEX(128, 66, 0F, IG, 0xd4), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddq, VEX(256, 66, 0F, IG, 0xd4), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddsb, LEG(NP, 0F, 0, 0xec), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddsb, LEG(66, 0F, 0, 0xec), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddsb, VEX(128, 66, 0F, IG, 0xec), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddsb, VEX(256, 66, 0F, IG, 0xec), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddsw, LEG(NP, 0F, 0, 0xed), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddsw, LEG(66, 0F, 0, 0xed), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddsw, VEX(128, 66, 0F, IG, 0xed), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddsw, VEX(256, 66, 0F, IG, 0xed), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddusb, LEG(NP, 0F, 0, 0xdc), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddusb, LEG(66, 0F, 0, 0xdc), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddusb, VEX(128, 66, 0F, IG, 0xdc), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddusb, VEX(256, 66, 0F, IG, 0xdc), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(paddusw, LEG(NP, 0F, 0, 0xdd), MMX, WRR, Pq, Pq, Qq)
 OPCODE(paddusw, LEG(66, 0F, 0, 0xdd), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpaddusw, VEX(128, 66, 0F, IG, 0xdd), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpaddusw, VEX(256, 66, 0F, IG, 0xdd), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vaddps, VEX(128, NP, 0F, IG, 0x58), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vaddps, VEX(256, NP, 0F, IG, 0x58), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1004,12 +1189,15 @@ OPCODE(vaddsd, VEX(IG, F2, 0F, IG, 0x58), AVX, WRR, Vq, Hq, Wq)
 OPCODE(phaddw, LEG(NP, 0F38, 0, 0x01), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phaddw, LEG(66, 0F38, 0, 0x01), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vphaddw, VEX(128, 66, 0F38, IG, 0x01), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vphaddw, VEX(256, 66, 0F38, IG, 0x01), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(phaddd, LEG(NP, 0F38, 0, 0x02), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phaddd, LEG(66, 0F38, 0, 0x02), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vphaddd, VEX(128, 66, 0F38, IG, 0x02), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vphaddd, VEX(256, 66, 0F38, IG, 0x02), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(phaddsw, LEG(NP, 0F38, 0, 0x03), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phaddsw, LEG(66, 0F38, 0, 0x03), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vphaddsw, VEX(128, 66, 0F38, IG, 0x03), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vphaddsw, VEX(256, 66, 0F38, IG, 0x03), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(haddps, LEG(F2, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vhaddps, VEX(128, F2, 0F, IG, 0x7c), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vhaddps, VEX(256, F2, 0F, IG, 0x7c), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1019,27 +1207,35 @@ OPCODE(vhaddpd, VEX(256, 66, 0F, IG, 0x7c), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubb, LEG(66, 0F, 0, 0xf8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubb, VEX(128, 66, 0F, IG, 0xf8), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubb, VEX(256, 66, 0F, IG, 0xf8), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubw, LEG(66, 0F, 0, 0xf9), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubw, VEX(128, 66, 0F, IG, 0xf9), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubw, VEX(256, 66, 0F, IG, 0xf9), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubd, LEG(NP, 0F, 0, 0xfa), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubd, LEG(66, 0F, 0, 0xfa), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubd, VEX(128, 66, 0F, IG, 0xfa), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubd, VEX(256, 66, 0F, IG, 0xfa), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubq, LEG(NP, 0F, 0, 0xfb), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(psubq, LEG(66, 0F, 0, 0xfb), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubq, VEX(128, 66, 0F, IG, 0xfb), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubq, VEX(256, 66, 0F, IG, 0xfb), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubsb, LEG(NP, 0F, 0, 0xe8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubsb, LEG(66, 0F, 0, 0xe8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubsb, VEX(128, 66, 0F, IG, 0xe8), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubsb, VEX(256, 66, 0F, IG, 0xe8), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubsw, LEG(NP, 0F, 0, 0xe9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubsw, LEG(66, 0F, 0, 0xe9), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubsw, VEX(128, 66, 0F, IG, 0xe9), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubsw, VEX(256, 66, 0F, IG, 0xe9), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubusb, LEG(NP, 0F, 0, 0xd8), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubusb, LEG(66, 0F, 0, 0xd8), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubusb, VEX(128, 66, 0F, IG, 0xd8), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubusb, VEX(256, 66, 0F, IG, 0xd8), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psubusw, LEG(NP, 0F, 0, 0xd9), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psubusw, LEG(66, 0F, 0, 0xd9), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsubusw, VEX(128, 66, 0F, IG, 0xd9), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsubusw, VEX(256, 66, 0F, IG, 0xd9), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vsubps, VEX(128, NP, 0F, IG, 0x5c), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vsubps, VEX(256, NP, 0F, IG, 0x5c), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1053,12 +1249,15 @@ OPCODE(vsubsd, VEX(IG, F2, 0F, IG, 0x5c), AVX, WRR, Vq, Hq, Wq)
 OPCODE(phsubw, LEG(NP, 0F38, 0, 0x05), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phsubw, LEG(66, 0F38, 0, 0x05), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vphsubw, VEX(128, 66, 0F38, IG, 0x05), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vphsubw, VEX(256, 66, 0F38, IG, 0x05), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(phsubd, LEG(NP, 0F38, 0, 0x06), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phsubd, LEG(66, 0F38, 0, 0x06), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vphsubd, VEX(128, 66, 0F38, IG, 0x06), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vphsubd, VEX(256, 66, 0F38, IG, 0x06), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(phsubsw, LEG(NP, 0F38, 0, 0x07), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(phsubsw, LEG(66, 0F38, 0, 0x07), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vphsubsw, VEX(128, 66, 0F38, IG, 0x07), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vphsubsw, VEX(256, 66, 0F38, IG, 0x07), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(hsubps, LEG(F2, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vhsubps, VEX(128, F2, 0F, IG, 0x7d), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vhsubps, VEX(256, F2, 0F, IG, 0x7d), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1074,22 +1273,29 @@ OPCODE(vaddsubpd, VEX(256, 66, 0F, IG, 0xd0), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmullw, LEG(66, 0F, 0, 0xd5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmullw, VEX(128, 66, 0F, IG, 0xd5), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmullw, VEX(256, 66, 0F, IG, 0xd5), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmulld, LEG(66, 0F38, 0, 0x40), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmulld, VEX(128, 66, 0F38, IG, 0x40), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmulld, VEX(256, 66, 0F38, IG, 0x40), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmulhw, LEG(66, 0F, 0, 0xe5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmulhw, VEX(128, 66, 0F, IG, 0xe5), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmulhw, VEX(256, 66, 0F, IG, 0xe5), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmulhuw, LEG(66, 0F, 0, 0xe4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmulhuw, VEX(128, 66, 0F, IG, 0xe4), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmulhuw, VEX(256, 66, 0F, IG, 0xe4), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmuldq, LEG(66, 0F38, 0, 0x28), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmuldq, VEX(128, 66, 0F38, IG, 0x28), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmuldq, VEX(256, 66, 0F38, IG, 0x28), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmuludq, LEG(NP, 0F, 0, 0xf4), SSE2, WRR, Pq, Pq, Qq)
 OPCODE(pmuludq, LEG(66, 0F, 0, 0xf4), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmuludq, VEX(128, 66, 0F, IG, 0xf4), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmuludq, VEX(256, 66, 0F, IG, 0xf4), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmulhrsw, LEG(NP, 0F38, 0, 0x0b), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(pmulhrsw, LEG(66, 0F38, 0, 0x0b), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmulhrsw, VEX(128, 66, 0F38, IG, 0x0b), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmulhrsw, VEX(256, 66, 0F38, IG, 0x0b), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(mulps, LEG(NP, 0F, 0, 0x59), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vmulps, VEX(128, NP, 0F, IG, 0x59), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vmulps, VEX(256, NP, 0F, IG, 0x59), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1103,9 +1309,11 @@ OPCODE(vmulsd, VEX(IG, F2, 0F, IG, 0x59), AVX, WRR, Vq, Hq, Wq)
 OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pmaddwd, LEG(66, 0F, 0, 0xf5), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaddwd, VEX(128, 66, 0F, IG, 0xf5), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaddwd, VEX(256, 66, 0F, IG, 0xf5), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmaddubsw, LEG(NP, 0F38, 0, 0x04), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(pmaddubsw, LEG(66, 0F38, 0, 0x04), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaddubsw, VEX(128, 66, 0F38, IG, 0x04), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaddubsw, VEX(256, 66, 0F38, IG, 0x04), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(divps, LEG(NP, 0F, 0, 0x5e), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vdivps, VEX(128, NP, 0F, IG, 0x5e), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vdivps, VEX(256, NP, 0F, IG, 0x5e), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1139,17 +1347,23 @@ OPCODE(vrsqrtss, VEX(IG, F3, 0F, IG, 0x52), AVX, WRR, Vd, Hd, Wd)
 OPCODE(pminub, LEG(NP, 0F, 0, 0xda), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pminub, LEG(66, 0F, 0, 0xda), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpminub, VEX(128, 66, 0F, IG, 0xda), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpminub, VEX(256, 66, 0F, IG, 0xda), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pminuw, LEG(66, 0F38, 0, 0x3a), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpminuw, VEX(128, 66, 0F38, IG, 0x3a), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpminuw, VEX(256, 66, 0F38, IG, 0x3a), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pminud, LEG(66, 0F38, 0, 0x3b), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpminud, VEX(128, 66, 0F38, IG, 0x3b), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpminud, VEX(256, 66, 0F38, IG, 0x3b), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pminsb, LEG(66, 0F38, 0, 0x38), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpminsb, VEX(128, 66, 0F38, IG, 0x38), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpminsb, VEX(256, 66, 0F38, IG, 0x38), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pminsw, LEG(NP, 0F, 0, 0xea), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pminsw, LEG(66, 0F, 0, 0xea), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpminsw, VEX(128, 66, 0F, IG, 0xea), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpminsw, VEX(256, 66, 0F, IG, 0xea), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pminsd, LEG(66, 0F38, 0, 0x39), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpminsd, VEX(128, 66, 0F38, IG, 0x39), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpminsd, VEX(256, 66, 0F38, IG, 0x39), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(minps, LEG(NP, 0F, 0, 0x5d), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vminps, VEX(128, NP, 0F, IG, 0x5d), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vminps, VEX(256, NP, 0F, IG, 0x5d), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1165,17 +1379,23 @@ OPCODE(vphminposuw, VEX(128, 66, 0F38, IG, 0x41), AVX, WR, Vdq, Wdq)
 OPCODE(pmaxub, LEG(NP, 0F, 0, 0xde), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmaxub, LEG(66, 0F, 0, 0xde), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaxub, VEX(128, 66, 0F, IG, 0xde), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaxub, VEX(256, 66, 0F, IG, 0xde), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmaxuw, LEG(66, 0F38, 0, 0x3e), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaxuw, VEX(128, 66, 0F38, IG, 0x3e), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaxuw, VEX(256, 66, 0F38, IG, 0x3e), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmaxud, LEG(66, 0F38, 0, 0x3f), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaxud, VEX(128, 66, 0F38, IG, 0x3f), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaxud, VEX(256, 66, 0F38, IG, 0x3f), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmaxsb, LEG(66, 0F38, 0, 0x3c), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaxsb, VEX(128, 66, 0F38, IG, 0x3c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaxsb, VEX(256, 66, 0F38, IG, 0x3c), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmaxsw, LEG(NP, 0F, 0, 0xee), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pmaxsw, LEG(66, 0F, 0, 0xee), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaxsw, VEX(128, 66, 0F, IG, 0xee), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaxsw, VEX(256, 66, 0F, IG, 0xee), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pmaxsd, LEG(66, 0F38, 0, 0x3d), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpmaxsd, VEX(128, 66, 0F38, IG, 0x3d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpmaxsd, VEX(256, 66, 0F38, IG, 0x3d), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(maxps, LEG(NP, 0F, 0, 0x5f), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vmaxps, VEX(128, NP, 0F, IG, 0x5f), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vmaxps, VEX(256, NP, 0F, IG, 0x5f), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1189,32 +1409,42 @@ OPCODE(vmaxsd, VEX(IG, F2, 0F, IG, 0x5f), AVX, WRR, Vq, Hq, Wq)
 OPCODE(pavgb, LEG(NP, 0F, 0, 0xe0), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pavgb, LEG(66, 0F, 0, 0xe0), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpavgb, VEX(128, 66, 0F, IG, 0xe0), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpavgb, VEX(256, 66, 0F, IG, 0xe0), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
 OPCODE(pavgw, LEG(66, 0F, 0, 0xe3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpavgw, VEX(128, 66, 0F, IG, 0xe3), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpavgw, VEX(256, 66, 0F, IG, 0xe3), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
 OPCODE(psadbw, LEG(66, 0F, 0, 0xf6), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsadbw, VEX(128, 66, 0F, IG, 0xf6), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsadbw, VEX(256, 66, 0F, IG, 0xf6), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(mpsadbw, LEG(66, 0F3A, 0, 0x42), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(vmpsadbw, VEX(128, 66, 0F3A, IG, 0x42), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vmpsadbw, VEX(256, 66, 0F3A, IG, 0x42), AVX2, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(pabsb, LEG(NP, 0F38, 0, 0x1c), SSSE3, WR, Pq, Qq)
 OPCODE(pabsb, LEG(66, 0F38, 0, 0x1c), SSSE3, WR, Vdq, Wdq)
 OPCODE(vpabsb, VEX(128, 66, 0F38, IG, 0x1c), AVX, WR, Vdq, Wdq)
+OPCODE(vpabsb, VEX(256, 66, 0F38, IG, 0x1c), AVX2, WR, Vqq, Wqq)
 OPCODE(pabsw, LEG(NP, 0F38, 0, 0x1d), SSSE3, WR, Pq, Qq)
 OPCODE(pabsw, LEG(66, 0F38, 0, 0x1d), SSSE3, WR, Vdq, Wdq)
 OPCODE(vpabsw, VEX(128, 66, 0F38, IG, 0x1d), AVX, WR, Vdq, Wdq)
+OPCODE(vpabsw, VEX(256, 66, 0F38, IG, 0x1d), AVX2, WR, Vqq, Wqq)
 OPCODE(pabsd, LEG(NP, 0F38, 0, 0x1e), SSSE3, WR, Pq, Qq)
 OPCODE(pabsd, LEG(66, 0F38, 0, 0x1e), SSSE3, WR, Vdq, Wdq)
 OPCODE(vpabsd, VEX(128, 66, 0F38, IG, 0x1e), AVX, WR, Vdq, Wdq)
+OPCODE(vpabsd, VEX(256, 66, 0F38, IG, 0x1e), AVX2, WR, Vqq, Wqq)
 OPCODE(psignb, LEG(NP, 0F38, 0, 0x08), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignb, LEG(66, 0F38, 0, 0x08), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsignb, VEX(128, 66, 0F38, IG, 0x08), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsignb, VEX(256, 66, 0F38, IG, 0x08), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psignw, LEG(NP, 0F38, 0, 0x09), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignw, LEG(66, 0F38, 0, 0x09), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsignw, VEX(128, 66, 0F38, IG, 0x09), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsignw, VEX(256, 66, 0F38, IG, 0x09), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psignd, LEG(NP, 0F38, 0, 0x0a), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(psignd, LEG(66, 0F38, 0, 0x0a), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsignd, VEX(128, 66, 0F38, IG, 0x0a), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsignd, VEX(256, 66, 0F38, IG, 0x0a), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(dpps, LEG(66, 0F3A, 0, 0x40), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(vdpps, VEX(128, 66, 0F3A, IG, 0x40), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(vdpps, VEX(256, 66, 0F3A, IG, 0x40), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
@@ -1247,25 +1477,33 @@ OPCODE(vpclmulqdq, VEX(128, 66, 0F3A, IG, 0x44), PCLMULQDQ_AVX, WRRR, Vdq, Hdq,
 OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpeqb, VEX(128, 66, 0F, IG, 0x74), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpeqb, VEX(256, 66, 0F, IG, 0x74), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqw, LEG(66, 0F, 0, 0x75), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpeqw, VEX(128, 66, 0F, IG, 0x75), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpeqw, VEX(256, 66, 0F, IG, 0x75), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpeqd, LEG(66, 0F, 0, 0x76), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpeqd, VEX(128, 66, 0F, IG, 0x76), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpeqd, VEX(256, 66, 0F, IG, 0x76), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpeqq, LEG(66, 0F38, 0, 0x29), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpeqq, VEX(128, 66, 0F38, IG, 0x29), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpeqq, VEX(256, 66, 0F38, IG, 0x29), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtb, LEG(66, 0F, 0, 0x64), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpgtb, VEX(128, 66, 0F, IG, 0x64), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpgtb, VEX(256, 66, 0F, IG, 0x64), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtw, LEG(66, 0F, 0, 0x65), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpgtw, VEX(128, 66, 0F, IG, 0x65), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpgtw, VEX(256, 66, 0F, IG, 0x65), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pcmpgtd, LEG(66, 0F, 0, 0x66), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpgtd, VEX(128, 66, 0F, IG, 0x66), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpgtd, VEX(256, 66, 0F, IG, 0x66), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpgtq, LEG(66, 0F38, 0, 0x37), SSE4_2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpcmpgtq, VEX(128, 66, 0F38, IG, 0x37), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpcmpgtq, VEX(256, 66, 0F38, IG, 0x37), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pcmpestrm, LEG(66, 0F3A, 0, 0x60), SSE4_2, RRR, Vdq, Wdq, Ib)
 OPCODE(vpcmpestrm, VEX(128, 66, 0F3A, IG, 0x60), AVX, RRR, Vdq, Wdq, Ib)
 OPCODE(pcmpestri, LEG(66, 0F3A, 0, 0x61), SSE4_2, RRR, Vdq, Wdq, Ib)
@@ -1302,6 +1540,7 @@ OPCODE(vcomisd, VEX(IG, 66, 0F, IG, 0x2f), AVX, RR, Vq, Wq)
 OPCODE(pand, LEG(NP, 0F, 0, 0xdb), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pand, LEG(66, 0F, 0, 0xdb), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpand, VEX(128, 66, 0F, IG, 0xdb), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpand, VEX(256, 66, 0F, IG, 0xdb), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(andps, LEG(NP, 0F, 0, 0x54), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vandps, VEX(128, NP, 0F, IG, 0x54), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vandps, VEX(256, NP, 0F, IG, 0x54), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1311,6 +1550,7 @@ OPCODE(vandpd, VEX(256, 66, 0F, IG, 0x54), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pandn, LEG(NP, 0F, 0, 0xdf), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pandn, LEG(66, 0F, 0, 0xdf), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpandn, VEX(128, 66, 0F, IG, 0xdf), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpandn, VEX(256, 66, 0F, IG, 0xdf), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(andnps, LEG(NP, 0F, 0, 0x55), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vandnps, VEX(128, NP, 0F, IG, 0x55), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vandnps, VEX(256, NP, 0F, IG, 0x55), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1320,6 +1560,7 @@ OPCODE(vandnpd, VEX(256, 66, 0F, IG, 0x55), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(por, LEG(NP, 0F, 0, 0xeb), MMX, WRR, Pq, Pq, Qq)
 OPCODE(por, LEG(66, 0F, 0, 0xeb), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpor, VEX(128, 66, 0F, IG, 0xeb), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpor, VEX(256, 66, 0F, IG, 0xeb), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(orps, LEG(NP, 0F, 0, 0x56), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vorps, VEX(128, NP, 0F, IG, 0x56), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vorps, VEX(256, NP, 0F, IG, 0x56), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1329,6 +1570,7 @@ OPCODE(vorpd, VEX(256, 66, 0F, IG, 0x56), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pxor, LEG(NP, 0F, 0, 0xef), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pxor, LEG(66, 0F, 0, 0xef), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpxor, VEX(128, 66, 0F, IG, 0xef), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpxor, VEX(256, 66, 0F, IG, 0xef), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(xorps, LEG(NP, 0F, 0, 0x57), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vxorps, VEX(128, NP, 0F, IG, 0x57), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vxorps, VEX(256, NP, 0F, IG, 0x57), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1338,63 +1580,94 @@ OPCODE(vxorpd, VEX(256, 66, 0F, IG, 0x57), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(psllw, LEG(NP, 0F, 0, 0xf1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psllw, LEG(66, 0F, 0, 0xf1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsllw, VEX(128, 66, 0F, IG, 0xf1), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsllw, VEX(256, 66, 0F, IG, 0xf1), AVX2, WRR, Vqq, Hqq, Wdq)
 OPCODE(pslld, LEG(NP, 0F, 0, 0xf2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(pslld, LEG(66, 0F, 0, 0xf2), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpslld, VEX(128, 66, 0F, IG, 0xf2), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpslld, VEX(256, 66, 0F, IG, 0xf2), AVX2, WRR, Vqq, Hqq, Wdq)
 OPCODE(psllq, LEG(NP, 0F, 0, 0xf3), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psllq, LEG(66, 0F, 0, 0xf3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsllq, VEX(128, 66, 0F, IG, 0xf3), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsllq, VEX(256, 66, 0F, IG, 0xf3), AVX2, WRR, Vqq, Hqq, Wdq)
+OPCODE(vpsllvd, VEX(128, 66, 0F38, 0, 0x47), AVX2, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsllvd, VEX(256, 66, 0F38, 0, 0x47), AVX2, WRR, Vqq, Hqq, Wqq)
+OPCODE(vpsllvq, VEX(128, 66, 0F38, 1, 0x47), AVX2, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsllvq, VEX(256, 66, 0F38, 1, 0x47), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psrlw, LEG(NP, 0F, 0, 0xd1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrlw, LEG(66, 0F, 0, 0xd1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsrlw, VEX(128, 66, 0F, IG, 0xd1), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsrlw, VEX(256, 66, 0F, IG, 0xd1), AVX2, WRR, Vqq, Hqq, Wdq)
 OPCODE(psrld, LEG(NP, 0F, 0, 0xd2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrld, LEG(66, 0F, 0, 0xd2), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsrld, VEX(128, 66, 0F, IG, 0xd2), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsrld, VEX(256, 66, 0F, IG, 0xd2), AVX2, WRR, Vqq, Hqq, Wdq)
 OPCODE(psrlq, LEG(NP, 0F, 0, 0xd3), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrlq, LEG(66, 0F, 0, 0xd3), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsrlq, VEX(128, 66, 0F, IG, 0xd3), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsrlq, VEX(256, 66, 0F, IG, 0xd3), AVX2, WRR, Vqq, Hqq, Wdq)
+OPCODE(vpsrlvd, VEX(128, 66, 0F38, 0, 0x45), AVX2, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsrlvd, VEX(256, 66, 0F38, 0, 0x45), AVX2, WRR, Vqq, Hqq, Wqq)
+OPCODE(vpsrlvq, VEX(128, 66, 0F38, 1, 0x45), AVX2, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsrlvq, VEX(256, 66, 0F38, 1, 0x45), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(psraw, LEG(NP, 0F, 0, 0xe1), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psraw, LEG(66, 0F, 0, 0xe1), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsraw, VEX(128, 66, 0F, IG, 0xe1), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsraw, VEX(256, 66, 0F, IG, 0xe1), AVX2, WRR, Vqq, Hqq, Wdq)
 OPCODE(psrad, LEG(NP, 0F, 0, 0xe2), MMX, WRR, Pq, Pq, Qq)
 OPCODE(psrad, LEG(66, 0F, 0, 0xe2), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpsrad, VEX(128, 66, 0F, IG, 0xe2), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsrad, VEX(256, 66, 0F, IG, 0xe2), AVX2, WRR, Vqq, Hqq, Wdq)
+OPCODE(vpsravd, VEX(128, 66, 0F38, 0, 0x46), AVX2, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpsravd, VEX(256, 66, 0F38, 0, 0x46), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(palignr, LEG(NP, 0F3A, 0, 0x0f), SSSE3, WRRR, Pq, Pq, Qq, Ib)
 OPCODE(palignr, LEG(66, 0F3A, 0, 0x0f), SSSE3, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(vpalignr, VEX(128, 66, 0F3A, IG, 0x0f), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vpalignr, VEX(256, 66, 0F3A, IG, 0x0f), AVX2, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(packsswb, LEG(NP, 0F, 0, 0x63), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packsswb, LEG(66, 0F, 0, 0x63), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpacksswb, VEX(128, 66, 0F, IG, 0x63), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpacksswb, VEX(256, 66, 0F, IG, 0x63), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packssdw, LEG(66, 0F, 0, 0x6b), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpackssdw, VEX(128, 66, 0F, IG, 0x6b), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpackssdw, VEX(256, 66, 0F, IG, 0x6b), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(packuswb, LEG(NP, 0F, 0, 0x67), MMX, WRR, Pq, Pq, Qq)
 OPCODE(packuswb, LEG(66, 0F, 0, 0x67), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpackuswb, VEX(128, 66, 0F, IG, 0x67), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpackuswb, VEX(256, 66, 0F, IG, 0x67), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(packusdw, LEG(66, 0F38, 0, 0x2b), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpackusdw, VEX(128, 66, 0F38, IG, 0x2b), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpackusdw, VEX(256, 66, 0F38, IG, 0x2b), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpckhbw, LEG(NP, 0F, 0, 0x68), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhbw, LEG(66, 0F, 0, 0x68), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpckhbw, VEX(128, 66, 0F, IG, 0x68), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpckhbw, VEX(256, 66, 0F, IG, 0x68), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpckhwd, LEG(NP, 0F, 0, 0x69), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhwd, LEG(66, 0F, 0, 0x69), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpckhwd, VEX(128, 66, 0F, IG, 0x69), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpckhwd, VEX(256, 66, 0F, IG, 0x69), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpckhdq, LEG(NP, 0F, 0, 0x6a), MMX, WRR, Pq, Pq, Qq)
 OPCODE(punpckhdq, LEG(66, 0F, 0, 0x6a), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpckhdq, VEX(128, 66, 0F, IG, 0x6a), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpckhdq, VEX(256, 66, 0F, IG, 0x6a), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpckhqdq, LEG(66, 0F, 0, 0x6d), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpckhqdq, VEX(128, 66, 0F, IG, 0x6d), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpckhqdq, VEX(256, 66, 0F, IG, 0x6d), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpcklbw, LEG(NP, 0F, 0, 0x60), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpcklbw, LEG(66, 0F, 0, 0x60), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpcklbw, VEX(128, 66, 0F, IG, 0x60), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpcklbw, VEX(256, 66, 0F, IG, 0x60), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpcklwd, LEG(NP, 0F, 0, 0x61), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpcklwd, LEG(66, 0F, 0, 0x61), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpcklwd, VEX(128, 66, 0F, IG, 0x61), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpcklwd, VEX(256, 66, 0F, IG, 0x61), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpckldq, LEG(NP, 0F, 0, 0x62), MMX, WRR, Pq, Pq, Qd)
 OPCODE(punpckldq, LEG(66, 0F, 0, 0x62), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpckldq, VEX(128, 66, 0F, IG, 0x62), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpckldq, VEX(256, 66, 0F, IG, 0x62), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(punpcklqdq, LEG(66, 0F, 0, 0x6c), SSE2, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpunpcklqdq, VEX(128, 66, 0F, IG, 0x6c), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpunpcklqdq, VEX(256, 66, 0F, IG, 0x6c), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(unpcklps, LEG(NP, 0F, 0, 0x14), SSE, WRR, Vdq, Vdq, Wdq)
 OPCODE(vunpcklps, VEX(128, NP, 0F, IG, 0x14), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vunpcklps, VEX(256, NP, 0F, IG, 0x14), AVX, WRR, Vqq, Hqq, Wqq)
@@ -1410,13 +1683,17 @@ OPCODE(vunpckhpd, VEX(256, 66, 0F, IG, 0x15), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(pshufb, LEG(NP, 0F38, 0, 0x00), SSSE3, WRR, Pq, Pq, Qq)
 OPCODE(pshufb, LEG(66, 0F38, 0, 0x00), SSSE3, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpshufb, VEX(128, 66, 0F38, IG, 0x00), AVX, WRR, Vdq, Hdq, Wdq)
+OPCODE(vpshufb, VEX(256, 66, 0F38, IG, 0x00), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(pshufw, LEG(NP, 0F, 0, 0x70), SSE, WRR, Pq, Qq, Ib)
 OPCODE(pshuflw, LEG(F2, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(vpshuflw, VEX(128, F2, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vpshuflw, VEX(256, F2, 0F, IG, 0x70), AVX2, WRR, Vqq, Wqq, Ib)
 OPCODE(pshufhw, LEG(F3, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(vpshufhw, VEX(128, F3, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vpshufhw, VEX(256, F3, 0F, IG, 0x70), AVX2, WRR, Vqq, Wqq, Ib)
 OPCODE(pshufd, LEG(66, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
 OPCODE(vpshufd, VEX(128, 66, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
+OPCODE(vpshufd, VEX(256, 66, 0F, IG, 0x70), AVX2, WRR, Vqq, Wqq, Ib)
 OPCODE(shufps, LEG(NP, 0F, 0, 0xc6), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(vshufps, VEX(128, NP, 0F, IG, 0xc6), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
 OPCODE(vshufps, VEX(256, NP, 0F, IG, 0xc6), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
@@ -1437,8 +1714,12 @@ OPCODE(vblendvpd, VEX(128, 66, 0F3A, 0, 0x4b), AVX, WRRR, Vdq, Hdq, Wdq, Ldq)
 OPCODE(vblendvpd, VEX(256, 66, 0F3A, 0, 0x4b), AVX, WRRR, Vqq, Hqq, Wqq, Lqq)
 OPCODE(pblendvb, LEG(66, 0F38, 0, 0x10), SSE4_1, WRR, Vdq, Vdq, Wdq)
 OPCODE(vpblendvb, VEX(128, 66, 0F3A, 0, 0x4c), AVX, WRRR, Vdq, Hdq, Wdq, Ldq)
+OPCODE(vpblendvb, VEX(256, 66, 0F3A, 0, 0x4c), AVX2, WRRR, Vqq, Hqq, Wqq, Lqq)
 OPCODE(pblendw, LEG(66, 0F3A, 0, 0x0e), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
 OPCODE(vpblendw, VEX(128, 66, 0F3A, IG, 0x0e), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vpblendw, VEX(256, 66, 0F3A, IG, 0x0e), AVX2, WRRR, Vqq, Hqq, Wqq, Ib)
+OPCODE(vpblendd, VEX(128, 66, 0F3A, 0, 0x02), AVX2, WRRR, Vdq, Hdq, Wdq, Ib)
+OPCODE(vpblendd, VEX(256, 66, 0F3A, 0, 0x02), AVX2, WRRR, Vqq, Hqq, Wqq, Ib)
 OPCODE(insertps, LEG(66, 0F3A, 0, 0x21), SSE4_1, WRRR, Vdq, Vdq, Wd, Ib)
 OPCODE(vinsertps, VEX(128, 66, 0F3A, IG, 0x21), AVX, WRRR, Vdq, Hdq, Wd, Ib)
 OPCODE(pinsrb, LEG(66, 0F3A, 0, 0x20), SSE4_1, WRRR, Vdq, Vdq, RdMb, Ib)
@@ -1451,6 +1732,7 @@ OPCODE(vpinsrd, VEX(128, 66, 0F3A, 0, 0x22), AVX, WRRR, Vdq, Hdq, Ed, Ib)
 OPCODE(pinsrq, LEG(66, 0F3A, 1, 0x22), SSE4_1, WRRR, Vdq, Vdq, Eq, Ib)
 OPCODE(vpinsrq, VEX(128, 66, 0F3A, 1, 0x22), AVX, WRRR, Vdq, Hdq, Eq, Ib)
 OPCODE(vinsertf128, VEX(256, 66, 0F3A, 0, 0x18), AVX, WRRR, Vqq, Hqq, Wdq, Ib)
+OPCODE(vinserti128, VEX(256, 66, 0F3A, 0, 0x38), AVX2, WRRR, Vqq, Hqq, Wdq, Ib)
 OPCODE(extractps, LEG(66, 0F3A, 0, 0x17), SSE4_1, WRR, Ed, Vdq, Ib)
 OPCODE(vextractps, VEX(128, 66, 0F3A, IG, 0x17), AVX, WRR, Ed, Vdq, Ib)
 OPCODE(pextrb, LEG(66, 0F3A, 0, 0x14), SSE4_1, WRR, RdMb, Vdq, Ib)
@@ -1468,11 +1750,24 @@ OPCODE(pextrw, LEG(66, 0F, 1, 0xc5), SSE2, WRR, Gq, Udq, Ib)
 OPCODE(vpextrw, VEX(128, 66, 0F, 0, 0xc5), AVX, WRR, Gd, Udq, Ib)
 OPCODE(vpextrw, VEX(128, 66, 0F, 1, 0xc5), AVX, WRR, Gq, Udq, Ib)
 OPCODE(vextractf128, VEX(256, 66, 0F3A, 0, 0x19), AVX, WRR, Wdq, Vqq, Ib)
-OPCODE(vbroadcastss, VEX(128, 66, 0F38, 0, 0x18), AVX, WR, Vdq, Md)
-OPCODE(vbroadcastss, VEX(256, 66, 0F38, 0, 0x18), AVX, WR, Vqq, Md)
-OPCODE(vbroadcastsd, VEX(256, 66, 0F38, 0, 0x19), AVX, WR, Vqq, Mq)
+OPCODE(vextracti128, VEX(256, 66, 0F3A, 0, 0x39), AVX2, WRR, Wdq, Vqq, Ib)
+OPCODE(vpbroadcastb, VEX(128, 66, 0F38, 0, 0x78), AVX2, WR, Vdq, Wb)
+OPCODE(vpbroadcastb, VEX(256, 66, 0F38, 0, 0x78), AVX2, WR, Vqq, Wb)
+OPCODE(vpbroadcastw, VEX(128, 66, 0F38, 0, 0x79), AVX2, WR, Vdq, Ww)
+OPCODE(vpbroadcastw, VEX(256, 66, 0F38, 0, 0x79), AVX2, WR, Vqq, Ww)
+OPCODE(vpbroadcastd, VEX(128, 66, 0F38, 0, 0x58), AVX2, WR, Vdq, Wd)
+OPCODE(vpbroadcastd, VEX(256, 66, 0F38, 0, 0x58), AVX2, WR, Vqq, Wd)
+OPCODE(vpbroadcastq, VEX(128, 66, 0F38, 0, 0x59), AVX2, WR, Vdq, Wq)
+OPCODE(vpbroadcastq, VEX(256, 66, 0F38, 0, 0x59), AVX2, WR, Vqq, Wq)
+OPCODE(vbroadcastss, VEX(128, 66, 0F38, 0, 0x18), AVX, WRR, Vdq, Wd, modrm_mod)
+OPCODE(vbroadcastss, VEX(256, 66, 0F38, 0, 0x18), AVX, WRR, Vqq, Wd, modrm_mod)
+OPCODE(vbroadcastsd, VEX(256, 66, 0F38, 0, 0x19), AVX, WRR, Vqq, Wq, modrm_mod)
 OPCODE(vbroadcastf128, VEX(256, 66, 0F38, 0, 0x1a), AVX, WR, Vqq, Mdq)
+OPCODE(vbroadcasti128, VEX(256, 66, 0F38, 0, 0x5a), AVX2, WR, Vqq, Mdq)
 OPCODE(vperm2f128, VEX(256, 66, 0F3A, 0, 0x06), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
+OPCODE(vperm2i128, VEX(256, 66, 0F3A, 0, 0x46), AVX2, WRRR, Vqq, Hqq, Wqq, Ib)
+OPCODE(vpermd, VEX(256, 66, 0F38, 0, 0x36), AVX2, WRR, Vqq, Hqq, Wqq)
+OPCODE(vpermps, VEX(256, 66, 0F38, 0, 0x16), AVX2, WRR, Vqq, Hqq, Wqq)
 OPCODE(vpermilps, VEX(128, 66, 0F38, 0, 0x0c), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vpermilps, VEX(256, 66, 0F38, 0, 0x0c), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(vpermilps, VEX(128, 66, 0F3A, 0, 0x04), AVX, WRR, Vdq, Wdq, Ib)
@@ -1481,30 +1776,60 @@ OPCODE(vpermilpd, VEX(128, 66, 0F38, 0, 0x0d), AVX, WRR, Vdq, Hdq, Wdq)
 OPCODE(vpermilpd, VEX(256, 66, 0F38, 0, 0x0d), AVX, WRR, Vqq, Hqq, Wqq)
 OPCODE(vpermilpd, VEX(128, 66, 0F3A, 0, 0x05), AVX, WRR, Vdq, Wdq, Ib)
 OPCODE(vpermilpd, VEX(256, 66, 0F3A, 0, 0x05), AVX, WRR, Vqq, Wqq, Ib)
+OPCODE(vpermq, VEX(256, 66, 0F3A, 1, 0x00), AVX2, WRR, Vqq, Wqq, Ib)
+OPCODE(vpermpd, VEX(256, 66, 0F3A, 1, 0x01), AVX2, WRR, Vqq, Wqq, Ib)
+OPCODE(vgatherdps, VEX(128, 66, 0F38, 0, 0x92), AVX2, WWRRR, Vdq, Hdq, Vdq, MDdq, Hdq)
+OPCODE(vgatherdps, VEX(256, 66, 0F38, 0, 0x92), AVX2, WWRRR, Vqq, Hqq, Vqq, MDqq, Hqq)
+OPCODE(vgatherdpd, VEX(128, 66, 0F38, 1, 0x92), AVX2, WWRRR, Vdq, Hdq, Vdq, MDdq, Hdq)
+OPCODE(vgatherdpd, VEX(256, 66, 0F38, 1, 0x92), AVX2, WWRRR, Vqq, Hqq, Vqq, MDdq, Hqq)
+OPCODE(vgatherqps, VEX(128, 66, 0F38, 0, 0x93), AVX2, WWRRR, Vdq, Hdq, Vdq, MQdq, Hdq)
+OPCODE(vgatherqps, VEX(256, 66, 0F38, 0, 0x93), AVX2, WWRRR, Vdq, Hdq, Vdq, MQqq, Hdq)
+OPCODE(vgatherqpd, VEX(128, 66, 0F38, 1, 0x93), AVX2, WWRRR, Vdq, Hdq, Vdq, MQdq, Hdq)
+OPCODE(vgatherqpd, VEX(256, 66, 0F38, 1, 0x93), AVX2, WWRRR, Vqq, Hqq, Vqq, MQqq, Hqq)
+OPCODE(vpgatherdd, VEX(128, 66, 0F38, 0, 0x90), AVX2, WWRRR, Vdq, Hdq, Vdq, MDdq, Hdq)
+OPCODE(vpgatherdd, VEX(256, 66, 0F38, 0, 0x90), AVX2, WWRRR, Vqq, Hqq, Vqq, MDqq, Hqq)
+OPCODE(vpgatherdq, VEX(128, 66, 0F38, 1, 0x90), AVX2, WWRRR, Vdq, Hdq, Vdq, MDdq, Hdq)
+OPCODE(vpgatherdq, VEX(256, 66, 0F38, 1, 0x90), AVX2, WWRRR, Vqq, Hqq, Vqq, MDdq, Hqq)
+OPCODE(vpgatherqd, VEX(128, 66, 0F38, 0, 0x91), AVX2, WWRRR, Vdq, Hdq, Vdq, MQdq, Hdq)
+OPCODE(vpgatherqd, VEX(256, 66, 0F38, 0, 0x91), AVX2, WWRRR, Vdq, Hdq, Vdq, MQqq, Hdq)
+OPCODE(vpgatherqq, VEX(128, 66, 0F38, 1, 0x91), AVX2, WWRRR, Vdq, Hdq, Vdq, MQdq, Hdq)
+OPCODE(vpgatherqq, VEX(256, 66, 0F38, 1, 0x91), AVX2, WWRRR, Vqq, Hqq, Vqq, MQqq, Hqq)
 OPCODE(pmovsxbw, LEG(66, 0F38, 0, 0x20), SSE4_1, WR, Vdq, Wq)
 OPCODE(vpmovsxbw, VEX(128, 66, 0F38, IG, 0x20), AVX, WR, Vdq, Wq)
+OPCODE(vpmovsxbw, VEX(256, 66, 0F38, IG, 0x20), AVX2, WR, Vqq, Wdq)
 OPCODE(pmovsxbd, LEG(66, 0F38, 0, 0x21), SSE4_1, WR, Vdq, Wd)
 OPCODE(vpmovsxbd, VEX(128, 66, 0F38, IG, 0x21), AVX, WR, Vdq, Wd)
+OPCODE(vpmovsxbd, VEX(256, 66, 0F38, IG, 0x21), AVX2, WR, Vqq, Wq)
 OPCODE(pmovsxbq, LEG(66, 0F38, 0, 0x22), SSE4_1, WR, Vdq, Ww)
 OPCODE(vpmovsxbq, VEX(128, 66, 0F38, IG, 0x22), AVX, WR, Vdq, Ww)
+OPCODE(vpmovsxbq, VEX(256, 66, 0F38, IG, 0x22), AVX2, WR, Vqq, Wd)
 OPCODE(pmovsxwd, LEG(66, 0F38, 0, 0x23), SSE4_1, WR, Vdq, Wq)
 OPCODE(vpmovsxwd, VEX(128, 66, 0F38, IG, 0x23), AVX, WR, Vdq, Wq)
+OPCODE(vpmovsxwd, VEX(256, 66, 0F38, IG, 0x23), AVX2, WR, Vqq, Wdq)
 OPCODE(pmovsxwq, LEG(66, 0F38, 0, 0x24), SSE4_1, WR, Vdq, Wd)
 OPCODE(vpmovsxwq, VEX(128, 66, 0F38, IG, 0x24), AVX, WR, Vdq, Wd)
+OPCODE(vpmovsxwq, VEX(256, 66, 0F38, IG, 0x24), AVX2, WR, Vqq, Wq)
 OPCODE(pmovsxdq, LEG(66, 0F38, 0, 0x25), SSE4_1, WR, Vdq, Wq)
 OPCODE(vpmovsxdq, VEX(128, 66, 0F38, IG, 0x25), AVX, WR, Vdq, Wq)
+OPCODE(vpmovsxdq, VEX(256, 66, 0F38, IG, 0x25), AVX2, WR, Vqq, Wdq)
 OPCODE(pmovzxbw, LEG(66, 0F38, 0, 0x30), SSE4_1, WR, Vdq, Wq)
 OPCODE(vpmovzxbw, VEX(128, 66, 0F38, IG, 0x30), AVX, WR, Vdq, Wq)
+OPCODE(vpmovzxbw, VEX(256, 66, 0F38, IG, 0x30), AVX2, WR, Vqq, Wdq)
 OPCODE(pmovzxbd, LEG(66, 0F38, 0, 0x31), SSE4_1, WR, Vdq, Wd)
 OPCODE(vpmovzxbd, VEX(128, 66, 0F38, IG, 0x31), AVX, WR, Vdq, Wd)
+OPCODE(vpmovzxbd, VEX(256, 66, 0F38, IG, 0x31), AVX2, WR, Vqq, Wq)
 OPCODE(pmovzxbq, LEG(66, 0F38, 0, 0x32), SSE4_1, WR, Vdq, Ww)
 OPCODE(vpmovzxbq, VEX(128, 66, 0F38, IG, 0x32), AVX, WR, Vdq, Ww)
+OPCODE(vpmovzxbq, VEX(256, 66, 0F38, IG, 0x32), AVX2, WR, Vqq, Wd)
 OPCODE(pmovzxwd, LEG(66, 0F38, 0, 0x33), SSE4_1, WR, Vdq, Wq)
 OPCODE(vpmovzxwd, VEX(128, 66, 0F38, IG, 0x33), AVX, WR, Vdq, Wq)
+OPCODE(vpmovzxwd, VEX(256, 66, 0F38, IG, 0x33), AVX2, WR, Vqq, Wdq)
 OPCODE(pmovzxwq, LEG(66, 0F38, 0, 0x34), SSE4_1, WR, Vdq, Wd)
 OPCODE(vpmovzxwq, VEX(128, 66, 0F38, IG, 0x34), AVX, WR, Vdq, Wd)
+OPCODE(vpmovzxwq, VEX(256, 66, 0F38, IG, 0x34), AVX2, WR, Vqq, Wq)
 OPCODE(pmovzxdq, LEG(66, 0F38, 0, 0x35), SSE4_1, WR, Vdq, Wq)
 OPCODE(vpmovzxdq, VEX(128, 66, 0F38, IG, 0x35), AVX, WR, Vdq, Wq)
+OPCODE(vpmovzxdq, VEX(256, 66, 0F38, IG, 0x35), AVX2, WR, Vqq, Wdq)
 OPCODE(cvtpi2ps, LEG(NP, 0F, 0, 0x2a), SSE, WR, Vdq, Qq)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 0, 0x2a), SSE, WR, Vd, Ed)
 OPCODE(cvtsi2ss, LEG(F3, 0F, 1, 0x2a), SSE, WR, Vd, Eq)
@@ -1574,6 +1899,14 @@ OPCODE(vmaskmovpd, VEX(128, 66, 0F38, 0, 0x2d), AVX, WRR, Vdq, Hdq, Mdq)
 OPCODE(vmaskmovpd, VEX(128, 66, 0F38, 0, 0x2f), AVX, WRR, Mdq, Hdq, Vdq)
 OPCODE(vmaskmovpd, VEX(256, 66, 0F38, 0, 0x2d), AVX, WRR, Vqq, Hqq, Mqq)
 OPCODE(vmaskmovpd, VEX(256, 66, 0F38, 0, 0x2f), AVX, WRR, Mqq, Hqq, Vqq)
+OPCODE(vpmaskmovd, VEX(128, 66, 0F38, 0, 0x8c), AVX2, WRR, Vdq, Hdq, Mdq)
+OPCODE(vpmaskmovd, VEX(128, 66, 0F38, 0, 0x8e), AVX2, WRR, Mdq, Hdq, Vdq)
+OPCODE(vpmaskmovd, VEX(256, 66, 0F38, 0, 0x8c), AVX2, WRR, Vqq, Hqq, Mqq)
+OPCODE(vpmaskmovd, VEX(256, 66, 0F38, 0, 0x8e), AVX2, WRR, Mqq, Hqq, Vqq)
+OPCODE(vpmaskmovq, VEX(128, 66, 0F38, 1, 0x8c), AVX2, WRR, Vdq, Hdq, Mdq)
+OPCODE(vpmaskmovq, VEX(128, 66, 0F38, 1, 0x8e), AVX2, WRR, Mdq, Hdq, Vdq)
+OPCODE(vpmaskmovq, VEX(256, 66, 0F38, 1, 0x8c), AVX2, WRR, Vqq, Hqq, Mqq)
+OPCODE(vpmaskmovq, VEX(256, 66, 0F38, 1, 0x8e), AVX2, WRR, Mqq, Hqq, Vqq)
 OPCODE(movntps, LEG(NP, 0F, 0, 0x2b), SSE, WR, Mdq, Vdq)
 OPCODE(vmovntps, VEX(128, NP, 0F, IG, 0x2b), AVX, WR, Mdq, Vdq)
 OPCODE(vmovntps, VEX(256, NP, 0F, IG, 0x2b), AVX, WR, Mqq, Vqq)
@@ -1588,6 +1921,7 @@ OPCODE(vmovntdq, VEX(128, 66, 0F, IG, 0xe7), AVX, WR, Mdq, Vdq)
 OPCODE(vmovntdq, VEX(256, 66, 0F, IG, 0xe7), AVX, WR, Mqq, Vqq)
 OPCODE(movntdqa, LEG(66, 0F38, 0, 0x2a), SSE4_1, WR, Vdq, Mdq)
 OPCODE(vmovntdqa, VEX(128, 66, 0F38, IG, 0x2a), AVX, WR, Vdq, Mdq)
+OPCODE(vmovntdqa, VEX(256, 66, 0F38, IG, 0x2a), AVX2, WR, Vqq, Mqq)
 OPCODE(pause, LEG(F3, NA, 0, 0x90), SSE2, )
 OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
 OPCODE(vzeroupper, VEX(128, NP, 0F, IG, 0x77), AVX, )
@@ -1614,6 +1948,13 @@ OPCODE_GRP_BEGIN(grp12_VEX_128_66)
     OPCODE_GRPMEMB(grp12_VEX_128_66, vpsraw, 4, AVX, WRR, Hdq, Udq, Ib)
 OPCODE_GRP_END(grp12_VEX_128_66)
 
+OPCODE_GRP(grp12_VEX_256_66, VEX(256, 66, 0F, IG, 0x71))
+OPCODE_GRP_BEGIN(grp12_VEX_256_66)
+    OPCODE_GRPMEMB(grp12_VEX_256_66, vpsllw, 6, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp12_VEX_256_66, vpsrlw, 2, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp12_VEX_256_66, vpsraw, 4, AVX2, WRR, Hqq, Uqq, Ib)
+OPCODE_GRP_END(grp12_VEX_256_66)
+
 OPCODE_GRP(grp13_LEG_66, LEG(66, 0F, 0, 0x72))
 OPCODE_GRP_BEGIN(grp13_LEG_66)
     OPCODE_GRPMEMB(grp13_LEG_66, pslld, 6, SSE2, WRR, Udq, Udq, Ib)
@@ -1635,6 +1976,13 @@ OPCODE_GRP_BEGIN(grp13_VEX_128_66)
     OPCODE_GRPMEMB(grp13_VEX_128_66, vpsrad, 4, AVX, WRR, Hdq, Udq, Ib)
 OPCODE_GRP_END(grp13_VEX_128_66)
 
+OPCODE_GRP(grp13_VEX_256_66, VEX(256, 66, 0F, IG, 0x72))
+OPCODE_GRP_BEGIN(grp13_VEX_256_66)
+    OPCODE_GRPMEMB(grp13_VEX_256_66, vpslld, 6, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp13_VEX_256_66, vpsrld, 2, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp13_VEX_256_66, vpsrad, 4, AVX2, WRR, Hqq, Uqq, Ib)
+OPCODE_GRP_END(grp13_VEX_256_66)
+
 OPCODE_GRP(grp14_LEG_66, LEG(66, 0F, 0, 0x73))
 OPCODE_GRP_BEGIN(grp14_LEG_66)
     OPCODE_GRPMEMB(grp14_LEG_66, psllq, 6, SSE2, WRR, Udq, Udq, Ib)
@@ -1657,6 +2005,14 @@ OPCODE_GRP_BEGIN(grp14_VEX_128_66)
     OPCODE_GRPMEMB(grp14_VEX_128_66, vpsrldq, 3, AVX, WRR, Hdq, Udq, Ib)
 OPCODE_GRP_END(grp14_VEX_128_66)
 
+OPCODE_GRP(grp14_VEX_256_66, VEX(256, 66, 0F, IG, 0x73))
+OPCODE_GRP_BEGIN(grp14_VEX_256_66)
+    OPCODE_GRPMEMB(grp14_VEX_256_66, vpsllq, 6, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp14_VEX_256_66, vpslldq, 7, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp14_VEX_256_66, vpsrlq, 2, AVX2, WRR, Hqq, Uqq, Ib)
+    OPCODE_GRPMEMB(grp14_VEX_256_66, vpsrldq, 3, AVX2, WRR, Hqq, Uqq, Ib)
+OPCODE_GRP_END(grp14_VEX_256_66)
+
 OPCODE_GRP(grp15_LEG_NP, LEG(NP, 0F, 0, 0xae))
 OPCODE_GRP_BEGIN(grp15_LEG_NP)
     OPCODE_GRPMEMB(grp15_LEG_NP, sfence_clflush, 7, SSE, RR, modrm_mod, modrm)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 65/75] target/i386: remove obsoleted helpers
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (62 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 64/75] target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 66/75] target/i386: cleanup leftovers in ops_sse_header.h Jan Bobek
                   ` (9 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

A number of helpers have been obsoleted by the use of tcg_gen_gvec_*
functions; remove all of them.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 65 ------------------------------------
 target/i386/ops_sse_header.h | 39 ----------------------
 target/i386/translate.c      | 38 ---------------------
 3 files changed, 142 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index ec1ec745d0..aca6b50f23 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -337,32 +337,6 @@ static inline int satsw(int x)
     }
 }
 
-#define FADD(a, b) ((a) + (b))
-#define FADDUB(a, b) satub((a) + (b))
-#define FADDUW(a, b) satuw((a) + (b))
-#define FADDSB(a, b) satsb((int8_t)(a) + (int8_t)(b))
-#define FADDSW(a, b) satsw((int16_t)(a) + (int16_t)(b))
-
-#define FSUB(a, b) ((a) - (b))
-#define FSUBUB(a, b) satub((a) - (b))
-#define FSUBUW(a, b) satuw((a) - (b))
-#define FSUBSB(a, b) satsb((int8_t)(a) - (int8_t)(b))
-#define FSUBSW(a, b) satsw((int16_t)(a) - (int16_t)(b))
-#define FMINUB(a, b) ((a) < (b)) ? (a) : (b)
-#define FMINSW(a, b) ((int16_t)(a) < (int16_t)(b)) ? (a) : (b)
-#define FMAXUB(a, b) ((a) > (b)) ? (a) : (b)
-#define FMAXSW(a, b) ((int16_t)(a) > (int16_t)(b)) ? (a) : (b)
-
-#define FAND(a, b) ((a) & (b))
-#define FANDN(a, b) ((~(a)) & (b))
-#define FOR(a, b) ((a) | (b))
-#define FXOR(a, b) ((a) ^ (b))
-
-#define FCMPGTB(a, b) ((int8_t)(a) > (int8_t)(b) ? -1 : 0)
-#define FCMPGTW(a, b) ((int16_t)(a) > (int16_t)(b) ? -1 : 0)
-#define FCMPGTL(a, b) ((int32_t)(a) > (int32_t)(b) ? -1 : 0)
-#define FCMPEQ(a, b) ((a) == (b) ? -1 : 0)
-
 #define FMULLW(a, b) ((a) * (b))
 #define FMULHRW(a, b) (((int16_t)(a) * (int16_t)(b) + 0x8000) >> 16)
 #define FMULHUW(a, b) ((a) * (b) >> 16)
@@ -371,45 +345,6 @@ static inline int satsw(int x)
 #define FAVG(a, b) (((a) + (b) + 1) >> 1)
 #endif
 
-SSE_HELPER_B(helper_paddb, FADD)
-SSE_HELPER_W(helper_paddw, FADD)
-SSE_HELPER_L(helper_paddl, FADD)
-SSE_HELPER_Q(helper_paddq, FADD)
-
-SSE_HELPER_B(helper_psubb, FSUB)
-SSE_HELPER_W(helper_psubw, FSUB)
-SSE_HELPER_L(helper_psubl, FSUB)
-SSE_HELPER_Q(helper_psubq, FSUB)
-
-SSE_HELPER_B(helper_paddusb, FADDUB)
-SSE_HELPER_B(helper_paddsb, FADDSB)
-SSE_HELPER_B(helper_psubusb, FSUBUB)
-SSE_HELPER_B(helper_psubsb, FSUBSB)
-
-SSE_HELPER_W(helper_paddusw, FADDUW)
-SSE_HELPER_W(helper_paddsw, FADDSW)
-SSE_HELPER_W(helper_psubusw, FSUBUW)
-SSE_HELPER_W(helper_psubsw, FSUBSW)
-
-SSE_HELPER_B(helper_pminub, FMINUB)
-SSE_HELPER_B(helper_pmaxub, FMAXUB)
-
-SSE_HELPER_W(helper_pminsw, FMINSW)
-SSE_HELPER_W(helper_pmaxsw, FMAXSW)
-
-SSE_HELPER_Q(helper_pand, FAND)
-SSE_HELPER_Q(helper_pandn, FANDN)
-SSE_HELPER_Q(helper_por, FOR)
-SSE_HELPER_Q(helper_pxor, FXOR)
-
-SSE_HELPER_B(helper_pcmpgtb, FCMPGTB)
-SSE_HELPER_W(helper_pcmpgtw, FCMPGTW)
-SSE_HELPER_L(helper_pcmpgtl, FCMPGTL)
-
-SSE_HELPER_B(helper_pcmpeqb, FCMPEQ)
-SSE_HELPER_W(helper_pcmpeqw, FCMPEQ)
-SSE_HELPER_L(helper_pcmpeql, FCMPEQ)
-
 SSE_HELPER_W(helper_pmullw, FMULLW)
 #if SHIFT == 0
 SSE_HELPER_W(helper_pmulhrw, FMULHRW)
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 094aafc573..d8e33dff6b 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -60,45 +60,6 @@ DEF_HELPER_3(glue(pslldq, SUFFIX), void, env, Reg, Reg)
 #define SSE_HELPER_Q(name, F)\
     DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg)
 
-SSE_HELPER_B(paddb, FADD)
-SSE_HELPER_W(paddw, FADD)
-SSE_HELPER_L(paddl, FADD)
-SSE_HELPER_Q(paddq, FADD)
-
-SSE_HELPER_B(psubb, FSUB)
-SSE_HELPER_W(psubw, FSUB)
-SSE_HELPER_L(psubl, FSUB)
-SSE_HELPER_Q(psubq, FSUB)
-
-SSE_HELPER_B(paddusb, FADDUB)
-SSE_HELPER_B(paddsb, FADDSB)
-SSE_HELPER_B(psubusb, FSUBUB)
-SSE_HELPER_B(psubsb, FSUBSB)
-
-SSE_HELPER_W(paddusw, FADDUW)
-SSE_HELPER_W(paddsw, FADDSW)
-SSE_HELPER_W(psubusw, FSUBUW)
-SSE_HELPER_W(psubsw, FSUBSW)
-
-SSE_HELPER_B(pminub, FMINUB)
-SSE_HELPER_B(pmaxub, FMAXUB)
-
-SSE_HELPER_W(pminsw, FMINSW)
-SSE_HELPER_W(pmaxsw, FMAXSW)
-
-SSE_HELPER_Q(pand, FAND)
-SSE_HELPER_Q(pandn, FANDN)
-SSE_HELPER_Q(por, FOR)
-SSE_HELPER_Q(pxor, FXOR)
-
-SSE_HELPER_B(pcmpgtb, FCMPGTB)
-SSE_HELPER_W(pcmpgtw, FCMPGTW)
-SSE_HELPER_L(pcmpgtl, FCMPGTL)
-
-SSE_HELPER_B(pcmpeqb, FCMPEQ)
-SSE_HELPER_W(pcmpeqw, FCMPEQ)
-SSE_HELPER_L(pcmpeql, FCMPEQ)
-
 SSE_HELPER_W(pmullw, FMULLW)
 #if SHIFT == 0
 SSE_HELPER_W(pmulhrw, FMULHRW)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3149989d68..78c91a85c9 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2756,19 +2756,11 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0x51] = SSE_FOP(sqrt),
     [0x52] = { gen_helper_rsqrtps, NULL, gen_helper_rsqrtss, NULL },
     [0x53] = { gen_helper_rcpps, NULL, gen_helper_rcpss, NULL },
-    [0x54] = { gen_helper_pand_xmm, gen_helper_pand_xmm }, /* andps, andpd */
-    [0x55] = { gen_helper_pandn_xmm, gen_helper_pandn_xmm }, /* andnps, andnpd */
-    [0x56] = { gen_helper_por_xmm, gen_helper_por_xmm }, /* orps, orpd */
-    [0x57] = { gen_helper_pxor_xmm, gen_helper_pxor_xmm }, /* xorps, xorpd */
     [0x58] = SSE_FOP(add),
     [0x59] = SSE_FOP(mul),
     [0x5a] = { gen_helper_cvtps2pd, gen_helper_cvtpd2ps,
                gen_helper_cvtss2sd, gen_helper_cvtsd2ss },
     [0x5b] = { gen_helper_cvtdq2ps, gen_helper_cvtps2dq, gen_helper_cvttps2dq },
-    [0x5c] = SSE_FOP(sub),
-    [0x5d] = SSE_FOP(min),
-    [0x5e] = SSE_FOP(div),
-    [0x5f] = SSE_FOP(max),
 
     [0xc2] = SSE_FOP(cmpeq),
     [0xc6] = { (SSEFunc_0_epp)gen_helper_shufps,
@@ -2783,9 +2775,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0x61] = MMX_OP2(punpcklwd),
     [0x62] = MMX_OP2(punpckldq),
     [0x63] = MMX_OP2(packsswb),
-    [0x64] = MMX_OP2(pcmpgtb),
-    [0x65] = MMX_OP2(pcmpgtw),
-    [0x66] = MMX_OP2(pcmpgtl),
     [0x67] = MMX_OP2(packuswb),
     [0x68] = MMX_OP2(punpckhbw),
     [0x69] = MMX_OP2(punpckhwd),
@@ -2802,9 +2791,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0x71] = { SSE_SPECIAL, SSE_SPECIAL }, /* shiftw */
     [0x72] = { SSE_SPECIAL, SSE_SPECIAL }, /* shiftd */
     [0x73] = { SSE_SPECIAL, SSE_SPECIAL }, /* shiftq */
-    [0x74] = MMX_OP2(pcmpeqb),
-    [0x75] = MMX_OP2(pcmpeqw),
-    [0x76] = MMX_OP2(pcmpeql),
     [0x77] = { SSE_DUMMY }, /* emms */
     [0x78] = { NULL, SSE_SPECIAL, NULL, SSE_SPECIAL }, /* extrq_i, insertq_i */
     [0x79] = { NULL, gen_helper_extrq_r, NULL, gen_helper_insertq_r },
@@ -2818,18 +2804,9 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xd1] = MMX_OP2(psrlw),
     [0xd2] = MMX_OP2(psrld),
     [0xd3] = MMX_OP2(psrlq),
-    [0xd4] = MMX_OP2(paddq),
     [0xd5] = MMX_OP2(pmullw),
     [0xd6] = { NULL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL },
     [0xd7] = { SSE_SPECIAL, SSE_SPECIAL }, /* pmovmskb */
-    [0xd8] = MMX_OP2(psubusb),
-    [0xd9] = MMX_OP2(psubusw),
-    [0xda] = MMX_OP2(pminub),
-    [0xdb] = MMX_OP2(pand),
-    [0xdc] = MMX_OP2(paddusb),
-    [0xdd] = MMX_OP2(paddusw),
-    [0xde] = MMX_OP2(pmaxub),
-    [0xdf] = MMX_OP2(pandn),
     [0xe0] = MMX_OP2(pavgb),
     [0xe1] = MMX_OP2(psraw),
     [0xe2] = MMX_OP2(psrad),
@@ -2838,14 +2815,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xe5] = MMX_OP2(pmulhw),
     [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq },
     [0xe7] = { SSE_SPECIAL , SSE_SPECIAL },  /* movntq, movntq */
-    [0xe8] = MMX_OP2(psubsb),
-    [0xe9] = MMX_OP2(psubsw),
-    [0xea] = MMX_OP2(pminsw),
-    [0xeb] = MMX_OP2(por),
-    [0xec] = MMX_OP2(paddsb),
-    [0xed] = MMX_OP2(paddsw),
-    [0xee] = MMX_OP2(pmaxsw),
-    [0xef] = MMX_OP2(pxor),
     [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */
     [0xf1] = MMX_OP2(psllw),
     [0xf2] = MMX_OP2(pslld),
@@ -2855,13 +2824,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xf6] = MMX_OP2(psadbw),
     [0xf7] = { (SSEFunc_0_epp)gen_helper_maskmov_mmx,
                (SSEFunc_0_epp)gen_helper_maskmov_xmm }, /* XXX: casts */
-    [0xf8] = MMX_OP2(psubb),
-    [0xf9] = MMX_OP2(psubw),
-    [0xfa] = MMX_OP2(psubl),
-    [0xfb] = MMX_OP2(psubq),
-    [0xfc] = MMX_OP2(paddb),
-    [0xfd] = MMX_OP2(paddw),
-    [0xfe] = MMX_OP2(paddl),
 };
 
 static const SSEFunc_0_epp sse_op_table2[3 * 8][2] = {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 66/75] target/i386: cleanup leftovers in ops_sse_header.h
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (63 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 65/75] target/i386: remove obsoleted helpers Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 67/75] target/i386: introduce aliases for helper-based tcg_gen_gvec_* functions Jan Bobek
                   ` (8 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Get rid of unused macro definitions that have been left over after
removal of obsoleted helpers.
---
 target/i386/ops_sse_header.h | 28 ++++++----------------------
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index d8e33dff6b..afa0ad0938 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -48,27 +48,15 @@ DEF_HELPER_3(glue(psrldq, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(pslldq, SUFFIX), void, env, Reg, Reg)
 #endif
 
-#define SSE_HELPER_B(name, F)\
-    DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg)
-
-#define SSE_HELPER_W(name, F)\
-    DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg)
-
-#define SSE_HELPER_L(name, F)\
-    DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg)
-
-#define SSE_HELPER_Q(name, F)\
-    DEF_HELPER_3(glue(name, SUFFIX), void, env, Reg, Reg)
-
-SSE_HELPER_W(pmullw, FMULLW)
+DEF_HELPER_3(glue(pmullw, SUFFIX), void, env, Reg, Reg)
 #if SHIFT == 0
-SSE_HELPER_W(pmulhrw, FMULHRW)
+DEF_HELPER_3(glue(pmulhrw, SUFFIX), void, env, Reg, Reg)
 #endif
-SSE_HELPER_W(pmulhuw, FMULHUW)
-SSE_HELPER_W(pmulhw, FMULHW)
+DEF_HELPER_3(glue(pmulhuw, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_3(glue(pmulhw, SUFFIX), void, env, Reg, Reg)
 
-SSE_HELPER_B(pavgb, FAVG)
-SSE_HELPER_W(pavgw, FAVG)
+DEF_HELPER_3(glue(pavgb, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_3(glue(pavgw, SUFFIX), void, env, Reg, Reg)
 
 DEF_HELPER_3(glue(pmuludq, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(pmaddwd, SUFFIX), void, env, Reg, Reg)
@@ -311,10 +299,6 @@ DEF_HELPER_4(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, i32)
 #undef Reg
 #undef SUFFIX
 
-#undef SSE_HELPER_B
-#undef SSE_HELPER_W
-#undef SSE_HELPER_L
-#undef SSE_HELPER_Q
 #undef SSE_HELPER_S
 #undef SSE_HELPER_CMP
 #undef UNPCK_OP
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 67/75] target/i386: introduce aliases for helper-based tcg_gen_gvec_* functions
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (64 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 66/75] target/i386: cleanup leftovers in ops_sse_header.h Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 68/75] target/i386: convert ps((l, r)l(w, d, q), ra(w, d)) to helpers to gvec style Jan Bobek
                   ` (7 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Expand our aliases for tcg_gen_gvec_* functions to also include those
that generate calls to out-of-line helpers. This allows us use them
via the DEF_GEN_INSN*_GVEC macros.
---
 target/i386/translate.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 78c91a85c9..c7e664e798 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4441,6 +4441,36 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
     }
 }
 
+#define gen_gvec_s1_ool(ret, aofs, oprsz, maxsz, data, helper)  \
+    do {                                                        \
+        const TCGv_ptr a0 = tcg_temp_new_ptr();                 \
+        const TCGv_i32 desc =                                   \
+              tcg_const_i32(simd_desc(oprsz, maxsz, 0));        \
+                                                                \
+        tcg_gen_addi_ptr(a0, cpu_env, aofs);                    \
+        gen_helper_ ## helper(ret, a0, desc);                   \
+                                                                \
+        tcg_temp_free_ptr(a0);                                  \
+        tcg_temp_free_i32(desc);                                \
+    } while (0)
+
+/*
+ * We pass the immediate value via simd_data. The width is limited
+ * to SIMD_DATA_BITS, but we only use up to 8-bit immediates.
+ */
+#define gen_gvec_sd1_ool(ret, aofs, oprsz, maxsz, helper)       \
+    gen_gvec_s1_ool(ret, aofs, oprsz, maxsz, 0, helper)
+#define gen_gvec_sq1_ool(ret, aofs, oprsz, maxsz, helper)       \
+    gen_gvec_s1_ool(ret, aofs, oprsz, maxsz, 0, helper)
+#define gen_gvec_2_ool(aofs, bofs, oprsz, maxsz, helper)                \
+    tcg_gen_gvec_2_ool(aofs, bofs, oprsz, maxsz, 0, gen_helper_ ## helper)
+#define gen_gvec_2i_ool(aofs, bofs, c, oprsz, maxsz, helper)            \
+    tcg_gen_gvec_2_ool(aofs, bofs, oprsz, maxsz, c, gen_helper_ ## helper)
+#define gen_gvec_3_ool(aofs, bofs, cofs, oprsz, maxsz, helper)          \
+    tcg_gen_gvec_3_ool(aofs, bofs, cofs, oprsz, maxsz, 0, gen_helper_ ## helper)
+#define gen_gvec_3i_ool(aofs, bofs, cofs, c, oprsz, maxsz, helper)      \
+    tcg_gen_gvec_3_ool(aofs, bofs, cofs, oprsz, maxsz, c, gen_helper_ ## helper)
+
 #define gen_gvec_mov(dofs, aofs, oprsz, maxsz, vece)    \
     tcg_gen_gvec_mov(vece, dofs, aofs, oprsz, maxsz)
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 68/75] target/i386: convert ps((l, r)l(w, d, q), ra(w, d)) to helpers to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (65 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 67/75] target/i386: introduce aliases for helper-based tcg_gen_gvec_* functions Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 69/75] target/i386: convert pmullw/pmulhw/pmulhuw " Jan Bobek
                   ` (6 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 357 +++++++++++++++++++++--------------
 target/i386/ops_sse_header.h |  30 ++-
 target/i386/translate.c      | 259 +++++++------------------
 3 files changed, 306 insertions(+), 340 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index aca6b50f23..168e581c0c 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -19,6 +19,7 @@
  */
 
 #include "crypto/aes.h"
+#include "tcg-gvec-desc.h"
 
 #if SHIFT == 0
 #define Reg MMXReg
@@ -38,199 +39,273 @@
 #define SUFFIX _xmm
 #endif
 
-void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+static inline void glue(clear_high, SUFFIX)(Reg *d, intptr_t oprsz,
+                                            intptr_t maxsz)
 {
-    int shift;
+    intptr_t i;
 
-    if (s->Q(0) > 15) {
-        d->Q(0) = 0;
-#if SHIFT == 1
-        d->Q(1) = 0;
-#endif
-    } else {
-        shift = s->B(0);
-        d->W(0) >>= shift;
-        d->W(1) >>= shift;
-        d->W(2) >>= shift;
-        d->W(3) >>= shift;
-#if SHIFT == 1
-        d->W(4) >>= shift;
-        d->W(5) >>= shift;
-        d->W(6) >>= shift;
-        d->W(7) >>= shift;
-#endif
+    assert(oprsz % sizeof(uint64_t) == 0);
+    assert(maxsz % sizeof(uint64_t) == 0);
+
+    if (oprsz < maxsz) {
+        i = oprsz / sizeof(uint64_t);
+        for (; i * sizeof(uint64_t) < maxsz; ++i) {
+            d->Q(i) = 0;
+        }
     }
 }
 
-void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psllw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = b->Q(0);
+    const intptr_t oprsz = count > 15 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 15) {
-        shift = 15;
-    } else {
-        shift = s->B(0);
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = a->W(i) << count;
     }
-    d->W(0) = (int16_t)d->W(0) >> shift;
-    d->W(1) = (int16_t)d->W(1) >> shift;
-    d->W(2) = (int16_t)d->W(2) >> shift;
-    d->W(3) = (int16_t)d->W(3) >> shift;
-#if SHIFT == 1
-    d->W(4) = (int16_t)d->W(4) >> shift;
-    d->W(5) = (int16_t)d->W(5) >> shift;
-    d->W(6) = (int16_t)d->W(6) >> shift;
-    d->W(7) = (int16_t)d->W(7) >> shift;
-#endif
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_pslld, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = b->Q(0);
+    const intptr_t oprsz = count > 31 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 15) {
-        d->Q(0) = 0;
-#if SHIFT == 1
-        d->Q(1) = 0;
-#endif
-    } else {
-        shift = s->B(0);
-        d->W(0) <<= shift;
-        d->W(1) <<= shift;
-        d->W(2) <<= shift;
-        d->W(3) <<= shift;
-#if SHIFT == 1
-        d->W(4) <<= shift;
-        d->W(5) <<= shift;
-        d->W(6) <<= shift;
-        d->W(7) <<= shift;
-#endif
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        d->L(i) = a->L(i) << count;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psllq, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = b->Q(0);
+    const intptr_t oprsz = count > 63 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 31) {
-        d->Q(0) = 0;
-#if SHIFT == 1
-        d->Q(1) = 0;
-#endif
-    } else {
-        shift = s->B(0);
-        d->L(0) >>= shift;
-        d->L(1) >>= shift;
-#if SHIFT == 1
-        d->L(2) >>= shift;
-        d->L(3) >>= shift;
-#endif
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        d->Q(i) = a->Q(i) << count;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psllwi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = simd_data(desc);
+    const intptr_t oprsz = count > 15 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 31) {
-        shift = 31;
-    } else {
-        shift = s->B(0);
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = a->W(i) << count;
     }
-    d->L(0) = (int32_t)d->L(0) >> shift;
-    d->L(1) = (int32_t)d->L(1) >> shift;
-#if SHIFT == 1
-    d->L(2) = (int32_t)d->L(2) >> shift;
-    d->L(3) = (int32_t)d->L(3) >> shift;
-#endif
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_pslldi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = simd_data(desc);
+    const intptr_t oprsz = count > 31 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 31) {
-        d->Q(0) = 0;
-#if SHIFT == 1
-        d->Q(1) = 0;
-#endif
-    } else {
-        shift = s->B(0);
-        d->L(0) <<= shift;
-        d->L(1) <<= shift;
-#if SHIFT == 1
-        d->L(2) <<= shift;
-        d->L(3) <<= shift;
-#endif
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        d->L(i) = a->L(i) << count;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psllqi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = simd_data(desc);
+    const intptr_t oprsz = count > 63 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 63) {
-        d->Q(0) = 0;
-#if SHIFT == 1
-        d->Q(1) = 0;
-#endif
-    } else {
-        shift = s->B(0);
-        d->Q(0) >>= shift;
-#if SHIFT == 1
-        d->Q(1) >>= shift;
-#endif
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        d->Q(i) = a->Q(i) << count;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psrlw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    int shift;
+    const uint64_t count = b->Q(0);
+    const intptr_t oprsz = count > 15 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    if (s->Q(0) > 63) {
-        d->Q(0) = 0;
-#if SHIFT == 1
-        d->Q(1) = 0;
-#endif
-    } else {
-        shift = s->B(0);
-        d->Q(0) <<= shift;
-#if SHIFT == 1
-        d->Q(1) <<= shift;
-#endif
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = a->W(i) >> count;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-#if SHIFT == 1
-void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psrld, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const uint64_t count = b->Q(0);
+    const intptr_t oprsz = count > 31 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        d->L(i) = a->L(i) >> count;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psrlq, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const uint64_t count = b->Q(0);
+    const intptr_t oprsz = count > 63 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        d->Q(i) = a->Q(i) >> count;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psrlwi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
+{
+    const uint64_t count = simd_data(desc);
+    const intptr_t oprsz = count > 15 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = a->W(i) >> count;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psrldi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
+{
+    const uint64_t count = simd_data(desc);
+    const intptr_t oprsz = count > 31 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        d->L(i) = a->L(i) >> count;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psrlqi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    int shift, i;
+    const uint64_t count = simd_data(desc);
+    const intptr_t oprsz = count > 63 ? 0 : simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    shift = s->L(0);
-    if (shift > 16) {
-        shift = 16;
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        d->Q(i) = a->Q(i) >> count;
     }
-    for (i = 0; i < 16 - shift; i++) {
-        d->B(i) = d->B(i + shift);
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psraw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    uint64_t count = b->Q(0);
+    if (count > 15) {
+        count = 15;
+    }
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = (int16_t)a->W(i) >> count;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psrad, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    uint64_t count = b->Q(0);
+    if (count > 31) {
+        count = 31;
     }
-    for (i = 16 - shift; i < 16; i++) {
-        d->B(i) = 0;
+
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        d->L(i) = (int32_t)a->L(i) >> count;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_psrawi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    int shift, i;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    shift = s->L(0);
-    if (shift > 16) {
-        shift = 16;
+    uint64_t count = simd_data(desc);
+    if (count > 15) {
+        count = 15;
     }
-    for (i = 15; i >= shift; i--) {
-        d->B(i) = d->B(i - shift);
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = (int16_t)a->W(i) >> count;
     }
-    for (i = 0; i < shift; i++) {
-        d->B(i) = 0;
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psradi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    uint64_t count = simd_data(desc);
+    if (count > 31) {
+        count = 31;
+    }
+
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        d->L(i) = (int32_t)a->L(i) >> count;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+#if SHIFT == 1
+void glue(helper_pslldqi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    unsigned int count = simd_data(desc);
+    if (count > 16) {
+        count = 16;
+    }
+
+    for (intptr_t i = 0; i < oprsz; i += 16) {
+        intptr_t j = 15;
+        for (; count <= j; --j) {
+            d->B(i + j) = a->B(i + j - count);
+        }
+        for (; 0 <= j; --j) {
+            d->B(i + j) = 0;
+        }
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_psrldqi, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    unsigned int count = simd_data(desc);
+    if (count > 16) {
+        count = 16;
+    }
+
+    for (intptr_t i = 0; i < oprsz; i += 16) {
+        intptr_t j = 0;
+        for (; j + count < 16; ++j) {
+            d->B(i + j) = a->B(i + j + count);
+        }
+        for (; j < 16; ++j) {
+            d->B(i + j) = 0;
+        }
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 #endif
 
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index afa0ad0938..724692a689 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -34,18 +34,28 @@
 #define dh_is_signed_ZMMReg dh_is_signed_ptr
 #define dh_is_signed_MMXReg dh_is_signed_ptr
 
-DEF_HELPER_3(glue(psrlw, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(psraw, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(psllw, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(psrld, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(psrad, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(pslld, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(psrlq, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(psllq, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_4(glue(psllw, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(pslld, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(psllq, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_3(glue(psllwi, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(pslldi, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(psllqi, SUFFIX), void, Reg, Reg, i32)
+
+DEF_HELPER_4(glue(psrlw, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(psrld, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(psrlq, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_3(glue(psrlwi, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(psrldi, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(psrlqi, SUFFIX), void, Reg, Reg, i32)
+
+DEF_HELPER_4(glue(psraw, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(psrad, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_3(glue(psrawi, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(psradi, SUFFIX), void, Reg, Reg, i32)
 
 #if SHIFT == 1
-DEF_HELPER_3(glue(psrldq, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(pslldq, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_3(glue(pslldqi, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(psrldqi, SUFFIX), void, Reg, Reg, i32)
 #endif
 
 DEF_HELPER_3(glue(pmullw, SUFFIX), void, env, Reg, Reg)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index c7e664e798..03f7c6e450 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2801,24 +2801,16 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xc4] = { SSE_SPECIAL, SSE_SPECIAL }, /* pinsrw */
     [0xc5] = { SSE_SPECIAL, SSE_SPECIAL }, /* pextrw */
     [0xd0] = { NULL, gen_helper_addsubpd, NULL, gen_helper_addsubps },
-    [0xd1] = MMX_OP2(psrlw),
-    [0xd2] = MMX_OP2(psrld),
-    [0xd3] = MMX_OP2(psrlq),
     [0xd5] = MMX_OP2(pmullw),
     [0xd6] = { NULL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL },
     [0xd7] = { SSE_SPECIAL, SSE_SPECIAL }, /* pmovmskb */
     [0xe0] = MMX_OP2(pavgb),
-    [0xe1] = MMX_OP2(psraw),
-    [0xe2] = MMX_OP2(psrad),
     [0xe3] = MMX_OP2(pavgw),
     [0xe4] = MMX_OP2(pmulhuw),
     [0xe5] = MMX_OP2(pmulhw),
     [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq },
     [0xe7] = { SSE_SPECIAL , SSE_SPECIAL },  /* movntq, movntq */
     [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */
-    [0xf1] = MMX_OP2(psllw),
-    [0xf2] = MMX_OP2(pslld),
-    [0xf3] = MMX_OP2(psllq),
     [0xf4] = MMX_OP2(pmuludq),
     [0xf5] = MMX_OP2(pmaddwd),
     [0xf6] = MMX_OP2(psadbw),
@@ -2826,19 +2818,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
                (SSEFunc_0_epp)gen_helper_maskmov_xmm }, /* XXX: casts */
 };
 
-static const SSEFunc_0_epp sse_op_table2[3 * 8][2] = {
-    [0 + 2] = MMX_OP2(psrlw),
-    [0 + 4] = MMX_OP2(psraw),
-    [0 + 6] = MMX_OP2(psllw),
-    [8 + 2] = MMX_OP2(psrld),
-    [8 + 4] = MMX_OP2(psrad),
-    [8 + 6] = MMX_OP2(pslld),
-    [16 + 2] = MMX_OP2(psrlq),
-    [16 + 3] = { NULL, gen_helper_psrldq_xmm },
-    [16 + 6] = MMX_OP2(psllq),
-    [16 + 7] = { NULL, gen_helper_pslldq_xmm },
-};
-
 static const SSEFunc_0_epi sse_op_table3ai[] = {
     gen_helper_cvtsi2ss,
     gen_helper_cvtsi2sd
@@ -3403,49 +3382,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
                 goto illegal_op;
             }
             break;
-        case 0x71: /* shift mm, im */
-        case 0x72:
-        case 0x73:
-        case 0x171: /* shift xmm, im */
-        case 0x172:
-        case 0x173:
-            if (b1 >= 2) {
-                goto unknown_op;
-            }
-            val = x86_ldub_code(env, s);
-            if (is_xmm) {
-                tcg_gen_movi_tl(s->T0, val);
-                tcg_gen_st32_tl(s->T0, cpu_env,
-                                offsetof(CPUX86State, xmm_t0.ZMM_L(0)));
-                tcg_gen_movi_tl(s->T0, 0);
-                tcg_gen_st32_tl(s->T0, cpu_env,
-                                offsetof(CPUX86State, xmm_t0.ZMM_L(1)));
-                op1_offset = offsetof(CPUX86State,xmm_t0);
-            } else {
-                tcg_gen_movi_tl(s->T0, val);
-                tcg_gen_st32_tl(s->T0, cpu_env,
-                                offsetof(CPUX86State, mmx_t0.MMX_L(0)));
-                tcg_gen_movi_tl(s->T0, 0);
-                tcg_gen_st32_tl(s->T0, cpu_env,
-                                offsetof(CPUX86State, mmx_t0.MMX_L(1)));
-                op1_offset = offsetof(CPUX86State,mmx_t0);
-            }
-            sse_fn_epp = sse_op_table2[((b - 1) & 3) * 8 +
-                                       (((modrm >> 3)) & 7)][b1];
-            if (!sse_fn_epp) {
-                goto unknown_op;
-            }
-            if (is_xmm) {
-                rm = (modrm & 7) | REX_B(s);
-                op2_offset = offsetof(CPUX86State,xmm_regs[rm]);
-            } else {
-                rm = (modrm & 7);
-                op2_offset = offsetof(CPUX86State,fpregs[rm].mmx);
-            }
-            tcg_gen_addi_ptr(s->ptr0, cpu_env, op2_offset);
-            tcg_gen_addi_ptr(s->ptr1, cpu_env, op1_offset);
-            sse_fn_epp(cpu_env, s->ptr0, s->ptr1);
-            break;
         case 0x050: /* movmskps */
             rm = (modrm & 7) | REX_B(s);
             tcg_gen_addi_ptr(s->ptr0, cpu_env,
@@ -6889,18 +6825,18 @@ DEF_GEN_INSN3_GVEC(xorpd, Vdq, Vdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vxorpd, Vdq, Hdq, Wdq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 DEF_GEN_INSN3_GVEC(vxorpd, Vqq, Hqq, Wqq, xor, XMM_OPRSZ, XMM_MAXSZ, MO_64)
 
-DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psllw, psllw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsllw, psllw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsllw, psllw_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pslld, pslld_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpslld, pslld_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpslld, pslld_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psllq, psllq_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsllq, psllq_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsllq, psllq_xmm, Vqq, Hqq, Wdq)
+DEF_GEN_INSN3_GVEC(psllw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psllw_mmx)
+DEF_GEN_INSN3_GVEC(psllw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psllw_xmm)
+DEF_GEN_INSN3_GVEC(vpsllw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psllw_xmm)
+DEF_GEN_INSN3_GVEC(vpsllw, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psllw_xmm)
+DEF_GEN_INSN3_GVEC(pslld, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pslld_mmx)
+DEF_GEN_INSN3_GVEC(pslld, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pslld_xmm)
+DEF_GEN_INSN3_GVEC(vpslld, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pslld_xmm)
+DEF_GEN_INSN3_GVEC(vpslld, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pslld_xmm)
+DEF_GEN_INSN3_GVEC(psllq, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psllq_mmx)
+DEF_GEN_INSN3_GVEC(psllq, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psllq_xmm)
+DEF_GEN_INSN3_GVEC(vpsllq, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psllq_xmm)
+DEF_GEN_INSN3_GVEC(vpsllq, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psllq_xmm)
 
 GEN_INSN3(vpsllvd, Vdq, Hdq, Wdq)
 {
@@ -6920,21 +6856,18 @@ GEN_INSN3(vpsllvq, Vqq, Hqq, Wqq)
     /* XXX TODO implement this */
 }
 
-DEF_GEN_INSN3_HELPER_EPP(pslldq, pslldq_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpslldq, pslldq_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpslldq, pslldq_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psrlw, psrlw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrlw, psrlw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrlw, psrlw_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psrld, psrld_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrld, psrld_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrld, psrld_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psrlq, psrlq_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrlq, psrlq_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrlq, psrlq_xmm, Vqq, Hqq, Wdq)
+DEF_GEN_INSN3_GVEC(psrlw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psrlw_mmx)
+DEF_GEN_INSN3_GVEC(psrlw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrlw_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrlw_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlw, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrlw_xmm)
+DEF_GEN_INSN3_GVEC(psrld, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psrld_mmx)
+DEF_GEN_INSN3_GVEC(psrld, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrld_xmm)
+DEF_GEN_INSN3_GVEC(vpsrld, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrld_xmm)
+DEF_GEN_INSN3_GVEC(vpsrld, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrld_xmm)
+DEF_GEN_INSN3_GVEC(psrlq, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psrlq_mmx)
+DEF_GEN_INSN3_GVEC(psrlq, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrlq_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlq, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrlq_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlq, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrlq_xmm)
 
 GEN_INSN3(vpsrlvd, Vdq, Hdq, Wdq)
 {
@@ -6954,17 +6887,14 @@ GEN_INSN3(vpsrlvq, Vqq, Hqq, Wqq)
     /* XXX TODO implement this */
 }
 
-DEF_GEN_INSN3_HELPER_EPP(psrldq, psrldq_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrldq, psrldq_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrldq, psrldq_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psraw, psraw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsraw, psraw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsraw, psraw_xmm, Vqq, Hqq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psrad, psrad_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrad, psrad_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsrad, psrad_xmm, Vqq, Hqq, Wdq)
+DEF_GEN_INSN3_GVEC(psraw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psraw_mmx)
+DEF_GEN_INSN3_GVEC(psraw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psraw_xmm)
+DEF_GEN_INSN3_GVEC(vpsraw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psraw_xmm)
+DEF_GEN_INSN3_GVEC(vpsraw, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psraw_xmm)
+DEF_GEN_INSN3_GVEC(psrad, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psrad_mmx)
+DEF_GEN_INSN3_GVEC(psrad, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrad_xmm)
+DEF_GEN_INSN3_GVEC(vpsrad, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrad_xmm)
+DEF_GEN_INSN3_GVEC(vpsrad, Vqq, Hqq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psrad_xmm)
 
 GEN_INSN3(vpsravd, Vdq, Hdq, Wdq)
 {
@@ -6975,93 +6905,44 @@ GEN_INSN3(vpsravd, Vqq, Hqq, Wqq)
     /* XXX TODO implement this */
 }
 
-#define DEF_GEN_PSHIFT_IMM_MM(mnem, opT1, opT2)                         \
-    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
-    {                                                                   \
-        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
-        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
-        const insnop_arg_t(Qq) arg3_mm =                                \
-            offsetof(CPUX86State, mmx_t0.MMX_Q(0));                     \
-                                                                        \
-        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
-        gen_insn2(movq, Pq, Eq)(env, s, arg3_mm, arg3_r64);             \
-        gen_insn3(mnem, Pq, Pq, Qq)(env, s, arg1, arg2, arg3_mm);       \
-    }
-#define DEF_GEN_PSHIFT_IMM_XMM(mnem, opT1, opT2)                        \
-    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
-    {                                                                   \
-        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
-        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
-        const insnop_arg_t(Wdq) arg3_xmm =                              \
-            offsetof(CPUX86State, xmm_t0.ZMM_Q(0));                     \
-                                                                        \
-        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
-        gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
-        gen_insn3(mnem, Vdq, Vdq, Wdq)(env, s, arg1, arg2, arg3_xmm);   \
-    }
-#define DEF_GEN_VPSHIFT_IMM_XMM(mnem, opT1, opT2)                       \
-    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
-    {                                                                   \
-        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
-        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
-        const insnop_arg_t(Wdq) arg3_xmm =                              \
-            offsetof(CPUX86State, xmm_t0.ZMM_Q(0));                     \
-                                                                        \
-        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
-        gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
-        gen_insn3(mnem, Vdq, Hdq, Wdq)(env, s, arg2, arg2, arg3_xmm);   \
-    }
-#define DEF_GEN_VPSHIFT_IMM_YMM(mnem, opT1, opT2)                       \
-    GEN_INSN3(mnem, opT1, opT2, Ib)                                     \
-    {                                                                   \
-        const uint64_t arg3_ui64 = (uint8_t)arg3;                       \
-        const insnop_arg_t(Eq) arg3_r64 = s->tmp1_i64;                  \
-        const insnop_arg_t(Wdq) arg3_xmm =                              \
-            offsetof(CPUX86State, xmm_t0.ZMM_Q(0));                     \
-                                                                        \
-        tcg_gen_movi_i64(arg3_r64, arg3_ui64);                          \
-        gen_insn2(movq, Vdq, Eq)(env, s, arg3_xmm, arg3_r64);           \
-        gen_insn3(mnem, Vqq, Hqq, Wdq)(env, s, arg2, arg2, arg3_xmm);   \
-    }
-
-DEF_GEN_PSHIFT_IMM_MM(psllw, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psllw, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsllw, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpsllw, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(pslld, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(pslld, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpslld, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpslld, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(psllq, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psllq, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsllq, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpsllq, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_XMM(pslldq, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpslldq, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpslldq, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(psrlw, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psrlw, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsrlw, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpsrlw, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(psrld, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psrld, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsrld, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpsrld, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(psrlq, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psrlq, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsrlq, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpsrlq, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_XMM(psrldq, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsrldq, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_YMM(vpsrldq, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(psraw, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psraw, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsraw, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsraw, Hqq, Uqq)
-DEF_GEN_PSHIFT_IMM_MM(psrad, Nq, Nq)
-DEF_GEN_PSHIFT_IMM_XMM(psrad, Udq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsrad, Hdq, Udq)
-DEF_GEN_VPSHIFT_IMM_XMM(vpsrad, Hqq, Uqq)
+DEF_GEN_INSN3_GVEC(psllw, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psllwi_xmm)
+DEF_GEN_INSN3_GVEC(psllw, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psllwi_xmm)
+DEF_GEN_INSN3_GVEC(vpsllw, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psllwi_xmm)
+DEF_GEN_INSN3_GVEC(vpsllw, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psllwi_xmm)
+DEF_GEN_INSN3_GVEC(pslld, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, pslldi_xmm)
+DEF_GEN_INSN3_GVEC(pslld, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pslldi_xmm)
+DEF_GEN_INSN3_GVEC(vpslld, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pslldi_xmm)
+DEF_GEN_INSN3_GVEC(vpslld, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pslldi_xmm)
+DEF_GEN_INSN3_GVEC(psllq, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psllqi_xmm)
+DEF_GEN_INSN3_GVEC(psllq, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psllqi_xmm)
+DEF_GEN_INSN3_GVEC(vpsllq, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psllqi_xmm)
+DEF_GEN_INSN3_GVEC(vpsllq, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psllqi_xmm)
+DEF_GEN_INSN3_GVEC(pslldq, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pslldqi_xmm)
+DEF_GEN_INSN3_GVEC(vpslldq, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pslldqi_xmm)
+DEF_GEN_INSN3_GVEC(vpslldq, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pslldqi_xmm)
+DEF_GEN_INSN3_GVEC(psrlw, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psrlwi_xmm)
+DEF_GEN_INSN3_GVEC(psrlw, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrlwi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlw, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrlwi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlw, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrlwi_xmm)
+DEF_GEN_INSN3_GVEC(psrld, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psrldi_xmm)
+DEF_GEN_INSN3_GVEC(psrld, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrldi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrld, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrldi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrld, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrldi_xmm)
+DEF_GEN_INSN3_GVEC(psrlq, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psrlqi_xmm)
+DEF_GEN_INSN3_GVEC(psrlq, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrlqi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlq, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrlqi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrlq, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrlqi_xmm)
+DEF_GEN_INSN3_GVEC(psrldq, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrldqi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrldq, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrldqi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrldq, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrldqi_xmm)
+DEF_GEN_INSN3_GVEC(psraw, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psrawi_xmm)
+DEF_GEN_INSN3_GVEC(psraw, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrawi_xmm)
+DEF_GEN_INSN3_GVEC(vpsraw, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrawi_xmm)
+DEF_GEN_INSN3_GVEC(vpsraw, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psrawi_xmm)
+DEF_GEN_INSN3_GVEC(psrad, Nq, Nq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, psradi_xmm)
+DEF_GEN_INSN3_GVEC(psrad, Udq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psradi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrad, Hdq, Udq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psradi_xmm)
+DEF_GEN_INSN3_GVEC(vpsrad, Hqq, Uqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, psradi_xmm)
 
 DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_mmx, Pq, Pq, Qq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(palignr, palignr_xmm, Vdq, Vdq, Wdq, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 69/75] target/i386: convert pmullw/pmulhw/pmulhuw helpers to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (66 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 68/75] target/i386: convert ps((l, r)l(w, d, q), ra(w, d)) to helpers to gvec style Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 70/75] target/i386: convert pavgb/pavgw " Jan Bobek
                   ` (5 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 42 ++++++++++++++++++++++++++++++------
 target/i386/ops_sse_header.h |  6 +++---
 target/i386/translate.c      | 27 +++++++++++------------
 3 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 168e581c0c..6ec116573b 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -412,20 +412,50 @@ static inline int satsw(int x)
     }
 }
 
-#define FMULLW(a, b) ((a) * (b))
 #define FMULHRW(a, b) (((int16_t)(a) * (int16_t)(b) + 0x8000) >> 16)
-#define FMULHUW(a, b) ((a) * (b) >> 16)
-#define FMULHW(a, b) ((int16_t)(a) * (int16_t)(b) >> 16)
 
 #define FAVG(a, b) (((a) + (b) + 1) >> 1)
 #endif
 
-SSE_HELPER_W(helper_pmullw, FMULLW)
+void glue(helper_pmullw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        const uint32_t t = (uint32_t)a->W(i) * (uint32_t)b->W(i);
+        d->W(i) = t;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
 #if SHIFT == 0
 SSE_HELPER_W(helper_pmulhrw, FMULHRW)
 #endif
-SSE_HELPER_W(helper_pmulhuw, FMULHUW)
-SSE_HELPER_W(helper_pmulhw, FMULHW)
+
+void glue(helper_pmulhuw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        const uint32_t t = (uint32_t)a->W(i) * (uint32_t)b->W(i);
+        d->W(i) = t >> 16;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+void glue(helper_pmulhw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        const int32_t t = (int32_t)a->W(i) * (int32_t)b->W(i);
+        d->W(i) = t >> 16;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
 
 SSE_HELPER_B(helper_pavgb, FAVG)
 SSE_HELPER_W(helper_pavgw, FAVG)
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 724692a689..7e6411fc82 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -58,12 +58,12 @@ DEF_HELPER_3(glue(pslldqi, SUFFIX), void, Reg, Reg, i32)
 DEF_HELPER_3(glue(psrldqi, SUFFIX), void, Reg, Reg, i32)
 #endif
 
-DEF_HELPER_3(glue(pmullw, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_4(glue(pmullw, SUFFIX), void, Reg, Reg, Reg, i32)
 #if SHIFT == 0
 DEF_HELPER_3(glue(pmulhrw, SUFFIX), void, env, Reg, Reg)
 #endif
-DEF_HELPER_3(glue(pmulhuw, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(pmulhw, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_4(glue(pmulhuw, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(pmulhw, SUFFIX), void, Reg, Reg, Reg, i32)
 
 DEF_HELPER_3(glue(pavgb, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(pavgw, SUFFIX), void, env, Reg, Reg)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 03f7c6e450..79f8c1ddac 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2801,13 +2801,10 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xc4] = { SSE_SPECIAL, SSE_SPECIAL }, /* pinsrw */
     [0xc5] = { SSE_SPECIAL, SSE_SPECIAL }, /* pextrw */
     [0xd0] = { NULL, gen_helper_addsubpd, NULL, gen_helper_addsubps },
-    [0xd5] = MMX_OP2(pmullw),
     [0xd6] = { NULL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL },
     [0xd7] = { SSE_SPECIAL, SSE_SPECIAL }, /* pmovmskb */
     [0xe0] = MMX_OP2(pavgb),
     [0xe3] = MMX_OP2(pavgw),
-    [0xe4] = MMX_OP2(pmulhuw),
-    [0xe5] = MMX_OP2(pmulhw),
     [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq },
     [0xe7] = { SSE_SPECIAL , SSE_SPECIAL },  /* movntq, movntq */
     [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */
@@ -6116,21 +6113,21 @@ DEF_GEN_INSN3_HELPER_EPP(addsubpd, addsubpd, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vaddsubpd, addsubpd, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vaddsubpd, addsubpd, Vqq, Hqq, Wqq)
 
-DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pmullw, pmullw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmullw, pmullw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmullw, pmullw_xmm, Vqq, Hqq, Wqq)
+DEF_GEN_INSN3_GVEC(pmullw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pmullw_mmx)
+DEF_GEN_INSN3_GVEC(pmullw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmullw_xmm)
+DEF_GEN_INSN3_GVEC(vpmullw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmullw_xmm)
+DEF_GEN_INSN3_GVEC(vpmullw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmullw_xmm)
 DEF_GEN_INSN3_HELPER_EPP(pmulld, pmulld_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulld, pmulld_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulld, pmulld_xmm, Vqq, Hqq, Wqq)
-DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pmulhw, pmulhw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmulhw, pmulhw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmulhw, pmulhw_xmm, Vqq, Hqq, Wqq)
-DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pmulhuw, pmulhuw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmulhuw, pmulhuw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmulhuw, pmulhuw_xmm, Vqq, Hqq, Wqq)
+DEF_GEN_INSN3_GVEC(pmulhw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pmulhw_mmx)
+DEF_GEN_INSN3_GVEC(pmulhw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhw_xmm)
+DEF_GEN_INSN3_GVEC(vpmulhw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhw_xmm)
+DEF_GEN_INSN3_GVEC(vpmulhw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhw_xmm)
+DEF_GEN_INSN3_GVEC(pmulhuw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pmulhuw_mmx)
+DEF_GEN_INSN3_GVEC(pmulhuw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhuw_xmm)
+DEF_GEN_INSN3_GVEC(vpmulhuw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhuw_xmm)
+DEF_GEN_INSN3_GVEC(vpmulhuw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhuw_xmm)
 DEF_GEN_INSN3_HELPER_EPP(pmuldq, pmuldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vqq, Hqq, Wqq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 70/75] target/i386: convert pavgb/pavgw helpers to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (67 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 69/75] target/i386: convert pmullw/pmulhw/pmulhuw " Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 71/75] target/i386: convert pmuludq/pmaddwd " Jan Bobek
                   ` (4 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 33 +++++++++++++++++++++++++++++----
 target/i386/ops_sse_header.h |  7 +++++--
 target/i386/translate.c      | 20 +++++++++-----------
 3 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 6ec116573b..1661bd7c64 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -413,8 +413,6 @@ static inline int satsw(int x)
 }
 
 #define FMULHRW(a, b) (((int16_t)(a) * (int16_t)(b) + 0x8000) >> 16)
-
-#define FAVG(a, b) (((a) + (b) + 1) >> 1)
 #endif
 
 void glue(helper_pmullw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
@@ -457,8 +455,35 @@ void glue(helper_pmulhw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
     glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-SSE_HELPER_B(helper_pavgb, FAVG)
-SSE_HELPER_W(helper_pavgw, FAVG)
+void glue(helper_pavgb, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint8_t) < oprsz; ++i) {
+        d->B(i) = (a->B(i) + b->B(i) + 1) >> 1;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
+
+#if SHIFT == 0
+void glue(helper_pavgusb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+{
+    const uint32_t desc = simd_desc(sizeof(Reg), sizeof(Reg), 0);
+    glue(helper_pavgb, SUFFIX)(d, s, s, desc);
+}
+#endif
+
+void glue(helper_pavgw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint16_t) < oprsz; ++i) {
+        d->W(i) = (a->W(i) + b->W(i) + 1) >> 1;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
+}
 
 void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 7e6411fc82..b5e8aae897 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -65,8 +65,11 @@ DEF_HELPER_3(glue(pmulhrw, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_4(glue(pmulhuw, SUFFIX), void, Reg, Reg, Reg, i32)
 DEF_HELPER_4(glue(pmulhw, SUFFIX), void, Reg, Reg, Reg, i32)
 
-DEF_HELPER_3(glue(pavgb, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(pavgw, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_4(glue(pavgb, SUFFIX), void, Reg, Reg, Reg, i32)
+#if SHIFT == 0
+DEF_HELPER_3(glue(pavgusb, SUFFIX), void, env, Reg, Reg)
+#endif
+DEF_HELPER_4(glue(pavgw, SUFFIX), void, Reg, Reg, Reg, i32)
 
 DEF_HELPER_3(glue(pmuludq, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(pmaddwd, SUFFIX), void, env, Reg, Reg)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 79f8c1ddac..77b2e18f34 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2803,8 +2803,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xd0] = { NULL, gen_helper_addsubpd, NULL, gen_helper_addsubps },
     [0xd6] = { NULL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL },
     [0xd7] = { SSE_SPECIAL, SSE_SPECIAL }, /* pmovmskb */
-    [0xe0] = MMX_OP2(pavgb),
-    [0xe3] = MMX_OP2(pavgw),
     [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq },
     [0xe7] = { SSE_SPECIAL , SSE_SPECIAL },  /* movntq, movntq */
     [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */
@@ -2878,7 +2876,7 @@ static const SSEFunc_0_epp sse_op_table5[256] = {
     [0xb6] = gen_helper_movq, /* pfrcpit2 */
     [0xb7] = gen_helper_pmulhrw_mmx,
     [0xbb] = gen_helper_pswapd,
-    [0xbf] = gen_helper_pavgb_mmx /* pavgusb */
+    [0xbf] = gen_helper_pavgusb_mmx
 };
 
 struct SSEOpHelper_epp {
@@ -6252,14 +6250,14 @@ DEF_GEN_INSN3_HELPER_EPP(maxss, maxss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(vmaxss, maxss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(maxsd, maxsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(vmaxsd, maxsd, Vq, Hq, Wq)
-DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pavgb, pavgb_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpavgb, pavgb_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpavgb, pavgb_xmm, Vqq, Hqq, Wqq)
-DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pavgw, pavgw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpavgw, pavgw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpavgw, pavgw_xmm, Vqq, Hqq, Wqq)
+DEF_GEN_INSN3_GVEC(pavgb, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pavgb_mmx)
+DEF_GEN_INSN3_GVEC(pavgb, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgb_xmm)
+DEF_GEN_INSN3_GVEC(vpavgb, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgb_xmm)
+DEF_GEN_INSN3_GVEC(vpavgb, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgb_xmm)
+DEF_GEN_INSN3_GVEC(pavgw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pavgw_mmx)
+DEF_GEN_INSN3_GVEC(pavgw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgw_xmm)
+DEF_GEN_INSN3_GVEC(vpavgw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgw_xmm)
+DEF_GEN_INSN3_GVEC(vpavgw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgw_xmm)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpsadbw, psadbw_xmm, Vdq, Hdq, Wdq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 71/75] target/i386: convert pmuludq/pmaddwd helpers to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (68 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 70/75] target/i386: convert pavgb/pavgw " Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 72/75] target/i386: convert psadbw helper " Jan Bobek
                   ` (3 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.
---
 target/i386/ops_sse.h        | 27 +++++++++++++++++----------
 target/i386/ops_sse_header.h |  4 ++--
 target/i386/translate.c      | 18 ++++++++----------
 3 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 1661bd7c64..384a835662 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -485,22 +485,29 @@ void glue(helper_pavgw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
     glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_pmuludq, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    d->Q(0) = (uint64_t)s->L(0) * (uint64_t)d->L(0);
-#if SHIFT == 1
-    d->Q(1) = (uint64_t)s->L(2) * (uint64_t)d->L(2);
-#endif
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        const uint64_t t = (uint64_t)a->L(2 * i) * (uint64_t)b->L(2 * i);
+        d->Q(i) = t;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pmaddwd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
+void glue(helper_pmaddwd, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    int i;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
 
-    for (i = 0; i < (2 << SHIFT); i++) {
-        d->L(i) = (int16_t)s->W(2 * i) * (int16_t)d->W(2 * i) +
-            (int16_t)s->W(2 * i + 1) * (int16_t)d->W(2 * i + 1);
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        const int32_t t0 = (int32_t)a->W(2 * i + 0) * (int32_t)b->W(2 * i + 0);
+        const int32_t t1 = (int32_t)a->W(2 * i + 1) * (int32_t)b->W(2 * i + 1);
+        d->L(i) = t0 + t1;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
 #if SHIFT == 0
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index b5e8aae897..18d39ca649 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -71,8 +71,8 @@ DEF_HELPER_3(glue(pavgusb, SUFFIX), void, env, Reg, Reg)
 #endif
 DEF_HELPER_4(glue(pavgw, SUFFIX), void, Reg, Reg, Reg, i32)
 
-DEF_HELPER_3(glue(pmuludq, SUFFIX), void, env, Reg, Reg)
-DEF_HELPER_3(glue(pmaddwd, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_4(glue(pmuludq, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(pmaddwd, SUFFIX), void, Reg, Reg, Reg, i32)
 
 DEF_HELPER_3(glue(psadbw, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_4(glue(maskmov, SUFFIX), void, env, Reg, Reg, tl)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 77b2e18f34..55607db09c 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2806,8 +2806,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq },
     [0xe7] = { SSE_SPECIAL , SSE_SPECIAL },  /* movntq, movntq */
     [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */
-    [0xf4] = MMX_OP2(pmuludq),
-    [0xf5] = MMX_OP2(pmaddwd),
     [0xf6] = MMX_OP2(psadbw),
     [0xf7] = { (SSEFunc_0_epp)gen_helper_maskmov_mmx,
                (SSEFunc_0_epp)gen_helper_maskmov_xmm }, /* XXX: casts */
@@ -6129,10 +6127,10 @@ DEF_GEN_INSN3_GVEC(vpmulhuw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmulhuw
 DEF_GEN_INSN3_HELPER_EPP(pmuldq, pmuldq_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmuldq, pmuldq_xmm, Vqq, Hqq, Wqq)
-DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pmuludq, pmuludq_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmuludq, pmuludq_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmuludq, pmuludq_xmm, Vqq, Hqq, Wqq)
+DEF_GEN_INSN3_GVEC(pmuludq, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pmuludq_mmx)
+DEF_GEN_INSN3_GVEC(pmuludq, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmuludq_xmm)
+DEF_GEN_INSN3_GVEC(vpmuludq, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmuludq_xmm)
+DEF_GEN_INSN3_GVEC(vpmuludq, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmuludq_xmm)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmulhrsw, pmulhrsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmulhrsw, pmulhrsw_xmm, Vdq, Hdq, Wdq)
@@ -6147,10 +6145,10 @@ DEF_GEN_INSN3_HELPER_EPP(mulss, mulss, Vd, Vd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(vmulss, mulss, Vd, Hd, Wd)
 DEF_GEN_INSN3_HELPER_EPP(mulsd, mulsd, Vq, Vq, Wq)
 DEF_GEN_INSN3_HELPER_EPP(vmulsd, mulsd, Vq, Hq, Wq)
-DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(pmaddwd, pmaddwd_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmaddwd, pmaddwd_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpmaddwd, pmaddwd_xmm, Vqq, Hqq, Wqq)
+DEF_GEN_INSN3_GVEC(pmaddwd, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pmaddwd_mmx)
+DEF_GEN_INSN3_GVEC(pmaddwd, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmaddwd_xmm)
+DEF_GEN_INSN3_GVEC(vpmaddwd, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmaddwd_xmm)
+DEF_GEN_INSN3_GVEC(vpmaddwd, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pmaddwd_xmm)
 DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pmaddubsw, pmaddubsw_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpmaddubsw, pmaddubsw_xmm, Vdq, Hdq, Wdq)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 72/75] target/i386: convert psadbw helper to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (69 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 71/75] target/i386: convert pmuludq/pmaddwd " Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 73/75] target/i386: remove obsoleted helper_mov(l, q)_mm_T0 Jan Bobek
                   ` (2 subsequent siblings)
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 64 +++++++++++++++---------------------
 target/i386/ops_sse_header.h |  2 +-
 target/i386/translate.c      |  9 +++--
 3 files changed, 32 insertions(+), 43 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 384a835662..b866ead1c8 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -412,6 +412,15 @@ static inline int satsw(int x)
     }
 }
 
+static inline int abs1(int x)
+{
+    if (x < 0) {
+        return -x;
+    } else {
+        return x;
+    }
+}
+
 #define FMULHRW(a, b) (((int16_t)(a) * (int16_t)(b) + 0x8000) >> 16)
 #endif
 
@@ -510,52 +519,33 @@ void glue(helper_pmaddwd, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
     glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-#if SHIFT == 0
-static inline int abs1(int a)
+void glue(helper_psadbw, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    if (a < 0) {
-        return -a;
-    } else {
-        return a;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        const uint64_t t0 = abs1(a->B(8 * i + 0) - b->B(8 * i + 0));
+        const uint64_t t1 = abs1(a->B(8 * i + 1) - b->B(8 * i + 1));
+        const uint64_t t2 = abs1(a->B(8 * i + 2) - b->B(8 * i + 2));
+        const uint64_t t3 = abs1(a->B(8 * i + 3) - b->B(8 * i + 3));
+        const uint64_t t4 = abs1(a->B(8 * i + 4) - b->B(8 * i + 4));
+        const uint64_t t5 = abs1(a->B(8 * i + 5) - b->B(8 * i + 5));
+        const uint64_t t6 = abs1(a->B(8 * i + 6) - b->B(8 * i + 6));
+        const uint64_t t7 = abs1(a->B(8 * i + 7) - b->B(8 * i + 7));
+        d->Q(i) = t0 + t1 + t2 + t3 + t4 + t5 + t6 + t7;
     }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
-#endif
-void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
-{
-    unsigned int val;
 
-    val = 0;
-    val += abs1(d->B(0) - s->B(0));
-    val += abs1(d->B(1) - s->B(1));
-    val += abs1(d->B(2) - s->B(2));
-    val += abs1(d->B(3) - s->B(3));
-    val += abs1(d->B(4) - s->B(4));
-    val += abs1(d->B(5) - s->B(5));
-    val += abs1(d->B(6) - s->B(6));
-    val += abs1(d->B(7) - s->B(7));
-    d->Q(0) = val;
-#if SHIFT == 1
-    val = 0;
-    val += abs1(d->B(8) - s->B(8));
-    val += abs1(d->B(9) - s->B(9));
-    val += abs1(d->B(10) - s->B(10));
-    val += abs1(d->B(11) - s->B(11));
-    val += abs1(d->B(12) - s->B(12));
-    val += abs1(d->B(13) - s->B(13));
-    val += abs1(d->B(14) - s->B(14));
-    val += abs1(d->B(15) - s->B(15));
-    d->Q(1) = val;
-#endif
-}
-
-void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
+void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *a, Reg *b,
                                   target_ulong a0)
 {
     int i;
 
     for (i = 0; i < (8 << SHIFT); i++) {
-        if (s->B(i) & 0x80) {
-            cpu_stb_data_ra(env, a0 + i, d->B(i), GETPC());
+        if (b->B(i) & 0x80) {
+            cpu_stb_data_ra(env, a0 + i, a->B(i), GETPC());
         }
     }
 }
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 18d39ca649..ec7d1fc686 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -74,7 +74,7 @@ DEF_HELPER_4(glue(pavgw, SUFFIX), void, Reg, Reg, Reg, i32)
 DEF_HELPER_4(glue(pmuludq, SUFFIX), void, Reg, Reg, Reg, i32)
 DEF_HELPER_4(glue(pmaddwd, SUFFIX), void, Reg, Reg, Reg, i32)
 
-DEF_HELPER_3(glue(psadbw, SUFFIX), void, env, Reg, Reg)
+DEF_HELPER_4(glue(psadbw, SUFFIX), void, Reg, Reg, Reg, i32)
 DEF_HELPER_4(glue(maskmov, SUFFIX), void, env, Reg, Reg, tl)
 DEF_HELPER_2(glue(movl_mm_T0, SUFFIX), void, Reg, i32)
 #ifdef TARGET_X86_64
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 55607db09c..6bffbaee4c 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2806,7 +2806,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0xe6] = { NULL, gen_helper_cvttpd2dq, gen_helper_cvtdq2pd, gen_helper_cvtpd2dq },
     [0xe7] = { SSE_SPECIAL , SSE_SPECIAL },  /* movntq, movntq */
     [0xf0] = { NULL, NULL, NULL, SSE_SPECIAL }, /* lddqu */
-    [0xf6] = MMX_OP2(psadbw),
     [0xf7] = { (SSEFunc_0_epp)gen_helper_maskmov_mmx,
                (SSEFunc_0_epp)gen_helper_maskmov_xmm }, /* XXX: casts */
 };
@@ -6256,10 +6255,10 @@ DEF_GEN_INSN3_GVEC(pavgw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, pavgw_mmx)
 DEF_GEN_INSN3_GVEC(pavgw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgw_xmm)
 DEF_GEN_INSN3_GVEC(vpavgw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgw_xmm)
 DEF_GEN_INSN3_GVEC(vpavgw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, pavgw_xmm)
-DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_mmx, Pq, Pq, Qq)
-DEF_GEN_INSN3_HELPER_EPP(psadbw, psadbw_xmm, Vdq, Vdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsadbw, psadbw_xmm, Vdq, Hdq, Wdq)
-DEF_GEN_INSN3_HELPER_EPP(vpsadbw, psadbw_xmm, Vqq, Hqq, Wqq)
+DEF_GEN_INSN3_GVEC(psadbw, Pq, Pq, Qq, 3_ool, MM_OPRSZ, MM_MAXSZ, psadbw_mmx)
+DEF_GEN_INSN3_GVEC(psadbw, Vdq, Vdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psadbw_xmm)
+DEF_GEN_INSN3_GVEC(vpsadbw, Vdq, Hdq, Wdq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psadbw_xmm)
+DEF_GEN_INSN3_GVEC(vpsadbw, Vqq, Hqq, Wqq, 3_ool, XMM_OPRSZ, XMM_MAXSZ, psadbw_xmm)
 DEF_GEN_INSN4_HELPER_EPPI(mpsadbw, mpsadbw_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vmpsadbw, mpsadbw_xmm, Vdq, Hdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vmpsadbw, mpsadbw_xmm, Vqq, Hqq, Wqq, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 73/75] target/i386: remove obsoleted helper_mov(l, q)_mm_T0
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (70 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 72/75] target/i386: convert psadbw helper " Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 74/75] target/i386: convert pshuf(w, lw, hw, d), shuf(pd, ps) helpers to gvec style Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 75/75] target/i386: convert pmovmskb/movmskps/movmskpd " Jan Bobek
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This helper has been obsoleted by the new code.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 19 -------------------
 target/i386/ops_sse_header.h |  4 ----
 target/i386/translate.c      | 33 ---------------------------------
 3 files changed, 56 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index b866ead1c8..8172324e34 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -550,25 +550,6 @@ void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *a, Reg *b,
     }
 }
 
-void glue(helper_movl_mm_T0, SUFFIX)(Reg *d, uint32_t val)
-{
-    d->L(0) = val;
-    d->L(1) = 0;
-#if SHIFT == 1
-    d->Q(1) = 0;
-#endif
-}
-
-#ifdef TARGET_X86_64
-void glue(helper_movq_mm_T0, SUFFIX)(Reg *d, uint64_t val)
-{
-    d->Q(0) = val;
-#if SHIFT == 1
-    d->Q(1) = 0;
-#endif
-}
-#endif
-
 #if SHIFT == 0
 void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order)
 {
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index ec7d1fc686..ee8bd4c1af 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -76,10 +76,6 @@ DEF_HELPER_4(glue(pmaddwd, SUFFIX), void, Reg, Reg, Reg, i32)
 
 DEF_HELPER_4(glue(psadbw, SUFFIX), void, Reg, Reg, Reg, i32)
 DEF_HELPER_4(glue(maskmov, SUFFIX), void, env, Reg, Reg, tl)
-DEF_HELPER_2(glue(movl_mm_T0, SUFFIX), void, Reg, i32)
-#ifdef TARGET_X86_64
-DEF_HELPER_2(glue(movq_mm_T0, SUFFIX), void, Reg, i64)
-#endif
 
 #if SHIFT == 0
 DEF_HELPER_3(glue(pshufw, SUFFIX), void, Reg, Reg, int)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 6bffbaee4c..3554086336 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -3079,39 +3079,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
                 gen_op_st_v(s, MO_32, s->T0, s->A0);
             }
             break;
-        case 0x6e: /* movd mm, ea */
-#ifdef TARGET_X86_64
-            if (s->dflag == MO_64) {
-                gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0);
-                tcg_gen_st_tl(s->T0, cpu_env,
-                              offsetof(CPUX86State, fpregs[reg].mmx));
-            } else
-#endif
-            {
-                gen_ldst_modrm(env, s, modrm, MO_32, OR_TMP0, 0);
-                tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                                 offsetof(CPUX86State,fpregs[reg].mmx));
-                tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
-                gen_helper_movl_mm_T0_mmx(s->ptr0, s->tmp2_i32);
-            }
-            break;
-        case 0x16e: /* movd xmm, ea */
-#ifdef TARGET_X86_64
-            if (s->dflag == MO_64) {
-                gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0);
-                tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                                 offsetof(CPUX86State,xmm_regs[reg]));
-                gen_helper_movq_mm_T0_xmm(s->ptr0, s->T0);
-            } else
-#endif
-            {
-                gen_ldst_modrm(env, s, modrm, MO_32, OR_TMP0, 0);
-                tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                                 offsetof(CPUX86State,xmm_regs[reg]));
-                tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
-                gen_helper_movl_mm_T0_xmm(s->ptr0, s->tmp2_i32);
-            }
-            break;
         case 0x6f: /* movq mm, ea */
             if (mod != 3) {
                 gen_lea_modrm(env, s, modrm);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 74/75] target/i386: convert pshuf(w, lw, hw, d), shuf(pd, ps) helpers to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (71 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 73/75] target/i386: remove obsoleted helper_mov(l, q)_mm_T0 Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 75/75] target/i386: convert pmovmskb/movmskps/movmskpd " Jan Bobek
  73 siblings, 0 replies; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        | 141 ++++++++++++++++++++++++-----------
 target/i386/ops_sse_header.h |  12 +--
 target/i386/translate.c      |  34 ++++-----
 3 files changed, 119 insertions(+), 68 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 8172324e34..2e50d91a25 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -551,70 +551,123 @@ void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *a, Reg *b,
 }
 
 #if SHIFT == 0
-void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order)
+void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    Reg r;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+    const uint8_t ctrl = simd_data(desc);
 
-    r.W(0) = s->W(order & 3);
-    r.W(1) = s->W((order >> 2) & 3);
-    r.W(2) = s->W((order >> 4) & 3);
-    r.W(3) = s->W((order >> 6) & 3);
-    *d = r;
+    for (intptr_t i = 0; 4 * i * sizeof(uint16_t) < oprsz; ++i) {
+        const uint16_t t0 = a->W(4 * i + ((ctrl >> 0) & 3));
+        const uint16_t t1 = a->W(4 * i + ((ctrl >> 2) & 3));
+        const uint16_t t2 = a->W(4 * i + ((ctrl >> 4) & 3));
+        const uint16_t t3 = a->W(4 * i + ((ctrl >> 6) & 3));
+
+        d->W(4 * i + 0) = t0;
+        d->W(4 * i + 1) = t1;
+        d->W(4 * i + 2) = t2;
+        d->W(4 * i + 3) = t3;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 #else
-void helper_shufps(Reg *d, Reg *s, int order)
+void glue(helper_pshuflw, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    Reg r;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+    const uint8_t ctrl = simd_data(desc);
 
-    r.L(0) = d->L(order & 3);
-    r.L(1) = d->L((order >> 2) & 3);
-    r.L(2) = s->L((order >> 4) & 3);
-    r.L(3) = s->L((order >> 6) & 3);
-    *d = r;
+    for (intptr_t i = 0; 8 * i * sizeof(uint16_t) < oprsz; ++i) {
+        const uint16_t t0 = a->W(8 * i + ((ctrl >> 0) & 3));
+        const uint16_t t1 = a->W(8 * i + ((ctrl >> 2) & 3));
+        const uint16_t t2 = a->W(8 * i + ((ctrl >> 4) & 3));
+        const uint16_t t3 = a->W(8 * i + ((ctrl >> 6) & 3));
+
+        d->W(8 * i + 0) = t0;
+        d->W(8 * i + 1) = t1;
+        d->W(8 * i + 2) = t2;
+        d->W(8 * i + 3) = t3;
+        d->Q(2 * i + 1) = a->Q(2 * i + 1);
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void helper_shufpd(Reg *d, Reg *s, int order)
+void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    Reg r;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+    const uint8_t ctrl = simd_data(desc);
+
+    for (intptr_t i = 0; 8 * i * sizeof(uint16_t) < oprsz; ++i) {
+        const uint16_t t0 = a->W(8 * i + 4 + ((ctrl >> 0) & 3));
+        const uint16_t t1 = a->W(8 * i + 4 + ((ctrl >> 2) & 3));
+        const uint16_t t2 = a->W(8 * i + 4 + ((ctrl >> 4) & 3));
+        const uint16_t t3 = a->W(8 * i + 4 + ((ctrl >> 6) & 3));
 
-    r.Q(0) = d->Q(order & 1);
-    r.Q(1) = s->Q((order >> 1) & 1);
-    *d = r;
+        d->Q(2 * i + 0) = a->Q(2 * i + 0);
+        d->W(8 * i + 4) = t0;
+        d->W(8 * i + 5) = t1;
+        d->W(8 * i + 6) = t2;
+        d->W(8 * i + 7) = t3;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pshufd, SUFFIX)(Reg *d, Reg *s, int order)
+void glue(helper_pshufd, SUFFIX)(Reg *d, Reg *a, uint32_t desc)
 {
-    Reg r;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+    const uint8_t ctrl = simd_data(desc);
+
+    for (intptr_t i = 0; 4 * i * sizeof(uint32_t) < oprsz; ++i) {
+        const uint32_t t0 = a->L(4 * i + ((ctrl >> 0) & 3));
+        const uint32_t t1 = a->L(4 * i + ((ctrl >> 2) & 3));
+        const uint32_t t2 = a->L(4 * i + ((ctrl >> 4) & 3));
+        const uint32_t t3 = a->L(4 * i + ((ctrl >> 6) & 3));
+
+        d->L(4 * i + 0) = t0;
+        d->L(4 * i + 1) = t1;
+        d->L(4 * i + 2) = t2;
+        d->L(4 * i + 3) = t3;
 
-    r.L(0) = s->L(order & 3);
-    r.L(1) = s->L((order >> 2) & 3);
-    r.L(2) = s->L((order >> 4) & 3);
-    r.L(3) = s->L((order >> 6) & 3);
-    *d = r;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pshuflw, SUFFIX)(Reg *d, Reg *s, int order)
+void glue(helper_shufps, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    Reg r;
-
-    r.W(0) = s->W(order & 3);
-    r.W(1) = s->W((order >> 2) & 3);
-    r.W(2) = s->W((order >> 4) & 3);
-    r.W(3) = s->W((order >> 6) & 3);
-    r.Q(1) = s->Q(1);
-    *d = r;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+    const uint8_t ctrl = simd_data(desc);
+
+    for (intptr_t i = 0; 4 * i * sizeof(uint32_t) < oprsz; ++i) {
+        const uint32_t t0 = a->L(4 * i + ((ctrl >> 0) & 3));
+        const uint32_t t1 = a->L(4 * i + ((ctrl >> 2) & 3));
+        const uint32_t t2 = b->L(4 * i + ((ctrl >> 4) & 3));
+        const uint32_t t3 = b->L(4 * i + ((ctrl >> 6) & 3));
+
+        d->W(4 * i + 0) = t0;
+        d->W(4 * i + 1) = t1;
+        d->W(4 * i + 2) = t2;
+        d->W(4 * i + 3) = t3;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 
-void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order)
+void glue(helper_shufpd, SUFFIX)(Reg *d, Reg *a, Reg *b, uint32_t desc)
 {
-    Reg r;
-
-    r.Q(0) = s->Q(0);
-    r.W(4) = s->W(4 + (order & 3));
-    r.W(5) = s->W(4 + ((order >> 2) & 3));
-    r.W(6) = s->W(4 + ((order >> 4) & 3));
-    r.W(7) = s->W(4 + ((order >> 6) & 3));
-    *d = r;
+    const intptr_t oprsz = simd_oprsz(desc);
+    const intptr_t maxsz = simd_maxsz(desc);
+    const uint8_t ctrl = simd_data(desc);
+
+    for (intptr_t i = 0; 2 * i * sizeof(uint64_t) < oprsz; ++i) {
+        const uint64_t t0 = a->Q(2 * i + ((ctrl >> 0) & 1));
+        const uint64_t t1 = b->Q(2 * i + ((ctrl >> 1) & 1));
+
+        d->Q(2 * i + 0) = t0;
+        d->Q(2 * i + 1) = t1;
+    }
+    glue(clear_high, SUFFIX)(d, oprsz, maxsz);
 }
 #endif
 
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index ee8bd4c1af..207d41e248 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -78,13 +78,13 @@ DEF_HELPER_4(glue(psadbw, SUFFIX), void, Reg, Reg, Reg, i32)
 DEF_HELPER_4(glue(maskmov, SUFFIX), void, env, Reg, Reg, tl)
 
 #if SHIFT == 0
-DEF_HELPER_3(glue(pshufw, SUFFIX), void, Reg, Reg, int)
+DEF_HELPER_3(glue(pshufw, SUFFIX), void, Reg, Reg, i32)
 #else
-DEF_HELPER_3(shufps, void, Reg, Reg, int)
-DEF_HELPER_3(shufpd, void, Reg, Reg, int)
-DEF_HELPER_3(glue(pshufd, SUFFIX), void, Reg, Reg, int)
-DEF_HELPER_3(glue(pshuflw, SUFFIX), void, Reg, Reg, int)
-DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int)
+DEF_HELPER_3(glue(pshuflw, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_3(glue(pshufd, SUFFIX), void, Reg, Reg, i32)
+DEF_HELPER_4(glue(shufps, SUFFIX), void, Reg, Reg, Reg, i32)
+DEF_HELPER_4(glue(shufpd, SUFFIX), void, Reg, Reg, Reg, i32)
 #endif
 
 #if SHIFT == 1
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 3554086336..bb4120a848 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2763,8 +2763,6 @@ static const SSEFunc_0_epp sse_op_table1[256][4] = {
     [0x5b] = { gen_helper_cvtdq2ps, gen_helper_cvtps2dq, gen_helper_cvttps2dq },
 
     [0xc2] = SSE_FOP(cmpeq),
-    [0xc6] = { (SSEFunc_0_epp)gen_helper_shufps,
-               (SSEFunc_0_epp)gen_helper_shufpd }, /* XXX: casts */
 
     /* SSSE3, SSE4, MOVBE, CRC32, BMI1, BMI2, ADX.  */
     [0x38] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL },
@@ -6971,22 +6969,22 @@ DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_mmx, Pq, Pq, Qq)
 DEF_GEN_INSN3_HELPER_EPP(pshufb, pshufb_xmm, Vdq, Vdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpshufb, pshufb_xmm, Vdq, Hdq, Wdq)
 DEF_GEN_INSN3_HELPER_EPP(vpshufb, pshufb_xmm, Vqq, Hqq, Wqq)
-DEF_GEN_INSN3_HELPER_PPI(pshufw, pshufw_mmx, Pq, Qq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(pshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(vpshuflw, pshuflw_xmm, Vdq, Wdq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(vpshuflw, pshuflw_xmm, Vqq, Wqq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(pshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(vpshufhw, pshufhw_xmm, Vdq, Wdq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(vpshufhw, pshufhw_xmm, Vqq, Wqq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(pshufd, pshufd_xmm, Vdq, Wdq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(vpshufd, pshufd_xmm, Vdq, Wdq, Ib)
-DEF_GEN_INSN3_HELPER_PPI(vpshufd, pshufd_xmm, Vqq, Wqq, Ib)
-DEF_GEN_INSN4_HELPER_PPI(shufps, shufps, Vdq, Vdq, Wdq, Ib)
-DEF_GEN_INSN4_HELPER_PPI(vshufps, shufps, Vdq, Hdq, Wdq, Ib)
-DEF_GEN_INSN4_HELPER_PPI(vshufps, shufps, Vqq, Hqq, Wqq, Ib)
-DEF_GEN_INSN4_HELPER_PPI(shufpd, shufpd, Vdq, Vdq, Wdq, Ib)
-DEF_GEN_INSN4_HELPER_PPI(vshufpd, shufpd, Vdq, Hdq, Wdq, Ib)
-DEF_GEN_INSN4_HELPER_PPI(vshufpd, shufpd, Vqq, Hqq, Wqq, Ib)
+DEF_GEN_INSN3_GVEC(pshufw, Pq, Qq, Ib, 2i_ool, MM_OPRSZ, MM_MAXSZ, pshufw_mmx)
+DEF_GEN_INSN3_GVEC(pshuflw, Vdq, Wdq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshuflw_xmm)
+DEF_GEN_INSN3_GVEC(vpshuflw, Vdq, Wdq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshuflw_xmm)
+DEF_GEN_INSN3_GVEC(vpshuflw, Vqq, Wqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshuflw_xmm)
+DEF_GEN_INSN3_GVEC(pshufhw, Vdq, Wdq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshufhw_xmm)
+DEF_GEN_INSN3_GVEC(vpshufhw, Vdq, Wdq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshufhw_xmm)
+DEF_GEN_INSN3_GVEC(vpshufhw, Vqq, Wqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshufhw_xmm)
+DEF_GEN_INSN3_GVEC(pshufd, Vdq, Wdq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshufd_xmm)
+DEF_GEN_INSN3_GVEC(vpshufd, Vdq, Wdq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshufd_xmm)
+DEF_GEN_INSN3_GVEC(vpshufd, Vqq, Wqq, Ib, 2i_ool, XMM_OPRSZ, XMM_MAXSZ, pshufd_xmm)
+DEF_GEN_INSN4_GVEC(shufps, Vdq, Vdq, Wdq, Ib, 3i_ool, XMM_OPRSZ, XMM_MAXSZ, shufps_xmm)
+DEF_GEN_INSN4_GVEC(vshufps, Vdq, Hdq, Wdq, Ib, 3i_ool, XMM_OPRSZ, XMM_MAXSZ, shufps_xmm)
+DEF_GEN_INSN4_GVEC(vshufps, Vqq, Hqq, Wqq, Ib, 3i_ool, XMM_OPRSZ, XMM_MAXSZ, shufps_xmm)
+DEF_GEN_INSN4_GVEC(shufpd, Vdq, Vdq, Wdq, Ib, 3i_ool, XMM_OPRSZ, XMM_MAXSZ, shufpd_xmm)
+DEF_GEN_INSN4_GVEC(vshufpd, Vdq, Hdq, Wdq, Ib, 3i_ool, XMM_OPRSZ, XMM_MAXSZ, shufpd_xmm)
+DEF_GEN_INSN4_GVEC(vshufpd, Vqq, Hqq, Wqq, Ib, 3i_ool, XMM_OPRSZ, XMM_MAXSZ, shufpd_xmm)
 
 DEF_GEN_INSN4_HELPER_EPPI(blendps, blendps_xmm, Vdq, Vdq, Wdq, Ib)
 DEF_GEN_INSN4_HELPER_EPPI(vblendps, blendps_xmm, Vdq, Hdq, Wdq, Ib)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [Qemu-devel] [RFC PATCH v4 75/75] target/i386: convert pmovmskb/movmskps/movmskpd helpers to gvec style
  2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
                   ` (72 preceding siblings ...)
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 74/75] target/i386: convert pshuf(w, lw, hw, d), shuf(pd, ps) helpers to gvec style Jan Bobek
@ 2019-08-21 17:29 ` Jan Bobek
  2019-08-21 23:53   ` Richard Henderson
  73 siblings, 1 reply; 80+ messages in thread
From: Jan Bobek @ 2019-08-21 17:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Make these helpers suitable for use with tcg_gen_gvec_* functions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 target/i386/ops_sse.h        |  74 ++++++++++----------
 target/i386/ops_sse_header.h |   9 ++-
 target/i386/translate.c      | 132 ++++++-----------------------------
 3 files changed, 65 insertions(+), 150 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 2e50d91a25..82562c9473 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -1169,52 +1169,56 @@ void helper_comisd(CPUX86State *env, Reg *d, Reg *s)
     CC_SRC = comis_eflags[ret + 1];
 }
 
-uint32_t helper_movmskps(CPUX86State *env, Reg *s)
+uint32_t helper_movmskpsd(Reg *a, uint32_t desc)
 {
-    int b0, b1, b2, b3;
+    const intptr_t oprsz = simd_oprsz(desc);
 
-    b0 = s->ZMM_L(0) >> 31;
-    b1 = s->ZMM_L(1) >> 31;
-    b2 = s->ZMM_L(2) >> 31;
-    b3 = s->ZMM_L(3) >> 31;
-    return b0 | (b1 << 1) | (b2 << 2) | (b3 << 3);
+    uint32_t ret = 0;
+    for (intptr_t i = 0; i * sizeof(uint32_t) < oprsz; ++i) {
+        const uint32_t t = a->ZMM_L(i) & (1UL << 31);
+        ret |= t >> (31 - i);
+    }
+    return ret;
 }
 
-uint32_t helper_movmskpd(CPUX86State *env, Reg *s)
+uint64_t helper_movmskpsq(Reg *a, uint32_t desc)
 {
-    int b0, b1;
+    return helper_movmskpsd(a, desc);
+}
+
+uint32_t helper_movmskpdd(Reg *a, uint32_t desc)
+{
+    const intptr_t oprsz = simd_oprsz(desc);
 
-    b0 = s->ZMM_L(1) >> 31;
-    b1 = s->ZMM_L(3) >> 31;
-    return b0 | (b1 << 1);
+    uint32_t ret = 0;
+    for (intptr_t i = 0; i * sizeof(uint64_t) < oprsz; ++i) {
+        const uint64_t t = a->ZMM_Q(i) & (1ULL << 63);
+        ret |= t >> (63 - i);
+    }
+    return ret;
 }
 
+uint64_t helper_movmskpdq(Reg *a, uint32_t desc)
+{
+    return helper_movmskpdd(a, desc);
+}
 #endif
 
-uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s)
+uint32_t glue(helper_pmovmskbd, SUFFIX)(Reg *a, uint32_t desc)
 {
-    uint32_t val;
-
-    val = 0;
-    val |= (s->B(0) >> 7);
-    val |= (s->B(1) >> 6) & 0x02;
-    val |= (s->B(2) >> 5) & 0x04;
-    val |= (s->B(3) >> 4) & 0x08;
-    val |= (s->B(4) >> 3) & 0x10;
-    val |= (s->B(5) >> 2) & 0x20;
-    val |= (s->B(6) >> 1) & 0x40;
-    val |= (s->B(7)) & 0x80;
-#if SHIFT == 1
-    val |= (s->B(8) << 1) & 0x0100;
-    val |= (s->B(9) << 2) & 0x0200;
-    val |= (s->B(10) << 3) & 0x0400;
-    val |= (s->B(11) << 4) & 0x0800;
-    val |= (s->B(12) << 5) & 0x1000;
-    val |= (s->B(13) << 6) & 0x2000;
-    val |= (s->B(14) << 7) & 0x4000;
-    val |= (s->B(15) << 8) & 0x8000;
-#endif
-    return val;
+    const intptr_t oprsz = simd_oprsz(desc);
+
+    uint32_t ret = 0;
+    for (intptr_t i = 0; i * sizeof(uint8_t) < oprsz; ++i) {
+        const uint8_t t = a->B(i) & (1 << 7);
+        ret |= i < 8 ? t >> (7 - i) : t << (i - 7);
+    }
+    return ret;
+}
+
+uint64_t glue(helper_pmovmskbq, SUFFIX)(Reg *a, uint32_t desc)
+{
+    return glue(helper_pmovmskbd, SUFFIX)(a, desc);
 }
 
 void glue(helper_packsswb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 207d41e248..59ac1f28e3 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -178,11 +178,14 @@ DEF_HELPER_3(ucomiss, void, env, Reg, Reg)
 DEF_HELPER_3(comiss, void, env, Reg, Reg)
 DEF_HELPER_3(ucomisd, void, env, Reg, Reg)
 DEF_HELPER_3(comisd, void, env, Reg, Reg)
-DEF_HELPER_2(movmskps, i32, env, Reg)
-DEF_HELPER_2(movmskpd, i32, env, Reg)
+DEF_HELPER_2(movmskpsd, i32, Reg, i32)
+DEF_HELPER_2(movmskpsq, i64, Reg, i32)
+DEF_HELPER_2(movmskpdd, i32, Reg, i32)
+DEF_HELPER_2(movmskpdq, i64, Reg, i32)
 #endif
 
-DEF_HELPER_2(glue(pmovmskb, SUFFIX), i32, env, Reg)
+DEF_HELPER_2(glue(pmovmskbd, SUFFIX), i32, Reg, i32)
+DEF_HELPER_2(glue(pmovmskbq, SUFFIX), i64, Reg, i32)
 DEF_HELPER_3(glue(packsswb, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(packuswb, SUFFIX), void, env, Reg, Reg)
 DEF_HELPER_3(glue(packssdw, SUFFIX), void, env, Reg, Reg)
diff --git a/target/i386/translate.c b/target/i386/translate.c
index bb4120a848..8f891b6e47 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -3339,20 +3339,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
                 goto illegal_op;
             }
             break;
-        case 0x050: /* movmskps */
-            rm = (modrm & 7) | REX_B(s);
-            tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                             offsetof(CPUX86State,xmm_regs[rm]));
-            gen_helper_movmskps(s->tmp2_i32, cpu_env, s->ptr0);
-            tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32);
-            break;
-        case 0x150: /* movmskpd */
-            rm = (modrm & 7) | REX_B(s);
-            tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                             offsetof(CPUX86State,xmm_regs[rm]));
-            gen_helper_movmskpd(s->tmp2_i32, cpu_env, s->ptr0);
-            tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32);
-            break;
         case 0x02a: /* cvtpi2ps */
         case 0x12a: /* cvtpi2pd */
             gen_helper_enter_mmx(cpu_env);
@@ -3524,24 +3510,6 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b)
             gen_op_movq(s, offsetof(CPUX86State, fpregs[reg & 7].mmx),
                         offsetof(CPUX86State,xmm_regs[rm].ZMM_Q(0)));
             break;
-        case 0xd7: /* pmovmskb */
-        case 0x1d7:
-            if (mod != 3)
-                goto illegal_op;
-            if (b1) {
-                rm = (modrm & 7) | REX_B(s);
-                tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                                 offsetof(CPUX86State, xmm_regs[rm]));
-                gen_helper_pmovmskb_xmm(s->tmp2_i32, cpu_env, s->ptr0);
-            } else {
-                rm = (modrm & 7);
-                tcg_gen_addi_ptr(s->ptr0, cpu_env,
-                                 offsetof(CPUX86State, fpregs[rm].mmx));
-                gen_helper_pmovmskb_mmx(s->tmp2_i32, cpu_env, s->ptr0);
-            }
-            reg = ((modrm >> 3) & 7) | REX_R(s);
-            tcg_gen_extu_i32_tl(cpu_regs[reg], s->tmp2_i32);
-            break;
 
         case 0x138:
         case 0x038:
@@ -5773,88 +5741,28 @@ GEN_INSN2(vmovhpd, Mq, Vdq)
     gen_insn2(movhpd, Mq, Vdq)(env, s, arg1, arg2);
 }
 
-DEF_GEN_INSN2_HELPER_DEP(pmovmskb, pmovmskb_mmx, Gd, Nq)
-GEN_INSN2(pmovmskb, Gq, Nq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(pmovmskb, Gd, Nq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(pmovmskb, pmovmskb_xmm, Gd, Udq)
-GEN_INSN2(pmovmskb, Gq, Udq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(pmovmskb, Gd, Udq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(vpmovmskb, pmovmskb_xmm, Gd, Udq)
-GEN_INSN2(vpmovmskb, Gq, Udq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(vpmovmskb, Gd, Udq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(vpmovmskb, pmovmskb_xmm, Gd, Uqq)
-GEN_INSN2(vpmovmskb, Gq, Uqq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(vpmovmskb, Gd, Uqq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
+DEF_GEN_INSN2_GVEC(pmovmskb, Gd, Nq, sd1_ool, MM_OPRSZ, MM_MAXSZ, pmovmskbd_mmx)
+DEF_GEN_INSN2_GVEC(pmovmskb, Gq, Nq, sq1_ool, MM_OPRSZ, MM_MAXSZ, pmovmskbq_mmx)
+DEF_GEN_INSN2_GVEC(pmovmskb, Gd, Udq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbd_xmm)
+DEF_GEN_INSN2_GVEC(pmovmskb, Gq, Udq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbq_xmm)
+DEF_GEN_INSN2_GVEC(vpmovmskb, Gd, Udq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbd_xmm)
+DEF_GEN_INSN2_GVEC(vpmovmskb, Gq, Udq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbq_xmm)
+DEF_GEN_INSN2_GVEC(vpmovmskb, Gd, Uqq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbd_xmm)
+DEF_GEN_INSN2_GVEC(vpmovmskb, Gq, Uqq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbq_xmm)
 
-DEF_GEN_INSN2_HELPER_DEP(movmskps, movmskps, Gd, Udq)
-GEN_INSN2(movmskps, Gq, Udq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(movmskps, Gd, Udq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(vmovmskps, movmskps, Gd, Udq)
-GEN_INSN2(vmovmskps, Gq, Udq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(vmovmskps, Gd, Udq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(vmovmskps, movmskps, Gd, Uqq)
-GEN_INSN2(vmovmskps, Gq, Uqq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(vmovmskps, Gd, Uqq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
+DEF_GEN_INSN2_GVEC(movmskps, Gd, Udq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpsd)
+DEF_GEN_INSN2_GVEC(movmskps, Gq, Udq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpsq)
+DEF_GEN_INSN2_GVEC(vmovmskps, Gd, Udq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpsd)
+DEF_GEN_INSN2_GVEC(vmovmskps, Gq, Udq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpsq)
+DEF_GEN_INSN2_GVEC(vmovmskps, Gd, Uqq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpsd)
+DEF_GEN_INSN2_GVEC(vmovmskps, Gq, Uqq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpsq)
 
-DEF_GEN_INSN2_HELPER_DEP(movmskpd, movmskpd, Gd, Udq)
-GEN_INSN2(movmskpd, Gq, Udq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(movmskpd, Gd, Udq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(vmovmskpd, movmskpd, Gd, Udq)
-GEN_INSN2(vmovmskpd, Gq, Udq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(vmovmskpd, Gd, Udq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
-DEF_GEN_INSN2_HELPER_DEP(vmovmskpd, movmskpd, Gd, Uqq)
-GEN_INSN2(vmovmskpd, Gq, Uqq)
-{
-    const TCGv_i32 arg1_r32 = tcg_temp_new_i32();
-    gen_insn2(vmovmskpd, Gd, Uqq)(env, s, arg1_r32, arg2);
-    tcg_gen_extu_i32_i64(arg1, arg1_r32);
-    tcg_temp_free_i32(arg1_r32);
-}
+DEF_GEN_INSN2_GVEC(movmskpd, Gd, Udq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpdd)
+DEF_GEN_INSN2_GVEC(movmskpd, Gq, Udq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpdq)
+DEF_GEN_INSN2_GVEC(vmovmskpd, Gd, Udq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpdd)
+DEF_GEN_INSN2_GVEC(vmovmskpd, Gq, Udq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpdq)
+DEF_GEN_INSN2_GVEC(vmovmskpd, Gd, Uqq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpdd)
+DEF_GEN_INSN2_GVEC(vmovmskpd, Gq, Uqq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, movmskpdq)
 
 GEN_INSN2(lddqu, Vdq, Mdq)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 75/75] target/i386: convert pmovmskb/movmskps/movmskpd helpers to gvec style
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 75/75] target/i386: convert pmovmskb/movmskps/movmskpd " Jan Bobek
@ 2019-08-21 23:53   ` Richard Henderson
  0 siblings, 0 replies; 80+ messages in thread
From: Richard Henderson @ 2019-08-21 23:53 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 8/21/19 10:29 AM, Jan Bobek wrote:
> +    for (intptr_t i = 0; i * sizeof(uint8_t) < oprsz; ++i) {
> +        const uint8_t t = a->B(i) & (1 << 7);
> +        ret |= i < 8 ? t >> (7 - i) : t << (i - 7);

You can avoid this variable shift by doing

  uint32_t t = a->B(i) >> 7;
  ret |= t << i;

> +uint64_t glue(helper_pmovmskbq, SUFFIX)(Reg *a, uint32_t desc)
> +{
> +    return glue(helper_pmovmskbd, SUFFIX)(a, desc);
>  }
...
> +DEF_GEN_INSN2_GVEC(vpmovmskb, Gd, Uqq, sd1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbd_xmm)
> +DEF_GEN_INSN2_GVEC(vpmovmskb, Gq, Uqq, sq1_ool, XMM_OPRSZ, XMM_MAXSZ, pmovmskbq_xmm)

What is the difference between these two?

Given that we aren't attempting avx512, uint32_t is sufficient for all of the
bytes of a YMM register.

I have a feeling that some of this should simply use target_ulong, so that a
direct assignment to the general register can be done without extra extensions
within the generated code.


r~


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 64/75] target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 64/75] target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-22  3:54   ` Aleksandar Markovic
  0 siblings, 0 replies; 80+ messages in thread
From: Aleksandar Markovic @ 2019-08-22  3:54 UTC (permalink / raw)
  To: Jan Bobek; +Cc: Alex Bennée, Richard Henderson, qemu-devel

21.08.2019. 20.49, "Jan Bobek" <jan.bobek@gmail.com> је написао/ла:
>
> Add all the AVX2 vector instruction entries to sse-opcode.inc.h.
>

Why is AVX-related code inserted in a file whose name says SSE? Perhaps the
file should be named vector-opcode.inc.h?

Also, some vector extensions contain non-vector instructions. Even if you
don't intend to implement emulation of such instructions, you should
mention them in places like this file, since this file mainly deals with
decoding, which is known to you. Without it, you leave the file unfinished
without a good reason.

Thanks,
Aleksandar

> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> ---
>  target/i386/sse-opcode.inc.h | 362 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 359 insertions(+), 3 deletions(-)
>
> diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
> index c3c0ec4f89..abbb0a15d7 100644
> --- a/target/i386/sse-opcode.inc.h
> +++ b/target/i386/sse-opcode.inc.h
> @@ -855,6 +855,181 @@
>   * VEX.128.66.0F.WIG 73 /3 ib      VPSRLDQ xmm1, xmm2, imm8
>   * VEX.LZ.0F.WIG AE /2             VLDMXCSR m32
>   * VEX.LZ.0F.WIG AE /3             VSTMXCSR m32
> + *
> + * AVX2 Instructions
> + * ------------------
> + * VEX.256.66.0F.W0 D7 /r          VPMOVMSKB r32, ymm1
> + * VEX.256.66.0F.W1 D7 /r          VPMOVMSKB r64, ymm1
> + * VEX.256.66.0F.WIG FC /r         VPADDB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG FD /r         VPADDW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG FE /r         VPADDD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG D4 /r         VPADDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG EC /r         VPADDSB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG ED /r         VPADDSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG DC /r         VPADDUSB ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F.WIG DD /r         VPADDUSW ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F38.WIG 01 /r       VPHADDW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 02 /r       VPHADDD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 03 /r       VPHADDSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG F8 /r         VPSUBB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG F9 /r         VPSUBW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG FA /r         VPSUBD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG FB /r         VPSUBQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E8 /r         VPSUBSB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E9 /r         VPSUBSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG D8 /r         VPSUBUSB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG D9 /r         VPSUBUSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 05 /r       VPHSUBW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 06 /r       VPHSUBD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 07 /r       VPHSUBSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG D5 /r         VPMULLW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 40 /r       VPMULLD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E5 /r         VPMULHW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E4 /r         VPMULHUW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 28 /r       VPMULDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG F4 /r         VPMULUDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 0B /r       VPMULHRSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG F5 /r         VPMADDWD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 04 /r       VPMADDUBSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F DA /r             VPMINUB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38 3A /r           VPMINUW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 3B /r       VPMINUD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38 38 /r           VPMINSB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F EA /r             VPMINSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 39 /r       VPMINSD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F DE /r             VPMAXUB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38 3E /r           VPMAXUW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 3F /r       VPMAXUD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 3C /r       VPMAXSB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG EE /r         VPMAXSW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 3D /r       VPMAXSD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E0 /r         VPAVGB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E3 /r         VPAVGW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG F6 /r         VPSADBW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F3A.WIG 42 /r ib    VMPSADBW ymm1, ymm2, ymm3/m256, imm8
> + * VEX.256.66.0F38.WIG 1C /r       VPABSB ymm1, ymm2/m256
> + * VEX.256.66.0F38.WIG 1D /r       VPABSW ymm1, ymm2/m256
> + * VEX.256.66.0F38.WIG 1E /r       VPABSD ymm1, ymm2/m256
> + * VEX.256.66.0F38.WIG 08 /r       VPSIGNB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 09 /r       VPSIGNW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 0A /r       VPSIGND ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 74 /r         VPCMPEQB ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F.WIG 75 /r         VPCMPEQW ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F.WIG 76 /r         VPCMPEQD ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F38.WIG 29 /r       VPCMPEQQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 64 /r         VPCMPGTB ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F.WIG 65 /r         VPCMPGTW ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F.WIG 66 /r         VPCMPGTD ymm1,ymm2,ymm3/m256
> + * VEX.256.66.0F38.WIG 37 /r       VPCMPGTQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG DB /r         VPAND ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG DF /r         VPANDN ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG EB /r         VPOR ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG EF /r         VPXOR ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG F1 /r         VPSLLW ymm1, ymm2, xmm3/m128
> + * VEX.256.66.0F.WIG F2 /r         VPSLLD ymm1, ymm2, xmm3/m128
> + * VEX.256.66.0F.WIG F3 /r         VPSLLQ ymm1, ymm2, xmm3/m128
> + * VEX.128.66.0F38.W0 47 /r        VPSLLVD xmm1, xmm2, xmm3/m128
> + * VEX.256.66.0F38.W0 47 /r        VPSLLVD ymm1, ymm2, ymm3/m256
> + * VEX.128.66.0F38.W1 47 /r        VPSLLVQ xmm1, xmm2, xmm3/m128
> + * VEX.256.66.0F38.W1 47 /r        VPSLLVQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG D1 /r         VPSRLW ymm1, ymm2, xmm3/m128
> + * VEX.256.66.0F.WIG D2 /r         VPSRLD ymm1, ymm2, xmm3/m128
> + * VEX.256.66.0F.WIG D3 /r         VPSRLQ ymm1, ymm2, xmm3/m128
> + * VEX.128.66.0F38.W0 45 /r        VPSRLVD xmm1, xmm2, xmm3/m128
> + * VEX.256.66.0F38.W0 45 /r        VPSRLVD ymm1, ymm2, ymm3/m256
> + * VEX.128.66.0F38.W1 45 /r        VPSRLVQ xmm1, xmm2, xmm3/m128
> + * VEX.256.66.0F38.W1 45 /r        VPSRLVQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG E1 /r         VPSRAW ymm1,ymm2,xmm3/m128
> + * VEX.256.66.0F.WIG E2 /r         VPSRAD ymm1,ymm2,xmm3/m128
> + * VEX.128.66.0F38.W0 46 /r        VPSRAVD xmm1, xmm2, xmm3/m128
> + * VEX.256.66.0F38.W0 46 /r        VPSRAVD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F3A.WIG 0F /r ib    VPALIGNR ymm1, ymm2, ymm3/m256, imm8
> + * VEX.256.66.0F.WIG 63 /r         VPACKSSWB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 6B /r         VPACKSSDW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 67 /r         VPACKUSWB ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38 2B /r           VPACKUSDW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 68 /r         VPUNPCKHBW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 69 /r         VPUNPCKHWD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 6A /r         VPUNPCKHDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 6D /r         VPUNPCKHQDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 60 /r         VPUNPCKLBW ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 61 /r         VPUNPCKLWD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 62 /r         VPUNPCKLDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F.WIG 6C /r         VPUNPCKLQDQ ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.WIG 00 /r       VPSHUFB ymm1, ymm2, ymm3/m256
> + * VEX.256.F2.0F.WIG 70 /r ib      VPSHUFLW ymm1, ymm2/m256, imm8
> + * VEX.256.F3.0F.WIG 70 /r ib      VPSHUFHW ymm1, ymm2/m256, imm8
> + * VEX.256.66.0F.WIG 70 /r ib      VPSHUFD ymm1, ymm2/m256, imm8
> + * VEX.256.66.0F3A.W0 4C /r /is4   VPBLENDVB ymm1, ymm2, ymm3/m256, ymm4
> + * VEX.256.66.0F3A.WIG 0E /r ib    VPBLENDW ymm1, ymm2, ymm3/m256, imm8
> + * VEX.128.66.0F3A.W0 02 /r ib     VPBLENDD xmm1, xmm2, xmm3/m128, imm8
> + * VEX.256.66.0F3A.W0 02 /r ib     VPBLENDD ymm1, ymm2, ymm3/m256, imm8
> + * VEX.256.66.0F3A.W0 38 /r ib     VINSERTI128 ymm1, ymm2, xmm3/m128,
imm8
> + * VEX.256.66.0F3A.W0 39 /r ib     VEXTRACTI128 xmm1/m128, ymm2, imm8
> + * VEX.128.66.0F38.W0 78 /r        VPBROADCASTB xmm1,xmm2/m8
> + * VEX.256.66.0F38.W0 78 /r        VPBROADCASTB ymm1,xmm2/m8
> + * VEX.128.66.0F38.W0 79 /r        VPBROADCASTW xmm1,xmm2/m16
> + * VEX.256.66.0F38.W0 79 /r        VPBROADCASTW ymm1,xmm2/m16
> + * VEX.128.66.0F38.W0 58 /r        VPBROADCASTD xmm1,xmm2/m32
> + * VEX.256.66.0F38.W0 58 /r        VPBROADCASTD ymm1,xmm2/m32
> + * VEX.128.66.0F38.W0 59 /r        VPBROADCASTQ xmm1,xmm2/m64
> + * VEX.256.66.0F38.W0 59 /r        VPBROADCASTQ ymm1,xmm2/m64
> + * VEX.128.66.0F38.W0 18 /r        VBROADCASTSS xmm1, xmm2
> + * VEX.256.66.0F38.W0 18 /r        VBROADCASTSS ymm1, xmm2
> + * VEX.256.66.0F38.W0 19 /r        VBROADCASTSD ymm1, xmm2
> + * VEX.256.66.0F38.W0 5A /r        VBROADCASTI128 ymm1,m128
> + * VEX.256.66.0F3A.W0 46 /r ib     VPERM2I128 ymm1, ymm2, ymm3/m256, imm8
> + * VEX.256.66.0F38.W0 36 /r        VPERMD ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F38.W0 16 /r        VPERMPS ymm1, ymm2, ymm3/m256
> + * VEX.256.66.0F3A.W1 00 /r ib     VPERMQ ymm1, ymm2/m256, imm8
> + * VEX.256.66.0F3A.W1 01 /r ib     VPERMPD ymm1, ymm2/m256, imm8
> + * VEX.128.66.0F38.W0 92 /r        VGATHERDPS xmm1, vm32x, xmm2
> + * VEX.256.66.0F38.W0 92 /r        VGATHERDPS ymm1, vm32y, ymm2
> + * VEX.128.66.0F38.W1 92 /r        VGATHERDPD xmm1, vm32x, xmm2
> + * VEX.256.66.0F38.W1 92 /r        VGATHERDPD ymm1, vm32x, ymm2
> + * VEX.128.66.0F38.W0 93 /r        VGATHERQPS xmm1, vm64x, xmm2
> + * VEX.256.66.0F38.W0 93 /r        VGATHERQPS xmm1, vm64y, xmm2
> + * VEX.128.66.0F38.W1 93 /r        VGATHERQPD xmm1, vm64x, xmm2
> + * VEX.256.66.0F38.W1 93 /r        VGATHERQPD ymm1, vm64y, ymm2
> + * VEX.128.66.0F38.W0 90 /r        VPGATHERDD xmm1, vm32x, xmm2
> + * VEX.256.66.0F38.W0 90 /r        VPGATHERDD ymm1, vm32y, ymm2
> + * VEX.128.66.0F38.W1 90 /r        VPGATHERDQ xmm1, vm32x, xmm2
> + * VEX.256.66.0F38.W1 90 /r        VPGATHERDQ ymm1, vm32x, ymm2
> + * VEX.128.66.0F38.W0 91 /r        VPGATHERQD xmm1, vm64x, xmm2
> + * VEX.256.66.0F38.W0 91 /r        VPGATHERQD xmm1, vm64y, xmm2
> + * VEX.128.66.0F38.W1 91 /r        VPGATHERQQ xmm1, vm64x, xmm2
> + * VEX.256.66.0F38.W1 91 /r        VPGATHERQQ ymm1, vm64y, ymm2
> + * VEX.256.66.0F38.WIG 20 /r       VPMOVSXBW ymm1, xmm2/m128
> + * VEX.256.66.0F38.WIG 21 /r       VPMOVSXBD ymm1, xmm2/m64
> + * VEX.256.66.0F38.WIG 22 /r       VPMOVSXBQ ymm1, xmm2/m32
> + * VEX.256.66.0F38.WIG 23 /r       VPMOVSXWD ymm1, xmm2/m128
> + * VEX.256.66.0F38.WIG 24 /r       VPMOVSXWQ ymm1, xmm2/m64
> + * VEX.256.66.0F38.WIG 25 /r       VPMOVSXDQ ymm1, xmm2/m128
> + * VEX.256.66.0F38.WIG 30 /r       VPMOVZXBW ymm1, xmm2/m128
> + * VEX.256.66.0F38.WIG 31 /r       VPMOVZXBD ymm1, xmm2/m64
> + * VEX.256.66.0F38.WIG 32 /r       VPMOVZXBQ ymm1, xmm2/m32
> + * VEX.256.66.0F38.WIG 33 /r       VPMOVZXWD ymm1, xmm2/m128
> + * VEX.256.66.0F38.WIG 34 /r       VPMOVZXWQ ymm1, xmm2/m64
> + * VEX.256.66.0F38.WIG 35 /r       VPMOVZXDQ ymm1, xmm2/m128
> + * VEX.128.66.0F38.W0 8C /r        VPMASKMOVD xmm1, xmm2, m128
> + * VEX.128.66.0F38.W0 8E /r        VPMASKMOVD m128, xmm1, xmm2
> + * VEX.256.66.0F38.W0 8C /r        VPMASKMOVD ymm1, ymm2, m256
> + * VEX.256.66.0F38.W0 8E /r        VPMASKMOVD m256, ymm1, ymm2
> + * VEX.128.66.0F38.W1 8C /r        VPMASKMOVQ xmm1, xmm2, m128
> + * VEX.128.66.0F38.W1 8E /r        VPMASKMOVQ m128, xmm1, xmm2
> + * VEX.256.66.0F38.W1 8C /r        VPMASKMOVQ ymm1, ymm2, m256
> + * VEX.256.66.0F38.W1 8E /r        VPMASKMOVQ m256, ymm1, ymm2
> + * VEX.256.66.0F38.WIG 2A /r       VMOVNTDQA ymm1, m256
> + * VEX.256.66.0F.WIG 71 /6 ib      VPSLLW ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 71 /2 ib      VPSRLW ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 71 /4 ib      VPSRAW ymm1,ymm2,imm8
> + * VEX.256.66.0F.WIG 72 /6 ib      VPSLLD ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 72 /2 ib      VPSRLD ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 72 /4 ib      VPSRAD ymm1,ymm2,imm8
> + * VEX.256.66.0F.WIG 73 /6 ib      VPSLLQ ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 73 /7 ib      VPSLLDQ ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 73 /2 ib      VPSRLQ ymm1, ymm2, imm8
> + * VEX.256.66.0F.WIG 73 /3 ib      VPSRLDQ ymm1, ymm2, imm8
>   */
>
>  OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
> @@ -943,6 +1118,8 @@ OPCODE(pmovmskb, LEG(66, 0F, 0, 0xd7), SSE2, WR, Gd,
Udq)
>  OPCODE(pmovmskb, LEG(66, 0F, 1, 0xd7), SSE2, WR, Gq, Udq)
>  OPCODE(vpmovmskb, VEX(128, 66, 0F, 0, 0xd7), AVX, WR, Gd, Udq)
>  OPCODE(vpmovmskb, VEX(128, 66, 0F, 1, 0xd7), AVX, WR, Gq, Udq)
> +OPCODE(vpmovmskb, VEX(256, 66, 0F, 0, 0xd7), AVX2, WR, Gd, Uqq)
> +OPCODE(vpmovmskb, VEX(256, 66, 0F, 1, 0xd7), AVX2, WR, Gq, Uqq)
>  OPCODE(movmskps, LEG(NP, 0F, 0, 0x50), SSE, WR, Gd, Udq)
>  OPCODE(movmskps, LEG(NP, 0F, 1, 0x50), SSE, WR, Gq, Udq)
>  OPCODE(vmovmskps, VEX(128, NP, 0F, 0, 0x50), AVX, WR, Gd, Udq)
> @@ -970,27 +1147,35 @@ OPCODE(vmovddup, VEX(256, F2, 0F, IG, 0x12), AVX,
WR, Vqq, Wqq)
>  OPCODE(paddb, LEG(NP, 0F, 0, 0xfc), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddb, LEG(66, 0F, 0, 0xfc), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddb, VEX(128, 66, 0F, IG, 0xfc), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddb, VEX(256, 66, 0F, IG, 0xfc), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddw, LEG(NP, 0F, 0, 0xfd), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddw, LEG(66, 0F, 0, 0xfd), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddw, VEX(128, 66, 0F, IG, 0xfd), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddw, VEX(256, 66, 0F, IG, 0xfd), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddd, LEG(NP, 0F, 0, 0xfe), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddd, LEG(66, 0F, 0, 0xfe), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddd, VEX(128, 66, 0F, IG, 0xfe), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddd, VEX(256, 66, 0F, IG, 0xfe), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddq, LEG(NP, 0F, 0, 0xd4), SSE2, WRR, Pq, Pq, Qq)
>  OPCODE(paddq, LEG(66, 0F, 0, 0xd4), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddq, VEX(128, 66, 0F, IG, 0xd4), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddq, VEX(256, 66, 0F, IG, 0xd4), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddsb, LEG(NP, 0F, 0, 0xec), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddsb, LEG(66, 0F, 0, 0xec), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddsb, VEX(128, 66, 0F, IG, 0xec), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddsb, VEX(256, 66, 0F, IG, 0xec), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddsw, LEG(NP, 0F, 0, 0xed), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddsw, LEG(66, 0F, 0, 0xed), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddsw, VEX(128, 66, 0F, IG, 0xed), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddsw, VEX(256, 66, 0F, IG, 0xed), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddusb, LEG(NP, 0F, 0, 0xdc), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddusb, LEG(66, 0F, 0, 0xdc), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddusb, VEX(128, 66, 0F, IG, 0xdc), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddusb, VEX(256, 66, 0F, IG, 0xdc), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(paddusw, LEG(NP, 0F, 0, 0xdd), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(paddusw, LEG(66, 0F, 0, 0xdd), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpaddusw, VEX(128, 66, 0F, IG, 0xdd), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpaddusw, VEX(256, 66, 0F, IG, 0xdd), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(addps, LEG(NP, 0F, 0, 0x58), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vaddps, VEX(128, NP, 0F, IG, 0x58), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vaddps, VEX(256, NP, 0F, IG, 0x58), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1004,12 +1189,15 @@ OPCODE(vaddsd, VEX(IG, F2, 0F, IG, 0x58), AVX,
WRR, Vq, Hq, Wq)
>  OPCODE(phaddw, LEG(NP, 0F38, 0, 0x01), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(phaddw, LEG(66, 0F38, 0, 0x01), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vphaddw, VEX(128, 66, 0F38, IG, 0x01), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vphaddw, VEX(256, 66, 0F38, IG, 0x01), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(phaddd, LEG(NP, 0F38, 0, 0x02), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(phaddd, LEG(66, 0F38, 0, 0x02), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vphaddd, VEX(128, 66, 0F38, IG, 0x02), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vphaddd, VEX(256, 66, 0F38, IG, 0x02), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(phaddsw, LEG(NP, 0F38, 0, 0x03), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(phaddsw, LEG(66, 0F38, 0, 0x03), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vphaddsw, VEX(128, 66, 0F38, IG, 0x03), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vphaddsw, VEX(256, 66, 0F38, IG, 0x03), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(haddps, LEG(F2, 0F, 0, 0x7c), SSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vhaddps, VEX(128, F2, 0F, IG, 0x7c), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vhaddps, VEX(256, F2, 0F, IG, 0x7c), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1019,27 +1207,35 @@ OPCODE(vhaddpd, VEX(256, 66, 0F, IG, 0x7c), AVX,
WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubb, LEG(NP, 0F, 0, 0xf8), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubb, LEG(66, 0F, 0, 0xf8), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubb, VEX(128, 66, 0F, IG, 0xf8), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubb, VEX(256, 66, 0F, IG, 0xf8), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubw, LEG(NP, 0F, 0, 0xf9), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubw, LEG(66, 0F, 0, 0xf9), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubw, VEX(128, 66, 0F, IG, 0xf9), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubw, VEX(256, 66, 0F, IG, 0xf9), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubd, LEG(NP, 0F, 0, 0xfa), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubd, LEG(66, 0F, 0, 0xfa), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubd, VEX(128, 66, 0F, IG, 0xfa), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubd, VEX(256, 66, 0F, IG, 0xfa), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubq, LEG(NP, 0F, 0, 0xfb), SSE2, WRR, Pq, Pq, Qq)
>  OPCODE(psubq, LEG(66, 0F, 0, 0xfb), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubq, VEX(128, 66, 0F, IG, 0xfb), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubq, VEX(256, 66, 0F, IG, 0xfb), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubsb, LEG(NP, 0F, 0, 0xe8), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubsb, LEG(66, 0F, 0, 0xe8), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubsb, VEX(128, 66, 0F, IG, 0xe8), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubsb, VEX(256, 66, 0F, IG, 0xe8), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubsw, LEG(NP, 0F, 0, 0xe9), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubsw, LEG(66, 0F, 0, 0xe9), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubsw, VEX(128, 66, 0F, IG, 0xe9), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubsw, VEX(256, 66, 0F, IG, 0xe9), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubusb, LEG(NP, 0F, 0, 0xd8), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubusb, LEG(66, 0F, 0, 0xd8), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubusb, VEX(128, 66, 0F, IG, 0xd8), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubusb, VEX(256, 66, 0F, IG, 0xd8), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psubusw, LEG(NP, 0F, 0, 0xd9), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psubusw, LEG(66, 0F, 0, 0xd9), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsubusw, VEX(128, 66, 0F, IG, 0xd9), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsubusw, VEX(256, 66, 0F, IG, 0xd9), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(subps, LEG(NP, 0F, 0, 0x5c), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vsubps, VEX(128, NP, 0F, IG, 0x5c), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vsubps, VEX(256, NP, 0F, IG, 0x5c), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1053,12 +1249,15 @@ OPCODE(vsubsd, VEX(IG, F2, 0F, IG, 0x5c), AVX,
WRR, Vq, Hq, Wq)
>  OPCODE(phsubw, LEG(NP, 0F38, 0, 0x05), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(phsubw, LEG(66, 0F38, 0, 0x05), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vphsubw, VEX(128, 66, 0F38, IG, 0x05), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vphsubw, VEX(256, 66, 0F38, IG, 0x05), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(phsubd, LEG(NP, 0F38, 0, 0x06), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(phsubd, LEG(66, 0F38, 0, 0x06), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vphsubd, VEX(128, 66, 0F38, IG, 0x06), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vphsubd, VEX(256, 66, 0F38, IG, 0x06), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(phsubsw, LEG(NP, 0F38, 0, 0x07), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(phsubsw, LEG(66, 0F38, 0, 0x07), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vphsubsw, VEX(128, 66, 0F38, IG, 0x07), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vphsubsw, VEX(256, 66, 0F38, IG, 0x07), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(hsubps, LEG(F2, 0F, 0, 0x7d), SSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vhsubps, VEX(128, F2, 0F, IG, 0x7d), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vhsubps, VEX(256, F2, 0F, IG, 0x7d), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1074,22 +1273,29 @@ OPCODE(vaddsubpd, VEX(256, 66, 0F, IG, 0xd0),
AVX, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmullw, LEG(NP, 0F, 0, 0xd5), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pmullw, LEG(66, 0F, 0, 0xd5), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmullw, VEX(128, 66, 0F, IG, 0xd5), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmullw, VEX(256, 66, 0F, IG, 0xd5), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmulld, LEG(66, 0F38, 0, 0x40), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmulld, VEX(128, 66, 0F38, IG, 0x40), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmulld, VEX(256, 66, 0F38, IG, 0x40), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmulhw, LEG(NP, 0F, 0, 0xe5), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pmulhw, LEG(66, 0F, 0, 0xe5), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmulhw, VEX(128, 66, 0F, IG, 0xe5), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmulhw, VEX(256, 66, 0F, IG, 0xe5), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmulhuw, LEG(NP, 0F, 0, 0xe4), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pmulhuw, LEG(66, 0F, 0, 0xe4), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmulhuw, VEX(128, 66, 0F, IG, 0xe4), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmulhuw, VEX(256, 66, 0F, IG, 0xe4), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmuldq, LEG(66, 0F38, 0, 0x28), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmuldq, VEX(128, 66, 0F38, IG, 0x28), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmuldq, VEX(256, 66, 0F38, IG, 0x28), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmuludq, LEG(NP, 0F, 0, 0xf4), SSE2, WRR, Pq, Pq, Qq)
>  OPCODE(pmuludq, LEG(66, 0F, 0, 0xf4), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmuludq, VEX(128, 66, 0F, IG, 0xf4), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmuludq, VEX(256, 66, 0F, IG, 0xf4), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmulhrsw, LEG(NP, 0F38, 0, 0x0b), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(pmulhrsw, LEG(66, 0F38, 0, 0x0b), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmulhrsw, VEX(128, 66, 0F38, IG, 0x0b), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmulhrsw, VEX(256, 66, 0F38, IG, 0x0b), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(mulps, LEG(NP, 0F, 0, 0x59), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vmulps, VEX(128, NP, 0F, IG, 0x59), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vmulps, VEX(256, NP, 0F, IG, 0x59), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1103,9 +1309,11 @@ OPCODE(vmulsd, VEX(IG, F2, 0F, IG, 0x59), AVX,
WRR, Vq, Hq, Wq)
>  OPCODE(pmaddwd, LEG(NP, 0F, 0, 0xf5), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pmaddwd, LEG(66, 0F, 0, 0xf5), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaddwd, VEX(128, 66, 0F, IG, 0xf5), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaddwd, VEX(256, 66, 0F, IG, 0xf5), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmaddubsw, LEG(NP, 0F38, 0, 0x04), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(pmaddubsw, LEG(66, 0F38, 0, 0x04), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaddubsw, VEX(128, 66, 0F38, IG, 0x04), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaddubsw, VEX(256, 66, 0F38, IG, 0x04), AVX2, WRR, Vqq, Hqq,
Wqq)
>  OPCODE(divps, LEG(NP, 0F, 0, 0x5e), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vdivps, VEX(128, NP, 0F, IG, 0x5e), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vdivps, VEX(256, NP, 0F, IG, 0x5e), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1139,17 +1347,23 @@ OPCODE(vrsqrtss, VEX(IG, F3, 0F, IG, 0x52), AVX,
WRR, Vd, Hd, Wd)
>  OPCODE(pminub, LEG(NP, 0F, 0, 0xda), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pminub, LEG(66, 0F, 0, 0xda), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpminub, VEX(128, 66, 0F, IG, 0xda), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpminub, VEX(256, 66, 0F, IG, 0xda), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pminuw, LEG(66, 0F38, 0, 0x3a), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpminuw, VEX(128, 66, 0F38, IG, 0x3a), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpminuw, VEX(256, 66, 0F38, IG, 0x3a), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pminud, LEG(66, 0F38, 0, 0x3b), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpminud, VEX(128, 66, 0F38, IG, 0x3b), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpminud, VEX(256, 66, 0F38, IG, 0x3b), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pminsb, LEG(66, 0F38, 0, 0x38), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpminsb, VEX(128, 66, 0F38, IG, 0x38), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpminsb, VEX(256, 66, 0F38, IG, 0x38), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pminsw, LEG(NP, 0F, 0, 0xea), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pminsw, LEG(66, 0F, 0, 0xea), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpminsw, VEX(128, 66, 0F, IG, 0xea), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpminsw, VEX(256, 66, 0F, IG, 0xea), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pminsd, LEG(66, 0F38, 0, 0x39), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpminsd, VEX(128, 66, 0F38, IG, 0x39), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpminsd, VEX(256, 66, 0F38, IG, 0x39), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(minps, LEG(NP, 0F, 0, 0x5d), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vminps, VEX(128, NP, 0F, IG, 0x5d), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vminps, VEX(256, NP, 0F, IG, 0x5d), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1165,17 +1379,23 @@ OPCODE(vphminposuw, VEX(128, 66, 0F38, IG, 0x41),
AVX, WR, Vdq, Wdq)
>  OPCODE(pmaxub, LEG(NP, 0F, 0, 0xde), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pmaxub, LEG(66, 0F, 0, 0xde), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaxub, VEX(128, 66, 0F, IG, 0xde), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaxub, VEX(256, 66, 0F, IG, 0xde), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmaxuw, LEG(66, 0F38, 0, 0x3e), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaxuw, VEX(128, 66, 0F38, IG, 0x3e), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaxuw, VEX(256, 66, 0F38, IG, 0x3e), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmaxud, LEG(66, 0F38, 0, 0x3f), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaxud, VEX(128, 66, 0F38, IG, 0x3f), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaxud, VEX(256, 66, 0F38, IG, 0x3f), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmaxsb, LEG(66, 0F38, 0, 0x3c), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaxsb, VEX(128, 66, 0F38, IG, 0x3c), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaxsb, VEX(256, 66, 0F38, IG, 0x3c), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmaxsw, LEG(NP, 0F, 0, 0xee), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pmaxsw, LEG(66, 0F, 0, 0xee), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaxsw, VEX(128, 66, 0F, IG, 0xee), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaxsw, VEX(256, 66, 0F, IG, 0xee), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pmaxsd, LEG(66, 0F38, 0, 0x3d), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpmaxsd, VEX(128, 66, 0F38, IG, 0x3d), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpmaxsd, VEX(256, 66, 0F38, IG, 0x3d), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(maxps, LEG(NP, 0F, 0, 0x5f), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vmaxps, VEX(128, NP, 0F, IG, 0x5f), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vmaxps, VEX(256, NP, 0F, IG, 0x5f), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1189,32 +1409,42 @@ OPCODE(vmaxsd, VEX(IG, F2, 0F, IG, 0x5f), AVX,
WRR, Vq, Hq, Wq)
>  OPCODE(pavgb, LEG(NP, 0F, 0, 0xe0), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pavgb, LEG(66, 0F, 0, 0xe0), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpavgb, VEX(128, 66, 0F, IG, 0xe0), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpavgb, VEX(256, 66, 0F, IG, 0xe0), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pavgw, LEG(NP, 0F, 0, 0xe3), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(pavgw, LEG(66, 0F, 0, 0xe3), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpavgw, VEX(128, 66, 0F, IG, 0xe3), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpavgw, VEX(256, 66, 0F, IG, 0xe3), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psadbw, LEG(NP, 0F, 0, 0xf6), SSE, WRR, Pq, Pq, Qq)
>  OPCODE(psadbw, LEG(66, 0F, 0, 0xf6), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsadbw, VEX(128, 66, 0F, IG, 0xf6), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsadbw, VEX(256, 66, 0F, IG, 0xf6), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(mpsadbw, LEG(66, 0F3A, 0, 0x42), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
>  OPCODE(vmpsadbw, VEX(128, 66, 0F3A, IG, 0x42), AVX, WRRR, Vdq, Hdq, Wdq,
Ib)
> +OPCODE(vmpsadbw, VEX(256, 66, 0F3A, IG, 0x42), AVX2, WRRR, Vqq, Hqq,
Wqq, Ib)
>  OPCODE(pabsb, LEG(NP, 0F38, 0, 0x1c), SSSE3, WR, Pq, Qq)
>  OPCODE(pabsb, LEG(66, 0F38, 0, 0x1c), SSSE3, WR, Vdq, Wdq)
>  OPCODE(vpabsb, VEX(128, 66, 0F38, IG, 0x1c), AVX, WR, Vdq, Wdq)
> +OPCODE(vpabsb, VEX(256, 66, 0F38, IG, 0x1c), AVX2, WR, Vqq, Wqq)
>  OPCODE(pabsw, LEG(NP, 0F38, 0, 0x1d), SSSE3, WR, Pq, Qq)
>  OPCODE(pabsw, LEG(66, 0F38, 0, 0x1d), SSSE3, WR, Vdq, Wdq)
>  OPCODE(vpabsw, VEX(128, 66, 0F38, IG, 0x1d), AVX, WR, Vdq, Wdq)
> +OPCODE(vpabsw, VEX(256, 66, 0F38, IG, 0x1d), AVX2, WR, Vqq, Wqq)
>  OPCODE(pabsd, LEG(NP, 0F38, 0, 0x1e), SSSE3, WR, Pq, Qq)
>  OPCODE(pabsd, LEG(66, 0F38, 0, 0x1e), SSSE3, WR, Vdq, Wdq)
>  OPCODE(vpabsd, VEX(128, 66, 0F38, IG, 0x1e), AVX, WR, Vdq, Wdq)
> +OPCODE(vpabsd, VEX(256, 66, 0F38, IG, 0x1e), AVX2, WR, Vqq, Wqq)
>  OPCODE(psignb, LEG(NP, 0F38, 0, 0x08), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(psignb, LEG(66, 0F38, 0, 0x08), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsignb, VEX(128, 66, 0F38, IG, 0x08), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsignb, VEX(256, 66, 0F38, IG, 0x08), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psignw, LEG(NP, 0F38, 0, 0x09), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(psignw, LEG(66, 0F38, 0, 0x09), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsignw, VEX(128, 66, 0F38, IG, 0x09), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsignw, VEX(256, 66, 0F38, IG, 0x09), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psignd, LEG(NP, 0F38, 0, 0x0a), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(psignd, LEG(66, 0F38, 0, 0x0a), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsignd, VEX(128, 66, 0F38, IG, 0x0a), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsignd, VEX(256, 66, 0F38, IG, 0x0a), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(dpps, LEG(66, 0F3A, 0, 0x40), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
>  OPCODE(vdpps, VEX(128, 66, 0F3A, IG, 0x40), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
>  OPCODE(vdpps, VEX(256, 66, 0F3A, IG, 0x40), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
> @@ -1247,25 +1477,33 @@ OPCODE(vpclmulqdq, VEX(128, 66, 0F3A, IG, 0x44),
PCLMULQDQ_AVX, WRRR, Vdq, Hdq,
>  OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpeqb, VEX(128, 66, 0F, IG, 0x74), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpeqb, VEX(256, 66, 0F, IG, 0x74), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpeqw, LEG(66, 0F, 0, 0x75), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpeqw, VEX(128, 66, 0F, IG, 0x75), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpeqw, VEX(256, 66, 0F, IG, 0x75), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpeqd, LEG(NP, 0F, 0, 0x76), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpeqd, LEG(66, 0F, 0, 0x76), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpeqd, VEX(128, 66, 0F, IG, 0x76), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpeqd, VEX(256, 66, 0F, IG, 0x76), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpeqq, LEG(66, 0F38, 0, 0x29), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpeqq, VEX(128, 66, 0F38, IG, 0x29), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpeqq, VEX(256, 66, 0F38, IG, 0x29), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpgtb, LEG(NP, 0F, 0, 0x64), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpgtb, LEG(66, 0F, 0, 0x64), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpgtb, VEX(128, 66, 0F, IG, 0x64), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpgtb, VEX(256, 66, 0F, IG, 0x64), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpgtw, LEG(NP, 0F, 0, 0x65), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpgtw, LEG(66, 0F, 0, 0x65), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpgtw, VEX(128, 66, 0F, IG, 0x65), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpgtw, VEX(256, 66, 0F, IG, 0x65), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpgtd, LEG(NP, 0F, 0, 0x66), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpgtd, LEG(66, 0F, 0, 0x66), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpgtd, VEX(128, 66, 0F, IG, 0x66), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpgtd, VEX(256, 66, 0F, IG, 0x66), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpgtq, LEG(66, 0F38, 0, 0x37), SSE4_2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpcmpgtq, VEX(128, 66, 0F38, IG, 0x37), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpcmpgtq, VEX(256, 66, 0F38, IG, 0x37), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pcmpestrm, LEG(66, 0F3A, 0, 0x60), SSE4_2, RRR, Vdq, Wdq, Ib)
>  OPCODE(vpcmpestrm, VEX(128, 66, 0F3A, IG, 0x60), AVX, RRR, Vdq, Wdq, Ib)
>  OPCODE(pcmpestri, LEG(66, 0F3A, 0, 0x61), SSE4_2, RRR, Vdq, Wdq, Ib)
> @@ -1302,6 +1540,7 @@ OPCODE(vcomisd, VEX(IG, 66, 0F, IG, 0x2f), AVX, RR,
Vq, Wq)
>  OPCODE(pand, LEG(NP, 0F, 0, 0xdb), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pand, LEG(66, 0F, 0, 0xdb), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpand, VEX(128, 66, 0F, IG, 0xdb), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpand, VEX(256, 66, 0F, IG, 0xdb), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(andps, LEG(NP, 0F, 0, 0x54), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vandps, VEX(128, NP, 0F, IG, 0x54), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vandps, VEX(256, NP, 0F, IG, 0x54), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1311,6 +1550,7 @@ OPCODE(vandpd, VEX(256, 66, 0F, IG, 0x54), AVX,
WRR, Vqq, Hqq, Wqq)
>  OPCODE(pandn, LEG(NP, 0F, 0, 0xdf), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pandn, LEG(66, 0F, 0, 0xdf), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpandn, VEX(128, 66, 0F, IG, 0xdf), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpandn, VEX(256, 66, 0F, IG, 0xdf), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(andnps, LEG(NP, 0F, 0, 0x55), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vandnps, VEX(128, NP, 0F, IG, 0x55), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vandnps, VEX(256, NP, 0F, IG, 0x55), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1320,6 +1560,7 @@ OPCODE(vandnpd, VEX(256, 66, 0F, IG, 0x55), AVX,
WRR, Vqq, Hqq, Wqq)
>  OPCODE(por, LEG(NP, 0F, 0, 0xeb), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(por, LEG(66, 0F, 0, 0xeb), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpor, VEX(128, 66, 0F, IG, 0xeb), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpor, VEX(256, 66, 0F, IG, 0xeb), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(orps, LEG(NP, 0F, 0, 0x56), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vorps, VEX(128, NP, 0F, IG, 0x56), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vorps, VEX(256, NP, 0F, IG, 0x56), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1329,6 +1570,7 @@ OPCODE(vorpd, VEX(256, 66, 0F, IG, 0x56), AVX, WRR,
Vqq, Hqq, Wqq)
>  OPCODE(pxor, LEG(NP, 0F, 0, 0xef), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pxor, LEG(66, 0F, 0, 0xef), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpxor, VEX(128, 66, 0F, IG, 0xef), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpxor, VEX(256, 66, 0F, IG, 0xef), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(xorps, LEG(NP, 0F, 0, 0x57), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vxorps, VEX(128, NP, 0F, IG, 0x57), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vxorps, VEX(256, NP, 0F, IG, 0x57), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1338,63 +1580,94 @@ OPCODE(vxorpd, VEX(256, 66, 0F, IG, 0x57), AVX,
WRR, Vqq, Hqq, Wqq)
>  OPCODE(psllw, LEG(NP, 0F, 0, 0xf1), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psllw, LEG(66, 0F, 0, 0xf1), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsllw, VEX(128, 66, 0F, IG, 0xf1), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsllw, VEX(256, 66, 0F, IG, 0xf1), AVX2, WRR, Vqq, Hqq, Wdq)
>  OPCODE(pslld, LEG(NP, 0F, 0, 0xf2), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pslld, LEG(66, 0F, 0, 0xf2), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpslld, VEX(128, 66, 0F, IG, 0xf2), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpslld, VEX(256, 66, 0F, IG, 0xf2), AVX2, WRR, Vqq, Hqq, Wdq)
>  OPCODE(psllq, LEG(NP, 0F, 0, 0xf3), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psllq, LEG(66, 0F, 0, 0xf3), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsllq, VEX(128, 66, 0F, IG, 0xf3), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsllq, VEX(256, 66, 0F, IG, 0xf3), AVX2, WRR, Vqq, Hqq, Wdq)
> +OPCODE(vpsllvd, VEX(128, 66, 0F38, 0, 0x47), AVX2, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsllvd, VEX(256, 66, 0F38, 0, 0x47), AVX2, WRR, Vqq, Hqq, Wqq)
> +OPCODE(vpsllvq, VEX(128, 66, 0F38, 1, 0x47), AVX2, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsllvq, VEX(256, 66, 0F38, 1, 0x47), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psrlw, LEG(NP, 0F, 0, 0xd1), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psrlw, LEG(66, 0F, 0, 0xd1), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsrlw, VEX(128, 66, 0F, IG, 0xd1), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsrlw, VEX(256, 66, 0F, IG, 0xd1), AVX2, WRR, Vqq, Hqq, Wdq)
>  OPCODE(psrld, LEG(NP, 0F, 0, 0xd2), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psrld, LEG(66, 0F, 0, 0xd2), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsrld, VEX(128, 66, 0F, IG, 0xd2), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsrld, VEX(256, 66, 0F, IG, 0xd2), AVX2, WRR, Vqq, Hqq, Wdq)
>  OPCODE(psrlq, LEG(NP, 0F, 0, 0xd3), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psrlq, LEG(66, 0F, 0, 0xd3), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsrlq, VEX(128, 66, 0F, IG, 0xd3), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsrlq, VEX(256, 66, 0F, IG, 0xd3), AVX2, WRR, Vqq, Hqq, Wdq)
> +OPCODE(vpsrlvd, VEX(128, 66, 0F38, 0, 0x45), AVX2, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsrlvd, VEX(256, 66, 0F38, 0, 0x45), AVX2, WRR, Vqq, Hqq, Wqq)
> +OPCODE(vpsrlvq, VEX(128, 66, 0F38, 1, 0x45), AVX2, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsrlvq, VEX(256, 66, 0F38, 1, 0x45), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(psraw, LEG(NP, 0F, 0, 0xe1), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psraw, LEG(66, 0F, 0, 0xe1), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsraw, VEX(128, 66, 0F, IG, 0xe1), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsraw, VEX(256, 66, 0F, IG, 0xe1), AVX2, WRR, Vqq, Hqq, Wdq)
>  OPCODE(psrad, LEG(NP, 0F, 0, 0xe2), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(psrad, LEG(66, 0F, 0, 0xe2), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpsrad, VEX(128, 66, 0F, IG, 0xe2), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsrad, VEX(256, 66, 0F, IG, 0xe2), AVX2, WRR, Vqq, Hqq, Wdq)
> +OPCODE(vpsravd, VEX(128, 66, 0F38, 0, 0x46), AVX2, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpsravd, VEX(256, 66, 0F38, 0, 0x46), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(palignr, LEG(NP, 0F3A, 0, 0x0f), SSSE3, WRRR, Pq, Pq, Qq, Ib)
>  OPCODE(palignr, LEG(66, 0F3A, 0, 0x0f), SSSE3, WRRR, Vdq, Vdq, Wdq, Ib)
>  OPCODE(vpalignr, VEX(128, 66, 0F3A, IG, 0x0f), AVX, WRRR, Vdq, Hdq, Wdq,
Ib)
> +OPCODE(vpalignr, VEX(256, 66, 0F3A, IG, 0x0f), AVX2, WRRR, Vqq, Hqq,
Wqq, Ib)
>  OPCODE(packsswb, LEG(NP, 0F, 0, 0x63), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(packsswb, LEG(66, 0F, 0, 0x63), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpacksswb, VEX(128, 66, 0F, IG, 0x63), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpacksswb, VEX(256, 66, 0F, IG, 0x63), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(packssdw, LEG(NP, 0F, 0, 0x6b), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(packssdw, LEG(66, 0F, 0, 0x6b), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpackssdw, VEX(128, 66, 0F, IG, 0x6b), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpackssdw, VEX(256, 66, 0F, IG, 0x6b), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(packuswb, LEG(NP, 0F, 0, 0x67), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(packuswb, LEG(66, 0F, 0, 0x67), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpackuswb, VEX(128, 66, 0F, IG, 0x67), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpackuswb, VEX(256, 66, 0F, IG, 0x67), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(packusdw, LEG(66, 0F38, 0, 0x2b), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpackusdw, VEX(128, 66, 0F38, IG, 0x2b), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpackusdw, VEX(256, 66, 0F38, IG, 0x2b), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpckhbw, LEG(NP, 0F, 0, 0x68), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(punpckhbw, LEG(66, 0F, 0, 0x68), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpckhbw, VEX(128, 66, 0F, IG, 0x68), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpckhbw, VEX(256, 66, 0F, IG, 0x68), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpckhwd, LEG(NP, 0F, 0, 0x69), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(punpckhwd, LEG(66, 0F, 0, 0x69), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpckhwd, VEX(128, 66, 0F, IG, 0x69), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpckhwd, VEX(256, 66, 0F, IG, 0x69), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpckhdq, LEG(NP, 0F, 0, 0x6a), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(punpckhdq, LEG(66, 0F, 0, 0x6a), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpckhdq, VEX(128, 66, 0F, IG, 0x6a), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpckhdq, VEX(256, 66, 0F, IG, 0x6a), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpckhqdq, LEG(66, 0F, 0, 0x6d), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpckhqdq, VEX(128, 66, 0F, IG, 0x6d), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpckhqdq, VEX(256, 66, 0F, IG, 0x6d), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpcklbw, LEG(NP, 0F, 0, 0x60), MMX, WRR, Pq, Pq, Qd)
>  OPCODE(punpcklbw, LEG(66, 0F, 0, 0x60), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpcklbw, VEX(128, 66, 0F, IG, 0x60), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpcklbw, VEX(256, 66, 0F, IG, 0x60), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpcklwd, LEG(NP, 0F, 0, 0x61), MMX, WRR, Pq, Pq, Qd)
>  OPCODE(punpcklwd, LEG(66, 0F, 0, 0x61), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpcklwd, VEX(128, 66, 0F, IG, 0x61), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpcklwd, VEX(256, 66, 0F, IG, 0x61), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpckldq, LEG(NP, 0F, 0, 0x62), MMX, WRR, Pq, Pq, Qd)
>  OPCODE(punpckldq, LEG(66, 0F, 0, 0x62), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpckldq, VEX(128, 66, 0F, IG, 0x62), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpckldq, VEX(256, 66, 0F, IG, 0x62), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(punpcklqdq, LEG(66, 0F, 0, 0x6c), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpunpcklqdq, VEX(128, 66, 0F, IG, 0x6c), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpunpcklqdq, VEX(256, 66, 0F, IG, 0x6c), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(unpcklps, LEG(NP, 0F, 0, 0x14), SSE, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vunpcklps, VEX(128, NP, 0F, IG, 0x14), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vunpcklps, VEX(256, NP, 0F, IG, 0x14), AVX, WRR, Vqq, Hqq, Wqq)
> @@ -1410,13 +1683,17 @@ OPCODE(vunpckhpd, VEX(256, 66, 0F, IG, 0x15),
AVX, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pshufb, LEG(NP, 0F38, 0, 0x00), SSSE3, WRR, Pq, Pq, Qq)
>  OPCODE(pshufb, LEG(66, 0F38, 0, 0x00), SSSE3, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpshufb, VEX(128, 66, 0F38, IG, 0x00), AVX, WRR, Vdq, Hdq, Wdq)
> +OPCODE(vpshufb, VEX(256, 66, 0F38, IG, 0x00), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(pshufw, LEG(NP, 0F, 0, 0x70), SSE, WRR, Pq, Qq, Ib)
>  OPCODE(pshuflw, LEG(F2, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
>  OPCODE(vpshuflw, VEX(128, F2, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
> +OPCODE(vpshuflw, VEX(256, F2, 0F, IG, 0x70), AVX2, WRR, Vqq, Wqq, Ib)
>  OPCODE(pshufhw, LEG(F3, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
>  OPCODE(vpshufhw, VEX(128, F3, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
> +OPCODE(vpshufhw, VEX(256, F3, 0F, IG, 0x70), AVX2, WRR, Vqq, Wqq, Ib)
>  OPCODE(pshufd, LEG(66, 0F, 0, 0x70), SSE2, WRR, Vdq, Wdq, Ib)
>  OPCODE(vpshufd, VEX(128, 66, 0F, IG, 0x70), AVX, WRR, Vdq, Wdq, Ib)
> +OPCODE(vpshufd, VEX(256, 66, 0F, IG, 0x70), AVX2, WRR, Vqq, Wqq, Ib)
>  OPCODE(shufps, LEG(NP, 0F, 0, 0xc6), SSE, WRRR, Vdq, Vdq, Wdq, Ib)
>  OPCODE(vshufps, VEX(128, NP, 0F, IG, 0xc6), AVX, WRRR, Vdq, Hdq, Wdq, Ib)
>  OPCODE(vshufps, VEX(256, NP, 0F, IG, 0xc6), AVX, WRRR, Vqq, Hqq, Wqq, Ib)
> @@ -1437,8 +1714,12 @@ OPCODE(vblendvpd, VEX(128, 66, 0F3A, 0, 0x4b),
AVX, WRRR, Vdq, Hdq, Wdq, Ldq)
>  OPCODE(vblendvpd, VEX(256, 66, 0F3A, 0, 0x4b), AVX, WRRR, Vqq, Hqq, Wqq,
Lqq)
>  OPCODE(pblendvb, LEG(66, 0F38, 0, 0x10), SSE4_1, WRR, Vdq, Vdq, Wdq)
>  OPCODE(vpblendvb, VEX(128, 66, 0F3A, 0, 0x4c), AVX, WRRR, Vdq, Hdq, Wdq,
Ldq)
> +OPCODE(vpblendvb, VEX(256, 66, 0F3A, 0, 0x4c), AVX2, WRRR, Vqq, Hqq,
Wqq, Lqq)
>  OPCODE(pblendw, LEG(66, 0F3A, 0, 0x0e), SSE4_1, WRRR, Vdq, Vdq, Wdq, Ib)
>  OPCODE(vpblendw, VEX(128, 66, 0F3A, IG, 0x0e), AVX, WRRR, Vdq, Hdq, Wdq,
Ib)
> +OPCODE(vpblendw, VEX(256, 66, 0F3A, IG, 0x0e), AVX2, WRRR, Vqq, Hqq,
Wqq, Ib)
> +OPCODE(vpblendd, VEX(128, 66, 0F3A, 0, 0x02), AVX2, WRRR, Vdq, Hdq, Wdq,
Ib)
> +OPCODE(vpblendd, VEX(256, 66, 0F3A, 0, 0x02), AVX2, WRRR, Vqq, Hqq, Wqq,
Ib)
>  OPCODE(insertps, LEG(66, 0F3A, 0, 0x21), SSE4_1, WRRR, Vdq, Vdq, Wd, Ib)
>  OPCODE(vinsertps, VEX(128, 66, 0F3A, IG, 0x21), AVX, WRRR, Vdq, Hdq, Wd,
Ib)
>  OPCODE(pinsrb, LEG(66, 0F3A, 0, 0x20), SSE4_1, WRRR, Vdq, Vdq, RdMb, Ib)
> @@ -1451,6 +1732,7 @@ OPCODE(vpinsrd, VEX(128, 66, 0F3A, 0, 0x22), AVX,
WRRR, Vdq, Hdq, Ed, Ib)
>  OPCODE(pinsrq, LEG(66, 0F3A, 1, 0x22), SSE4_1, WRRR, Vdq, Vdq, Eq, Ib)
>  OPCODE(vpinsrq, VEX(128, 66, 0F3A, 1, 0x22), AVX, WRRR, Vdq, Hdq, Eq, Ib)
>  OPCODE(vinsertf128, VEX(256, 66, 0F3A, 0, 0x18), AVX, WRRR, Vqq, Hqq,
Wdq, Ib)
> +OPCODE(vinserti128, VEX(256, 66, 0F3A, 0, 0x38), AVX2, WRRR, Vqq, Hqq,
Wdq, Ib)
>  OPCODE(extractps, LEG(66, 0F3A, 0, 0x17), SSE4_1, WRR, Ed, Vdq, Ib)
>  OPCODE(vextractps, VEX(128, 66, 0F3A, IG, 0x17), AVX, WRR, Ed, Vdq, Ib)
>  OPCODE(pextrb, LEG(66, 0F3A, 0, 0x14), SSE4_1, WRR, RdMb, Vdq, Ib)
> @@ -1468,11 +1750,24 @@ OPCODE(pextrw, LEG(66, 0F, 1, 0xc5), SSE2, WRR,
Gq, Udq, Ib)
>  OPCODE(vpextrw, VEX(128, 66, 0F, 0, 0xc5), AVX, WRR, Gd, Udq, Ib)
>  OPCODE(vpextrw, VEX(128, 66, 0F, 1, 0xc5), AVX, WRR, Gq, Udq, Ib)
>  OPCODE(vextractf128, VEX(256, 66, 0F3A, 0, 0x19), AVX, WRR, Wdq, Vqq, Ib)
> -OPCODE(vbroadcastss, VEX(128, 66, 0F38, 0, 0x18), AVX, WR, Vdq, Md)
> -OPCODE(vbroadcastss, VEX(256, 66, 0F38, 0, 0x18), AVX, WR, Vqq, Md)
> -OPCODE(vbroadcastsd, VEX(256, 66, 0F38, 0, 0x19), AVX, WR, Vqq, Mq)
> +OPCODE(vextracti128, VEX(256, 66, 0F3A, 0, 0x39), AVX2, WRR, Wdq, Vqq,
Ib)
> +OPCODE(vpbroadcastb, VEX(128, 66, 0F38, 0, 0x78), AVX2, WR, Vdq, Wb)
> +OPCODE(vpbroadcastb, VEX(256, 66, 0F38, 0, 0x78), AVX2, WR, Vqq, Wb)
> +OPCODE(vpbroadcastw, VEX(128, 66, 0F38, 0, 0x79), AVX2, WR, Vdq, Ww)
> +OPCODE(vpbroadcastw, VEX(256, 66, 0F38, 0, 0x79), AVX2, WR, Vqq, Ww)
> +OPCODE(vpbroadcastd, VEX(128, 66, 0F38, 0, 0x58), AVX2, WR, Vdq, Wd)
> +OPCODE(vpbroadcastd, VEX(256, 66, 0F38, 0, 0x58), AVX2, WR, Vqq, Wd)
> +OPCODE(vpbroadcastq, VEX(128, 66, 0F38, 0, 0x59), AVX2, WR, Vdq, Wq)
> +OPCODE(vpbroadcastq, VEX(256, 66, 0F38, 0, 0x59), AVX2, WR, Vqq, Wq)
> +OPCODE(vbroadcastss, VEX(128, 66, 0F38, 0, 0x18), AVX, WRR, Vdq, Wd,
modrm_mod)
> +OPCODE(vbroadcastss, VEX(256, 66, 0F38, 0, 0x18), AVX, WRR, Vqq, Wd,
modrm_mod)
> +OPCODE(vbroadcastsd, VEX(256, 66, 0F38, 0, 0x19), AVX, WRR, Vqq, Wq,
modrm_mod)
>  OPCODE(vbroadcastf128, VEX(256, 66, 0F38, 0, 0x1a), AVX, WR, Vqq, Mdq)
> +OPCODE(vbroadcasti128, VEX(256, 66, 0F38, 0, 0x5a), AVX2, WR, Vqq, Mdq)
>  OPCODE(vperm2f128, VEX(256, 66, 0F3A, 0, 0x06), AVX, WRRR, Vqq, Hqq,
Wqq, Ib)
> +OPCODE(vperm2i128, VEX(256, 66, 0F3A, 0, 0x46), AVX2, WRRR, Vqq, Hqq,
Wqq, Ib)
> +OPCODE(vpermd, VEX(256, 66, 0F38, 0, 0x36), AVX2, WRR, Vqq, Hqq, Wqq)
> +OPCODE(vpermps, VEX(256, 66, 0F38, 0, 0x16), AVX2, WRR, Vqq, Hqq, Wqq)
>  OPCODE(vpermilps, VEX(128, 66, 0F38, 0, 0x0c), AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vpermilps, VEX(256, 66, 0F38, 0, 0x0c), AVX, WRR, Vqq, Hqq, Wqq)
>  OPCODE(vpermilps, VEX(128, 66, 0F3A, 0, 0x04), AVX, WRR, Vdq, Wdq, Ib)
> @@ -1481,30 +1776,60 @@ OPCODE(vpermilpd, VEX(128, 66, 0F38, 0, 0x0d),
AVX, WRR, Vdq, Hdq, Wdq)
>  OPCODE(vpermilpd, VEX(256, 66, 0F38, 0, 0x0d), AVX, WRR, Vqq, Hqq, Wqq)
>  OPCODE(vpermilpd, VEX(128, 66, 0F3A, 0, 0x05), AVX, WRR, Vdq, Wdq, Ib)
>  OPCODE(vpermilpd, VEX(256, 66, 0F3A, 0, 0x05), AVX, WRR, Vqq, Wqq, Ib)
> +OPCODE(vpermq, VEX(256, 66, 0F3A, 1, 0x00), AVX2, WRR, Vqq, Wqq, Ib)
> +OPCODE(vpermpd, VEX(256, 66, 0F3A, 1, 0x01), AVX2, WRR, Vqq, Wqq, Ib)
> +OPCODE(vgatherdps, VEX(128, 66, 0F38, 0, 0x92), AVX2, WWRRR, Vdq, Hdq,
Vdq, MDdq, Hdq)
> +OPCODE(vgatherdps, VEX(256, 66, 0F38, 0, 0x92), AVX2, WWRRR, Vqq, Hqq,
Vqq, MDqq, Hqq)
> +OPCODE(vgatherdpd, VEX(128, 66, 0F38, 1, 0x92), AVX2, WWRRR, Vdq, Hdq,
Vdq, MDdq, Hdq)
> +OPCODE(vgatherdpd, VEX(256, 66, 0F38, 1, 0x92), AVX2, WWRRR, Vqq, Hqq,
Vqq, MDdq, Hqq)
> +OPCODE(vgatherqps, VEX(128, 66, 0F38, 0, 0x93), AVX2, WWRRR, Vdq, Hdq,
Vdq, MQdq, Hdq)
> +OPCODE(vgatherqps, VEX(256, 66, 0F38, 0, 0x93), AVX2, WWRRR, Vdq, Hdq,
Vdq, MQqq, Hdq)
> +OPCODE(vgatherqpd, VEX(128, 66, 0F38, 1, 0x93), AVX2, WWRRR, Vdq, Hdq,
Vdq, MQdq, Hdq)
> +OPCODE(vgatherqpd, VEX(256, 66, 0F38, 1, 0x93), AVX2, WWRRR, Vqq, Hqq,
Vqq, MQqq, Hqq)
> +OPCODE(vpgatherdd, VEX(128, 66, 0F38, 0, 0x90), AVX2, WWRRR, Vdq, Hdq,
Vdq, MDdq, Hdq)
> +OPCODE(vpgatherdd, VEX(256, 66, 0F38, 0, 0x90), AVX2, WWRRR, Vqq, Hqq,
Vqq, MDqq, Hqq)
> +OPCODE(vpgatherdq, VEX(128, 66, 0F38, 1, 0x90), AVX2, WWRRR, Vdq, Hdq,
Vdq, MDdq, Hdq)
> +OPCODE(vpgatherdq, VEX(256, 66, 0F38, 1, 0x90), AVX2, WWRRR, Vqq, Hqq,
Vqq, MDdq, Hqq)
> +OPCODE(vpgatherqd, VEX(128, 66, 0F38, 0, 0x91), AVX2, WWRRR, Vdq, Hdq,
Vdq, MQdq, Hdq)
> +OPCODE(vpgatherqd, VEX(256, 66, 0F38, 0, 0x91), AVX2, WWRRR, Vdq, Hdq,
Vdq, MQqq, Hdq)
> +OPCODE(vpgatherqq, VEX(128, 66, 0F38, 1, 0x91), AVX2, WWRRR, Vdq, Hdq,
Vdq, MQdq, Hdq)
> +OPCODE(vpgatherqq, VEX(256, 66, 0F38, 1, 0x91), AVX2, WWRRR, Vqq, Hqq,
Vqq, MQqq, Hqq)
>  OPCODE(pmovsxbw, LEG(66, 0F38, 0, 0x20), SSE4_1, WR, Vdq, Wq)
>  OPCODE(vpmovsxbw, VEX(128, 66, 0F38, IG, 0x20), AVX, WR, Vdq, Wq)
> +OPCODE(vpmovsxbw, VEX(256, 66, 0F38, IG, 0x20), AVX2, WR, Vqq, Wdq)
>  OPCODE(pmovsxbd, LEG(66, 0F38, 0, 0x21), SSE4_1, WR, Vdq, Wd)
>  OPCODE(vpmovsxbd, VEX(128, 66, 0F38, IG, 0x21), AVX, WR, Vdq, Wd)
> +OPCODE(vpmovsxbd, VEX(256, 66, 0F38, IG, 0x21), AVX2, WR, Vqq, Wq)
>  OPCODE(pmovsxbq, LEG(66, 0F38, 0, 0x22), SSE4_1, WR, Vdq, Ww)
>  OPCODE(vpmovsxbq, VEX(128, 66, 0F38, IG, 0x22), AVX, WR, Vdq, Ww)
> +OPCODE(vpmovsxbq, VEX(256, 66, 0F38, IG, 0x22), AVX2, WR, Vqq, Wd)
>  OPCODE(pmovsxwd, LEG(66, 0F38, 0, 0x23), SSE4_1, WR, Vdq, Wq)
>  OPCODE(vpmovsxwd, VEX(128, 66, 0F38, IG, 0x23), AVX, WR, Vdq, Wq)
> +OPCODE(vpmovsxwd, VEX(256, 66, 0F38, IG, 0x23), AVX2, WR, Vqq, Wdq)
>  OPCODE(pmovsxwq, LEG(66, 0F38, 0, 0x24), SSE4_1, WR, Vdq, Wd)
>  OPCODE(vpmovsxwq, VEX(128, 66, 0F38, IG, 0x24), AVX, WR, Vdq, Wd)
> +OPCODE(vpmovsxwq, VEX(256, 66, 0F38, IG, 0x24), AVX2, WR, Vqq, Wq)
>  OPCODE(pmovsxdq, LEG(66, 0F38, 0, 0x25), SSE4_1, WR, Vdq, Wq)
>  OPCODE(vpmovsxdq, VEX(128, 66, 0F38, IG, 0x25), AVX, WR, Vdq, Wq)
> +OPCODE(vpmovsxdq, VEX(256, 66, 0F38, IG, 0x25), AVX2, WR, Vqq, Wdq)
>  OPCODE(pmovzxbw, LEG(66, 0F38, 0, 0x30), SSE4_1, WR, Vdq, Wq)
>  OPCODE(vpmovzxbw, VEX(128, 66, 0F38, IG, 0x30), AVX, WR, Vdq, Wq)
> +OPCODE(vpmovzxbw, VEX(256, 66, 0F38, IG, 0x30), AVX2, WR, Vqq, Wdq)
>  OPCODE(pmovzxbd, LEG(66, 0F38, 0, 0x31), SSE4_1, WR, Vdq, Wd)
>  OPCODE(vpmovzxbd, VEX(128, 66, 0F38, IG, 0x31), AVX, WR, Vdq, Wd)
> +OPCODE(vpmovzxbd, VEX(256, 66, 0F38, IG, 0x31), AVX2, WR, Vqq, Wq)
>  OPCODE(pmovzxbq, LEG(66, 0F38, 0, 0x32), SSE4_1, WR, Vdq, Ww)
>  OPCODE(vpmovzxbq, VEX(128, 66, 0F38, IG, 0x32), AVX, WR, Vdq, Ww)
> +OPCODE(vpmovzxbq, VEX(256, 66, 0F38, IG, 0x32), AVX2, WR, Vqq, Wd)
>  OPCODE(pmovzxwd, LEG(66, 0F38, 0, 0x33), SSE4_1, WR, Vdq, Wq)
>  OPCODE(vpmovzxwd, VEX(128, 66, 0F38, IG, 0x33), AVX, WR, Vdq, Wq)
> +OPCODE(vpmovzxwd, VEX(256, 66, 0F38, IG, 0x33), AVX2, WR, Vqq, Wdq)
>  OPCODE(pmovzxwq, LEG(66, 0F38, 0, 0x34), SSE4_1, WR, Vdq, Wd)
>  OPCODE(vpmovzxwq, VEX(128, 66, 0F38, IG, 0x34), AVX, WR, Vdq, Wd)
> +OPCODE(vpmovzxwq, VEX(256, 66, 0F38, IG, 0x34), AVX2, WR, Vqq, Wq)
>  OPCODE(pmovzxdq, LEG(66, 0F38, 0, 0x35), SSE4_1, WR, Vdq, Wq)
>  OPCODE(vpmovzxdq, VEX(128, 66, 0F38, IG, 0x35), AVX, WR, Vdq, Wq)
> +OPCODE(vpmovzxdq, VEX(256, 66, 0F38, IG, 0x35), AVX2, WR, Vqq, Wdq)
>  OPCODE(cvtpi2ps, LEG(NP, 0F, 0, 0x2a), SSE, WR, Vdq, Qq)
>  OPCODE(cvtsi2ss, LEG(F3, 0F, 0, 0x2a), SSE, WR, Vd, Ed)
>  OPCODE(cvtsi2ss, LEG(F3, 0F, 1, 0x2a), SSE, WR, Vd, Eq)
> @@ -1574,6 +1899,14 @@ OPCODE(vmaskmovpd, VEX(128, 66, 0F38, 0, 0x2d),
AVX, WRR, Vdq, Hdq, Mdq)
>  OPCODE(vmaskmovpd, VEX(128, 66, 0F38, 0, 0x2f), AVX, WRR, Mdq, Hdq, Vdq)
>  OPCODE(vmaskmovpd, VEX(256, 66, 0F38, 0, 0x2d), AVX, WRR, Vqq, Hqq, Mqq)
>  OPCODE(vmaskmovpd, VEX(256, 66, 0F38, 0, 0x2f), AVX, WRR, Mqq, Hqq, Vqq)
> +OPCODE(vpmaskmovd, VEX(128, 66, 0F38, 0, 0x8c), AVX2, WRR, Vdq, Hdq, Mdq)
> +OPCODE(vpmaskmovd, VEX(128, 66, 0F38, 0, 0x8e), AVX2, WRR, Mdq, Hdq, Vdq)
> +OPCODE(vpmaskmovd, VEX(256, 66, 0F38, 0, 0x8c), AVX2, WRR, Vqq, Hqq, Mqq)
> +OPCODE(vpmaskmovd, VEX(256, 66, 0F38, 0, 0x8e), AVX2, WRR, Mqq, Hqq, Vqq)
> +OPCODE(vpmaskmovq, VEX(128, 66, 0F38, 1, 0x8c), AVX2, WRR, Vdq, Hdq, Mdq)
> +OPCODE(vpmaskmovq, VEX(128, 66, 0F38, 1, 0x8e), AVX2, WRR, Mdq, Hdq, Vdq)
> +OPCODE(vpmaskmovq, VEX(256, 66, 0F38, 1, 0x8c), AVX2, WRR, Vqq, Hqq, Mqq)
> +OPCODE(vpmaskmovq, VEX(256, 66, 0F38, 1, 0x8e), AVX2, WRR, Mqq, Hqq, Vqq)
>  OPCODE(movntps, LEG(NP, 0F, 0, 0x2b), SSE, WR, Mdq, Vdq)
>  OPCODE(vmovntps, VEX(128, NP, 0F, IG, 0x2b), AVX, WR, Mdq, Vdq)
>  OPCODE(vmovntps, VEX(256, NP, 0F, IG, 0x2b), AVX, WR, Mqq, Vqq)
> @@ -1588,6 +1921,7 @@ OPCODE(vmovntdq, VEX(128, 66, 0F, IG, 0xe7), AVX,
WR, Mdq, Vdq)
>  OPCODE(vmovntdq, VEX(256, 66, 0F, IG, 0xe7), AVX, WR, Mqq, Vqq)
>  OPCODE(movntdqa, LEG(66, 0F38, 0, 0x2a), SSE4_1, WR, Vdq, Mdq)
>  OPCODE(vmovntdqa, VEX(128, 66, 0F38, IG, 0x2a), AVX, WR, Vdq, Mdq)
> +OPCODE(vmovntdqa, VEX(256, 66, 0F38, IG, 0x2a), AVX2, WR, Vqq, Mqq)
>  OPCODE(pause, LEG(F3, NA, 0, 0x90), SSE2, )
>  OPCODE(emms, LEG(NP, 0F, 0, 0x77), MMX, )
>  OPCODE(vzeroupper, VEX(128, NP, 0F, IG, 0x77), AVX, )
> @@ -1614,6 +1948,13 @@ OPCODE_GRP_BEGIN(grp12_VEX_128_66)
>      OPCODE_GRPMEMB(grp12_VEX_128_66, vpsraw, 4, AVX, WRR, Hdq, Udq, Ib)
>  OPCODE_GRP_END(grp12_VEX_128_66)
>
> +OPCODE_GRP(grp12_VEX_256_66, VEX(256, 66, 0F, IG, 0x71))
> +OPCODE_GRP_BEGIN(grp12_VEX_256_66)
> +    OPCODE_GRPMEMB(grp12_VEX_256_66, vpsllw, 6, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp12_VEX_256_66, vpsrlw, 2, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp12_VEX_256_66, vpsraw, 4, AVX2, WRR, Hqq, Uqq, Ib)
> +OPCODE_GRP_END(grp12_VEX_256_66)
> +
>  OPCODE_GRP(grp13_LEG_66, LEG(66, 0F, 0, 0x72))
>  OPCODE_GRP_BEGIN(grp13_LEG_66)
>      OPCODE_GRPMEMB(grp13_LEG_66, pslld, 6, SSE2, WRR, Udq, Udq, Ib)
> @@ -1635,6 +1976,13 @@ OPCODE_GRP_BEGIN(grp13_VEX_128_66)
>      OPCODE_GRPMEMB(grp13_VEX_128_66, vpsrad, 4, AVX, WRR, Hdq, Udq, Ib)
>  OPCODE_GRP_END(grp13_VEX_128_66)
>
> +OPCODE_GRP(grp13_VEX_256_66, VEX(256, 66, 0F, IG, 0x72))
> +OPCODE_GRP_BEGIN(grp13_VEX_256_66)
> +    OPCODE_GRPMEMB(grp13_VEX_256_66, vpslld, 6, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp13_VEX_256_66, vpsrld, 2, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp13_VEX_256_66, vpsrad, 4, AVX2, WRR, Hqq, Uqq, Ib)
> +OPCODE_GRP_END(grp13_VEX_256_66)
> +
>  OPCODE_GRP(grp14_LEG_66, LEG(66, 0F, 0, 0x73))
>  OPCODE_GRP_BEGIN(grp14_LEG_66)
>      OPCODE_GRPMEMB(grp14_LEG_66, psllq, 6, SSE2, WRR, Udq, Udq, Ib)
> @@ -1657,6 +2005,14 @@ OPCODE_GRP_BEGIN(grp14_VEX_128_66)
>      OPCODE_GRPMEMB(grp14_VEX_128_66, vpsrldq, 3, AVX, WRR, Hdq, Udq, Ib)
>  OPCODE_GRP_END(grp14_VEX_128_66)
>
> +OPCODE_GRP(grp14_VEX_256_66, VEX(256, 66, 0F, IG, 0x73))
> +OPCODE_GRP_BEGIN(grp14_VEX_256_66)
> +    OPCODE_GRPMEMB(grp14_VEX_256_66, vpsllq, 6, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp14_VEX_256_66, vpslldq, 7, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp14_VEX_256_66, vpsrlq, 2, AVX2, WRR, Hqq, Uqq, Ib)
> +    OPCODE_GRPMEMB(grp14_VEX_256_66, vpsrldq, 3, AVX2, WRR, Hqq, Uqq, Ib)
> +OPCODE_GRP_END(grp14_VEX_256_66)
> +
>  OPCODE_GRP(grp15_LEG_NP, LEG(NP, 0F, 0, 0xae))
>  OPCODE_GRP_BEGIN(grp15_LEG_NP)
>      OPCODE_GRPMEMB(grp15_LEG_NP, sfence_clflush, 7, SSE, RR, modrm_mod,
modrm)
> --
> 2.20.1
>
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 58/75] target/i386: introduce AES and PCLMULQDQ vector instructions to sse-opcode.inc.h
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 58/75] target/i386: introduce AES and PCLMULQDQ vector instructions to sse-opcode.inc.h Jan Bobek
@ 2019-08-22  4:02   ` Aleksandar Markovic
  0 siblings, 0 replies; 80+ messages in thread
From: Aleksandar Markovic @ 2019-08-22  4:02 UTC (permalink / raw)
  To: Jan Bobek; +Cc: Alex Bennée, Richard Henderson, qemu-devel

21.08.2019. 20.37, "Jan Bobek" <jan.bobek@gmail.com> је написао/ла:
>
> Add all the AES and PCLMULQDQ vector instruction entries to
sse-opcode.inc.h.
>

Why only pclmulqdq, and not entire CLMUL instruction set?

> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> ---
>  target/i386/sse-opcode.inc.h | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/target/i386/sse-opcode.inc.h b/target/i386/sse-opcode.inc.h
> index f43436213e..1359508424 100644
> --- a/target/i386/sse-opcode.inc.h
> +++ b/target/i386/sse-opcode.inc.h
> @@ -449,6 +449,26 @@
>   * 66 0F 3A 61 /r imm8     PCMPESTRI xmm1, xmm2/m128, imm8
>   * 66 0F 3A 62 /r imm8     PCMPISTRM xmm1, xmm2/m128, imm8
>   * 66 0F 3A 63 /r imm8     PCMPISTRI xmm1, xmm2/m128, imm8
> + *
> + * AES Instructions
> + * -----------------
> + * 66 0F 38 DE /r                  AESDEC xmm1, xmm2/m128
> + * VEX.128.66.0F38.WIG DE /r       VAESDEC xmm1, xmm2, xmm3/m128
> + * 66 0F 38 DF /r                  AESDECLAST xmm1, xmm2/m128
> + * VEX.128.66.0F38.WIG DF /r       VAESDECLAST xmm1, xmm2, xmm3/m128
> + * 66 0F 38 DC /r                  AESENC xmm1, xmm2/m128
> + * VEX.128.66.0F38.WIG DC /r       VAESENC xmm1, xmm2, xmm3/m128
> + * 66 0F 38 DD /r                  AESENCLAST xmm1, xmm2/m128
> + * VEX.128.66.0F38.WIG DD /r       VAESENCLAST xmm1, xmm2, xmm3/m128
> + * 66 0F 38 DB /r                  AESIMC xmm1, xmm2/m128
> + * VEX.128.66.0F38.WIG DB /r       VAESIMC xmm1, xmm2/m128
> + * 66 0F 3A DF /r ib               AESKEYGENASSIST xmm1, xmm2/m128, imm8
> + * VEX.128.66.0F3A.WIG DF /r ib    VAESKEYGENASSIST xmm1, xmm2/m128, imm8
> + *
> + * PCLMULQDQ Instructions
> + * -----------------------
> + * 66 0F 3A 44 /r ib               PCLMULQDQ xmm1, xmm2/m128, imm8
> + * VEX.128.66.0F3A.WIG 44 /r ib    VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
>   */
>
>  OPCODE(movd, LEG(NP, 0F, 0, 0x6e), MMX, WR, Pq, Ed)
> @@ -641,6 +661,20 @@ OPCODE(roundps, LEG(66, 0F3A, 0, 0x08), SSE4_1, WRR,
Vdq, Wdq, Ib)
>  OPCODE(roundpd, LEG(66, 0F3A, 0, 0x09), SSE4_1, WRR, Vdq, Wdq, Ib)
>  OPCODE(roundss, LEG(66, 0F3A, 0, 0x0a), SSE4_1, WRR, Vd, Wd, Ib)
>  OPCODE(roundsd, LEG(66, 0F3A, 0, 0x0b), SSE4_1, WRR, Vq, Wq, Ib)
> +OPCODE(aesdec, LEG(66, 0F38, 0, 0xde), AES, WRR, Vdq, Vdq, Wdq)
> +OPCODE(vaesdec, VEX(128, 66, 0F38, IG, 0xde), AES_AVX, WRR, Vdq, Hdq,
Wdq)
> +OPCODE(aesdeclast, LEG(66, 0F38, 0, 0xdf), AES, WRR, Vdq, Vdq, Wdq)
> +OPCODE(vaesdeclast, VEX(128, 66, 0F38, IG, 0xdf), AES_AVX, WRR, Vdq,
Hdq, Wdq)
> +OPCODE(aesenc, LEG(66, 0F38, 0, 0xdc), AES, WRR, Vdq, Vdq, Wdq)
> +OPCODE(vaesenc, VEX(128, 66, 0F38, IG, 0xdc), AES_AVX, WRR, Vdq, Hdq,
Wdq)
> +OPCODE(aesenclast, LEG(66, 0F38, 0, 0xdd), AES, WRR, Vdq, Vdq, Wdq)
> +OPCODE(vaesenclast, VEX(128, 66, 0F38, IG, 0xdd), AES_AVX, WRR, Vdq,
Hdq, Wdq)
> +OPCODE(aesimc, LEG(66, 0F38, 0, 0xdb), AES, WR, Vdq, Wdq)
> +OPCODE(vaesimc, VEX(128, 66, 0F38, IG, 0xdb), AES_AVX, WR, Vdq, Wdq)
> +OPCODE(aeskeygenassist, LEG(66, 0F3A, 0, 0xdf), AES, WRR, Vdq, Wdq, Ib)
> +OPCODE(vaeskeygenassist, VEX(128, 66, 0F3A, IG, 0xdf), AES_AVX, WRR,
Vdq, Wdq, Ib)
> +OPCODE(pclmulqdq, LEG(66, 0F3A, 0, 0x44), PCLMULQDQ, WRRR, Vdq, Vdq,
Wdq, Ib)
> +OPCODE(vpclmulqdq, VEX(128, 66, 0F3A, IG, 0x44), PCLMULQDQ_AVX, WRRR,
Vdq, Hdq, Wdq, Ib)
>  OPCODE(pcmpeqb, LEG(NP, 0F, 0, 0x74), MMX, WRR, Pq, Pq, Qq)
>  OPCODE(pcmpeqb, LEG(66, 0F, 0, 0x74), SSE2, WRR, Vdq, Vdq, Wdq)
>  OPCODE(pcmpeqw, LEG(NP, 0F, 0, 0x75), MMX, WRR, Pq, Pq, Qq)
> --
> 2.20.1
>
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 02/75] target/i386: Push rex_w into DisasContext
  2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 02/75] target/i386: Push rex_w " Jan Bobek
@ 2019-08-22  4:07   ` Aleksandar Markovic
  0 siblings, 0 replies; 80+ messages in thread
From: Aleksandar Markovic @ 2019-08-22  4:07 UTC (permalink / raw)
  To: Jan Bobek
  Cc: Alex Bennée, Richard Henderson, qemu-devel, Richard Henderson

21.08.2019. 19.41, "Jan Bobek" <jan.bobek@gmail.com> је написао/ла:
>
> From: Richard Henderson <rth@twiddle.net>
>
> Treat this the same as we already do for other rex bits.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---

I keep my previous opinion that this is an example of a low-quality commit
message that needlessly introduces unclarity.

Thanks,
Aleksandar

>  target/i386/translate.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 3aac84e5b0..9ec1c79371 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -44,11 +44,13 @@
>  #define REX_X(s) ((s)->rex_x)
>  #define REX_B(s) ((s)->rex_b)
>  #define REX_R(s) ((s)->rex_r)
> +#define REX_W(s) ((s)->rex_w)
>  #else
>  #define CODE64(s) 0
>  #define REX_X(s) 0
>  #define REX_B(s) 0
>  #define REX_R(s) 0
> +#define REX_W(s) -1
>  #endif
>
>  #ifdef TARGET_X86_64
> @@ -100,7 +102,7 @@ typedef struct DisasContext {
>  #ifdef TARGET_X86_64
>      int lma;    /* long mode active */
>      int code64; /* 64 bit code segment */
> -    int rex_x, rex_b, rex_r;
> +    int rex_x, rex_b, rex_r, rex_w;
>  #endif
>      int vex_l;  /* vex vector length */
>      int vex_v;  /* vex vvvv register, without 1's complement.  */
> @@ -4495,7 +4497,6 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>      int modrm, reg, rm, mod, op, opreg, val;
>      target_ulong next_eip, tval;
>      target_ulong pc_start = s->base.pc_next;
> -    int rex_w;
>
>      s->pc_start = s->pc = pc_start;
>      s->override = -1;
> @@ -4503,6 +4504,7 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>      s->rex_x = 0;
>      s->rex_b = 0;
>      s->rex_r = 0;
> +    s->rex_w = -1;
>      s->x86_64_hregs = false;
>  #endif
>      s->rip_offset = 0; /* for relative ip address */
> @@ -4514,7 +4516,6 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>      }
>
>      prefixes = 0;
> -    rex_w = -1;
>
>   next_byte:
>      b = x86_ldub_code(env, s);
> @@ -4557,7 +4558,7 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>      case 0x40 ... 0x4f:
>          if (CODE64(s)) {
>              /* REX prefix */
> -            rex_w = (b >> 3) & 1;
> +            s->rex_w = (b >> 3) & 1;
>              s->rex_r = (b & 0x4) << 1;
>              s->rex_x = (b & 0x2) << 2;
>              s->rex_b = (b & 0x1) << 3;
> @@ -4606,7 +4607,9 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>                  s->rex_b = (~vex2 >> 2) & 8;
>  #endif
>                  vex3 = x86_ldub_code(env, s);
> -                rex_w = (vex3 >> 7) & 1;
> +#ifdef TARGET_X86_64
> +                s->rex_w = (vex3 >> 7) & 1;
> +#endif
>                  switch (vex2 & 0x1f) {
>                  case 0x01: /* Implied 0f leading opcode bytes.  */
>                      b = x86_ldub_code(env, s) | 0x100;
> @@ -4631,9 +4634,9 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>      /* Post-process prefixes.  */
>      if (CODE64(s)) {
>          /* In 64-bit mode, the default data size is 32-bit.  Select
64-bit
> -           data with rex_w, and 16-bit data with 0x66; rex_w takes
precedence
> +           data with REX_W, and 16-bit data with 0x66; REX_W takes
precedence
>             over 0x66 if both are present.  */
> -        dflag = (rex_w > 0 ? MO_64 : prefixes & PREFIX_DATA ? MO_16 :
MO_32);
> +        dflag = (REX_W(s) > 0 ? MO_64 : prefixes & PREFIX_DATA ? MO_16 :
MO_32);
>          /* In 64-bit mode, 0x67 selects 32-bit addressing.  */
>          aflag = (prefixes & PREFIX_ADR ? MO_32 : MO_64);
>      } else {
> @@ -5029,7 +5032,7 @@ static target_ulong disas_insn(DisasContext *s,
CPUState *cpu)
>                  /* operand size for jumps is 64 bit */
>                  ot = MO_64;
>              } else if (op == 3 || op == 5) {
> -                ot = dflag != MO_16 ? MO_32 + (rex_w == 1) : MO_16;
> +                ot = dflag != MO_16 ? MO_32 + (REX_W(s) == 1) : MO_16;
>              } else if (op == 6) {
>                  /* default push size is 64 bit */
>                  ot = mo_pushpop(s, dflag);
> --
> 2.20.1
>
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v4 31/75] target/i386: introduce code generators
  2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 31/75] target/i386: introduce code generators Jan Bobek
@ 2019-08-22  4:33   ` Aleksandar Markovic
  0 siblings, 0 replies; 80+ messages in thread
From: Aleksandar Markovic @ 2019-08-22  4:33 UTC (permalink / raw)
  To: Jan Bobek; +Cc: Alex Bennée, Richard Henderson, qemu-devel

21.08.2019. 20.12, "Jan Bobek" <jan.bobek@gmail.com> је написао/ла:
>
> In this context, "code generators" are functions that receive decoded
> instruction operands and emit TCG ops implementing the correct
> instruction functionality. Introduce the naming macros first, actual
> generator macros will be added later.
>
> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> ---

I advice some caution here. Before adopting the coding approach that relies
heavily on preprocessor, you should seriously evaluate
not-always-so-obvious aspects of debugability and readibility of the end
result. In other words, you should provide a clear and objective answer to
this: What is gained and what is lost by using macros?

Thanks,
Aleksandar

>  target/i386/translate.c | 46 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
>
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 2e78bed78f..603a5b80a1 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -5331,6 +5331,52 @@ INSNOP_LDST(xmm, Mhq)
>      tcg_temp_free_i64(r64);
>  }
>
> +/*
> + * Code generators
> + */
> +#define gen_insn(mnem, argc, ...)               \
> +    glue(gen_insn, argc)(mnem, ## __VA_ARGS__)
> +#define gen_insn0(mnem)                         \
> +    gen_ ## mnem ## _0
> +#define gen_insn1(mnem, opT1)                   \
> +    gen_ ## mnem ## _1 ## opT1
> +#define gen_insn2(mnem, opT1, opT2)             \
> +    gen_ ## mnem ## _2 ## opT1 ## opT2
> +#define gen_insn3(mnem, opT1, opT2, opT3)       \
> +    gen_ ## mnem ## _3 ## opT1 ## opT2 ## opT3
> +#define gen_insn4(mnem, opT1, opT2, opT3, opT4)         \
> +    gen_ ## mnem ## _4 ## opT1 ## opT2 ## opT3 ## opT4
> +#define gen_insn5(mnem, opT1, opT2, opT3, opT4, opT5)           \
> +    gen_ ## mnem ## _5 ## opT1 ## opT2 ## opT3 ## opT4 ## opT5
> +
> +#define GEN_INSN0(mnem)                         \
> +    static void gen_insn0(mnem)(                \
> +        CPUX86State *env, DisasContext *s)
> +#define GEN_INSN1(mnem, opT1)                   \
> +    static void gen_insn1(mnem, opT1)(          \
> +        CPUX86State *env, DisasContext *s,      \
> +        insnop_arg_t(opT1) arg1)
> +#define GEN_INSN2(mnem, opT1, opT2)                             \
> +    static void gen_insn2(mnem, opT1, opT2)(                    \
> +        CPUX86State *env, DisasContext *s,                      \
> +        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2)
> +#define GEN_INSN3(mnem, opT1, opT2, opT3)                       \
> +    static void gen_insn3(mnem, opT1, opT2, opT3)(              \
> +        CPUX86State *env, DisasContext *s,                      \
> +        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2,       \
> +        insnop_arg_t(opT3) arg3)
> +#define GEN_INSN4(mnem, opT1, opT2, opT3, opT4)                 \
> +    static void gen_insn4(mnem, opT1, opT2, opT3, opT4)(        \
> +        CPUX86State *env, DisasContext *s,                      \
> +        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2,       \
> +        insnop_arg_t(opT3) arg3, insnop_arg_t(opT4) arg4)
> +#define GEN_INSN5(mnem, opT1, opT2, opT3, opT4, opT5)           \
> +    static void gen_insn5(mnem, opT1, opT2, opT3, opT4, opT5)(  \
> +        CPUX86State *env, DisasContext *s,                      \
> +        insnop_arg_t(opT1) arg1, insnop_arg_t(opT2) arg2,       \
> +        insnop_arg_t(opT3) arg3, insnop_arg_t(opT4) arg4,       \
> +        insnop_arg_t(opT5) arg5)
> +
>  static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
>  {
>      enum {
> --
> 2.20.1
>
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2019-08-22  4:34 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 17:28 [Qemu-devel] [RFC PATCH v4 00/75] rewrite MMX/SSE*/AVX/AVX2 vector instruction translation Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 01/75] target/i386: Push rex_r into DisasContext Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 02/75] target/i386: Push rex_w " Jan Bobek
2019-08-22  4:07   ` Aleksandar Markovic
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 03/75] target/i386: use dflag from DisasContext Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 04/75] target/i386: use prefix " Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 05/75] target/i386: introduce disas_insn_prefix Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 06/75] target/i386: Simplify gen_exception arguments Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 07/75] target/i386: use pc_start from DisasContext Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 08/75] target/i386: make variable b1 const Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 09/75] target/i386: make variable is_xmm const Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 10/75] target/i386: add vector register file alignment constraints Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 11/75] target/i386: introduce gen_sse_ng Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 12/75] target/i386: introduce CASES_* macros in gen_sse_ng Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 13/75] target/i386: decode the 0F38/0F3A prefix " Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 14/75] target/i386: introduce aliases for some tcg_gvec operations Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 15/75] target/i386: introduce function check_cpuid Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 16/75] target/i386: disable AVX/AVX2 cpuid bitchecks Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 17/75] target/i386: introduce instruction operand infrastructure Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 18/75] target/i386: introduce generic operand alias Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 19/75] target/i386: introduce generic either-or operand Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 20/75] target/i386: introduce generic load-store operand Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 21/75] target/i386: introduce tcg register operands Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 22/75] target/i386: introduce modrm operand Jan Bobek
2019-08-21 17:28 ` [Qemu-devel] [RFC PATCH v4 23/75] target/i386: introduce operands for decoding modrm fields Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 24/75] target/i386: introduce operand for direct-only r/m field Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 25/75] target/i386: introduce Ib (immediate) operand Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 26/75] target/i386: introduce M* (memptr) operands Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 27/75] target/i386: introduce G*, R*, E* (general register) operands Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 28/75] target/i386: introduce P*, N*, Q* (MMX) operands Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 29/75] target/i386: introduce H*, L*, V*, U*, W* (SSE/AVX) operands Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 30/75] target/i386: alias H* operands with the V* operands Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 31/75] target/i386: introduce code generators Jan Bobek
2019-08-22  4:33   ` Aleksandar Markovic
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 32/75] target/i386: introduce helper-based code generator macros Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 33/75] target/i386: introduce gvec-based " Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 34/75] target/i386: introduce sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 35/75] target/i386: introduce instruction translator macros Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 36/75] target/i386: introduce MMX translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 37/75] target/i386: introduce MMX code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 38/75] target/i386: introduce MMX vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 39/75] target/i386: introduce SSE translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 40/75] target/i386: introduce SSE code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 41/75] target/i386: introduce SSE vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 42/75] target/i386: introduce SSE2 translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 43/75] target/i386: introduce SSE2 code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 44/75] target/i386: introduce SSE2 vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 45/75] target/i386: introduce SSE3 translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 46/75] target/i386: introduce SSE3 code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 47/75] target/i386: introduce SSE3 vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 48/75] target/i386: introduce SSSE3 translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 49/75] target/i386: introduce SSSE3 code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 50/75] target/i386: introduce SSSE3 vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 51/75] target/i386: introduce SSE4.1 translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 52/75] target/i386: introduce SSE4.1 code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 53/75] target/i386: introduce SSE4.1 vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 54/75] target/i386: introduce SSE4.2 code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 55/75] target/i386: introduce SSE4.2 vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 57/75] target/i386: introduce AES and PCLMULQDQ code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 58/75] target/i386: introduce AES and PCLMULQDQ vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-22  4:02   ` Aleksandar Markovic
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 59/75] target/i386: introduce AVX translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 60/75] target/i386: introduce AVX code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 61/75] target/i386: introduce AVX vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 62/75] target/i386: introduce AVX2 translators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 63/75] target/i386: introduce AVX2 code generators Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 64/75] target/i386: introduce AVX2 vector instructions to sse-opcode.inc.h Jan Bobek
2019-08-22  3:54   ` Aleksandar Markovic
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 65/75] target/i386: remove obsoleted helpers Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 66/75] target/i386: cleanup leftovers in ops_sse_header.h Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 67/75] target/i386: introduce aliases for helper-based tcg_gen_gvec_* functions Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 68/75] target/i386: convert ps((l, r)l(w, d, q), ra(w, d)) to helpers to gvec style Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 69/75] target/i386: convert pmullw/pmulhw/pmulhuw " Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 70/75] target/i386: convert pavgb/pavgw " Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 71/75] target/i386: convert pmuludq/pmaddwd " Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 72/75] target/i386: convert psadbw helper " Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 73/75] target/i386: remove obsoleted helper_mov(l, q)_mm_T0 Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 74/75] target/i386: convert pshuf(w, lw, hw, d), shuf(pd, ps) helpers to gvec style Jan Bobek
2019-08-21 17:29 ` [Qemu-devel] [RFC PATCH v4 75/75] target/i386: convert pmovmskb/movmskps/movmskpd " Jan Bobek
2019-08-21 23:53   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).