All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images
@ 2019-07-01  4:35 Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

This is a v2 of the patch series first posted in [1]. This version also
implements the VEX prefix, hence all SIMD extensions up to AVX2 are
supported. Notable exceptions are LDMXCSR (cannot constrain memory
contents yet) and all forms of VGATHER (VSIB not implemented).

Note that this is still not the final version; I am planning to
implement randomization of VSIB to test VGATHER, and improve the way
registers are randomized (as discussed in e.g. [2]).

Changes since v1:
  - risugen_common: rewrote insnv to make it clearer, added a comment
    to randint_constr;
  - risugen_x86_asm: fixed a typo in rex_encode;
  - risugen_x86: use more than one opcode in write_mov_reg_imm to
    optimize space usage;
  - x86.risu: added all SIMD extensnions up to AVX2.

References:
  1. https://lists.nongnu.org/archive/html/qemu-devel/2019-06/msg04123.html
  2. https://lists.nongnu.org/archive/html/qemu-devel/2019-06/msg06489.html

Jan Bobek (14):
  risugen_common: add insnv, randint_constr, rand_fill
  risugen_x86_asm: add module
  risugen_x86_emit: add module
  risugen_x86: add module
  risugen: allow all byte-aligned instructions
  x86.risu: add MMX instructions
  x86.risu: add SSE instructions
  x86.risu: add SSE2 instructions
  x86.risu: add SSE3 instructions
  x86.risu: add SSSE3 instructions
  x86.risu: add SSE4.1 and SSE4.2 instructions
  x86.risu: add AES and PCLMULQDQ instructions
  x86.risu: add AVX instructions
  x86.risu: add AVX2 instructions

 risugen             |   15 +-
 risugen_common.pm   |  107 ++++-
 risugen_x86.pm      |  498 +++++++++++++++++++++
 risugen_x86_asm.pm  |  252 +++++++++++
 risugen_x86_emit.pm |   91 ++++
 x86.risu            | 1026 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1977 insertions(+), 12 deletions(-)
 create mode 100644 risugen_x86.pm
 create mode 100644 risugen_x86_asm.pm
 create mode 100644 risugen_x86_emit.pm
 create mode 100644 x86.risu

-- 
2.20.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-03 15:22   ` Richard Henderson
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add three common utility functions:

- insnv allows emitting variable-length instructions in little-endian
  or big-endian byte order; it subsumes functionality of former
  insn16() and insn32() functions.

- randint_constr allows generating random integers according to
  several constraints passed as arguments.

- rand_fill uses randint_constr to fill a given hash with
  (optionally constrained) random values.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 risugen_common.pm | 107 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 101 insertions(+), 6 deletions(-)

diff --git a/risugen_common.pm b/risugen_common.pm
index 71ee996..c5d861e 100644
--- a/risugen_common.pm
+++ b/risugen_common.pm
@@ -23,7 +23,8 @@ BEGIN {
     require Exporter;
 
     our @ISA = qw(Exporter);
-    our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16 $bytecount
+    our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16
+                   $bytecount insnv randint_constr rand_fill
                    progress_start progress_update progress_end
                    eval_with_fields is_pow_of_2 sextract ctz
                    dump_insn_details);
@@ -37,7 +38,7 @@ my $bigendian = 0;
 # (default is little endian, 0).
 sub set_endian
 {
-    $bigendian = @_;
+    ($bigendian) = @_;
 }
 
 sub open_bin
@@ -52,18 +53,112 @@ sub close_bin
     close(BIN) or die "can't close output file: $!";
 }
 
+sub insnv(%)
+{
+    my (%args) = @_;
+
+    # Default to big-endian order, so that the instruction bytes are
+    # emitted in the same order as they are written in the
+    # configuration file.
+    $args{bigendian} = 1 unless defined $args{bigendian};
+
+    my $bitcur = 0;
+    my $bitend = 8 * $args{len};
+    while ($bitcur < $bitend) {
+        my $format;
+        my $bitlen;
+
+        if ($bitcur + 64 <= $bitend) {
+            $format = "Q";
+            $bitlen = 64;
+        } elsif ($bitcur + 32 <= $bitend) {
+            $format = "L";
+            $bitlen = 32;
+        } elsif ($bitcur + 16 <= $bitend) {
+            $format = "S";
+            $bitlen = 16;
+        } else {
+            $format = "C";
+            $bitlen = 8;
+        }
+
+        $format .= ($args{bigendian} ? ">" : "<") if $bitlen > 8;
+
+        my $bitmask = (1 << $bitlen) - 1;
+        my $value = $args{value} >> ($args{bigendian}
+                                     ? $bitend - $bitcur - $bitlen
+                                     : $bitcur);
+
+        print BIN pack($format, $value & $bitmask);
+        $bytecount += $bitlen / 8;
+
+        $bitcur += $bitlen;
+    }
+}
+
 sub insn32($)
 {
     my ($insn) = @_;
-    print BIN pack($bigendian ? "N" : "V", $insn);
-    $bytecount += 4;
+    insnv(value => $insn, len => 4, bigendian => $bigendian);
 }
 
 sub insn16($)
 {
     my ($insn) = @_;
-    print BIN pack($bigendian ? "n" : "v", $insn);
-    $bytecount += 2;
+    insnv(value => $insn, len => 2, bigendian => $bigendian);
+}
+
+sub randint_constr(%)
+{
+    my (%args) = @_;
+    my $bitlen = $args{bitlen};
+    my $halfrange = 1 << ($bitlen - 1);
+
+    while (1) {
+        my $value = int(rand(2 * $halfrange));
+        $value -= $halfrange if defined $args{signed} && $args{signed};
+        $value &= ~$args{fixedbitmask} if defined $args{fixedbitmask};
+        $value |= $args{fixedbits} if defined $args{fixedbits};
+
+        if (defined $args{constraint}) {
+            # The idea is: if the most significant bit of
+            # $args{constraint} is zero, $args{constraint} is the
+            # value we want to return; if the most significant bit is
+            # one, ~$args{constraint} (its bit inversion) is the value
+            # we want to *avoid*, so we try again.
+
+            if (!($args{constraint} >> 63)) {
+                $value = $args{constraint};
+            } elsif ($value == ~$args{constraint}) {
+                next;
+            }
+        }
+
+        return $value;
+    }
+}
+
+sub rand_fill($$)
+{
+    my ($target, $constraints) = @_;
+
+    for (keys %{$target}) {
+        my %args = (bitlen => $target->{$_}{bitlen});
+
+        $args{fixedbits} = $target->{$_}{fixedbits}
+            if defined $target->{$_}{fixedbits};
+        $args{fixedbitmask} = $target->{$_}{fixedbitmask}
+            if defined $target->{$_}{fixedbitmask};
+        $args{signed} = $target->{$_}{signed}
+            if defined $target->{$_}{signed};
+
+        $args{constraint} = $constraints->{$_}
+            if defined $constraints->{$_};
+
+        $target->{$_} = randint_constr(%args);
+    }
+
+    return $target;
 }
 
 # Progress bar implementation
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-03 15:37   ` Richard Henderson
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The module risugen_x86_asm.pm exports several constants and the
function write_insn, which work in tandem to allow emission of x86
instructions in more clear and structured manner.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 risugen_x86_asm.pm | 252 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 252 insertions(+)
 create mode 100644 risugen_x86_asm.pm

diff --git a/risugen_x86_asm.pm b/risugen_x86_asm.pm
new file mode 100644
index 0000000..5640531
--- /dev/null
+++ b/risugen_x86_asm.pm
@@ -0,0 +1,252 @@
+#!/usr/bin/perl -w
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+#     Jan Bobek - initial implementation
+###############################################################################
+
+# risugen_x86_asm -- risugen_x86's helper module for x86 assembly
+package risugen_x86_asm;
+
+use strict;
+use warnings;
+
+use risugen_common;
+
+our @ISA    = qw(Exporter);
+our @EXPORT = qw(
+    write_insn
+    VEX_L_128 VEX_L_256
+    VEX_P_NONE VEX_P_DATA16 VEX_P_REP VEX_P_REPNE
+    VEX_M_0F VEX_M_0F38 VEX_M_0F3A
+    VEX_V_UNUSED
+    REG_EAX REG_ECX REG_EDX REG_EBX REG_ESP REG_EBP REG_ESI REG_EDI
+    MOD_INDIRECT MOD_INDIRECT_DISP8 MOD_INDIRECT_DISP32 MOD_DIRECT
+    X86PFX_DATA16 X86PFX_REPNE X86PFX_REP
+    X86OP_LEA X86OP_XOR X86OP_ALU_imm8 X86OP_MOV X86OP_SAHF X86OP_CALL
+    X86OP_JMP X86OP_UD1 X86OP_VMOVAPS X86OP_MOVAPS
+    );
+
+use constant {
+    VEX_L_128 => 0,
+    VEX_L_256 => 1,
+
+    VEX_P_NONE   => 0b00,
+    VEX_P_DATA16 => 0b01,
+    VEX_P_REP    => 0b10,
+    VEX_P_REPNE  => 0b11,
+
+    VEX_M_0F   => 0b00001,
+    VEX_M_0F38 => 0b00010,
+    VEX_M_0F3A => 0b00011,
+
+    VEX_V_UNUSED => 0b1111,
+
+    REG_EAX => 0,
+    REG_ECX => 1,
+    REG_EDX => 2,
+    REG_EBX => 3,
+    REG_ESP => 4,
+    REG_EBP => 5,
+    REG_ESI => 6,
+    REG_EDI => 7,
+
+    MOD_INDIRECT        => 0b00,
+    MOD_INDIRECT_DISP8  => 0b01,
+    MOD_INDIRECT_DISP32 => 0b10,
+    MOD_DIRECT          => 0b11,
+
+    X86PFX_DATA16 => {value => 0x66, len => 1},
+    X86PFX_REPNE  => {value => 0xF2, len => 1},
+    X86PFX_REP    => {value => 0xF3, len => 1},
+
+    X86OP_LEA      => {value => 0x8D, len => 1},
+    X86OP_XOR      => {value => 0x33, len => 1},
+    X86OP_ALU_imm8 => {value => 0x83, len => 1},
+    X86OP_MOV      => {value => 0x8B, len => 1},
+    X86OP_SAHF     => {value => 0x9E, len => 1},
+    X86OP_CALL     => {value => 0xE8, len => 1},
+    X86OP_JMP      => {value => 0xE9, len => 1},
+
+    X86OP_UD1      => {value => 0x0FB9, len => 2},
+    X86OP_VMOVAPS  => {value => 0x28, len => 1},
+    X86OP_MOVAPS   => {value => 0x0F28, len => 2},
+};
+
+sub rex_encode(%)
+{
+    my (%args) = @_;
+
+    $args{w} = 0 unless defined $args{w};
+    $args{r} = 0 unless defined $args{r};
+    $args{x} = 0 unless defined $args{x};
+    $args{b} = 0 unless defined $args{b};
+
+    return (value => 0x40
+            | (($args{w} ? 1 : 0) << 3)
+            | (($args{r} ? 1 : 0) << 2)
+            | (($args{x} ? 1 : 0) << 1)
+            | ($args{b} ? 1 : 0),
+            len => 1);
+}
+
+sub vex_encode(%)
+{
+    my (%args) = @_;
+
+    $args{r} = 1 unless defined $args{r};
+    $args{x} = 1 unless defined $args{x};
+    $args{b} = 1 unless defined $args{b};
+    $args{v} = VEX_V_UNUSED unless defined $args{v};
+    $args{p} = VEX_P_NONE unless defined $args{p};
+
+    die "l field undefined"
+        unless defined $args{l};
+    die "v field out-of-range: $args{v}"
+        unless 0b0000 <= $args{v} && $args{v} <= 0b1111;
+    die "p field out-of-range: $args{p}"
+        unless 0b00 <= $args{p} && $args{p} <= 0b11;
+
+    if ($args{x} && $args{b} && !defined $args{m} && !defined $args{w}) {
+        # We can use the 2-byte VEX prefix
+        return (value => (0xC5 << 8)
+                | (($args{r} ? 1 : 0) << 7)
+                | ($args{v} << 3)
+                | (($args{l} ? 1 : 0) << 2)
+                | $args{p},
+                len => 2);
+    } else {
+        # We have to use the 3-byte VEX prefix
+        die "m field undefined"
+            unless defined $args{m};
+        die "m field out-of-range: $args{m}"
+            unless 0b00000 <= $args{m} && $args{m} <= 0b11111;
+        die "w field undefined"
+            unless defined $args{w};
+
+        return (value => (0xC4 << 16)
+                | (($args{r} ? 1 : 0) << 15)
+                | (($args{x} ? 1 : 0) << 14)
+                | (($args{b} ? 1 : 0) << 13)
+                | ($args{m} << 8)
+                | (($args{w} ? 1 : 0) << 7)
+                | ($args{v} << 3)
+                | (($args{l} ? 1 : 0) << 2)
+                | $args{p},
+                len => 3);
+    }
+}
+
+sub modrm_encode(%)
+{
+    my (%args) = @_;
+
+    die "MOD field out-of-range: $args{mod}"
+        unless 0 <= $args{mod} && $args{mod} <= 3;
+    die "REG field out-of-range: $args{reg}"
+        unless 0 <= $args{reg} && $args{reg} <= 7;
+    die "RM field out-of-range: $args{rm}"
+        unless 0 <= $args{rm} && $args{rm} <= 7;
+
+    return (value =>
+            ($args{mod} << 6)
+            | ($args{reg} << 3)
+            | $args{rm},
+            len => 1);
+}
+
+sub sib_encode(%)
+{
+    my (%args) = @_;
+
+    die "SS field out-of-range: $args{ss}"
+        unless 0 <= $args{ss} && $args{ss} <= 3;
+    die "INDEX field out-of-range: $args{index}"
+        unless 0 <= $args{index} && $args{index} <= 7;
+    die "BASE field out-of-range: $args{base}"
+        unless 0 <= $args{base} && $args{base} <= 7;
+
+    return (value =>
+            ($args{ss} << 6)
+            | ($args{index} << 3)
+            | $args{base},
+            len => 1);
+}
+
+sub write_insn(%)
+{
+    my (%insn) = @_;
+
+    my @tokens;
+    push @tokens, "EVEX"   if defined $insn{evex};
+    push @tokens, "VEX"    if defined $insn{vex};
+    push @tokens, "REP"    if defined $insn{rep};
+    push @tokens, "REPNE"  if defined $insn{repne};
+    push @tokens, "DATA16" if defined $insn{data16};
+    push @tokens, "REX"    if defined $insn{rex};
+    push @tokens, "OP"     if defined $insn{opcode};
+    push @tokens, "MODRM"  if defined $insn{modrm};
+    push @tokens, "SIB"    if defined $insn{sib};
+    push @tokens, "DISP"   if defined $insn{disp};
+    push @tokens, "IMM"    if defined $insn{imm};
+    push @tokens, "END";
+
+    # (EVEX | VEX | ((REP | REPNE)? DATA16? REX?)) OP (MODRM SIB? DISP?)? IMM? END
+
+    my $token = shift @tokens;
+    if ($token eq "EVEX") {
+        insnv(evex_encode(%{$insn{evex}}));
+        $token = shift @tokens;
+    } elsif ($token eq "VEX") {
+        insnv(vex_encode(%{$insn{vex}}));
+        $token = shift @tokens;
+    } else {
+        if ($token eq "REP") {
+            insnv(%{&X86PFX_REP});
+            $token = shift @tokens;
+        } elsif ($token eq "REPNE") {
+            insnv(%{&X86PFX_REPNE});
+            $token = shift @tokens;
+        }
+        if ($token eq "DATA16") {
+            insnv(%{&X86PFX_DATA16});
+            $token = shift @tokens;
+        }
+        if ($token eq "REX") {
+            insnv(rex_encode(%{$insn{rex}}));
+            $token = shift @tokens;
+        }
+    }
+
+    die "Unexpected instruction tokens where OP expected: $token @tokens\n"
+        unless $token eq "OP";
+
+    insnv(%{$insn{opcode}});
+    $token = shift @tokens;
+
+    if ($token eq "MODRM") {
+        insnv(modrm_encode(%{$insn{modrm}}));
+        $token = shift @tokens;
+
+        if ($token eq "SIB") {
+            insnv(sib_encode(%{$insn{sib}}));
+            $token = shift @tokens;
+        }
+        if ($token eq "DISP") {
+            insnv(%{$insn{disp}}, bigendian => 0);
+            $token = shift @tokens;
+        }
+    }
+    if ($token eq "IMM") {
+        insnv(%{$insn{imm}}, bigendian => 0);
+        $token = shift @tokens;
+    }
+
+    die "Unexpected junk tokens at the end of instruction: $token @tokens\n"
+        unless $token eq "END";
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: add module
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-03 15:47   ` Richard Henderson
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The helper module risugen_x86_emit.pm exports a single function
"parse_emitblock", which serves to capture and return instruction
constraints described by "emit" blocks in an x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 risugen             |  2 +-
 risugen_x86_emit.pm | 91 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 risugen_x86_emit.pm

diff --git a/risugen b/risugen
index e690b18..fe3d00e 100755
--- a/risugen
+++ b/risugen
@@ -43,7 +43,7 @@ my @pattern_re = ();            # include pattern
 my @not_pattern_re = ();        # exclude pattern
 
 # Valid block names (keys in blocks hash)
-my %valid_blockname = ( constraints => 1, memory => 1 );
+my %valid_blockname = ( constraints => 1, memory => 1, emit => 1 );
 
 sub parse_risu_directive($$@)
 {
diff --git a/risugen_x86_emit.pm b/risugen_x86_emit.pm
new file mode 100644
index 0000000..127a524
--- /dev/null
+++ b/risugen_x86_emit.pm
@@ -0,0 +1,91 @@
+#!/usr/bin/perl -w
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+#     Jan Bobek - initial implementation
+###############################################################################
+
+# risugen_x86_emit -- risugen_x86's helper module for emit blocks
+package risugen_x86_emit;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+
+our @ISA    = qw(Exporter);
+our @EXPORT = qw(parse_emitblock);
+
+my $emit_opts;
+
+sub rep(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{rep} = \%opts;
+}
+
+sub repne(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{repne} = \%opts;
+}
+
+sub data16(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{data16} = \%opts;
+}
+
+sub rex(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{rex} = \%opts;
+}
+
+sub vex(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{vex} = \%opts;
+}
+
+sub modrm(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{modrm} = \%opts;
+}
+
+sub mem(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{mem} = \%opts;
+}
+
+sub imm(%)
+{
+    my (%opts) = @_;
+    $emit_opts->{imm} = \%opts;
+}
+
+sub parse_emitblock($$)
+{
+    my ($rec, $insn) = @_;
+    my $insnname = $rec->{name};
+    my $opcode = $insn->{opcode}{value};
+
+    $emit_opts = {};
+
+    my $emitblock = $rec->{blocks}{"emit"};
+    if (defined $emitblock) {
+        eval_with_fields($insnname, $opcode, $rec, "emit", $emitblock);
+    }
+
+    return $emit_opts;
+}
+
+1;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (2 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-03 16:11   ` Richard Henderson
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions Jan Bobek
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

The risugen_x86.pm module contains most of the code specific to Intel
i386 and x86_64 architectures. This commit also adds --x86_64 option,
which enables emission of 64-bit (rather than 32-bit) assembly.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 risugen        |   6 +-
 risugen_x86.pm | 498 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 503 insertions(+), 1 deletion(-)
 create mode 100644 risugen_x86.pm

diff --git a/risugen b/risugen
index fe3d00e..09a702a 100755
--- a/risugen
+++ b/risugen
@@ -310,6 +310,7 @@ Valid options:
                    Useful to test before support for FP is available.
     --sve        : enable sve floating point
     --be         : generate instructions in Big-Endian byte order (ppc64 only).
+    --x86_64     : generate 64-bit (rather than 32-bit) x86 code.
     --help       : print this message
 EOT
 }
@@ -322,6 +323,7 @@ sub main()
     my $fp_enabled = 1;
     my $sve_enabled = 0;
     my $big_endian = 0;
+    my $is_x86_64 = 0;
     my ($infile, $outfile);
 
     GetOptions( "help" => sub { usage(); exit(0); },
@@ -338,6 +340,7 @@ sub main()
                 },
                 "be" => sub { $big_endian = 1; },
                 "no-fp" => sub { $fp_enabled = 0; },
+                "x86_64" => sub { $is_x86_64 = 1; },
                 "sve" => sub { $sve_enabled = 1; },
         ) or return 1;
     # allow "--pattern re,re" and "--pattern re --pattern re"
@@ -372,7 +375,8 @@ sub main()
         'keys' => \@insn_keys,
         'arch' => $full_arch[0],
         'subarch' => $full_arch[1] || '',
-        'bigendian' => $big_endian
+        'bigendian' => $big_endian,
+        'x86_64' => $is_x86_64
     );
 
     write_test_code(\%params);
diff --git a/risugen_x86.pm b/risugen_x86.pm
new file mode 100644
index 0000000..fd16c45
--- /dev/null
+++ b/risugen_x86.pm
@@ -0,0 +1,498 @@
+#!/usr/bin/perl -w
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+#     Jan Bobek - initial implementation
+###############################################################################
+
+# risugen_x86 -- risugen module for Intel i386/x86_64 architectures
+package risugen_x86;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+use risugen_x86_emit;
+
+require Exporter;
+
+our @ISA    = qw(Exporter);
+our @EXPORT = qw(write_test_code);
+
+use constant {
+    RISUOP_COMPARE     => 0,        # compare registers
+    RISUOP_TESTEND     => 1,        # end of test, stop
+    RISUOP_SETMEMBLOCK => 2,        # eax is address of memory block (8192 bytes)
+    RISUOP_GETMEMBLOCK => 3,        # add the address of memory block to eax
+    RISUOP_COMPAREMEM  => 4,        # compare memory block
+
+    # Maximum alignment restriction permitted for a memory op.
+    MAXALIGN => 64,
+    MEMBLOCK_LEN => 8192,
+};
+
+my $periodic_reg_random = 1;
+my $is_x86_64 = 0;
+
+sub write_risuop($)
+{
+    my ($op) = @_;
+
+    write_insn(opcode => X86OP_UD1,
+               modrm => {mod => MOD_DIRECT,
+                         reg => REG_EAX,
+                         rm => $op});
+}
+
+sub write_mov_rr($$)
+{
+    my ($r1, $r2) = @_;
+
+    my %insn = (opcode => X86OP_MOV,
+                modrm => {mod => MOD_DIRECT,
+                          reg => ($r1 & 0x7),
+                          rm => ($r2 & 0x7)});
+
+    $insn{rex}{w} = 1 if $is_x86_64;
+    $insn{rex}{r} = 1 if $r1 >= 8;
+    $insn{rex}{b} = 1 if $r2 >= 8;
+
+    write_insn(%insn);
+}
+
+sub write_mov_reg_imm($$)
+{
+    my ($reg, $imm) = @_;
+    my %insn;
+
+    if (0 <= $imm && $imm <= 0xffffffff) {
+        %insn = (opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
+                 imm => {value => $imm, len => 4});
+    } elsif (-0x80000000 <= $imm && $imm <= 0x7fffffff) {
+        %insn = (opcode => {value => 0xC7, len => 1},
+                 modrm => {mod => MOD_DIRECT,
+                           reg => 0, rm => ($reg & 0x7)},
+                 imm => {value => $imm, len => 4});
+
+        $insn{rex}{w} = 1 if $is_x86_64;
+    } else {
+        %insn = (rex => {w => 1},
+                 opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
+                 imm => {value => $imm, len => 8});
+    }
+
+    $insn{rex}{b} = 1 if $reg >= 8;
+    write_insn(%insn);
+}
+
+sub write_random_regdata()
+{
+    my $reg_cnt = $is_x86_64 ? 16 : 8;
+    my $bitlen = $is_x86_64 ? 64 : 32;
+
+    # initialize flags register
+    write_insn(opcode => X86OP_XOR,
+               modrm => {mod => MOD_DIRECT,
+                         reg => REG_EAX,
+                         rm => REG_EAX});
+    write_insn(opcode => X86OP_SAHF);
+
+    # general purpose registers
+    for (my $reg = 0; $reg < $reg_cnt; $reg++) {
+        if ($reg != REG_ESP) {
+            my $imm = randint_constr(bitlen => $bitlen, signed => 1);
+            write_mov_reg_imm($reg, $imm);
+        }
+    }
+}
+
+sub write_random_datablock($)
+{
+    my ($datalen) = @_;
+
+    # Write a block of random data, $datalen bytes long, aligned
+    # according to MAXALIGN, and load its address into EAX/RAX.
+
+    $datalen += MAXALIGN - 1;
+
+    # First, load current EIP/RIP into EAX/RAX. Easy to do on x86_64
+    # thanks to RIP-relative addressing, but on i386 we need to play
+    # some well-known tricks with CALL instruction.
+    if ($is_x86_64) {
+        # 4-byte AND + 5-byte JMP
+        my $disp32 = 4 + 5 + (MAXALIGN - 1);
+        my $reg = REG_EAX;
+
+        write_insn(rex => {w => 1},
+                   opcode => X86OP_LEA,
+                   modrm => {mod => MOD_INDIRECT,
+                             reg => $reg, rm => REG_EBP},
+                   disp => {value => $disp32, len => 4});
+
+        write_insn(rex => {w => 1},
+                   opcode => X86OP_ALU_imm8,
+                   modrm => {mod => MOD_DIRECT,
+                             reg => 4, rm => $reg},
+                   imm => {value => ~(MAXALIGN - 1),
+                           len => 1});
+
+    } else {
+        # 1-byte POP + 3-byte ADD + 3-byte AND + 5-byte JMP
+        my $imm8 = 1 + 3 + 3 + 5 + (MAXALIGN - 1);
+        my $reg = REG_EAX;
+
+        # displacement = next instruction
+        write_insn(opcode => X86OP_CALL,
+                   imm => {value => 0x00000000, len => 4});
+
+        write_insn(opcode => {value => 0x58 | ($reg & 0x7),
+                              len => 1});
+
+        write_insn(opcode => X86OP_ALU_imm8,
+                   modrm => {mod => MOD_DIRECT,
+                             reg => 0, rm => $reg},
+                   imm => {value => $imm8, len => 1});
+
+        write_insn(opcode => X86OP_ALU_imm8,
+                   modrm => {mod => MOD_DIRECT,
+                             reg => 4, rm => $reg},
+                   imm => {value => ~(MAXALIGN - 1),
+                           len => 1});
+    }
+
+    # JMP over the data blob.
+    write_insn(opcode => X86OP_JMP,
+               imm => {value => $datalen, len => 4});
+
+    # Generate the random data
+    for (my $w = 8; 0 < $w; $w /= 2) {
+        for (; $w <= $datalen; $datalen -= $w) {
+            insnv(%{rand_insn_imm(size => $w)});
+        }
+    }
+}
+
+sub write_random_ymmdata()
+{
+    my $ymm_cnt = $is_x86_64 ? 16 : 8;
+    my $ymm_len = 32;
+    my $datalen = $ymm_cnt * $ymm_len;
+
+    # Generate random data blob
+    write_random_datablock($datalen);
+
+    # Load the random data into YMM regs.
+    for (my $ymm_reg = 0; $ymm_reg < $ymm_cnt; $ymm_reg++) {
+        write_insn(vex => {l => VEX_L_256, p => VEX_P_DATA16,
+                           r => !($ymm_reg >= 8)},
+                   opcode => X86OP_VMOVAPS,
+                   modrm => {mod => MOD_INDIRECT_DISP32,
+                             reg => ($ymm_reg & 0x7),
+                             rm => REG_EAX},
+                   disp => {value => $ymm_reg * $ymm_len,
+                            len => 4});
+    }
+}
+
+sub write_memblock_setup()
+{
+    # Generate random data blob
+    write_random_datablock(MEMBLOCK_LEN);
+    # Pointer is in EAX/RAX; set the memblock
+    write_risuop(RISUOP_SETMEMBLOCK);
+}
+
+sub write_random_register_data()
+{
+    write_random_ymmdata();
+    write_random_regdata();
+    write_risuop(RISUOP_COMPARE);
+}
+
+sub rand_insn_imm(%)
+{
+    my (%args) = @_;
+
+    return {
+        value => randint_constr(bitlen => ($args{size} * 8), signed => 1),
+        len => $args{size}
+    };
+}
+
+sub rand_insn_opcode($)
+{
+    # Given an instruction-details array, generate an instruction
+    my ($rec) = @_;
+    my $insnname = $rec->{name};
+    my $insnwidth = $rec->{width};
+
+    my $constraintfailures = 0;
+
+    INSN: while(1) {
+        my $opcode = randint_constr(bitlen => 32,
+                                    fixedbits => $rec->{fixedbits},
+                                    fixedbitmask => $rec->{fixedbitmask});
+
+        my $constraint = $rec->{blocks}{"constraints"};
+        if (defined $constraint) {
+            # user-specified constraint: evaluate in an environment
+            # with variables set corresponding to the variable fields.
+            my $v = eval_with_fields($insnname, $opcode, $rec, "constraints", $constraint);
+            if (!$v) {
+                $constraintfailures++;
+                if ($constraintfailures > 10000) {
+                    print "10000 consecutive constraint failures for $insnname constraints string:\n$constraint\n";
+                    exit (1);
+                }
+                next INSN;
+            }
+        }
+
+        # OK, we got a good one
+        $constraintfailures = 0;
+
+        return {
+            value => $opcode >> (32 - $insnwidth),
+            len => $insnwidth / 8
+        };
+    }
+}
+
+sub rand_insn_modrm($$)
+{
+    my ($opts, $insn) = @_;
+    my $modrm;
+
+    while (1) {
+        $modrm = rand_fill({mod => {bitlen => 2},
+                            reg => {bitlen => 3},
+                            rm => {bitlen => 3}},
+                           $opts);
+
+        if ($modrm->{mod} != MOD_DIRECT) {
+            # Displacement only; we cannot use this since we
+            # don't know absolute address of the memblock.
+            next if $modrm->{mod} == MOD_INDIRECT && $modrm->{rm} == REG_EBP;
+
+            if ($modrm->{rm} == REG_ESP) {
+                # SIB byte present
+                my $sib = rand_fill({ss => {bitlen => 2},
+                                     index => {bitlen => 3},
+                                     base => {bitlen => 3}}, {});
+
+                # We cannot modify ESP/RSP during the tests
+                next if $sib->{base} == REG_ESP;
+
+                # When base and index register are the same,
+                # computing the correct memblock addresses and
+                # offsets gets way too complicated...
+                next if $sib->{base} == $sib->{index};
+
+                # No base register
+                next if $modrm->{mod} == MOD_INDIRECT && $sib->{base} == REG_EBP;
+
+                $insn->{sib} = $sib;
+            }
+
+            $insn->{disp} = rand_insn_imm(size => 1)
+                if $modrm->{mod} == MOD_INDIRECT_DISP8;
+
+            $insn->{disp} = rand_insn_imm(size => 4)
+                if $modrm->{mod} == MOD_INDIRECT_DISP32;
+        }
+
+        $insn->{modrm} = $modrm;
+        last;
+    }
+}
+
+sub rand_insn_rex($$)
+{
+    my ($opts, $insn) = @_;
+
+    $opts->{w} = 0 unless defined $opts->{w};
+    $opts->{x} = 0 unless defined $opts->{x} || defined $insn->{sib};
+
+    my $rex = rand_fill({w => {bitlen => 1},
+                         r => {bitlen => 1},
+                         b => {bitlen => 1},
+                         x => {bitlen => 1}},
+                        $opts);
+
+    $insn->{rex} = $rex
+        if $rex->{w} || $rex->{r} || $rex->{b} || $rex->{x};
+}
+
+sub rand_insn_vex($$)
+{
+    my ($opts, $insn) = @_;
+    my $vex;
+
+    $opts->{r} = 1 unless $is_x86_64;
+    $opts->{x} = 1 unless $is_x86_64 && (defined $opts->{x} || defined $insn->{sib});
+    $opts->{b} = 1 unless $is_x86_64;
+    $opts->{p} = 0 unless defined $opts->{p};
+
+    $vex->{r} = {bitlen => 1};
+    $vex->{v} = {bitlen => 4};
+    $vex->{l} = {bitlen => 1};
+    $vex->{p} = {bitlen => 2};
+
+    # Note that VEX.X, VEX.B, VEX.M and VEX.W are only present in the
+    # 3-byte VEX prefix. Since VEX.M is an extension of opcode, it
+    # makes no sense to randomize it; therefore, we can only include
+    # VEX.X, VEX.B and VEX.W if we are given a meaningful value for
+    # VEX.M.
+    if (defined $opts->{m}) {
+        $vex->{x} = {bitlen => 1};
+        $vex->{b} = {bitlen => 1};
+        $vex->{m} = {bitlen => 5};
+        $vex->{w} = {bitlen => 1};
+    }
+
+    $insn->{vex} = rand_fill($vex, $opts);
+}
+
+sub write_mem_getoffset($$)
+{
+    my ($opts, $insn) = @_;
+    my $offset, my $index;
+
+    $opts->{size}  = 0 unless defined $opts->{size};
+    $opts->{align} = 1 unless defined $opts->{align};
+
+    if (!defined $opts->{base}
+        && defined $insn->{modrm}
+        && $insn->{modrm}{mod} != MOD_DIRECT) {
+
+        $opts->{base} = (defined $insn->{sib}
+                         ? $insn->{sib}{base}
+                         : $insn->{modrm}{rm});
+
+        if ($insn->{modrm}{mod} == MOD_INDIRECT && $opts->{base} == REG_EBP) {
+            delete $opts->{base}; # No base register
+        } else {
+            $opts->{base} |= $insn->{rex}{b} << 3 if defined $insn->{rex};
+            $opts->{base} |= (!$insn->{vex}{b}) << 3 if defined $insn->{vex};
+        }
+    }
+
+    if (!defined $opts->{index} && defined $insn->{sib}) {
+        $opts->{index} = $insn->{sib}{index};
+        $opts->{index} |= $insn->{rex}{x} << 3 if defined $insn->{rex};
+        $opts->{index} |= (!$insn->{vex}{x}) << 3 if defined $insn->{vex};
+        delete $opts->{index} if $opts->{index} == REG_ESP; # ESP means "none"
+    }
+
+    $opts->{ss} = $insn->{sib}{ss} if !defined $opts->{ss} && defined $insn->{sib};
+    $opts->{disp} = $insn->{disp} if !defined $opts->{disp} && defined $insn->{disp};
+
+    $offset = int(rand(MEMBLOCK_LEN - $opts->{size}));
+    $offset &= ~($opts->{align} - 1);
+
+    $offset -= $opts->{disp}{value} if defined $opts->{disp};
+
+    if (defined $opts->{index}) {
+        $index = randint_constr(bitlen => 32, signed => 1);
+        $offset -= $index * (1 << $opts->{ss});
+    }
+
+    if (defined $opts->{base} && defined $offset) {
+        write_mov_reg_imm(REG_EAX, $offset);
+        write_risuop(RISUOP_GETMEMBLOCK);
+        write_mov_rr($opts->{base}, REG_EAX);
+    }
+    if (defined $opts->{index} && defined $index) {
+        write_mov_reg_imm($opts->{index}, $index);
+    }
+}
+
+sub gen_one_insn($)
+{
+    my ($rec) = @_;
+    my $insn;
+
+    $insn->{opcode} = rand_insn_opcode($rec);
+    my $opts = parse_emitblock($rec, $insn);
+
+    # Operation with a ModR/M byte can potentially use a memory
+    # operand
+    $opts->{mem} = {}
+        unless (defined $opts->{mem}
+                || !defined $opts->{modrm});
+
+    # If none of REX/VEX/EVEX are specified, default to REX
+    $opts->{rex} = {}
+        unless (defined $opts->{rex}
+                || defined $opts->{vex}
+                || defined $opts->{evex}
+                || !defined $opts->{modrm});
+
+    # REX requires x86_64
+    delete $opts->{rex}
+        unless $is_x86_64;
+
+    $insn->{rep}    = $opts->{rep}    if defined $opts->{rep};
+    $insn->{repne}  = $opts->{repne}  if defined $opts->{repne};
+    $insn->{data16} = $opts->{data16} if defined $opts->{data16};
+
+    rand_insn_modrm($opts->{modrm}, $insn) if defined $opts->{modrm};
+
+    rand_insn_vex($opts->{vex}, $insn) if defined $opts->{vex};
+    # TODO rand_insn_evex($opts->{evex}, $insn) if defined $opts->{evex};
+    rand_insn_rex($opts->{rex}, $insn) if defined $opts->{rex};
+
+    $insn->{imm} = rand_insn_imm(%{$opts->{imm}}) if defined $opts->{imm};
+
+    write_mem_getoffset($opts->{mem}, $insn);
+    write_insn(%{$insn});
+}
+
+sub write_test_code($)
+{
+    my ($params) = @_;
+
+    my $numinsns = $params->{ 'numinsns' };
+    my $outfile = $params->{ 'outfile' };
+
+    my %insn_details = %{ $params->{ 'details' } };
+    my @keys = @{ $params->{ 'keys' } };
+
+    $is_x86_64 = $params->{ 'x86_64' };
+
+    open_bin($outfile);
+
+    # TODO better random number generator?
+    srand(0);
+
+    print "Generating code using patterns: @keys...\n";
+    progress_start(78, $numinsns);
+
+    write_memblock_setup();
+
+    # memblock setup doesn't clean its registers, so this must come afterwards.
+    write_random_register_data();
+
+    for my $i (1..$numinsns) {
+        my $insn_enc = $keys[int rand (@keys)];
+        gen_one_insn($insn_details{$insn_enc});
+        write_risuop(RISUOP_COMPARE);
+        # Rewrite the registers periodically. This avoids the tendency
+        # for the VFP registers to decay to NaNs and zeroes.
+        if ($periodic_reg_random && ($i % 100) == 0) {
+            write_random_register_data();
+        }
+        progress_update($i);
+    }
+    write_risuop(RISUOP_TESTEND);
+    progress_end();
+    close_bin();
+}
+
+1;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (3 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Accept all instructions whose bit length is divisible by 8. Note that
the maximum instruction length (as specified in the config file) is 32
bits, hence this change permits instructions which are 8 bits or 24
bits long (16-bit instructions have already been considered valid).

Note that while valid x86 instructions may be up to 15 bytes long, the
length constraint described above only applies to the main opcode
field, which is usually only 1 or 2 bytes long. Therefore, the primary
purpose of this change is to allow 1-byte x86 opcodes.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 risugen | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/risugen b/risugen
index 09a702a..17bf98f 100755
--- a/risugen
+++ b/risugen
@@ -229,12 +229,11 @@ sub parse_config_file($)
                 push @fields, [ $var, $bitpos, $bitmask ];
             }
         }
-        if ($bitpos == 16) {
-            # assume this is a half-width thumb instruction
+        if ($bitpos % 8 == 0) {
             # Note that we don't fiddle with the bitmasks or positions,
             # which means the generated insn will be in the high halfword!
-            $insnwidth = 16;
-        } elsif ($bitpos != 0) {
+            $insnwidth -= $bitpos;
+        } else {
             print STDERR "$file:$.: ($insn $enc) not enough bits specified\n";
             exit(1);
         }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (4 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-03 21:35   ` Richard Henderson
                     ` (2 more replies)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions Jan Bobek
                   ` (6 subsequent siblings)
  12 siblings, 3 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add an x86 configuration file with all MMX instructions.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)
 create mode 100644 x86.risu

diff --git a/x86.risu b/x86.risu
new file mode 100644
index 0000000..f2dd9b0
--- /dev/null
+++ b/x86.risu
@@ -0,0 +1,96 @@
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+#     Jan Bobek - initial implementation
+###############################################################################
+
+# Input file for risugen defining x86 instructions
+.mode x86
+
+# Data Transfer Instructions
+MOVD            MMX     00001111 011 d 1110 !emit { modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVD_mem        MMX     00001111 011 d 1110 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVQ_mm         MMX     00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
+
+# Arithmetic Instructions
+PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
+PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
+PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
+PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }
+PADDSB          MMX     00001111 11101100 !emit { modrm(); mem(size => 8); }
+PADDSW          MMX     00001111 11101101 !emit { modrm(); mem(size => 8); }
+PADDUSB         MMX     00001111 11011100 !emit { modrm(); mem(size => 8); }
+PADDUSW         MMX     00001111 11011101 !emit { modrm(); mem(size => 8); }
+
+PSUBB           MMX     00001111 11111000 !emit { modrm(); mem(size => 8); }
+PSUBW           MMX     00001111 11111001 !emit { modrm(); mem(size => 8); }
+PSUBD           MMX     00001111 11111010 !emit { modrm(); mem(size => 8); }
+PSUBSB          MMX     00001111 11101000 !emit { modrm(); mem(size => 8); }
+PSUBSW          MMX     00001111 11101001 !emit { modrm(); mem(size => 8); }
+PSUBUSB         MMX     00001111 11011000 !emit { modrm(); mem(size => 8); }
+PSUBUSW         MMX     00001111 11011001 !emit { modrm(); mem(size => 8); }
+
+PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
+PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
+
+PMADDWD         MMX     00001111 11110101 !emit { modrm(); mem(size => 8); }
+
+# Comparison Instructions
+PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
+PCMPEQW         MMX     00001111 01110101 !emit { modrm(); mem(size => 8); }
+PCMPEQD         MMX     00001111 01110110 !emit { modrm(); mem(size => 8); }
+PCMPGTB         MMX     00001111 01100100 !emit { modrm(); mem(size => 8); }
+PCMPGTW         MMX     00001111 01100101 !emit { modrm(); mem(size => 8); }
+PCMPGTD         MMX     00001111 01100110 !emit { modrm(); mem(size => 8); }
+
+# Logical Instructions
+PAND            MMX     00001111 11011011 !emit { modrm(); mem(size => 8); }
+PANDN           MMX     00001111 11011111 !emit { modrm(); mem(size => 8); }
+POR             MMX     00001111 11101011 !emit { modrm(); mem(size => 8); }
+PXOR            MMX     00001111 11101111 !emit { modrm(); mem(size => 8); }
+
+# Shift and Rotate Instructions
+PSLLW           MMX     00001111 11110001 !emit { modrm(); mem(size => 8); }
+PSLLD           MMX     00001111 11110010 !emit { modrm(); mem(size => 8); }
+PSLLQ           MMX     00001111 11110011 !emit { modrm(); mem(size => 8); }
+
+PSLLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+
+PSRLW           MMX     00001111 11010001 !emit { modrm(); mem(size => 8); }
+PSRLD           MMX     00001111 11010010 !emit { modrm(); mem(size => 8); }
+PSRLQ           MMX     00001111 11010011 !emit { modrm(); mem(size => 8); }
+
+PSRLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+
+PSRAW           MMX     00001111 11100001 !emit { modrm(); mem(size => 8); }
+PSRAD           MMX     00001111 11100010 !emit { modrm(); mem(size => 8); }
+
+PSRAW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PSRAD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+
+# Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
+PACKSSWB        MMX     00001111 01100011 !emit { modrm(); mem(size => 8); }
+PACKSSDW        MMX     00001111 01101011 !emit { modrm(); mem(size => 8); }
+PACKUSWB        MMX     00001111 01100111 !emit { modrm(); mem(size => 8); }
+
+PUNPCKHBW       MMX     00001111 01101000 !emit { modrm(); mem(size => 8); }
+PUNPCKHWD       MMX     00001111 01101001 !emit { modrm(); mem(size => 8); }
+PUNPCKHDQ       MMX     00001111 01101010 !emit { modrm(); mem(size => 8); }
+
+PUNPCKLBW       MMX     00001111 01100000 !emit { modrm(); mem(size => 4); }
+PUNPCKLWD       MMX     00001111 01100001 !emit { modrm(); mem(size => 4); }
+PUNPCKLDQ       MMX     00001111 01100010 !emit { modrm(); mem(size => 4); }
+
+# State Management Instructions
+EMMS            MMX     00001111 01110111 !emit { }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (5 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions Jan Bobek
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add SSE instructions to the x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/x86.risu b/x86.risu
index f2dd9b0..c29b210 100644
--- a/x86.risu
+++ b/x86.risu
@@ -19,6 +19,18 @@ MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MO
 MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVQ_mm         MMX     00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
 
+MOVAPS          SSE     00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
+MOVUPS          SSE     00001111 0001000 d !emit { modrm(); mem(size => 16); }
+MOVSS           SSE     00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
+
+MOVLPS          SSE     00001111 0001001 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVHPS          SSE     00001111 0001011 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVLHPS         SSE     00001111 00010110  !emit { modrm(mod => MOD_DIRECT); }
+MOVHLPS         SSE     00001111 00010010  !emit { modrm(mod => MOD_DIRECT); }
+
+PMOVMSKB        SSE     00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+MOVMSKPS        SSE     00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+
 # Arithmetic Instructions
 PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
 PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
@@ -29,6 +41,9 @@ PADDSW          MMX     00001111 11101101 !emit { modrm(); mem(size => 8); }
 PADDUSB         MMX     00001111 11011100 !emit { modrm(); mem(size => 8); }
 PADDUSW         MMX     00001111 11011101 !emit { modrm(); mem(size => 8); }
 
+ADDPS           SSE     00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
+ADDSS           SSE     00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
+
 PSUBB           MMX     00001111 11111000 !emit { modrm(); mem(size => 8); }
 PSUBW           MMX     00001111 11111001 !emit { modrm(); mem(size => 8); }
 PSUBD           MMX     00001111 11111010 !emit { modrm(); mem(size => 8); }
@@ -37,11 +52,47 @@ PSUBSW          MMX     00001111 11101001 !emit { modrm(); mem(size => 8); }
 PSUBUSB         MMX     00001111 11011000 !emit { modrm(); mem(size => 8); }
 PSUBUSW         MMX     00001111 11011001 !emit { modrm(); mem(size => 8); }
 
+SUBPS           SSE     00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
+SUBSS           SSE     00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
+
 PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
 PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
+PMULHUW         SSE     00001111 11100100 !emit { modrm(); mem(size => 8); }
+
+MULPS           SSE     00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
+MULSS           SSE     00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
 
 PMADDWD         MMX     00001111 11110101 !emit { modrm(); mem(size => 8); }
 
+DIVPS           SSE     00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
+DIVSS           SSE     00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
+
+RCPPS           SSE     00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
+RCPSS           SSE     00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
+
+SQRTPS          SSE     00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
+SQRTSS          SSE     00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
+
+RSQRTPS         SSE     00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
+RSQRTSS         SSE     00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
+
+PMINUB          SSE     00001111 11011010 !emit { modrm(); mem(size => 8); }
+PMINSW          SSE     00001111 11101010 !emit { modrm(); mem(size => 8); }
+
+MINPS           SSE     00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
+MINSS           SSE     00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
+
+PMAXUB          SSE     00001111 11011110 !emit { modrm(); mem(size => 8); }
+PMAXSW          SSE     00001111 11101110 !emit { modrm(); mem(size => 8); }
+
+MAXPS           SSE     00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
+MAXSS           SSE     00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
+
+PAVGB           SSE     00001111 11100000 !emit { modrm(); mem(size => 8); }
+PAVGW           SSE     00001111 11100011 !emit { modrm(); mem(size => 8); }
+
+PSADBW          SSE     00001111 11110110 !emit { modrm(); mem(size => 8); }
+
 # Comparison Instructions
 PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
 PCMPEQW         MMX     00001111 01110101 !emit { modrm(); mem(size => 8); }
@@ -50,11 +101,24 @@ PCMPGTB         MMX     00001111 01100100 !emit { modrm(); mem(size => 8); }
 PCMPGTW         MMX     00001111 01100101 !emit { modrm(); mem(size => 8); }
 PCMPGTD         MMX     00001111 01100110 !emit { modrm(); mem(size => 8); }
 
+CMPPS           SSE     00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+CMPSS           SSE     00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
+
+UCOMISS         SSE     00001111 00101110 !emit { modrm(); mem(size => 4); }
+COMISS          SSE     00001111 00101111 !emit { modrm(); mem(size => 4); }
+
 # Logical Instructions
 PAND            MMX     00001111 11011011 !emit { modrm(); mem(size => 8); }
+ANDPS           SSE     00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
+
 PANDN           MMX     00001111 11011111 !emit { modrm(); mem(size => 8); }
+ANDNPS          SSE     00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
+
 POR             MMX     00001111 11101011 !emit { modrm(); mem(size => 8); }
+ORPS            SSE     00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
+
 PXOR            MMX     00001111 11101111 !emit { modrm(); mem(size => 8); }
+XORPS           SSE     00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
 
 # Shift and Rotate Instructions
 PSLLW           MMX     00001111 11110001 !emit { modrm(); mem(size => 8); }
@@ -92,5 +156,41 @@ PUNPCKLBW       MMX     00001111 01100000 !emit { modrm(); mem(size => 4); }
 PUNPCKLWD       MMX     00001111 01100001 !emit { modrm(); mem(size => 4); }
 PUNPCKLDQ       MMX     00001111 01100010 !emit { modrm(); mem(size => 4); }
 
+UNPCKLPS        SSE     00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
+UNPCKHPS        SSE     00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
+
+PSHUFW          SSE     00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
+SHUFPS          SSE     00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
+PINSRW          SSE     00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
+PEXTRW_reg      SSE     00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+
+# Conversion Instructions
+CVTPI2PS        SSE     00001111 00101010 !emit { modrm(); mem(size => 8); }
+CVTSI2SS        SSE     00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
+CVTSI2SS_64     SSE     00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
+
+CVTPS2PI        SSE     00001111 00101101 !emit { modrm(); mem(size => 8); }
+CVTSS2SI        SSE     00001111 00101101 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTSS2SI_64     SSE     00001111 00101101 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+
+CVTTPS2PI       SSE     00001111 00101100 !emit { modrm(); mem(size => 8); }
+CVTTSS2SI       SSE     00001111 00101100 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTTSS2SI_64    SSE     00001111 00101100 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+
+# Cacheability Control, Prefetch, and Instruction Ordering Instructions
+MASKMOVQ        SSE     00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
+MOVNTPS         SSE     00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+MOVNTQ          SSE     00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+
+PREFETCHT0      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
+PREFETCHT1      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
+PREFETCHT2      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 1); }
+PREFETCHNTA     SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 0); mem(size => 1); }
+SFENCE          SSE     00001111 10101110 !emit { modrm(mod => MOD_DIRECT, reg => 7); }
+
 # State Management Instructions
 EMMS            MMX     00001111 01110111 !emit { }
+
+# LDMXCSR         SSE     00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 4); }
+STMXCSR         SSE     00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 4); }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (6 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions Jan Bobek
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add SSE2 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)

diff --git a/x86.risu b/x86.risu
index c29b210..9b63d6b 100644
--- a/x86.risu
+++ b/x86.risu
@@ -15,179 +15,332 @@
 # Data Transfer Instructions
 MOVD            MMX     00001111 011 d 1110 !emit { modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
 MOVD_mem        MMX     00001111 011 d 1110 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+MOVD            SSE2    00001111 011 d 1110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVD_mem        SSE2    00001111 011 d 1110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
 MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
 MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVQ            SSE2    00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVQ_mem        SSE2    00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVQ_mm         MMX     00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
+MOVQ_xmm1       SSE2    00001111 01111110 !emit { rep(); modrm(); mem(size => 8); }
+MOVQ_xmm2       SSE2    00001111 11010110 !emit { data16(); modrm(); mem(size => 8); }
 
 MOVAPS          SSE     00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
+MOVAPD          SSE2    00001111 0010100 d !emit { data16(); modrm(); mem(size => 16, align => 16); }
+MOVDQA          SSE2    00001111 011 d 1111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 MOVUPS          SSE     00001111 0001000 d !emit { modrm(); mem(size => 16); }
+MOVUPD          SSE2    00001111 0001000 d !emit { data16(); modrm(); mem(size => 16); }
+MOVDQU          SSE2    00001111 011 d 1111 !emit { rep(); modrm(); mem(size => 16); }
 MOVSS           SSE     00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
+MOVSD           SSE2    00001111 0001000 d !emit { repne(); modrm(); mem(size => 8); }
+
+MOVQ2DQ         SSE2    00001111 11010110 !emit { rep(); modrm(mod => MOD_DIRECT); }
+MOVDQ2Q         SSE2    00001111 11010110 !emit { repne(); modrm(mod => MOD_DIRECT); }
 
 MOVLPS          SSE     00001111 0001001 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVLPD          SSE2    00001111 0001001 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVHPS          SSE     00001111 0001011 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVHPD          SSE2    00001111 0001011 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVLHPS         SSE     00001111 00010110  !emit { modrm(mod => MOD_DIRECT); }
 MOVHLPS         SSE     00001111 00010010  !emit { modrm(mod => MOD_DIRECT); }
 
 PMOVMSKB        SSE     00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+PMOVMSKB        SSE2    00001111 11010111 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 MOVMSKPS        SSE     00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+MOVMKSPD        SSE2    00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 
 # Arithmetic Instructions
 PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
+PADDB           SSE2    00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
+PADDW           SSE2    00001111 11111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
+PADDD           SSE2    00001111 11111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }
+PADDQ           SSE2    00001111 11010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDSB          MMX     00001111 11101100 !emit { modrm(); mem(size => 8); }
+PADDSB          SSE2    00001111 11101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDSW          MMX     00001111 11101101 !emit { modrm(); mem(size => 8); }
+PADDSW          SSE2    00001111 11101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDUSB         MMX     00001111 11011100 !emit { modrm(); mem(size => 8); }
+PADDUSB         SSE2    00001111 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PADDUSW         MMX     00001111 11011101 !emit { modrm(); mem(size => 8); }
+PADDUSW         SSE2    00001111 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 ADDPS           SSE     00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
+ADDPD           SSE2    00001111 01011000 !emit { data16(); modrm(); mem(size => 16, align => 16) }
 ADDSS           SSE     00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
+ADDSD           SSE2    00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
 
 PSUBB           MMX     00001111 11111000 !emit { modrm(); mem(size => 8); }
+PSUBB           SSE2    00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBW           MMX     00001111 11111001 !emit { modrm(); mem(size => 8); }
+PSUBW           SSE2    00001111 11111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBD           MMX     00001111 11111010 !emit { modrm(); mem(size => 8); }
+PSUBD           SSE2    00001111 11111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSUBQ_64        SSE2    00001111 11111011 !emit { modrm(); mem(size => 8); }
+PSUBQ           SSE2    00001111 11111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBSB          MMX     00001111 11101000 !emit { modrm(); mem(size => 8); }
+PSUBSB          SSE2    00001111 11101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBSW          MMX     00001111 11101001 !emit { modrm(); mem(size => 8); }
+PSUBSW          SSE2    00001111 11101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBUSB         MMX     00001111 11011000 !emit { modrm(); mem(size => 8); }
+PSUBUSB         SSE2    00001111 11011000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBUSW         MMX     00001111 11011001 !emit { modrm(); mem(size => 8); }
+PSUBUSW         SSE2    00001111 11011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 SUBPS           SSE     00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
+SUBPD           SSE2    00001111 01011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 SUBSS           SSE     00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
+SUBSD           SSE2    00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
 
 PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
+PMULLW          SSE2    00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
+PMULHW          SSE2    00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMULHUW         SSE     00001111 11100100 !emit { modrm(); mem(size => 8); }
+PMULHUW         SSE2    00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULUDQ_64      SSE2    00001111 11110100 !emit { modrm(); mem(size => 8); }
+PMULUDQ         SSE2    00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 MULPS           SSE     00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
+MULPD           SSE2    00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 MULSS           SSE     00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
+MULSD           SSE2    00001111 01011001 !emit { repne(); modrm(); mem(size => 8); }
 
 PMADDWD         MMX     00001111 11110101 !emit { modrm(); mem(size => 8); }
+PMADDWD         SSE2    00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 DIVPS           SSE     00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
+DIVPD           SSE2    00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 DIVSS           SSE     00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
+DIVSD           SSE2    00001111 01011110 !emit { repne(); modrm(); mem(size => 8); }
 
 RCPPS           SSE     00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
 RCPSS           SSE     00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
 
 SQRTPS          SSE     00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
+SQRTPD          SSE2    00001111 01010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 SQRTSS          SSE     00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
+SQRTSD          SSE2    00001111 01010001 !emit { repne(); modrm(); mem(size => 8); }
 
 RSQRTPS         SSE     00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
 RSQRTSS         SSE     00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
 
 PMINUB          SSE     00001111 11011010 !emit { modrm(); mem(size => 8); }
+PMINUB          SSE2    00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMINSW          SSE     00001111 11101010 !emit { modrm(); mem(size => 8); }
+PMINSW          SSE2    00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 MINPS           SSE     00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
+MINPD           SSE2    00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 MINSS           SSE     00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
+MINSD           SSE2    00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
 
 PMAXUB          SSE     00001111 11011110 !emit { modrm(); mem(size => 8); }
+PMAXUB          SSE2    00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMAXSW          SSE     00001111 11101110 !emit { modrm(); mem(size => 8); }
+PMAXSW          SSE2    00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 MAXPS           SSE     00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
+MAXPD           SSE2    00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 MAXSS           SSE     00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
+MAXSD           SSE2    00001111 01011111 !emit { repne(); modrm(); mem(size => 8); }
 
 PAVGB           SSE     00001111 11100000 !emit { modrm(); mem(size => 8); }
+PAVGB           SSE2    00001111 11100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PAVGW           SSE     00001111 11100011 !emit { modrm(); mem(size => 8); }
+PAVGW           SSE2    00001111 11100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PSADBW          SSE     00001111 11110110 !emit { modrm(); mem(size => 8); }
+PSADBW          SSE2    00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 # Comparison Instructions
 PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
+PCMPEQB         SSE2    00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPEQW         MMX     00001111 01110101 !emit { modrm(); mem(size => 8); }
+PCMPEQW         SSE2    00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPEQD         MMX     00001111 01110110 !emit { modrm(); mem(size => 8); }
+PCMPEQD         SSE2    00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPGTB         MMX     00001111 01100100 !emit { modrm(); mem(size => 8); }
+PCMPGTB         SSE2    00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPGTW         MMX     00001111 01100101 !emit { modrm(); mem(size => 8); }
+PCMPGTW         SSE2    00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPGTD         MMX     00001111 01100110 !emit { modrm(); mem(size => 8); }
+PCMPGTD         SSE2    00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 CMPPS           SSE     00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+CMPPD           SSE2    00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 CMPSS           SSE     00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
+CMPSD           SSE2    00001111 11000010 !emit { repne(); modrm(); mem(size => 8); imm(size => 1); }
 
 UCOMISS         SSE     00001111 00101110 !emit { modrm(); mem(size => 4); }
+UCOMISD         SSE2    00001111 00101110 !emit { data16(); modrm(); mem(size => 8); }
+
 COMISS          SSE     00001111 00101111 !emit { modrm(); mem(size => 4); }
+COMISD          SSE2    00001111 00101111 !emit { data16(); modrm(); mem(size => 8); }
 
 # Logical Instructions
 PAND            MMX     00001111 11011011 !emit { modrm(); mem(size => 8); }
+PAND            SSE2    00001111 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 ANDPS           SSE     00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
+ANDPD           SSE2    00001111 01010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PANDN           MMX     00001111 11011111 !emit { modrm(); mem(size => 8); }
+PANDN           SSE2    00001111 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 ANDNPS          SSE     00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
+ANDNPD          SSE2    00001111 01010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 POR             MMX     00001111 11101011 !emit { modrm(); mem(size => 8); }
+POR             SSE2    00001111 11101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 ORPS            SSE     00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
+ORPD            SSE2    00001111 01010110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PXOR            MMX     00001111 11101111 !emit { modrm(); mem(size => 8); }
+PXOR            SSE2    00001111 11101111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 XORPS           SSE     00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
+XORPD           SSE2    00001111 01010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 # Shift and Rotate Instructions
 PSLLW           MMX     00001111 11110001 !emit { modrm(); mem(size => 8); }
+PSLLW           SSE2    00001111 11110001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSLLD           MMX     00001111 11110010 !emit { modrm(); mem(size => 8); }
+PSLLD           SSE2    00001111 11110010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSLLQ           MMX     00001111 11110011 !emit { modrm(); mem(size => 8); }
+PSLLQ           SSE2    00001111 11110011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSLLDQ          SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
 
 PSLLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLQ_imm       SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 
 PSRLW           MMX     00001111 11010001 !emit { modrm(); mem(size => 8); }
+PSRLW           SSE2    00001111 11010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSRLD           MMX     00001111 11010010 !emit { modrm(); mem(size => 8); }
+PSRLD           SSE2    00001111 11010010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSRLQ           MMX     00001111 11010011 !emit { modrm(); mem(size => 8); }
+PSRLQ           SSE2    00001111 11010011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSRLDQ          SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
 
 PSRLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLQ_imm       SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 
 PSRAW           MMX     00001111 11100001 !emit { modrm(); mem(size => 8); }
+PSRAW           SSE2    00001111 11100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSRAD           MMX     00001111 11100010 !emit { modrm(); mem(size => 8); }
+PSRAD           SSE2    00001111 11100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PSRAW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PSRAW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PSRAD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 
 # Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
 PACKSSWB        MMX     00001111 01100011 !emit { modrm(); mem(size => 8); }
+PACKSSWB        SSE2    00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PACKSSDW        MMX     00001111 01101011 !emit { modrm(); mem(size => 8); }
+PACKSSDW        SSE2    00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PACKUSWB        MMX     00001111 01100111 !emit { modrm(); mem(size => 8); }
+PACKUSWB        SSE2    00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PUNPCKHBW       MMX     00001111 01101000 !emit { modrm(); mem(size => 8); }
+PUNPCKHBW       SSE2    00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PUNPCKHWD       MMX     00001111 01101001 !emit { modrm(); mem(size => 8); }
+PUNPCKHWD       SSE2    00001111 01101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PUNPCKHDQ       MMX     00001111 01101010 !emit { modrm(); mem(size => 8); }
+PUNPCKHDQ       SSE2    00001111 01101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PUNPCKHQDQ      SSE2    00001111 01101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PUNPCKLBW       MMX     00001111 01100000 !emit { modrm(); mem(size => 4); }
+PUNPCKLBW       SSE2    00001111 01100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PUNPCKLWD       MMX     00001111 01100001 !emit { modrm(); mem(size => 4); }
+PUNPCKLWD       SSE2    00001111 01100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PUNPCKLDQ       MMX     00001111 01100010 !emit { modrm(); mem(size => 4); }
+PUNPCKLDQ       SSE2    00001111 01100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PUNPCKLQDQ      SSE2    00001111 01101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 UNPCKLPS        SSE     00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
+UNPCKLPD        SSE2    00001111 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 UNPCKHPS        SSE     00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
+UNPCKHPD        SSE2    00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PSHUFW          SSE     00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
+PSHUFLW         SSE2    00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+PSHUFHW         SSE2    00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+PSHUFD          SSE2    00001111 01110000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
 SHUFPS          SSE     00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+SHUFPD          SSE2    00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 
 PINSRW          SSE     00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
+PINSRW          SSE2    00001111 11000100 !emit { data16(); modrm(); mem(size => 2); imm(size => 1); }
+
 PEXTRW_reg      SSE     00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+PEXTRW_reg      SSE2    00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
 
 # Conversion Instructions
 CVTPI2PS        SSE     00001111 00101010 !emit { modrm(); mem(size => 8); }
 CVTSI2SS        SSE     00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
 CVTSI2SS_64     SSE     00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
+CVTPI2PD        SSE2    00001111 00101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTSI2SD        SSE2    00001111 00101010 !emit { repne(); modrm(); mem(size => 4); }
+CVTSI2SD_64     SSE2    00001111 00101010 !emit { repne(); rex(w => 1); modrm(); mem(size => 8); }
 
 CVTPS2PI        SSE     00001111 00101101 !emit { modrm(); mem(size => 8); }
 CVTSS2SI        SSE     00001111 00101101 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
 CVTSS2SI_64     SSE     00001111 00101101 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTPD2PI        SSE2    00001111 00101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTSD2SI        SSE2    00001111 00101101 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
+CVTSD2SI_64     SSE2    00001111 00101101 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
 
 CVTTPS2PI       SSE     00001111 00101100 !emit { modrm(); mem(size => 8); }
 CVTTSS2SI       SSE     00001111 00101100 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
 CVTTSS2SI_64    SSE     00001111 00101100 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTTPD2PI       SSE2    00001111 00101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTTSD2SI       SSE2    00001111 00101100 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
+CVTTSD2SI_64    SSE2    00001111 00101100 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
+
+CVTPD2DQ        SSE2    00001111 11100110 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+CVTTPD2DQ       SSE2    00001111 11100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTDQ2PD        SSE2    00001111 11100110 !emit { rep(); modrm(); mem(size => 8); }
+
+CVTPS2PD        SSE2    00001111 01011010 !emit { modrm(); mem(size => 8); }
+CVTPD2PS        SSE2    00001111 01011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTSS2SD        SSE2    00001111 01011010 !emit { rep(); modrm(); mem(size => 4); }
+CVTSD2SS        SSE2    00001111 01011010 !emit { repne(); modrm(); mem(size => 8); }
+
+CVTDQ2PS        SSE2    00001111 01011011 !emit { modrm(); mem(size => 16, align => 16); }
+CVTPS2DQ        SSE2    00001111 01011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTTPS2DQ       SSE2    00001111 01011011 !emit { rep(); modrm(); mem(size => 16, align => 16); }
 
 # Cacheability Control, Prefetch, and Instruction Ordering Instructions
 MASKMOVQ        SSE     00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
+MASKMOVDQU      SSE2    00001111 11110111 !emit { data16(); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
+
 MOVNTPS         SSE     00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+MOVNTPD         SSE2    00001111 00101011 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+
+MOVNTI          SSE2    00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+MOVNTI_64       SSE2    00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTQ          SSE     00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVNTDQ         SSE2    00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 
 PREFETCHT0      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
 PREFETCHT1      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
 PREFETCHT2      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 1); }
 PREFETCHNTA     SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 0); mem(size => 1); }
+CFLUSH          SSE2    00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 7); mem(size => 1); }
 SFENCE          SSE     00001111 10101110 !emit { modrm(mod => MOD_DIRECT, reg => 7); }
+LFENCE          SSE2    00001111 10101110 !emit { modrm(mod => 0b11, reg => 0b101); }
+MFENCE          SSE2    00001111 10101110 !emit { modrm(mod => 0b11, reg => 0b111); }
+PAUSE           SSE2    10010000          !emit { rep(); }
 
 # State Management Instructions
 EMMS            MMX     00001111 01110111 !emit { }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (7 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions Jan Bobek
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add SSE3 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/x86.risu b/x86.risu
index 9b63d6b..01181dd 100644
--- a/x86.risu
+++ b/x86.risu
@@ -49,6 +49,11 @@ PMOVMSKB        SSE2    00001111 11010111 !emit { data16(); modrm(mod => MOD_DIR
 MOVMSKPS        SSE     00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 MOVMKSPD        SSE2    00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 
+LDDQU           SSE3    00001111 11110000 !emit { repne(); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+MOVSHDUP        SSE3    00001111 00010110 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+MOVSLDUP        SSE3    00001111 00010010 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+MOVDDUP         SSE3    00001111 00010010 !emit { repne(); modrm(); mem(size => 8); }
+
 # Arithmetic Instructions
 PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
 PADDB           SSE2    00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -72,6 +77,9 @@ ADDPD           SSE2    00001111 01011000 !emit { data16(); modrm(); mem(size =>
 ADDSS           SSE     00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
 ADDSD           SSE2    00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
 
+HADDPS          SSE3    00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+HADDPD          SSE3    00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 PSUBB           MMX     00001111 11111000 !emit { modrm(); mem(size => 8); }
 PSUBB           SSE2    00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSUBW           MMX     00001111 11111001 !emit { modrm(); mem(size => 8); }
@@ -94,6 +102,12 @@ SUBPD           SSE2    00001111 01011100 !emit { data16(); modrm(); mem(size =>
 SUBSS           SSE     00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
 SUBSD           SSE2    00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
 
+HSUBPS          SSE3    00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+HSUBPD          SSE3    00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
+ADDSUBPS        SSE3    00001111 11010000 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+ADDSUBPD        SSE3    00001111 11010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
 PMULLW          SSE2    00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (8 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions Jan Bobek
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add SSSE3 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/x86.risu b/x86.risu
index 01181dd..35992d6 100644
--- a/x86.risu
+++ b/x86.risu
@@ -77,6 +77,13 @@ ADDPD           SSE2    00001111 01011000 !emit { data16(); modrm(); mem(size =>
 ADDSS           SSE     00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
 ADDSD           SSE2    00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
 
+PHADDW_64       SSSE3   00001111 00111000 00000001 !emit { modrm(); mem(size => 8); }
+PHADDW          SSSE3   00001111 00111000 00000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHADDD_64       SSSE3   00001111 00111000 00000010 !emit { modrm(); mem(size => 8); }
+PHADDD          SSSE3   00001111 00111000 00000010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHADDSW_64      SSSE3   00001111 00111000 00000011 !emit { modrm(); mem(size => 8); }
+PHADDSW         SSSE3   00001111 00111000 00000011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 HADDPS          SSE3    00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
 HADDPD          SSE3    00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
@@ -102,6 +109,13 @@ SUBPD           SSE2    00001111 01011100 !emit { data16(); modrm(); mem(size =>
 SUBSS           SSE     00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
 SUBSD           SSE2    00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
 
+PHSUBW_64       SSSE3   00001111 00111000 00000101 !emit { modrm(); mem(size => 8); }
+PHSUBW          SSSE3   00001111 00111000 00000101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHSUBD_64       SSSE3   00001111 00111000 00000110 !emit { modrm(); mem(size => 8); }
+PHSUBD          SSSE3   00001111 00111000 00000110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHSUBSW_64      SSSE3   00001111 00111000 00000111 !emit { modrm(); mem(size => 8); }
+PHSUBSW         SSSE3   00001111 00111000 00000111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 HSUBPS          SSE3    00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
 HSUBPD          SSE3    00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
@@ -117,6 +131,9 @@ PMULHUW         SSE2    00001111 11100100 !emit { data16(); modrm(); mem(size =>
 PMULUDQ_64      SSE2    00001111 11110100 !emit { modrm(); mem(size => 8); }
 PMULUDQ         SSE2    00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
+PMULHRSW_64     SSSE3   00001111 00111000 00001011 !emit { modrm(); mem(size => 8); }
+PMULHRSW        SSSE3   00001111 00111000 00001011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 MULPS           SSE     00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
 MULPD           SSE2    00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 MULSS           SSE     00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
@@ -124,6 +141,8 @@ MULSD           SSE2    00001111 01011001 !emit { repne(); modrm(); mem(size =>
 
 PMADDWD         MMX     00001111 11110101 !emit { modrm(); mem(size => 8); }
 PMADDWD         SSE2    00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMADDUBSW_64    SSSE3   00001111 00111000 00000100 !emit { modrm(); mem(size => 8); }
+PMADDUBSW       SSSE3   00001111 00111000 00000100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 DIVPS           SSE     00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
 DIVPD           SSE2    00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -169,6 +188,20 @@ PAVGW           SSE2    00001111 11100011 !emit { data16(); modrm(); mem(size =>
 PSADBW          SSE     00001111 11110110 !emit { modrm(); mem(size => 8); }
 PSADBW          SSE2    00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
+PABSB_64        SSSE3   00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
+PABSB           SSSE3   00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PABSW_64        SSSE3   00001111 00111000 00011101 !emit { modrm(); mem(size => 8); }
+PABSW           SSSE3   00001111 00111000 00011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PABSD_64        SSSE3   00001111 00111000 00011110 !emit { modrm(); mem(size => 8); }
+PABSD           SSSE3   00001111 00111000 00011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
+PSIGNB_64       SSSE3   00001111 00111000 00001000 !emit { modrm(); mem(size => 8); }
+PSIGNB          SSSE3   00001111 00111000 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSIGNW_64       SSSE3   00001111 00111000 00001001 !emit { modrm(); mem(size => 8); }
+PSIGNW          SSSE3   00001111 00111000 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSIGND_64       SSSE3   00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
+PSIGND          SSSE3   00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 # Comparison Instructions
 PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
 PCMPEQB         SSE2    00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -256,6 +289,9 @@ PSRAW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIR
 PSRAD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 
+PALIGNR_64      SSSE3   00001111 00111010 00001111 !emit { modrm(); mem(size => 8); imm(size => 1); }
+PALIGNR         SSSE3   00001111 00111010 00001111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
 # Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
 PACKSSWB        MMX     00001111 01100011 !emit { modrm(); mem(size => 8); }
 PACKSSWB        SSE2    00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -285,6 +321,8 @@ UNPCKLPD        SSE2    00001111 00010100 !emit { data16(); modrm(); mem(size =>
 UNPCKHPS        SSE     00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
 UNPCKHPD        SSE2    00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
+PSHUFB_64       SSSE3   00001111 00111000 00000000 !emit { modrm(); mem(size => 8); }
+PSHUFB          SSSE3   00001111 00111000 00000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PSHUFW          SSE     00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
 PSHUFLW         SSE2    00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 PSHUFHW         SSE2    00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (9 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions Jan Bobek
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add SSE4.1 and SSE4.2 instructions to the x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/x86.risu b/x86.risu
index 35992d6..a73e209 100644
--- a/x86.risu
+++ b/x86.risu
@@ -124,10 +124,12 @@ ADDSUBPD        SSE3    00001111 11010000 !emit { data16(); modrm(); mem(size =>
 
 PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
 PMULLW          SSE2    00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULLD          SSE4_1  00001111 00111000 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
 PMULHW          SSE2    00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMULHUW         SSE     00001111 11100100 !emit { modrm(); mem(size => 8); }
 PMULHUW         SSE2    00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULDQ          SSE4_1  00001111 00111000 00101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMULUDQ_64      SSE2    00001111 11110100 !emit { modrm(); mem(size => 8); }
 PMULUDQ         SSE2    00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
@@ -162,18 +164,28 @@ RSQRTSS         SSE     00001111 01010010 !emit { rep(); modrm(); mem(size => 4)
 
 PMINUB          SSE     00001111 11011010 !emit { modrm(); mem(size => 8); }
 PMINUB          SSE2    00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINUW          SSE4_1  00001111 00111000 00111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINUD          SSE4_1  00001111 00111000 00111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINSB          SSE4_1  00001111 00111000 00111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMINSW          SSE     00001111 11101010 !emit { modrm(); mem(size => 8); }
 PMINSW          SSE2    00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINSD          SSE4_1  00001111 00111000 00111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 MINPS           SSE     00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
 MINPD           SSE2    00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 MINSS           SSE     00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
 MINSD           SSE2    00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
 
+PHMINPOSUW      SSE4_1  00001111 00111000 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
 PMAXUB          SSE     00001111 11011110 !emit { modrm(); mem(size => 8); }
 PMAXUB          SSE2    00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXUW          SSE4_1  00001111 00111000 00111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXUD          SSE4_1  00001111 00111000 00111111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXSB          SSE4_1  00001111 00111000 00111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PMAXSW          SSE     00001111 11101110 !emit { modrm(); mem(size => 8); }
 PMAXSW          SSE2    00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXSD          SSE4_1  00001111 00111000 00111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 MAXPS           SSE     00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
 MAXPD           SSE2    00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -187,6 +199,7 @@ PAVGW           SSE2    00001111 11100011 !emit { data16(); modrm(); mem(size =>
 
 PSADBW          SSE     00001111 11110110 !emit { modrm(); mem(size => 8); }
 PSADBW          SSE2    00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+MPSADBW         SSE4_1  00001111 00111010 01000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 
 PABSB_64        SSSE3   00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
 PABSB           SSSE3   00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -202,6 +215,14 @@ PSIGNW          SSSE3   00001111 00111000 00001001 !emit { data16(); modrm(); me
 PSIGND_64       SSSE3   00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
 PSIGND          SSSE3   00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
+DPPS            SSE4_1  00001111 00111010 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+DPPD            SSE4_1  00001111 00111010 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
+ROUNDPS         SSE4_1  00001111 00111010 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+ROUNDPD         SSE4_1  00001111 00111010 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+ROUNDSS         SSE4_1  00001111 00111010 00001010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+ROUNDSD         SSE4_1  00001111 00111010 00001011 !emit { data16(); modrm(); mem(size => 8); imm(size => 1); }
+
 # Comparison Instructions
 PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
 PCMPEQB         SSE2    00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -209,12 +230,21 @@ PCMPEQW         MMX     00001111 01110101 !emit { modrm(); mem(size => 8); }
 PCMPEQW         SSE2    00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPEQD         MMX     00001111 01110110 !emit { modrm(); mem(size => 8); }
 PCMPEQD         SSE2    00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PCMPEQQ         SSE4_1  00001111 00111000 00101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPGTB         MMX     00001111 01100100 !emit { modrm(); mem(size => 8); }
 PCMPGTB         SSE2    00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPGTW         MMX     00001111 01100101 !emit { modrm(); mem(size => 8); }
 PCMPGTW         SSE2    00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PCMPGTD         MMX     00001111 01100110 !emit { modrm(); mem(size => 8); }
 PCMPGTD         SSE2    00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PCMPGTQ         SSE4_2  00001111 00111000 00110111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
+PCMPESTRM       SSE4_2  00001111 00111010 01100000 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+PCMPESTRI       SSE4_2  00001111 00111010 01100001 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+PCMPISTRM       SSE4_2  00001111 00111010 01100010 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+PCMPISTRI       SSE4_2  00001111 00111010 01100011 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+
+PTEST           SSE4_1  00001111 00111000 00010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 CMPPS           SSE     00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
 CMPPD           SSE2    00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
@@ -299,6 +329,7 @@ PACKSSDW        MMX     00001111 01101011 !emit { modrm(); mem(size => 8); }
 PACKSSDW        SSE2    00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 PACKUSWB        MMX     00001111 01100111 !emit { modrm(); mem(size => 8); }
 PACKUSWB        SSE2    00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PACKUSDW        SSE4_1  00001111 00111000 00101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 
 PUNPCKHBW       MMX     00001111 01101000 !emit { modrm(); mem(size => 8); }
 PUNPCKHBW       SSE2    00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -331,13 +362,50 @@ PSHUFD          SSE2    00001111 01110000 !emit { data16(); modrm(); mem(size =>
 SHUFPS          SSE     00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
 SHUFPD          SSE2    00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 
+BLENDPS         SSE4_1  00001111 00111010 00001100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+BLENDPD         SSE4_1  00001111 00111010 00001101 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+BLENDVPS        SSE4_1  00001111 00111000 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+BLENDVPD        SSE4_1  00001111 00111000 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PBLENDVB        SSE4_1  00001111 00111000 00010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PBLENDW         SSE4_1  00001111 00111010 00001110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
+INSERTPS        SSE4_1  00001111 00111010 00100001 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+PINSRB          SSE4_1  00001111 00111010 00100000 !emit { data16(); modrm(); mem(size => 1); imm(size => 1); }
 PINSRW          SSE     00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
 PINSRW          SSE2    00001111 11000100 !emit { data16(); modrm(); mem(size => 2); imm(size => 1); }
+PINSRD          SSE4_1  00001111 00111010 00100010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+PINSRQ          SSE4_1  00001111 00111010 00100010 !emit { data16(); rex(w => 1); modrm(); mem(size => 8); imm(size => 1); }
+
+EXTRACTPS       SSE4_1  00001111 00111010 00010111 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+EXTRACTPS_mem   SSE4_1  00001111 00111010 00010111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+
+PEXTRB          SSE4_1  00001111 00111010 00010100 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRB_mem      SSE4_1  00001111 00111010 00010100 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 1); imm(size => 1); }
+PEXTRW          SSE4_1  00001111 00111010 00010101 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRW_mem      SSE4_1  00001111 00111010 00010101 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 2); imm(size => 1); }
+PEXTRD          SSE4_1  00001111 00111010 00010110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRD_mem      SSE4_1  00001111 00111010 00010110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+PEXTRQ          SSE4_1  00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRQ_mem      SSE4_1  00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); imm(size => 1); }
 
 PEXTRW_reg      SSE     00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
 PEXTRW_reg      SSE2    00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
 
 # Conversion Instructions
+PMOVSXBW        SSE4_1  00001111 00111000 00100000 !emit { data16(); modrm(); mem(size => 8); }
+PMOVSXBD        SSE4_1  00001111 00111000 00100001 !emit { data16(); modrm(); mem(size => 4); }
+PMOVSXBQ        SSE4_1  00001111 00111000 00100010 !emit { data16(); modrm(); mem(size => 2); }
+PMOVSXWD        SSE4_1  00001111 00111000 00100011 !emit { data16(); modrm(); mem(size => 8); }
+PMOVSXWQ        SSE4_1  00001111 00111000 00100100 !emit { data16(); modrm(); mem(size => 4); }
+PMOVSXDQ        SSE4_1  00001111 00111000 00100101 !emit { data16(); modrm(); mem(size => 8); }
+
+PMOVZXBW        SSE4_1  00001111 00111000 00110000 !emit { data16(); modrm(); mem(size => 8); }
+PMOVZXBD        SSE4_1  00001111 00111000 00110001 !emit { data16(); modrm(); mem(size => 4); }
+PMOVZXBQ        SSE4_1  00001111 00111000 00110010 !emit { data16(); modrm(); mem(size => 2); }
+PMOVZXWD        SSE4_1  00001111 00111000 00110011 !emit { data16(); modrm(); mem(size => 8); }
+PMOVZXWQ        SSE4_1  00001111 00111000 00110100 !emit { data16(); modrm(); mem(size => 4); }
+PMOVZXDQ        SSE4_1  00001111 00111000 00110101 !emit { data16(); modrm(); mem(size => 8); }
+
 CVTPI2PS        SSE     00001111 00101010 !emit { modrm(); mem(size => 8); }
 CVTSI2SS        SSE     00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
 CVTSI2SS_64     SSE     00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
@@ -383,6 +451,7 @@ MOVNTI          SSE2    00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem
 MOVNTI_64       SSE2    00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTQ          SSE     00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTDQ         SSE2    00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+MOVNTDQA        SSE4_1  00001111 00111000 00101010 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 
 PREFETCHT0      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
 PREFETCHT1      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (10 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions Jan Bobek
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add AVX instructions to the x86 configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 288 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 288 insertions(+)

diff --git a/x86.risu b/x86.risu
index 17a5082..d3115ac 100644
--- a/x86.risu
+++ b/x86.risu
@@ -17,452 +17,736 @@ MOVD            MMX     00001111 011 d 1110 !emit { modrm(mod => MOD_DIRECT, rm
 MOVD_mem        MMX     00001111 011 d 1110 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
 MOVD            SSE2    00001111 011 d 1110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
 MOVD_mem        SSE2    00001111 011 d 1110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+VMOVD           AVX              011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+VMOVD_mem       AVX              011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
 MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
 MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVQ            SSE2    00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
 MOVQ_mem        SSE2    00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVQ           AVX              011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 1, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+VMOVQ_mem       AVX              011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 1, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVQ_mm         MMX     00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
 MOVQ_xmm1       SSE2    00001111 01111110 !emit { rep(); modrm(); mem(size => 8); }
+VMOVQ_xmm1      AVX              01111110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 MOVQ_xmm2       SSE2    00001111 11010110 !emit { data16(); modrm(); mem(size => 8); }
+VMOVQ_xmm2      AVX              11010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 MOVAPS          SSE     00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
+VMOVAPS         AVX              0010100 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
 MOVAPD          SSE2    00001111 0010100 d !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMOVAPD         AVX              0010100 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
 MOVDQA          SSE2    00001111 011 d 1111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMOVDQA         AVX              011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
 MOVUPS          SSE     00001111 0001000 d !emit { modrm(); mem(size => 16); }
+VMOVUPS         AVX              0001000 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 MOVUPD          SSE2    00001111 0001000 d !emit { data16(); modrm(); mem(size => 16); }
+VMOVUPD         AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 MOVDQU          SSE2    00001111 011 d 1111 !emit { rep(); modrm(); mem(size => 16); }
+VMOVDQU         AVX              011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 MOVSS           SSE     00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
+VMOVSS          AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
+VMOVSS_mem      AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
 MOVSD           SSE2    00001111 0001000 d !emit { repne(); modrm(); mem(size => 8); }
+VMOVSD          AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
+VMOVSD_mem      AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 
 MOVQ2DQ         SSE2    00001111 11010110 !emit { rep(); modrm(mod => MOD_DIRECT); }
 MOVDQ2Q         SSE2    00001111 11010110 !emit { repne(); modrm(mod => MOD_DIRECT); }
 
 MOVLPS          SSE     00001111 0001001 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPS_ld      AVX              00010010  !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPS_st      AVX              00010011  !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVLPD          SSE2    00001111 0001001 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPD_ld      AVX              00010010  !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPD_st      AVX              00010011  !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVHPS          SSE     00001111 0001011 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPS_ld      AVX              00010110  !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPS_st      AVX              00010111  !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVHPD          SSE2    00001111 0001011 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPD_ld      AVX              00010110  !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPD_st      AVX              00010111  !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVLHPS         SSE     00001111 00010110  !emit { modrm(mod => MOD_DIRECT); }
+VMOVLHPS        AVX              00010110  !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
 MOVHLPS         SSE     00001111 00010010  !emit { modrm(mod => MOD_DIRECT); }
+VMOVHLPS        AVX              00010010  !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
 
 PMOVMSKB        SSE     00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 PMOVMSKB        SSE2    00001111 11010111 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VPMOVMSKB       AVX              11010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 MOVMSKPS        SSE     00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPS       AVX              01010000 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 MOVMKSPD        SSE2    00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPD       AVX              01010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 
 LDDQU           SSE3    00001111 11110000 !emit { repne(); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VLDDQU          AVX              11110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
 MOVSHDUP        SSE3    00001111 00010110 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+VMOVSHDUP       AVX              00010110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 MOVSLDUP        SSE3    00001111 00010010 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+VMOVSLDUP       AVX              00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 MOVDDUP         SSE3    00001111 00010010 !emit { repne(); modrm(); mem(size => 8); }
+VMOVDDUP        AVX              00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 # Arithmetic Instructions
 PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
 PADDB           SSE2    00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDB          AVX              11111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
 PADDW           SSE2    00001111 11111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDW          AVX              11111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
 PADDD           SSE2    00001111 11111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDD          AVX              11111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }
 PADDQ           SSE2    00001111 11010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDQ          AVX              11010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDSB          MMX     00001111 11101100 !emit { modrm(); mem(size => 8); }
 PADDSB          SSE2    00001111 11101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDSB         AVX              11101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDSW          MMX     00001111 11101101 !emit { modrm(); mem(size => 8); }
 PADDSW          SSE2    00001111 11101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDSW         AVX              11101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDUSB         MMX     00001111 11011100 !emit { modrm(); mem(size => 8); }
 PADDUSB         SSE2    00001111 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDUSB        AVX              11011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PADDUSW         MMX     00001111 11011101 !emit { modrm(); mem(size => 8); }
 PADDUSW         SSE2    00001111 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDUSW        AVX              11011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 ADDPS           SSE     00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
+VADDPS          AVX              01011000 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 ADDPD           SSE2    00001111 01011000 !emit { data16(); modrm(); mem(size => 16, align => 16) }
+VADDPD          AVX              01011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 ADDSS           SSE     00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
+VADDSS          AVX              01011000 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 ADDSD           SSE2    00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
+VADDSD          AVX              01011000 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 PHADDW_64       SSSE3   00001111 00111000 00000001 !emit { modrm(); mem(size => 8); }
 PHADDW          SSSE3   00001111 00111000 00000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHADDW         AVX                       00000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PHADDD_64       SSSE3   00001111 00111000 00000010 !emit { modrm(); mem(size => 8); }
 PHADDD          SSSE3   00001111 00111000 00000010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHADDD         AVX                       00000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PHADDSW_64      SSSE3   00001111 00111000 00000011 !emit { modrm(); mem(size => 8); }
 PHADDSW         SSSE3   00001111 00111000 00000011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHADDSW        AVX                       00000011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 HADDPS          SSE3    00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VHADDPS         AVX              01111100 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
 HADDPD          SSE3    00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VHADDPD         AVX              01111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PSUBB           MMX     00001111 11111000 !emit { modrm(); mem(size => 8); }
 PSUBB           SSE2    00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBB          AVX              11111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBW           MMX     00001111 11111001 !emit { modrm(); mem(size => 8); }
 PSUBW           SSE2    00001111 11111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBW          AVX              11111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBD           MMX     00001111 11111010 !emit { modrm(); mem(size => 8); }
 PSUBD           SSE2    00001111 11111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBD          AVX              11111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBQ_64        SSE2    00001111 11111011 !emit { modrm(); mem(size => 8); }
 PSUBQ           SSE2    00001111 11111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBQ          AVX              11111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBSB          MMX     00001111 11101000 !emit { modrm(); mem(size => 8); }
 PSUBSB          SSE2    00001111 11101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBSB         AVX              11101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBSW          MMX     00001111 11101001 !emit { modrm(); mem(size => 8); }
 PSUBSW          SSE2    00001111 11101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBSW         AVX              11101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBUSB         MMX     00001111 11011000 !emit { modrm(); mem(size => 8); }
 PSUBUSB         SSE2    00001111 11011000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBUSB        AVX              11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSUBUSW         MMX     00001111 11011001 !emit { modrm(); mem(size => 8); }
 PSUBUSW         SSE2    00001111 11011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBUSW        AVX              11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 SUBPS           SSE     00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
+VSUBPS          AVX              01011100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 SUBPD           SSE2    00001111 01011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VSUBPD          AVX              01011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 SUBSS           SSE     00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
+VSUBSS          AVX              01011100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 SUBSD           SSE2    00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
+VSUBSD          AVX              01011100 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 PHSUBW_64       SSSE3   00001111 00111000 00000101 !emit { modrm(); mem(size => 8); }
 PHSUBW          SSSE3   00001111 00111000 00000101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHSUBW         AVX                       00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PHSUBD_64       SSSE3   00001111 00111000 00000110 !emit { modrm(); mem(size => 8); }
 PHSUBD          SSSE3   00001111 00111000 00000110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHSUBD         AVX                       00000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PHSUBSW_64      SSSE3   00001111 00111000 00000111 !emit { modrm(); mem(size => 8); }
 PHSUBSW         SSSE3   00001111 00111000 00000111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHSUBSW        AVX                       00000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 HSUBPS          SSE3    00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VHSUBPS         AVX              01111101 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
 HSUBPD          SSE3    00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VHSUBPD         AVX              01111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 ADDSUBPS        SSE3    00001111 11010000 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VADDSUBPS       AVX              11010000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
 ADDSUBPD        SSE3    00001111 11010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VADDSUBPD       AVX              11010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
 PMULLW          SSE2    00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULLW         AVX              11010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMULLD          SSE4_1  00001111 00111000 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULLD         AVX                       01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
 PMULHW          SSE2    00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULHW         AVX              11100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMULHUW         SSE     00001111 11100100 !emit { modrm(); mem(size => 8); }
 PMULHUW         SSE2    00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULHUW        AVX              11100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMULDQ          SSE4_1  00001111 00111000 00101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULDQ         AVX                       00101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMULUDQ_64      SSE2    00001111 11110100 !emit { modrm(); mem(size => 8); }
 PMULUDQ         SSE2    00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULUDQ        AVX              11110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PMULHRSW_64     SSSE3   00001111 00111000 00001011 !emit { modrm(); mem(size => 8); }
 PMULHRSW        SSSE3   00001111 00111000 00001011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULHRSW       AVX                       00001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 MULPS           SSE     00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
+VMULPS          AVX              01011001 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 MULPD           SSE2    00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMULPD          AVX              01011001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 MULSS           SSE     00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
+VMULSS          AVX              01011001 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 MULSD           SSE2    00001111 01011001 !emit { repne(); modrm(); mem(size => 8); }
+VMULSD          AVX              01011001 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 PMADDWD         MMX     00001111 11110101 !emit { modrm(); mem(size => 8); }
 PMADDWD         SSE2    00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMADDWD        AVX              11110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMADDUBSW_64    SSSE3   00001111 00111000 00000100 !emit { modrm(); mem(size => 8); }
 PMADDUBSW       SSSE3   00001111 00111000 00000100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMADDUBSW      AVX                       00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 DIVPS           SSE     00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
+VDIVPS          AVX              01011110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 DIVPD           SSE2    00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VDIVPD          AVX              01011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 DIVSS           SSE     00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
+VDIVSS          AVX              01011110 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 DIVSD           SSE2    00001111 01011110 !emit { repne(); modrm(); mem(size => 8); }
+VDIVSD          AVX              01011110 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 RCPPS           SSE     00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
+VRCPPS          AVX              01010011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 RCPSS           SSE     00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
+VRCPSS          AVX              01010011 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 
 SQRTPS          SSE     00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
+VSQRTPS         AVX              01010001 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 SQRTPD          SSE2    00001111 01010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VSQRTPD         AVX              01010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 SQRTSS          SSE     00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
+VSQRTSS         AVX              01010001 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 SQRTSD          SSE2    00001111 01010001 !emit { repne(); modrm(); mem(size => 8); }
+VSQRTSD         AVX              01010001 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 RSQRTPS         SSE     00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
+VRSQRTPS        AVX              01010010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 RSQRTSS         SSE     00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
+VRSQRTSS        AVX              01010010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 
 PMINUB          SSE     00001111 11011010 !emit { modrm(); mem(size => 8); }
 PMINUB          SSE2    00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINUB         AVX              11011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMINUW          SSE4_1  00001111 00111000 00111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINUW         AVX                       00111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMINUD          SSE4_1  00001111 00111000 00111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINUD         AVX                       00111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMINSB          SSE4_1  00001111 00111000 00111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINSB         AVX                       00111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMINSW          SSE     00001111 11101010 !emit { modrm(); mem(size => 8); }
 PMINSW          SSE2    00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINSW         AVX              11101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMINSD          SSE4_1  00001111 00111000 00111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINSD         AVX                       00111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 MINPS           SSE     00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
+VMINPS          AVX              01011101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 MINPD           SSE2    00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMINPD          AVX              01011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 MINSS           SSE     00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
+VMINSS          AVX              01011101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 MINSD           SSE2    00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
+VMINSD          AVX              01011101 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 PHMINPOSUW      SSE4_1  00001111 00111000 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHMINPOSUW     AVX                       01000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 PMAXUB          SSE     00001111 11011110 !emit { modrm(); mem(size => 8); }
 PMAXUB          SSE2    00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXUB         AVX              11011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMAXUW          SSE4_1  00001111 00111000 00111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXUW         AVX                       00111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMAXUD          SSE4_1  00001111 00111000 00111111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXUD         AVX                       00111111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMAXSB          SSE4_1  00001111 00111000 00111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXSB         AVX                       00111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PMAXSW          SSE     00001111 11101110 !emit { modrm(); mem(size => 8); }
 PMAXSW          SSE2    00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXSW         AVX              11101110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PMAXSD          SSE4_1  00001111 00111000 00111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXSD         AVX                       00111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 MAXPS           SSE     00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
+VMAXPS          AVX              01011111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 MAXPD           SSE2    00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMAXPD          AVX              01011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 MAXSS           SSE     00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
+VMAXSS          AVX              01011111 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 MAXSD           SSE2    00001111 01011111 !emit { repne(); modrm(); mem(size => 8); }
+VMAXSD          AVX              01011111 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 PAVGB           SSE     00001111 11100000 !emit { modrm(); mem(size => 8); }
 PAVGB           SSE2    00001111 11100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPAVGB          AVX              11100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PAVGW           SSE     00001111 11100011 !emit { modrm(); mem(size => 8); }
 PAVGW           SSE2    00001111 11100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPAVGW          AVX              11100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PSADBW          SSE     00001111 11110110 !emit { modrm(); mem(size => 8); }
 PSADBW          SSE2    00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSADBW         AVX              11110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 MPSADBW         SSE4_1  00001111 00111010 01000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VMPSADBW        AVX                       01000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 
 PABSB_64        SSSE3   00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
 PABSB           SSSE3   00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPABSB          AVX                       00011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 PABSW_64        SSSE3   00001111 00111000 00011101 !emit { modrm(); mem(size => 8); }
 PABSW           SSSE3   00001111 00111000 00011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPABSW          AVX                       00011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 PABSD_64        SSSE3   00001111 00111000 00011110 !emit { modrm(); mem(size => 8); }
 PABSD           SSSE3   00001111 00111000 00011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPABSD          AVX                       00011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 PSIGNB_64       SSSE3   00001111 00111000 00001000 !emit { modrm(); mem(size => 8); }
 PSIGNB          SSSE3   00001111 00111000 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSIGNB         AVX                       00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PSIGNW_64       SSSE3   00001111 00111000 00001001 !emit { modrm(); mem(size => 8); }
 PSIGNW          SSSE3   00001111 00111000 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSIGNW         AVX                       00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PSIGND_64       SSSE3   00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
 PSIGND          SSSE3   00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSIGND         AVX                       00001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 DPPS            SSE4_1  00001111 00111010 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VDPPS           AVX                       01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 DPPD            SSE4_1  00001111 00111010 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VDPPD           AVX                       01000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 
 ROUNDPS         SSE4_1  00001111 00111010 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VROUNDPS        AVX                       00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 ROUNDPD         SSE4_1  00001111 00111010 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VROUNDPD        AVX                       00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 ROUNDSS         SSE4_1  00001111 00111010 00001010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+VROUNDSS        AVX                       00001010 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
 ROUNDSD         SSE4_1  00001111 00111010 00001011 !emit { data16(); modrm(); mem(size => 8); imm(size => 1); }
+VROUNDSD        AVX                       00001011 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 8); imm(size => 1); }
 
 # AES Instructions
 AESDEC          AES     00001111 00111000 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESDEC         AES_AVX                   11011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 AESDECLAST      AES     00001111 00111000 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESDECLAST     AES_AVX                   11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 AESENC          AES     00001111 00111000 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESENC         AES_AVX                   11011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 AESENCLAST      AES     00001111 00111000 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESENCLAST     AES_AVX                   11011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 AESIMC          AES     00001111 00111000 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESIMC         AES_AVX                   11011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 AESKEYGENASSIST AES     00001111 00111010 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VAESKEYGENASSIST AES_AVX                  11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 
 # PCLMULQDQ Instructions
 PCLMULQDQ       PCLMULQDQ      00001111 00111010 01000100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPCLMULQDQ      PCLMULQDQ_AVX                    01000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 
 # Comparison Instructions
 PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
 PCMPEQB         SSE2    00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQB        AVX              01110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PCMPEQW         MMX     00001111 01110101 !emit { modrm(); mem(size => 8); }
 PCMPEQW         SSE2    00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQW        AVX              01110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PCMPEQD         MMX     00001111 01110110 !emit { modrm(); mem(size => 8); }
 PCMPEQD         SSE2    00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQD        AVX              01110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PCMPEQQ         SSE4_1  00001111 00111000 00101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQQ        AVX                       00101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PCMPGTB         MMX     00001111 01100100 !emit { modrm(); mem(size => 8); }
 PCMPGTB         SSE2    00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTB        AVX              01100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PCMPGTW         MMX     00001111 01100101 !emit { modrm(); mem(size => 8); }
 PCMPGTW         SSE2    00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTW        AVX              01100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PCMPGTD         MMX     00001111 01100110 !emit { modrm(); mem(size => 8); }
 PCMPGTD         SSE2    00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTD        AVX              01100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PCMPGTQ         SSE4_2  00001111 00111000 00110111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTQ        AVX                       00110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 PCMPESTRM       SSE4_2  00001111 00111010 01100000 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPESTRM      AVX                       01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 PCMPESTRI       SSE4_2  00001111 00111010 01100001 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPESTRI      AVX                       01100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 PCMPISTRM       SSE4_2  00001111 00111010 01100010 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPISTRM      AVX                       01100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 PCMPISTRI       SSE4_2  00001111 00111010 01100011 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPISTRI      AVX                       01100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 
 PTEST           SSE4_1  00001111 00111000 00010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPTEST          AVX                       00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+
+VTESTPS         AVX     00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VTESTPD         AVX     00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 CMPPS           SSE     00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VCMPPS          AVX              11000010 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
 CMPPD           SSE2    00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VCMPPD          AVX              11000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
 CMPSS           SSE     00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
+VCMPSS          AVX              11000010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); imm(size => 1); }
 CMPSD           SSE2    00001111 11000010 !emit { repne(); modrm(); mem(size => 8); imm(size => 1); }
+VCMPSD          AVX              11000010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); imm(size => 1); }
 
 UCOMISS         SSE     00001111 00101110 !emit { modrm(); mem(size => 4); }
+VUCOMISS        AVX              00101110 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 UCOMISD         SSE2    00001111 00101110 !emit { data16(); modrm(); mem(size => 8); }
+VUCOMISD        AVX              00101110 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 COMISS          SSE     00001111 00101111 !emit { modrm(); mem(size => 4); }
+VCOMISS         AVX              00101111 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 COMISD          SSE2    00001111 00101111 !emit { data16(); modrm(); mem(size => 8); }
+VCOMISD         AVX              00101111 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 # Logical Instructions
 PAND            MMX     00001111 11011011 !emit { modrm(); mem(size => 8); }
 PAND            SSE2    00001111 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPAND           AVX              11011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 ANDPS           SSE     00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
+VANDPS          AVX              01010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 ANDPD           SSE2    00001111 01010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VANDPD          AVX              01010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PANDN           MMX     00001111 11011111 !emit { modrm(); mem(size => 8); }
 PANDN           SSE2    00001111 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPANDN          AVX              11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 ANDNPS          SSE     00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
+VANDNPS         AVX              01010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 ANDNPD          SSE2    00001111 01010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VANDNPD         AVX              01010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 POR             MMX     00001111 11101011 !emit { modrm(); mem(size => 8); }
 POR             SSE2    00001111 11101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPOR            AVX              11101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 ORPS            SSE     00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
+VORPS           AVX              01010110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 ORPD            SSE2    00001111 01010110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VORPD           AVX              01010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PXOR            MMX     00001111 11101111 !emit { modrm(); mem(size => 8); }
 PXOR            SSE2    00001111 11101111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPXOR           AVX              11101111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 XORPS           SSE     00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
+VXORPS          AVX              01010111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 XORPD           SSE2    00001111 01010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VXORPD          AVX              01010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 # Shift and Rotate Instructions
 PSLLW           MMX     00001111 11110001 !emit { modrm(); mem(size => 8); }
 PSLLW           SSE2    00001111 11110001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSLLW          AVX              11110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSLLD           MMX     00001111 11110010 !emit { modrm(); mem(size => 8); }
 PSLLD           SSE2    00001111 11110010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSLLD          AVX              11110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSLLQ           MMX     00001111 11110011 !emit { modrm(); mem(size => 8); }
 PSLLQ           SSE2    00001111 11110011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSLLQ          AVX              11110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSLLDQ          SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
+VPSLLDQ         AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
 
 PSLLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLW_imm      AVX              01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLD_imm      AVX              01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLQ_imm       SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLQ_imm      AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 
 PSRLW           MMX     00001111 11010001 !emit { modrm(); mem(size => 8); }
 PSRLW           SSE2    00001111 11010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRLW          AVX              11010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRLD           MMX     00001111 11010010 !emit { modrm(); mem(size => 8); }
 PSRLD           SSE2    00001111 11010010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRLD          AVX              11010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRLQ           MMX     00001111 11010011 !emit { modrm(); mem(size => 8); }
 PSRLQ           SSE2    00001111 11010011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRLQ          AVX              11010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRLDQ          SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
+VPSRLDQ         AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
 
 PSRLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLW_imm      AVX              01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLD_imm      AVX              01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLQ_imm       SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLQ_imm      AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 
 PSRAW           MMX     00001111 11100001 !emit { modrm(); mem(size => 8); }
 PSRAW           SSE2    00001111 11100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRAW          AVX              11100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRAD           MMX     00001111 11100010 !emit { modrm(); mem(size => 8); }
 PSRAD           SSE2    00001111 11100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRAD          AVX              11100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PSRAW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAW_imm      AVX              01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAD_imm      AVX              01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 
 PALIGNR_64      SSSE3   00001111 00111010 00001111 !emit { modrm(); mem(size => 8); imm(size => 1); }
 PALIGNR         SSSE3   00001111 00111010 00001111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPALIGNR        AVX                       00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 
 # Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
 PACKSSWB        MMX     00001111 01100011 !emit { modrm(); mem(size => 8); }
 PACKSSWB        SSE2    00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKSSWB       AVX              01100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PACKSSDW        MMX     00001111 01101011 !emit { modrm(); mem(size => 8); }
 PACKSSDW        SSE2    00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKSSDW       AVX              01101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PACKUSWB        MMX     00001111 01100111 !emit { modrm(); mem(size => 8); }
 PACKUSWB        SSE2    00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKUSWB       AVX              01100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PACKUSDW        SSE4_1  00001111 00111000 00101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKUSDW       AVX                       00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 
 PUNPCKHBW       MMX     00001111 01101000 !emit { modrm(); mem(size => 8); }
 PUNPCKHBW       SSE2    00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHBW      AVX              01101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PUNPCKHWD       MMX     00001111 01101001 !emit { modrm(); mem(size => 8); }
 PUNPCKHWD       SSE2    00001111 01101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHWD      AVX              01101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PUNPCKHDQ       MMX     00001111 01101010 !emit { modrm(); mem(size => 8); }
 PUNPCKHDQ       SSE2    00001111 01101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHDQ      AVX              01101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PUNPCKHQDQ      SSE2    00001111 01101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHQDQ     AVX              01101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PUNPCKLBW       MMX     00001111 01100000 !emit { modrm(); mem(size => 4); }
 PUNPCKLBW       SSE2    00001111 01100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLBW      AVX              01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PUNPCKLWD       MMX     00001111 01100001 !emit { modrm(); mem(size => 4); }
 PUNPCKLWD       SSE2    00001111 01100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLWD      AVX              01100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PUNPCKLDQ       MMX     00001111 01100010 !emit { modrm(); mem(size => 4); }
 PUNPCKLDQ       SSE2    00001111 01100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLDQ      AVX              01100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PUNPCKLQDQ      SSE2    00001111 01101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLQDQ     AVX              01101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 UNPCKLPS        SSE     00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
+VUNPCKLPS       AVX              00010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 UNPCKLPD        SSE2    00001111 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VUNPCKLPD       AVX              00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 UNPCKHPS        SSE     00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
+VUNPCKHPS       AVX              00010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
 UNPCKHPD        SSE2    00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VUNPCKHPD       AVX              00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PSHUFB_64       SSSE3   00001111 00111000 00000000 !emit { modrm(); mem(size => 8); }
 PSHUFB          SSSE3   00001111 00111000 00000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSHUFB         AVX                       00000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
 PSHUFW          SSE     00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
 PSHUFLW         SSE2    00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPSHUFLW        AVX              01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 PSHUFHW         SSE2    00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPSHUFHW        AVX              01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 PSHUFD          SSE2    00001111 01110000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPSHUFD         AVX              01110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 
 SHUFPS          SSE     00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VSHUFPS         AVX              11000110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
 SHUFPD          SSE2    00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VSHUFPD         AVX              11000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
 
 BLENDPS         SSE4_1  00001111 00111010 00001100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VBLENDPS        AVX                       00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 BLENDPD         SSE4_1  00001111 00111010 00001101 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VBLENDPD        AVX                       00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 BLENDVPS        SSE4_1  00001111 00111000 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VBLENDVPS       AVX                       01001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
 BLENDVPD        SSE4_1  00001111 00111000 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VBLENDVPD       AVX                       01001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
 PBLENDVB        SSE4_1  00001111 00111000 00010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPBLENDVB       AVX                       01001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
 PBLENDW         SSE4_1  00001111 00111010 00001110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPBLENDW        AVX                       00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 
 INSERTPS        SSE4_1  00001111 00111010 00100001 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+VINSERTPS       AVX                       00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
 PINSRB          SSE4_1  00001111 00111010 00100000 !emit { data16(); modrm(); mem(size => 1); imm(size => 1); }
+VPINSRB         AVX                       00100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 1); imm(size => 1); }
 PINSRW          SSE     00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
 PINSRW          SSE2    00001111 11000100 !emit { data16(); modrm(); mem(size => 2); imm(size => 1); }
+VPINSRW         AVX              11000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, w => 0); modrm(); mem(size => 2); imm(size => 1); }
 PINSRD          SSE4_1  00001111 00111010 00100010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+VPINSRD         AVX                       00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 4); imm(size => 1); }
 PINSRQ          SSE4_1  00001111 00111010 00100010 !emit { data16(); rex(w => 1); modrm(); mem(size => 8); imm(size => 1); }
+VPINSRQ         AVX                       00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1); modrm(); mem(size => 8); imm(size => 1); }
 
 EXTRACTPS       SSE4_1  00001111 00111010 00010111 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
 EXTRACTPS_mem   SSE4_1  00001111 00111010 00010111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+VEXTRACTPS      AVX                       00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VEXTRACTPS_mem  AVX                       00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
 
 PEXTRB          SSE4_1  00001111 00111010 00010100 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
 PEXTRB_mem      SSE4_1  00001111 00111010 00010100 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 1); imm(size => 1); }
+VPEXTRB         AVX                       00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRB_mem     AVX                       00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 1); imm(size => 1); }
 PEXTRW          SSE4_1  00001111 00111010 00010101 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
 PEXTRW_mem      SSE4_1  00001111 00111010 00010101 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 2); imm(size => 1); }
+VPEXTRW         AVX                       00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRW_mem     AVX                       00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 2); imm(size => 1); }
 PEXTRD          SSE4_1  00001111 00111010 00010110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
 PEXTRD_mem      SSE4_1  00001111 00111010 00010110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+VPEXTRD         AVX                       00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRD_mem     AVX                       00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
 PEXTRQ          SSE4_1  00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
 PEXTRQ_mem      SSE4_1  00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); imm(size => 1); }
+VPEXTRQ         AVX                       00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRQ_mem     AVX                       00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); imm(size => 1); }
 
 PEXTRW_reg      SSE     00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
 PEXTRW_reg      SSE2    00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+VPEXTRW_reg     AVX              11000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+
+VPERMILPS       AVX     00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPS_imm   AVX     00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPERMILPD       AVX     00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPD_imm   AVX     00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
 
 # Conversion Instructions
 PMOVSXBW        SSE4_1  00001111 00111000 00100000 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVSXBW       AVX                       00100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVSXBD        SSE4_1  00001111 00111000 00100001 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVSXBD       AVX                       00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 PMOVSXBQ        SSE4_1  00001111 00111000 00100010 !emit { data16(); modrm(); mem(size => 2); }
+VPMOVSXBQ       AVX                       00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
 PMOVSXWD        SSE4_1  00001111 00111000 00100011 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVSXWD       AVX                       00100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVSXWQ        SSE4_1  00001111 00111000 00100100 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVSXWQ       AVX                       00100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 PMOVSXDQ        SSE4_1  00001111 00111000 00100101 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVSXDQ       AVX                       00100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 PMOVZXBW        SSE4_1  00001111 00111000 00110000 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVZXBW       AVX                       00110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVZXBD        SSE4_1  00001111 00111000 00110001 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVZXBD       AVX                       00110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 PMOVZXBQ        SSE4_1  00001111 00111000 00110010 !emit { data16(); modrm(); mem(size => 2); }
+VPMOVZXBQ       AVX                       00110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
 PMOVZXWD        SSE4_1  00001111 00111000 00110011 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVZXWD       AVX                       00110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVZXWQ        SSE4_1  00001111 00111000 00110100 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVZXWQ       AVX                       00110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 PMOVZXDQ        SSE4_1  00001111 00111000 00110101 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVZXDQ       AVX                       00110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 CVTPI2PS        SSE     00001111 00101010 !emit { modrm(); mem(size => 8); }
 CVTSI2SS        SSE     00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
 CVTSI2SS_64     SSE     00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
+VCVTSI2SS       AVX              00101010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 0); modrm(); mem(size => 4); }
+VCVTSI2SS_64    AVX              00101010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 1); modrm(); mem(size => 8); }
 CVTPI2PD        SSE2    00001111 00101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 CVTSI2SD        SSE2    00001111 00101010 !emit { repne(); modrm(); mem(size => 4); }
 CVTSI2SD_64     SSE2    00001111 00101010 !emit { repne(); rex(w => 1); modrm(); mem(size => 8); }
+VCVTSI2SD       AVX              00101010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 0); modrm(); mem(size => 4); }
+VCVTSI2SD_64    AVX              00101010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 1); modrm(); mem(size => 8); }
 
 CVTPS2PI        SSE     00001111 00101101 !emit { modrm(); mem(size => 8); }
 CVTSS2SI        SSE     00001111 00101101 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
 CVTSS2SI_64     SSE     00001111 00101101 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTSS2SI       AVX              00101101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTSS2SI_64    AVX              00101101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
 CVTPD2PI        SSE2    00001111 00101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 CVTSD2SI        SSE2    00001111 00101101 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
 CVTSD2SI_64     SSE2    00001111 00101101 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTSD2SI       AVX              00101101 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTSD2SI_64    AVX              00101101 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
 
 CVTTPS2PI       SSE     00001111 00101100 !emit { modrm(); mem(size => 8); }
 CVTTSS2SI       SSE     00001111 00101100 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
 CVTTSS2SI_64    SSE     00001111 00101100 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTTSS2SI      AVX              00101100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTTSS2SI_64   AVX              00101100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
 CVTTPD2PI       SSE2    00001111 00101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 CVTTSD2SI       SSE2    00001111 00101100 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
 CVTTSD2SI_64    SSE2    00001111 00101100 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTTSD2SI      AVX              00101100 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTTSD2SI_64   AVX              00101100 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
 
 CVTPD2DQ        SSE2    00001111 11100110 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VCVTPD2DQ       AVX              11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 CVTTPD2DQ       SSE2    00001111 11100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VCVTTPD2DQ      AVX              11100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 CVTDQ2PD        SSE2    00001111 11100110 !emit { rep(); modrm(); mem(size => 8); }
+VCVTDQ2PD       AVX              11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 
 CVTPS2PD        SSE2    00001111 01011010 !emit { modrm(); mem(size => 8); }
+VCVTPS2PD       AVX              01011010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 CVTPD2PS        SSE2    00001111 01011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VCVTPD2PS       AVX              01011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 CVTSS2SD        SSE2    00001111 01011010 !emit { rep(); modrm(); mem(size => 4); }
+VCVTSS2SD       AVX              01011010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 CVTSD2SS        SSE2    00001111 01011010 !emit { repne(); modrm(); mem(size => 8); }
+VCVTSD2SS       AVX              01011010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
 
 CVTDQ2PS        SSE2    00001111 01011011 !emit { modrm(); mem(size => 16, align => 16); }
+VCVTDQ2PS       AVX              01011011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 CVTPS2DQ        SSE2    00001111 01011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VCVTPS2DQ       AVX              01011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 CVTTPS2DQ       SSE2    00001111 01011011 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+VCVTTPS2DQ      AVX              01011011 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 # Cacheability Control, Prefetch, and Instruction Ordering Instructions
 MASKMOVQ        SSE     00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
 MASKMOVDQU      SSE2    00001111 11110111 !emit { data16(); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
+VMASKMOVDQU     AVX              11110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
+
+VMASKMOVPS      AVX     001011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VMASKMOVPD      AVX     001011 d 1 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
 
 MOVNTPS         SSE     00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPS        AVX              00101011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 MOVNTPD         SSE2    00001111 00101011 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPD        AVX              00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 
 MOVNTI          SSE2    00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
 MOVNTI_64       SSE2    00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTQ          SSE     00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTDQ         SSE2    00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQ        AVX              11100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 MOVNTDQA        SSE4_1  00001111 00111000 00101010 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQA       AVX                       00101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 
 PREFETCHT0      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
 PREFETCHT1      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
@@ -476,6 +760,10 @@ PAUSE           SSE2    10010000          !emit { rep(); }
 
 # State Management Instructions
 EMMS            MMX     00001111 01110111 !emit { }
+VZEROUPPER      AVX              01110111 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); }
+VZEROALL        AVX              01110111 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); }
 
 # LDMXCSR         SSE     00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 4); }
+# VLDMXCSR        AVX              10101110 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 4); }
 STMXCSR         SSE     00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 4); }
+VSTMXCSR        AVX              10101110 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 4); }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions
  2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
                   ` (11 preceding siblings ...)
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions Jan Bobek
@ 2019-07-01  4:35 ` Jan Bobek
  12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01  4:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson

Add AVX2 instructions to the configuration file.

Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
 x86.risu | 257 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 257 insertions(+)

diff --git a/x86.risu b/x86.risu
index d3115ac..74c4ce8 100644
--- a/x86.risu
+++ b/x86.risu
@@ -33,16 +33,22 @@ VMOVQ_xmm2      AVX              11010110 !emit { vex(l => VEX_L_128, p => VEX_P
 
 MOVAPS          SSE     00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
 VMOVAPS         AVX              0010100 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
+VMOVAPS         AVX2             0010100 d !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32, align => 32); }
 MOVAPD          SSE2    00001111 0010100 d !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VMOVAPD         AVX              0010100 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
+VMOVAPD         AVX2             0010100 d !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32, align => 32); }
 MOVDQA          SSE2    00001111 011 d 1111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VMOVDQA         AVX              011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
+VMOVDQA         AVX2             011 d 1111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32, align => 32); }
 MOVUPS          SSE     00001111 0001000 d !emit { modrm(); mem(size => 16); }
 VMOVUPS         AVX              0001000 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVUPS         AVX2             0001000 d !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 MOVUPD          SSE2    00001111 0001000 d !emit { data16(); modrm(); mem(size => 16); }
 VMOVUPD         AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVUPD         AVX2             0001000 d !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 MOVDQU          SSE2    00001111 011 d 1111 !emit { rep(); modrm(); mem(size => 16); }
 VMOVDQU         AVX              011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVDQU         AVX2             011 d 1111 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 MOVSS           SSE     00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
 VMOVSS          AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
 VMOVSS_mem      AVX              0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
@@ -73,50 +79,67 @@ VMOVHLPS        AVX              00010010  !emit { vex(l => VEX_L_128, m => VEX_
 PMOVMSKB        SSE     00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 PMOVMSKB        SSE2    00001111 11010111 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 VPMOVMSKB       AVX              11010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VPMOVMSKB       AVX2             11010111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 MOVMSKPS        SSE     00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 VMOVMSKPS       AVX              01010000 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPS       AVX2             01010000 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 MOVMKSPD        SSE2    00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 VMOVMSKPD       AVX              01010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPD       AVX2             01010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
 
 LDDQU           SSE3    00001111 11110000 !emit { repne(); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
 VLDDQU          AVX              11110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VLDDQU          AVX2             11110000 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
 MOVSHDUP        SSE3    00001111 00010110 !emit { rep(); modrm(); mem(size => 16, align => 16); }
 VMOVSHDUP       AVX              00010110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVSHDUP       AVX2             00010110 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 MOVSLDUP        SSE3    00001111 00010010 !emit { rep(); modrm(); mem(size => 16, align => 16); }
 VMOVSLDUP       AVX              00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVSLDUP       AVX2             00010010 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 MOVDDUP         SSE3    00001111 00010010 !emit { repne(); modrm(); mem(size => 8); }
 VMOVDDUP        AVX              00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VMOVDDUP        AVX2             00010010 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 
 # Arithmetic Instructions
 PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
 PADDB           SSE2    00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDB          AVX              11111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDB          AVX2             11111100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
 PADDW           SSE2    00001111 11111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDW          AVX              11111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDW          AVX2             11111101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
 PADDD           SSE2    00001111 11111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDD          AVX              11111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDD          AVX2             11111110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }
 PADDQ           SSE2    00001111 11010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDQ          AVX              11010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDQ          AVX2             11010100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDSB          MMX     00001111 11101100 !emit { modrm(); mem(size => 8); }
 PADDSB          SSE2    00001111 11101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDSB         AVX              11101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDSB         AVX2             11101100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDSW          MMX     00001111 11101101 !emit { modrm(); mem(size => 8); }
 PADDSW          SSE2    00001111 11101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDSW         AVX              11101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDSW         AVX2             11101101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDUSB         MMX     00001111 11011100 !emit { modrm(); mem(size => 8); }
 PADDUSB         SSE2    00001111 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDUSB        AVX              11011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDUSB        AVX2             11011100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PADDUSW         MMX     00001111 11011101 !emit { modrm(); mem(size => 8); }
 PADDUSW         SSE2    00001111 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPADDUSW        AVX              11011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDUSW        AVX2             11011101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 ADDPS           SSE     00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
 VADDPS          AVX              01011000 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDPS          AVX2             01011000 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 ADDPD           SSE2    00001111 01011000 !emit { data16(); modrm(); mem(size => 16, align => 16) }
 VADDPD          AVX              01011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDPD          AVX2             01011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 ADDSS           SSE     00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
 VADDSS          AVX              01011000 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 ADDSD           SSE2    00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
@@ -125,47 +148,62 @@ VADDSD          AVX              01011000 !emit { vex(l => 0, p => VEX_P_REPNE,
 PHADDW_64       SSSE3   00001111 00111000 00000001 !emit { modrm(); mem(size => 8); }
 PHADDW          SSSE3   00001111 00111000 00000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPHADDW         AVX                       00000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHADDW         AVX2                      00000001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PHADDD_64       SSSE3   00001111 00111000 00000010 !emit { modrm(); mem(size => 8); }
 PHADDD          SSSE3   00001111 00111000 00000010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPHADDD         AVX                       00000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHADDD         AVX2                      00000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PHADDSW_64      SSSE3   00001111 00111000 00000011 !emit { modrm(); mem(size => 8); }
 PHADDSW         SSSE3   00001111 00111000 00000011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPHADDSW        AVX                       00000011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHADDSW        AVX2                      00000011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 HADDPS          SSE3    00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
 VHADDPS         AVX              01111100 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHADDPS         AVX2             01111100 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 32); }
 HADDPD          SSE3    00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VHADDPD         AVX              01111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHADDPD         AVX2             01111100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PSUBB           MMX     00001111 11111000 !emit { modrm(); mem(size => 8); }
 PSUBB           SSE2    00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBB          AVX              11111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBB          AVX2             11111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBW           MMX     00001111 11111001 !emit { modrm(); mem(size => 8); }
 PSUBW           SSE2    00001111 11111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBW          AVX              11111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBW          AVX2             11111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBD           MMX     00001111 11111010 !emit { modrm(); mem(size => 8); }
 PSUBD           SSE2    00001111 11111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBD          AVX              11111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBD          AVX2             11111010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBQ_64        SSE2    00001111 11111011 !emit { modrm(); mem(size => 8); }
 PSUBQ           SSE2    00001111 11111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBQ          AVX              11111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBQ          AVX2             11111011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBSB          MMX     00001111 11101000 !emit { modrm(); mem(size => 8); }
 PSUBSB          SSE2    00001111 11101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBSB         AVX              11101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBSB         AVX2             11101000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBSW          MMX     00001111 11101001 !emit { modrm(); mem(size => 8); }
 PSUBSW          SSE2    00001111 11101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBSW         AVX              11101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBSW         AVX2             11101001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBUSB         MMX     00001111 11011000 !emit { modrm(); mem(size => 8); }
 PSUBUSB         SSE2    00001111 11011000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBUSB        AVX              11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBUSB        AVX2             11011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PSUBUSW         MMX     00001111 11011001 !emit { modrm(); mem(size => 8); }
 PSUBUSW         SSE2    00001111 11011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSUBUSW        AVX              11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBUSW        AVX2             11011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 SUBPS           SSE     00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
 VSUBPS          AVX              01011100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VSUBPS          AVX2             01011100 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 SUBPD           SSE2    00001111 01011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VSUBPD          AVX              01011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VSUBPD          AVX2             01011100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 SUBSS           SSE     00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
 VSUBSS          AVX              01011100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 SUBSD           SSE2    00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
@@ -174,48 +212,64 @@ VSUBSD          AVX              01011100 !emit { vex(l => 0, p => VEX_P_REPNE,
 PHSUBW_64       SSSE3   00001111 00111000 00000101 !emit { modrm(); mem(size => 8); }
 PHSUBW          SSSE3   00001111 00111000 00000101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPHSUBW         AVX                       00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHSUBW         AVX2                      00000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PHSUBD_64       SSSE3   00001111 00111000 00000110 !emit { modrm(); mem(size => 8); }
 PHSUBD          SSSE3   00001111 00111000 00000110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPHSUBD         AVX                       00000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHSUBD         AVX2                      00000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PHSUBSW_64      SSSE3   00001111 00111000 00000111 !emit { modrm(); mem(size => 8); }
 PHSUBSW         SSSE3   00001111 00111000 00000111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPHSUBSW        AVX                       00000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHSUBSW        AVX2                      00000111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 HSUBPS          SSE3    00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
 VHSUBPS         AVX              01111101 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHSUBPS         AVX2             01111101 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 32); }
 HSUBPD          SSE3    00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VHSUBPD         AVX              01111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHSUBPD         AVX2             01111101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 ADDSUBPS        SSE3    00001111 11010000 !emit { repne(); modrm(); mem(size => 16, align => 16); }
 VADDSUBPS       AVX              11010000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDSUBPS       AVX2             11010000 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 32); }
 ADDSUBPD        SSE3    00001111 11010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VADDSUBPD       AVX              11010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDSUBPD       AVX2             11010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PMULLW          MMX     00001111 11010101 !emit { modrm(); mem(size => 8); }
 PMULLW          SSE2    00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULLW         AVX              11010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULLW         AVX2             11010101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMULLD          SSE4_1  00001111 00111000 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULLD         AVX                       01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMULLD         AVX2                      01000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMULHW          MMX     00001111 11100101 !emit { modrm(); mem(size => 8); }
 PMULHW          SSE2    00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULHW         AVX              11100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULHW         AVX2             11100101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMULHUW         SSE     00001111 11100100 !emit { modrm(); mem(size => 8); }
 PMULHUW         SSE2    00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULHUW        AVX              11100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULHUW        AVX2             11100100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMULDQ          SSE4_1  00001111 00111000 00101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULDQ         AVX                       00101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMULDQ         AVX2                      00101000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMULUDQ_64      SSE2    00001111 11110100 !emit { modrm(); mem(size => 8); }
 PMULUDQ         SSE2    00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULUDQ        AVX              11110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULUDQ        AVX2             11110100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PMULHRSW_64     SSSE3   00001111 00111000 00001011 !emit { modrm(); mem(size => 8); }
 PMULHRSW        SSSE3   00001111 00111000 00001011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMULHRSW       AVX                       00001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMULHRSW       AVX2                      00001011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 MULPS           SSE     00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
 VMULPS          AVX              01011001 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMULPS          AVX2             01011001 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 MULPD           SSE2    00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VMULPD          AVX              01011001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMULPD          AVX2             01011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 MULSS           SSE     00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
 VMULSS          AVX              01011001 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 MULSD           SSE2    00001111 01011001 !emit { repne(); modrm(); mem(size => 8); }
@@ -224,14 +278,18 @@ VMULSD          AVX              01011001 !emit { vex(l => VEX_L_128, p => VEX_P
 PMADDWD         MMX     00001111 11110101 !emit { modrm(); mem(size => 8); }
 PMADDWD         SSE2    00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMADDWD        AVX              11110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMADDWD        AVX2             11110101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMADDUBSW_64    SSSE3   00001111 00111000 00000100 !emit { modrm(); mem(size => 8); }
 PMADDUBSW       SSSE3   00001111 00111000 00000100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMADDUBSW      AVX                       00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMADDUBSW      AVX2                      00000100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 DIVPS           SSE     00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
 VDIVPS          AVX              01011110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VDIVPS          AVX2             01011110 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 DIVPD           SSE2    00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VDIVPD          AVX              01011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VDIVPD          AVX2             01011110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 DIVSS           SSE     00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
 VDIVSS          AVX              01011110 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 DIVSD           SSE2    00001111 01011110 !emit { repne(); modrm(); mem(size => 8); }
@@ -239,13 +297,16 @@ VDIVSD          AVX              01011110 !emit { vex(l => 0, p => VEX_P_REPNE,
 
 RCPPS           SSE     00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
 VRCPPS          AVX              01010011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VRCPPS          AVX2             01010011 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 RCPSS           SSE     00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
 VRCPSS          AVX              01010011 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 
 SQRTPS          SSE     00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
 VSQRTPS         AVX              01010001 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VSQRTPS         AVX2             01010001 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 SQRTPD          SSE2    00001111 01010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VSQRTPD         AVX              01010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VSQRTPD         AVX2             01010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 SQRTSS          SSE     00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
 VSQRTSS         AVX              01010001 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 SQRTSD          SSE2    00001111 01010001 !emit { repne(); modrm(); mem(size => 8); }
@@ -253,28 +314,37 @@ VSQRTSD         AVX              01010001 !emit { vex(l => 0, p => VEX_P_REPNE,
 
 RSQRTPS         SSE     00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
 VRSQRTPS        AVX              01010010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VRSQRTPS        AVX2             01010010 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 RSQRTSS         SSE     00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
 VRSQRTSS        AVX              01010010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 
 PMINUB          SSE     00001111 11011010 !emit { modrm(); mem(size => 8); }
 PMINUB          SSE2    00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMINUB         AVX              11011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMINUB         AVX2             11011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMINUW          SSE4_1  00001111 00111000 00111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMINUW         AVX                       00111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINUW         AVX2                      00111010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMINUD          SSE4_1  00001111 00111000 00111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMINUD         AVX                       00111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINUD         AVX2                      00111011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMINSB          SSE4_1  00001111 00111000 00111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMINSB         AVX                       00111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINSB         AVX2                      00111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMINSW          SSE     00001111 11101010 !emit { modrm(); mem(size => 8); }
 PMINSW          SSE2    00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMINSW         AVX              11101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMINSW         AVX2             11101010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMINSD          SSE4_1  00001111 00111000 00111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMINSD         AVX                       00111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINSD         AVX2                      00111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 MINPS           SSE     00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
 VMINPS          AVX              01011101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMINPS          AVX2             01011101 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 MINPD           SSE2    00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VMINPD          AVX              01011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMINPD          AVX2             01011101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 MINSS           SSE     00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
 VMINSS          AVX              01011101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 MINSD           SSE2    00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
@@ -286,22 +356,30 @@ VPHMINPOSUW     AVX                       01000001 !emit { vex(l => VEX_L_128, p
 PMAXUB          SSE     00001111 11011110 !emit { modrm(); mem(size => 8); }
 PMAXUB          SSE2    00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMAXUB         AVX              11011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMAXUB         AVX2             11011110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMAXUW          SSE4_1  00001111 00111000 00111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMAXUW         AVX                       00111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXUW         AVX2                      00111110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMAXUD          SSE4_1  00001111 00111000 00111111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMAXUD         AVX                       00111111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXUD         AVX2                      00111111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMAXSB          SSE4_1  00001111 00111000 00111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMAXSB         AVX                       00111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXSB         AVX2                      00111100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PMAXSW          SSE     00001111 11101110 !emit { modrm(); mem(size => 8); }
 PMAXSW          SSE2    00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMAXSW         AVX              11101110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMAXSW         AVX2             11101110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PMAXSD          SSE4_1  00001111 00111000 00111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPMAXSD         AVX                       00111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXSD         AVX2                      00111101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 MAXPS           SSE     00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
 VMAXPS          AVX              01011111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMAXPS          AVX2             01011111 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 MAXPD           SSE2    00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VMAXPD          AVX              01011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMAXPD          AVX2             01011111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 MAXSS           SSE     00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
 VMAXSS          AVX              01011111 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 MAXSD           SSE2    00001111 01011111 !emit { repne(); modrm(); mem(size => 8); }
@@ -310,45 +388,58 @@ VMAXSD          AVX              01011111 !emit { vex(l => 0, p => VEX_P_REPNE,
 PAVGB           SSE     00001111 11100000 !emit { modrm(); mem(size => 8); }
 PAVGB           SSE2    00001111 11100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPAVGB          AVX              11100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPAVGB          AVX2             11100000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PAVGW           SSE     00001111 11100011 !emit { modrm(); mem(size => 8); }
 PAVGW           SSE2    00001111 11100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPAVGW          AVX              11100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPAVGW          AVX2             11100011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PSADBW          SSE     00001111 11110110 !emit { modrm(); mem(size => 8); }
 PSADBW          SSE2    00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSADBW         AVX              11110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSADBW         AVX2             11110110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 MPSADBW         SSE4_1  00001111 00111010 01000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VMPSADBW        AVX                       01000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VMPSADBW        AVX2                      01000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
 
 PABSB_64        SSSE3   00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
 PABSB           SSSE3   00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPABSB          AVX                       00011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPABSB          AVX2                      00011100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 PABSW_64        SSSE3   00001111 00111000 00011101 !emit { modrm(); mem(size => 8); }
 PABSW           SSSE3   00001111 00111000 00011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPABSW          AVX                       00011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPABSW          AVX2                      00011101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 PABSD_64        SSSE3   00001111 00111000 00011110 !emit { modrm(); mem(size => 8); }
 PABSD           SSSE3   00001111 00111000 00011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPABSD          AVX                       00011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPABSD          AVX2                      00011110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 
 PSIGNB_64       SSSE3   00001111 00111000 00001000 !emit { modrm(); mem(size => 8); }
 PSIGNB          SSSE3   00001111 00111000 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSIGNB         AVX                       00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSIGNB         AVX2                      00001000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PSIGNW_64       SSSE3   00001111 00111000 00001001 !emit { modrm(); mem(size => 8); }
 PSIGNW          SSSE3   00001111 00111000 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSIGNW         AVX                       00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSIGNW         AVX2                      00001001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PSIGND_64       SSSE3   00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
 PSIGND          SSSE3   00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSIGND         AVX                       00001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSIGND         AVX2                      00001010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 DPPS            SSE4_1  00001111 00111010 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VDPPS           AVX                       01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VDPPS           AVX2                      01000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
 DPPD            SSE4_1  00001111 00111010 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VDPPD           AVX                       01000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
 
 ROUNDPS         SSE4_1  00001111 00111010 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VROUNDPS        AVX                       00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VROUNDPS        AVX2                      00001000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
 ROUNDPD         SSE4_1  00001111 00111010 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VROUNDPD        AVX                       00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VROUNDPD        AVX2                      00001001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
 ROUNDSS         SSE4_1  00001111 00111010 00001010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
 VROUNDSS        AVX                       00001010 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
 ROUNDSD         SSE4_1  00001111 00111010 00001011 !emit { data16(); modrm(); mem(size => 8); imm(size => 1); }
@@ -376,25 +467,33 @@ VPCLMULQDQ      PCLMULQDQ_AVX                    01000100 !emit { vex(l => VEX_L
 PCMPEQB         MMX     00001111 01110100 !emit { modrm(); mem(size => 8); }
 PCMPEQB         SSE2    00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPEQB        AVX              01110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPEQB        AVX2             01110100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PCMPEQW         MMX     00001111 01110101 !emit { modrm(); mem(size => 8); }
 PCMPEQW         SSE2    00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPEQW        AVX              01110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPEQW        AVX2             01110101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PCMPEQD         MMX     00001111 01110110 !emit { modrm(); mem(size => 8); }
 PCMPEQD         SSE2    00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPEQD        AVX              01110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPEQD        AVX2             01110110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PCMPEQQ         SSE4_1  00001111 00111000 00101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPEQQ        AVX                       00101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPCMPEQQ        AVX2                      00101001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PCMPGTB         MMX     00001111 01100100 !emit { modrm(); mem(size => 8); }
 PCMPGTB         SSE2    00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPGTB        AVX              01100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPGTB        AVX2             01100100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PCMPGTW         MMX     00001111 01100101 !emit { modrm(); mem(size => 8); }
 PCMPGTW         SSE2    00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPGTW        AVX              01100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPGTW        AVX2             01100101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PCMPGTD         MMX     00001111 01100110 !emit { modrm(); mem(size => 8); }
 PCMPGTD         SSE2    00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPGTD        AVX              01100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPGTD        AVX2             01100110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PCMPGTQ         SSE4_2  00001111 00111000 00110111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPCMPGTQ        AVX                       00110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPCMPGTQ        AVX2                      00110111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 PCMPESTRM       SSE4_2  00001111 00111010 01100000 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
 VPCMPESTRM      AVX                       01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
@@ -407,14 +506,19 @@ VPCMPISTRI      AVX                       01100011 !emit { vex(l => VEX_L_128, p
 
 PTEST           SSE4_1  00001111 00111000 00010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPTEST          AVX                       00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPTEST          AVX2                      00010111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 
 VTESTPS         AVX     00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VTESTPS         AVX2    00001110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 VTESTPD         AVX     00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VTESTPD         AVX2    00001111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 
 CMPPS           SSE     00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VCMPPS          AVX              11000010 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VCMPPS          AVX2             11000010 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
 CMPPD           SSE2    00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VCMPPD          AVX              11000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VCMPPD          AVX2             11000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
 CMPSS           SSE     00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
 VCMPSS          AVX              11000010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); imm(size => 1); }
 CMPSD           SSE2    00001111 11000010 !emit { repne(); modrm(); mem(size => 8); imm(size => 1); }
@@ -434,172 +538,246 @@ VCOMISD         AVX              00101111 !emit { vex(l => 0, p => VEX_P_DATA16,
 PAND            MMX     00001111 11011011 !emit { modrm(); mem(size => 8); }
 PAND            SSE2    00001111 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPAND           AVX              11011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPAND           AVX2             11011011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 ANDPS           SSE     00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
 VANDPS          AVX              01010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDPS          AVX2             01010100 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 ANDPD           SSE2    00001111 01010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VANDPD          AVX              01010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDPD          AVX2             01010100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PANDN           MMX     00001111 11011111 !emit { modrm(); mem(size => 8); }
 PANDN           SSE2    00001111 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPANDN          AVX              11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPANDN          AVX2             11011111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 ANDNPS          SSE     00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
 VANDNPS         AVX              01010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDNPS         AVX2             01010101 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 ANDNPD          SSE2    00001111 01010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VANDNPD         AVX              01010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDNPD         AVX2             01010101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 POR             MMX     00001111 11101011 !emit { modrm(); mem(size => 8); }
 POR             SSE2    00001111 11101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPOR            AVX              11101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPOR            AVX2             11101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 ORPS            SSE     00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
 VORPS           AVX              01010110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VORPS           AVX2             01010110 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 ORPD            SSE2    00001111 01010110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VORPD           AVX              01010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VORPD           AVX2             01010110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PXOR            MMX     00001111 11101111 !emit { modrm(); mem(size => 8); }
 PXOR            SSE2    00001111 11101111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPXOR           AVX              11101111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPXOR           AVX2             11101111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 XORPS           SSE     00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
 VXORPS          AVX              01010111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VXORPS          AVX2             01010111 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 XORPD           SSE2    00001111 01010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VXORPD          AVX              01010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VXORPD          AVX2             01010111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 # Shift and Rotate Instructions
 PSLLW           MMX     00001111 11110001 !emit { modrm(); mem(size => 8); }
 PSLLW           SSE2    00001111 11110001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSLLW          AVX              11110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSLLW          AVX2             11110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSLLD           MMX     00001111 11110010 !emit { modrm(); mem(size => 8); }
 PSLLD           SSE2    00001111 11110010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSLLD          AVX              11110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSLLD          AVX2             11110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSLLQ           MMX     00001111 11110011 !emit { modrm(); mem(size => 8); }
 PSLLQ           SSE2    00001111 11110011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSLLQ          AVX              11110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSLLQ          AVX2             11110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSLLDQ          SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
 VPSLLDQ         AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
+VPSLLDQ         AVX2             01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
 
 PSLLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 VPSLLW_imm      AVX              01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLW_imm      AVX2             01110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 VPSLLD_imm      AVX              01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLD_imm      AVX2             01110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 PSLLQ_imm       SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
 VPSLLQ_imm      AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLQ_imm      AVX2             01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+
+VPSLLVD_xmm     AVX2    01000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPSLLVD         AVX2    01000111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
+VPSLLVQ_xmm     AVX2    01000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 16); }
+VPSLLVQ         AVX2    01000111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 32); }
 
 PSRLW           MMX     00001111 11010001 !emit { modrm(); mem(size => 8); }
 PSRLW           SSE2    00001111 11010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSRLW          AVX              11010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRLW          AVX2             11010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRLD           MMX     00001111 11010010 !emit { modrm(); mem(size => 8); }
 PSRLD           SSE2    00001111 11010010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSRLD          AVX              11010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRLD          AVX2             11010010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRLQ           MMX     00001111 11010011 !emit { modrm(); mem(size => 8); }
 PSRLQ           SSE2    00001111 11010011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSRLQ          AVX              11010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRLQ          AVX2             11010011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRLDQ          SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
 VPSRLDQ         AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
+VPSRLDQ         AVX2             01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
 
 PSRLW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 VPSRLW_imm      AVX              01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLW_imm      AVX2             01110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 VPSRLD_imm      AVX              01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLD_imm      AVX2             01110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLQ_imm       MMX     00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 PSRLQ_imm       SSE2    00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
 VPSRLQ_imm      AVX              01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLQ_imm      AVX2             01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+
+VPSRLVD_xmm     AVX2    01000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPSRLVD         AVX2    01000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
+VPSRLVQ_xmm     AVX2    01000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 16); }
+VPSRLVQ         AVX2    01000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 32); }
 
 PSRAW           MMX     00001111 11100001 !emit { modrm(); mem(size => 8); }
 PSRAW           SSE2    00001111 11100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSRAW          AVX              11100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRAW          AVX2             11100001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 PSRAD           MMX     00001111 11100010 !emit { modrm(); mem(size => 8); }
 PSRAD           SSE2    00001111 11100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSRAD          AVX              11100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRAD          AVX2             11100010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
 
 PSRAW_imm       MMX     00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAW_imm       SSE2    00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 VPSRAW_imm      AVX              01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAW_imm      AVX2             01110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAD_imm       MMX     00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 PSRAD_imm       SSE2    00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
 VPSRAD_imm      AVX              01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAD_imm      AVX2             01110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+
+VPSRAVD_xmm     AVX2    01000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPSRAVD         AVX2    01000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
 
 PALIGNR_64      SSSE3   00001111 00111010 00001111 !emit { modrm(); mem(size => 8); imm(size => 1); }
 PALIGNR         SSSE3   00001111 00111010 00001111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VPALIGNR        AVX                       00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VPALIGNR        AVX2                      00001111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
 
 # Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
 PACKSSWB        MMX     00001111 01100011 !emit { modrm(); mem(size => 8); }
 PACKSSWB        SSE2    00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPACKSSWB       AVX              01100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPACKSSWB       AVX2             01100011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PACKSSDW        MMX     00001111 01101011 !emit { modrm(); mem(size => 8); }
 PACKSSDW        SSE2    00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPACKSSDW       AVX              01101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPACKSSDW       AVX2             01101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PACKUSWB        MMX     00001111 01100111 !emit { modrm(); mem(size => 8); }
 PACKUSWB        SSE2    00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPACKUSWB       AVX              01100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPACKUSWB       AVX2             01100111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PACKUSDW        SSE4_1  00001111 00111000 00101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPACKUSDW       AVX                       00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPACKUSDW       AVX2                      00101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 
 PUNPCKHBW       MMX     00001111 01101000 !emit { modrm(); mem(size => 8); }
 PUNPCKHBW       SSE2    00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKHBW      AVX              01101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHBW      AVX2             01101000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PUNPCKHWD       MMX     00001111 01101001 !emit { modrm(); mem(size => 8); }
 PUNPCKHWD       SSE2    00001111 01101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKHWD      AVX              01101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHWD      AVX2             01101001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PUNPCKHDQ       MMX     00001111 01101010 !emit { modrm(); mem(size => 8); }
 PUNPCKHDQ       SSE2    00001111 01101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKHDQ      AVX              01101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHDQ      AVX2             01101010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PUNPCKHQDQ      SSE2    00001111 01101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKHQDQ     AVX              01101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHQDQ     AVX2             01101101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PUNPCKLBW       MMX     00001111 01100000 !emit { modrm(); mem(size => 4); }
 PUNPCKLBW       SSE2    00001111 01100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKLBW      AVX              01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLBW      AVX2             01100000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PUNPCKLWD       MMX     00001111 01100001 !emit { modrm(); mem(size => 4); }
 PUNPCKLWD       SSE2    00001111 01100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKLWD      AVX              01100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLWD      AVX2             01100001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PUNPCKLDQ       MMX     00001111 01100010 !emit { modrm(); mem(size => 4); }
 PUNPCKLDQ       SSE2    00001111 01100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKLDQ      AVX              01100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLDQ      AVX2             01100010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 PUNPCKLQDQ      SSE2    00001111 01101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPUNPCKLQDQ     AVX              01101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLQDQ     AVX2             01101100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 UNPCKLPS        SSE     00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
 VUNPCKLPS       AVX              00010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKLPS       AVX2             00010100 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 UNPCKLPD        SSE2    00001111 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VUNPCKLPD       AVX              00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKLPD       AVX2             00010100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 UNPCKHPS        SSE     00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
 VUNPCKHPS       AVX              00010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKHPS       AVX2             00010101 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
 UNPCKHPD        SSE2    00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VUNPCKHPD       AVX              00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKHPD       AVX2             00010101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
 
 PSHUFB_64       SSSE3   00001111 00111000 00000000 !emit { modrm(); mem(size => 8); }
 PSHUFB          SSSE3   00001111 00111000 00000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPSHUFB         AVX                       00000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSHUFB         AVX2                      00000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
 PSHUFW          SSE     00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
 PSHUFLW         SSE2    00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VPSHUFLW        AVX              01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPSHUFLW        AVX2             01110000 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
 PSHUFHW         SSE2    00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VPSHUFHW        AVX              01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPSHUFHW        AVX2             01110000 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
 PSHUFD          SSE2    00001111 01110000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VPSHUFD         AVX              01110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPSHUFD         AVX2             01110000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
 
 SHUFPS          SSE     00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VSHUFPS         AVX              11000110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VSHUFPS         AVX2             11000110 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
 SHUFPD          SSE2    00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VSHUFPD         AVX              11000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VSHUFPD         AVX2             11000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
 
 BLENDPS         SSE4_1  00001111 00111010 00001100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VBLENDPS        AVX                       00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDPS        AVX2                      00001100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
 BLENDPD         SSE4_1  00001111 00111010 00001101 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VBLENDPD        AVX                       00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDPD        AVX2                      00001101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
 BLENDVPS        SSE4_1  00001111 00111000 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VBLENDVPS       AVX                       01001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDVPS       AVX2                      01001010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
 BLENDVPD        SSE4_1  00001111 00111000 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VBLENDVPD       AVX                       01001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDVPD       AVX2                      01001011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
 PBLENDVB        SSE4_1  00001111 00111000 00010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VPBLENDVB       AVX                       01001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VPBLENDVB       AVX2                      01001100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
 PBLENDW         SSE4_1  00001111 00111010 00001110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
 VPBLENDW        AVX                       00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VPBLENDW        AVX2                      00001110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
+VPBLENDD_xmm    AVX2                      00000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VPBLENDD        AVX2                      00000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
 
 INSERTPS        SSE4_1  00001111 00111010 00100001 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
 VINSERTPS       AVX                       00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
@@ -613,6 +791,9 @@ VPINSRD         AVX                       00100010 !emit { vex(l => VEX_L_128, p
 PINSRQ          SSE4_1  00001111 00111010 00100010 !emit { data16(); rex(w => 1); modrm(); mem(size => 8); imm(size => 1); }
 VPINSRQ         AVX                       00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1); modrm(); mem(size => 8); imm(size => 1); }
 
+VINSERTF128     AVX2    00011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VINSERTI128     AVX2    00111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+
 EXTRACTPS       SSE4_1  00001111 00111010 00010111 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
 EXTRACTPS_mem   SSE4_1  00001111 00111010 00010111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
 VEXTRACTPS      AVX                       00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
@@ -639,37 +820,94 @@ PEXTRW_reg      SSE     00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg =
 PEXTRW_reg      SSE2    00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
 VPEXTRW_reg     AVX              11000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
 
+VEXTRACTF128    AVX2    00011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VEXTRACTI128    AVX2    00111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+
+VPBROADCASTB_xmm AVX2   01111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 1); }
+VPBROADCASTB    AVX2    01111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 1); }
+VPBROADCASTW_xmm AVX2   01111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPBROADCASTW    AVX2    01111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPBROADCASTD_xmm AVX2   01011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPBROADCASTD    AVX2    01011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPBROADCASTQ_xmm AVX2   01011001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPBROADCASTQ    AVX2    01011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VBROADCASTSS_xmm AVX2   00011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VBROADCASTSS    AVX2    00011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VBROADCASTSD    AVX2    00011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VBROADCASTF128  AVX2    00011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VBROADCASTI128  AVX2    01011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+
+VPERM2F128      AVX2    00000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
+VPERM2I128      AVX2    01000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
+VPERMD          AVX2    00110110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
+VPERMPS         AVX2    00010110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 VPERMILPS       AVX     00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPS       AVX2    00001100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
 VPERMILPS_imm   AVX     00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPERMILPS_imm   AVX2    00000100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
 VPERMILPD       AVX     00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPD       AVX2    00001101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
 VPERMILPD_imm   AVX     00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPERMILPD_imm   AVX2    00000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
+VPERMQ          AVX2    00000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
+VPERMPD         AVX2    00000001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
+
+# TODO These instructions use VSIB byte, which is not implemented yet
+# VGATHERDPS      AVX2    10010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERDPS      AVX2    10010010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERDPD      AVX2    10010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VGATHERDPD      AVX2    10010010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPS      AVX2    10010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPS      AVX2    10010011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPD      AVX2    10010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPD      AVX2    10010011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDD      AVX2    10010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDD      AVX2    10010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDQ      AVX2    10010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDQ      AVX2    10010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQD      AVX2    10010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQD      AVX2    10010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQQ      AVX2    10010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQQ      AVX2    10010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
 
 # Conversion Instructions
 PMOVSXBW        SSE4_1  00001111 00111000 00100000 !emit { data16(); modrm(); mem(size => 8); }
 VPMOVSXBW       AVX                       00100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVSXBW       AVX2                      00100000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 PMOVSXBD        SSE4_1  00001111 00111000 00100001 !emit { data16(); modrm(); mem(size => 4); }
 VPMOVSXBD       AVX                       00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVSXBD       AVX2                      00100001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVSXBQ        SSE4_1  00001111 00111000 00100010 !emit { data16(); modrm(); mem(size => 2); }
 VPMOVSXBQ       AVX                       00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPMOVSXBQ       AVX2                      00100010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 PMOVSXWD        SSE4_1  00001111 00111000 00100011 !emit { data16(); modrm(); mem(size => 8); }
 VPMOVSXWD       AVX                       00100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVSXWD       AVX2                      00100011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 PMOVSXWQ        SSE4_1  00001111 00111000 00100100 !emit { data16(); modrm(); mem(size => 4); }
 VPMOVSXWQ       AVX                       00100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVSXWQ       AVX2                      00100100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVSXDQ        SSE4_1  00001111 00111000 00100101 !emit { data16(); modrm(); mem(size => 8); }
 VPMOVSXDQ       AVX                       00100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVSXDQ       AVX2                      00100101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 PMOVZXBW        SSE4_1  00001111 00111000 00110000 !emit { data16(); modrm(); mem(size => 8); }
 VPMOVZXBW       AVX                       00110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVZXBW       AVX2                      00110000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 PMOVZXBD        SSE4_1  00001111 00111000 00110001 !emit { data16(); modrm(); mem(size => 4); }
 VPMOVZXBD       AVX                       00110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVZXBD       AVX2                      00110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVZXBQ        SSE4_1  00001111 00111000 00110010 !emit { data16(); modrm(); mem(size => 2); }
 VPMOVZXBQ       AVX                       00110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPMOVZXBQ       AVX2                      00110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
 PMOVZXWD        SSE4_1  00001111 00111000 00110011 !emit { data16(); modrm(); mem(size => 8); }
 VPMOVZXWD       AVX                       00110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVZXWD       AVX2                      00110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 PMOVZXWQ        SSE4_1  00001111 00111000 00110100 !emit { data16(); modrm(); mem(size => 4); }
 VPMOVZXWQ       AVX                       00110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVZXWQ       AVX2                      00110100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
 PMOVZXDQ        SSE4_1  00001111 00111000 00110101 !emit { data16(); modrm(); mem(size => 8); }
 VPMOVZXDQ       AVX                       00110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVZXDQ       AVX2                      00110101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 CVTPI2PS        SSE     00001111 00101010 !emit { modrm(); mem(size => 8); }
 CVTSI2SS        SSE     00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
@@ -706,15 +944,20 @@ VCVTTSD2SI_64   AVX              00101100 !emit { vex(l => 0, p => VEX_P_REPNE,
 
 CVTPD2DQ        SSE2    00001111 11100110 !emit { repne(); modrm(); mem(size => 16, align => 16); }
 VCVTPD2DQ       AVX              11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTPD2DQ       AVX2             11100110 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 CVTTPD2DQ       SSE2    00001111 11100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VCVTTPD2DQ      AVX              11100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTTPD2DQ      AVX2             11100110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 CVTDQ2PD        SSE2    00001111 11100110 !emit { rep(); modrm(); mem(size => 8); }
 VCVTDQ2PD       AVX              11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VCVTDQ2PD       AVX2             11100110 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 
 CVTPS2PD        SSE2    00001111 01011010 !emit { modrm(); mem(size => 8); }
 VCVTPS2PD       AVX              01011010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VCVTPS2PD       AVX2             01011010 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
 CVTPD2PS        SSE2    00001111 01011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VCVTPD2PS       AVX              01011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTPD2PS       AVX2             01011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 CVTSS2SD        SSE2    00001111 01011010 !emit { rep(); modrm(); mem(size => 4); }
 VCVTSS2SD       AVX              01011010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
 CVTSD2SS        SSE2    00001111 01011010 !emit { repne(); modrm(); mem(size => 8); }
@@ -722,10 +965,13 @@ VCVTSD2SS       AVX              01011010 !emit { vex(l => 0, p => VEX_P_REPNE,
 
 CVTDQ2PS        SSE2    00001111 01011011 !emit { modrm(); mem(size => 16, align => 16); }
 VCVTDQ2PS       AVX              01011011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTDQ2PS       AVX2             01011011 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 CVTPS2DQ        SSE2    00001111 01011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
 VCVTPS2DQ       AVX              01011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTPS2DQ       AVX2             01011011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 CVTTPS2DQ       SSE2    00001111 01011011 !emit { rep(); modrm(); mem(size => 16, align => 16); }
 VCVTTPS2DQ      AVX              01011011 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTTPS2DQ      AVX2             01011011 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
 
 # Cacheability Control, Prefetch, and Instruction Ordering Instructions
 MASKMOVQ        SSE     00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
@@ -733,20 +979,31 @@ MASKMOVDQU      SSE2    00001111 11110111 !emit { data16(); modrm(mod => MOD_DIR
 VMASKMOVDQU     AVX              11110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
 
 VMASKMOVPS      AVX     001011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VMASKMOVPS      AVX2    001011 d 0 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
 VMASKMOVPD      AVX     001011 d 1 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VMASKMOVPD      AVX2    001011 d 1 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
+
+VPMASKMOVD_xmm  AVX2    100011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VPMASKMOVD      AVX2    100011 d 0 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
+VPMASKMOVQ_xmm  AVX2    100011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VPMASKMOVQ      AVX2    100011 d 0 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
 
 MOVNTPS         SSE     00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 VMOVNTPS        AVX              00101011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPS        AVX2             00101011 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
 MOVNTPD         SSE2    00001111 00101011 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 VMOVNTPD        AVX              00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPD        AVX2             00101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
 
 MOVNTI          SSE2    00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
 MOVNTI_64       SSE2    00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTQ          SSE     00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
 MOVNTDQ         SSE2    00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 VMOVNTDQ        AVX              11100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQ        AVX2             11100111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
 MOVNTDQA        SSE4_1  00001111 00111000 00101010 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
 VMOVNTDQA       AVX                       00101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQA       AVX2                      00101010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
 
 PREFETCHT0      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
 PREFETCHT1      SSE     00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
@ 2019-07-03 15:22   ` Richard Henderson
  2019-07-10 17:48     ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 15:22 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/1/19 6:35 AM, Jan Bobek wrote:
> +    while ($bitcur < $bitend) {
> +        my $format;
> +        my $bitlen;
> +
> +        if ($bitcur + 64 <= $bitend) {
> +            $format = "Q";
> +            $bitlen = 64;
> +        } elsif ($bitcur + 32 <= $bitend) {
> +            $format = "L";
> +            $bitlen = 32;
> +        } elsif ($bitcur + 16 <= $bitend) {
> +            $format = "S";
> +            $bitlen = 16;
> +        } else {
> +            $format = "C";
> +            $bitlen = 8;
> +        }
> +
> +        $format .= ($args{bigendian} ? ">" : "<") if $bitlen > 8;

It now occurs to me to wonder if it's worth simplifying this function to always
emit bytes, and thus take care of all of the endianness ourselves, since we're
doing it anyway for larger/odd-sized hunks.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
@ 2019-07-03 15:37   ` Richard Henderson
  2019-07-10 18:02     ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 15:37 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/1/19 6:35 AM, Jan Bobek wrote:
> +    VEX_V_UNUSED => 0b1111,

I think perhaps this is a mistake.  Yes, that's what goes in the field, but
what goes in the field is ~(logical_value).

While for general RISU-ish operation, ~(random_number) is just as random as
+(random_number), the difference will be if we ever want to explicitly emit
with this interface a specific vex instruction which also requires the v-register.

> +sub rex_encode(%)
> +{
> +    my (%args) = @_;
> +
> +    $args{w} = 0 unless defined $args{w};
> +    $args{r} = 0 unless defined $args{r};
> +    $args{x} = 0 unless defined $args{x};
> +    $args{b} = 0 unless defined $args{b};
> +
> +    return (value => 0x40
> +            | (($args{w} ? 1 : 0) << 3)
> +            | (($args{r} ? 1 : 0) << 2)
> +            | (($args{x} ? 1 : 0) << 1)
> +            | ($args{b} ? 1 : 0),
> +            len => 1);
> +}

Does

	(defined $args{w} && $args{w}) << 3

work?  That seems tidier to me than splitting these conditions.

> +        return (value => (0xC4 << 16)
> +                | (($args{r} ? 1 : 0) << 15)
> +                | (($args{x} ? 1 : 0) << 14)
> +                | (($args{b} ? 1 : 0) << 13)

Further down in vex_encode, and along the lines of VEX_V_UNUSED, this appears
to be actively wrong, since these bits are encoded as inverses.  What this
*really* means is that because of that, rex_encode and vex_encode will not
encode the same registers for a given instruction.  Which really does feel
bug-like, random inputs or no.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: add module
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
@ 2019-07-03 15:47   ` Richard Henderson
  2019-07-10 18:08     ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 15:47 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/1/19 6:35 AM, Jan Bobek wrote:
> +sub parse_emitblock($$)
> +{
> +    my ($rec, $insn) = @_;
> +    my $insnname = $rec->{name};
> +    my $opcode = $insn->{opcode}{value};
> +
> +    $emit_opts = {};
> +
> +    my $emitblock = $rec->{blocks}{"emit"};
> +    if (defined $emitblock) {
> +        eval_with_fields($insnname, $opcode, $rec, "emit", $emitblock);
> +    }

And if !defined?  Silently discard?

Is this just weirdness higher in the risugen stack,
such that this might be called maybe_parse_emitblock?


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
@ 2019-07-03 16:11   ` Richard Henderson
  2019-07-10 18:21     ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 16:11 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/1/19 6:35 AM, Jan Bobek wrote:
> +sub write_mov_rr($$)
> +{
> +    my ($r1, $r2) = @_;
> +
> +    my %insn = (opcode => X86OP_MOV,
> +                modrm => {mod => MOD_DIRECT,
> +                          reg => ($r1 & 0x7),
> +                          rm => ($r2 & 0x7)});
> +
> +    $insn{rex}{w} = 1 if $is_x86_64;
> +    $insn{rex}{r} = 1 if $r1 >= 8;
> +    $insn{rex}{b} = 1 if $r2 >= 8;

This is where maybe it's better to leave rex.[rb] to risugen_x86_asm, and just
leave $modrm{reg} and $modrm{rm} as 4-bit quantities.

> +sub write_mov_reg_imm($$)
> +{
> +    my ($reg, $imm) = @_;
> +    my %insn;
> +
> +    if (0 <= $imm && $imm <= 0xffffffff) {

Should include !$is_x86_64 here,

> +        %insn = (opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
> +                 imm => {value => $imm, len => 4});
> +    } elsif (-0x80000000 <= $imm && $imm <= 0x7fffffff) {
> +        %insn = (opcode => {value => 0xC7, len => 1},
> +                 modrm => {mod => MOD_DIRECT,
> +                           reg => 0, rm => ($reg & 0x7)},
> +                 imm => {value => $imm, len => 4});
> +
> +        $insn{rex}{w} = 1 if $is_x86_64;

making this unconditional.

> +sub write_random_ymmdata()
> +{
> +    my $ymm_cnt = $is_x86_64 ? 16 : 8;
> +    my $ymm_len = 32;
> +    my $datalen = $ymm_cnt * $ymm_len;
> +
> +    # Generate random data blob
> +    write_random_datablock($datalen);
> +
> +    # Load the random data into YMM regs.
> +    for (my $ymm_reg = 0; $ymm_reg < $ymm_cnt; $ymm_reg++) {
> +        write_insn(vex => {l => VEX_L_256, p => VEX_P_DATA16,
> +                           r => !($ymm_reg >= 8)},

Again, vex.r should be handled in vex_encode.

> +                   opcode => X86OP_VMOVAPS,
> +                   modrm => {mod => MOD_INDIRECT_DISP32,
> +                             reg => ($ymm_reg & 0x7),
> +                             rm => REG_EAX},
> +                   disp => {value => $ymm_reg * $ymm_len,
> +                            len => 4});
> +    }

So... this now generates code that cannot run without AVX2.

Which is probably fine for testing right now, since we do
want to be able to notice effects of SSE/AVX insns on the
high bits of the registers.

But we'll probably need to have the same --xsave=foo
command-line option that we have for risu itself.

That would let you initialize only 16-bytes here, or
for avx512 initialize 64-bytes, plus the k-registers.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
@ 2019-07-03 21:35   ` Richard Henderson
  2019-07-10 18:29     ` Jan Bobek
  2019-07-03 21:49   ` Richard Henderson
  2019-07-03 22:01   ` Peter Maydell
  2 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 21:35 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/1/19 6:35 AM, Jan Bobek wrote:
> Add an x86 configuration file with all MMX instructions.
> 
> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> ---
>  x86.risu | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 96 insertions(+)
>  create mode 100644 x86.risu

Note that most of these MMX instructions affect the FPU, not the vector unit.
We would want to extend risu again to handle this.  You'd also need to seed the
FPU with random data.

I was thinking for a moment that this is really beyond what you've signed up
for, but on second thoughts it's not.  Decoding SSE is really tangled with
decoding MMX, via the 0x66 prefix, and you'll want to be able to verify that
you don't regress.

> +# State Management Instructions
> +EMMS            MMX     00001111 01110111 !emit { }

I'm not sure this is really testable, because of the state change.  But we'll
see what happens with the aforementioned dumping.

> +# Arithmetic Instructions
> +PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
> +PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
> +PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
> +PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }

PADDQ is sse2.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
  2019-07-03 21:35   ` Richard Henderson
@ 2019-07-03 21:49   ` Richard Henderson
  2019-07-10 18:32     ` Jan Bobek
  2019-07-03 22:01   ` Peter Maydell
  2 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 21:49 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/1/19 6:35 AM, Jan Bobek wrote:
> +MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
> +MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }

Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
can't be set.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
  2019-07-03 21:35   ` Richard Henderson
  2019-07-03 21:49   ` Richard Henderson
@ 2019-07-03 22:01   ` Peter Maydell
  2019-07-10 18:35     ` Jan Bobek
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2019-07-03 22:01 UTC (permalink / raw)
  To: Jan Bobek; +Cc: Richard Henderson, Alex Bennée, QEMU Developers

On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>
> Add an x86 configuration file with all MMX instructions.
>
> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>

> --- /dev/null
> +++ b/x86.risu
> @@ -0,0 +1,96 @@
> +###############################################################################
> +# Copyright (c) 2019 Linaro Limited

I'm guessing from your email address that this copyright line probably
isn't right :-)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill
  2019-07-03 15:22   ` Richard Henderson
@ 2019-07-10 17:48     ` Jan Bobek
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 17:48 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 1218 bytes --]

Hi Richard,

sorry for replying so late. I read your comments last week; as I
mentioned in our weekly update email, I ended up adding/removing quite
a lot since v2, so I wasn't 100% sure how much of it will remain
relevant.

Anyways,

On 7/3/19 11:22 AM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +    while ($bitcur < $bitend) {
>> +        my $format;
>> +        my $bitlen;
>> +
>> +        if ($bitcur + 64 <= $bitend) {
>> +            $format = "Q";
>> +            $bitlen = 64;
>> +        } elsif ($bitcur + 32 <= $bitend) {
>> +            $format = "L";
>> +            $bitlen = 32;
>> +        } elsif ($bitcur + 16 <= $bitend) {
>> +            $format = "S";
>> +            $bitlen = 16;
>> +        } else {
>> +            $format = "C";
>> +            $bitlen = 8;
>> +        }
>> +
>> +        $format .= ($args{bigendian} ? ">" : "<") if $bitlen > 8;
> 
> It now occurs to me to wonder if it's worth simplifying this function to always
> emit bytes, and thus take care of all of the endianness ourselves, since we're
> doing it anyway for larger/odd-sized hunks.

Good point. *facepalm*

I will include this change in v3.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module
  2019-07-03 15:37   ` Richard Henderson
@ 2019-07-10 18:02     ` Jan Bobek
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:02 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 2344 bytes --]

On 7/3/19 11:37 AM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +    VEX_V_UNUSED => 0b1111,
> 
> I think perhaps this is a mistake.  Yes, that's what goes in the field, but
> what goes in the field is ~(logical_value).
> 
> While for general RISU-ish operation, ~(random_number) is just as random as
> +(random_number), the difference will be if we ever want to explicitly emit
> with this interface a specific vex instruction which also requires the v-register.

See below.

>> +sub rex_encode(%)
>> +{
>> +    my (%args) = @_;
>> +
>> +    $args{w} = 0 unless defined $args{w};
>> +    $args{r} = 0 unless defined $args{r};
>> +    $args{x} = 0 unless defined $args{x};
>> +    $args{b} = 0 unless defined $args{b};
>> +
>> +    return (value => 0x40
>> +            | (($args{w} ? 1 : 0) << 3)
>> +            | (($args{r} ? 1 : 0) << 2)
>> +            | (($args{x} ? 1 : 0) << 1)
>> +            | ($args{b} ? 1 : 0),
>> +            len => 1);
>> +}
> 
> Does
> 
> 	(defined $args{w} && $args{w}) << 3
> 
> work?  That seems tidier to me than splitting these conditions.

It does, I will change it. Thanks!

>> +        return (value => (0xC4 << 16)
>> +                | (($args{r} ? 1 : 0) << 15)
>> +                | (($args{x} ? 1 : 0) << 14)
>> +                | (($args{b} ? 1 : 0) << 13)
> 
> Further down in vex_encode, and along the lines of VEX_V_UNUSED, this appears
> to be actively wrong, since these bits are encoded as inverses.  What this
> *really* means is that because of that, rex_encode and vex_encode will not
> encode the same registers for a given instruction.  Which really does feel
> bug-like, random inputs or no.

So, vex_encode, rex_encode and friends were meant to be really
low-level functions; they literally just encode the bits from what you
pass in, without any concern for what the fields even mean. In that
spirit, write_insn itself never did much of error-checking.

I have added quite a lot of code to risugen_x86_asm in v3; most
importantly, there are now asm_insn_* functions which are more
high-level, in that you pass in the logical values and they care of
error checks and encoding. I also removed write_insn and all the
encoding-related symbolic constants from the public interface of the
module.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: add module
  2019-07-03 15:47   ` Richard Henderson
@ 2019-07-10 18:08     ` Jan Bobek
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 852 bytes --]

On 7/3/19 11:47 AM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +sub parse_emitblock($$)
>> +{
>> +    my ($rec, $insn) = @_;
>> +    my $insnname = $rec->{name};
>> +    my $opcode = $insn->{opcode}{value};
>> +
>> +    $emit_opts = {};
>> +
>> +    my $emitblock = $rec->{blocks}{"emit"};
>> +    if (defined $emitblock) {
>> +        eval_with_fields($insnname, $opcode, $rec, "emit", $emitblock);
>> +    }
> 
> And if !defined?  Silently discard?
> 
> Is this just weirdness higher in the risugen stack,
> such that this might be called maybe_parse_emitblock?

If !defined, there _is_ no emit block, and we treat that as an empty
block. The caller gets an empty hash, and it's up to them to decide
what that means. I could rename it, but the difference doesn't seem
that important to me...?

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
  2019-07-03 16:11   ` Richard Henderson
@ 2019-07-10 18:21     ` Jan Bobek
  2019-07-11  9:26       ` Richard Henderson
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:21 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 3220 bytes --]

On 7/3/19 12:11 PM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +sub write_mov_rr($$)
>> +{
>> +    my ($r1, $r2) = @_;
>> +
>> +    my %insn = (opcode => X86OP_MOV,
>> +                modrm => {mod => MOD_DIRECT,
>> +                          reg => ($r1 & 0x7),
>> +                          rm => ($r2 & 0x7)});
>> +
>> +    $insn{rex}{w} = 1 if $is_x86_64;
>> +    $insn{rex}{r} = 1 if $r1 >= 8;
>> +    $insn{rex}{b} = 1 if $r2 >= 8;
> 
> This is where maybe it's better to leave rex.[rb] to risugen_x86_asm, and just
> leave $modrm{reg} and $modrm{rm} as 4-bit quantities.

That's what I have in v3, stay tuned!

>> +sub write_mov_reg_imm($$)
>> +{
>> +    my ($reg, $imm) = @_;
>> +    my %insn;
>> +
>> +    if (0 <= $imm && $imm <= 0xffffffff) {
> 
> Should include !$is_x86_64 here,
> 
>> +        %insn = (opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
>> +                 imm => {value => $imm, len => 4});
>> +    } elsif (-0x80000000 <= $imm && $imm <= 0x7fffffff) {
>> +        %insn = (opcode => {value => 0xC7, len => 1},
>> +                 modrm => {mod => MOD_DIRECT,
>> +                           reg => 0, rm => ($reg & 0x7)},
>> +                 imm => {value => $imm, len => 4});
>> +
>> +        $insn{rex}{w} = 1 if $is_x86_64;
> 
> making this unconditional.

Doesn't B8 (without REX.W) work for x86_64, too? It zeroes the upper
part of the destination, so it's effectively zero-extending, and it's
one byte shorter than C7 (no ModR/M byte needed).

That being said, I moved most of this function to risugen_x86_asm and
included a bunch of comments regarding different cases, so it should
be easier to understand.

>> +sub write_random_ymmdata()
>> +{
>> +    my $ymm_cnt = $is_x86_64 ? 16 : 8;
>> +    my $ymm_len = 32;
>> +    my $datalen = $ymm_cnt * $ymm_len;
>> +
>> +    # Generate random data blob
>> +    write_random_datablock($datalen);
>> +
>> +    # Load the random data into YMM regs.
>> +    for (my $ymm_reg = 0; $ymm_reg < $ymm_cnt; $ymm_reg++) {
>> +        write_insn(vex => {l => VEX_L_256, p => VEX_P_DATA16,
>> +                           r => !($ymm_reg >= 8)},
> 
> Again, vex.r should be handled in vex_encode.

As I said, there will be more high-level instruction-assembling
functions exported by risugen_x86_asm in v3, which take care of this.

>> +                   opcode => X86OP_VMOVAPS,
>> +                   modrm => {mod => MOD_INDIRECT_DISP32,
>> +                             reg => ($ymm_reg & 0x7),
>> +                             rm => REG_EAX},
>> +                   disp => {value => $ymm_reg * $ymm_len,
>> +                            len => 4});
>> +    }
> 
> So... this now generates code that cannot run without AVX2.
> 
> Which is probably fine for testing right now, since we do
> want to be able to notice effects of SSE/AVX insns on the
> high bits of the registers.
> 
> But we'll probably need to have the same --xsave=foo
> command-line option that we have for risu itself.
> 
> That would let you initialize only 16-bytes here, or
> for avx512 initialize 64-bytes, plus the k-registers.

Ah yes, indeed.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-03 21:35   ` Richard Henderson
@ 2019-07-10 18:29     ` Jan Bobek
  2019-07-11  9:32       ` Richard Henderson
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 1822 bytes --]

On 7/3/19 5:35 PM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> Add an x86 configuration file with all MMX instructions.
>>
>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>> ---
>>  x86.risu | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 96 insertions(+)
>>  create mode 100644 x86.risu
> 
> Note that most of these MMX instructions affect the FPU, not the vector unit.
> We would want to extend risu again to handle this.  You'd also need to seed the
> FPU with random data.
> 
> I was thinking for a moment that this is really beyond what you've signed up
> for, but on second thoughts it's not.  Decoding SSE is really tangled with
> decoding MMX, via the 0x66 prefix, and you'll want to be able to verify that
> you don't regress.

Honestly, I added MMX instructions just for completeness; I figured it can't
hurt, and you can always filter them out via command-line switches. You have
a point with the regression testing, though...

>> +# State Management Instructions
>> +EMMS            MMX     00001111 01110111 !emit { }
> 
> I'm not sure this is really testable, because of the state change.  But we'll
> see what happens with the aforementioned dumping.
> 
>> +# Arithmetic Instructions
>> +PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
>> +PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
>> +PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
>> +PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }

Not this one, at least according to the Intel docs:

NP 0F D4 /r: PADDQ mm, mm/m64          (MMX)
66 0F D4 /r: PADDQ xmm1, xmm2/m128     (SSE2)

The SSE2 version is added in a later patch.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-03 21:49   ` Richard Henderson
@ 2019-07-10 18:32     ` Jan Bobek
  2019-07-11  9:34       ` Richard Henderson
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:32 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée

On 7/3/19 5:49 PM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
>> +MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
> 
> Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
> can't be set.

Actually, my CPU chewed it without choking even when the bits were
set, but it will taken care of in v3.

-Jan


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-03 22:01   ` Peter Maydell
@ 2019-07-10 18:35     ` Jan Bobek
  2019-07-11  6:45       ` Alex Bennée
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:35 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Richard Henderson, Alex Bennée, QEMU Developers


[-- Attachment #1.1: Type: text/plain, Size: 747 bytes --]

On 7/3/19 6:01 PM, Peter Maydell wrote:
> On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>>
>> Add an x86 configuration file with all MMX instructions.
>>
>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> 
>> --- /dev/null
>> +++ b/x86.risu
>> @@ -0,0 +1,96 @@
>> +###############################################################################
>> +# Copyright (c) 2019 Linaro Limited
> 
> I'm guessing from your email address that this copyright line probably
> isn't right :-)

Haha indeed, I just copy-pasted it from the other files; the same goes for
the rest of the source files.

Any suggestions on what it should be? I'm not currently employed by
anyone (as Google keeps reminding us).

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-10 18:35     ` Jan Bobek
@ 2019-07-11  6:45       ` Alex Bennée
  2019-07-11 13:33         ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Bennée @ 2019-07-11  6:45 UTC (permalink / raw)
  To: Jan Bobek; +Cc: Peter Maydell, Richard Henderson, QEMU Developers


Jan Bobek <jan.bobek@gmail.com> writes:

> On 7/3/19 6:01 PM, Peter Maydell wrote:
>> On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>>>
>>> Add an x86 configuration file with all MMX instructions.
>>>
>>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>>
>>> --- /dev/null
>>> +++ b/x86.risu
>>> @@ -0,0 +1,96 @@
>>> +###############################################################################
>>> +# Copyright (c) 2019 Linaro Limited
>>
>> I'm guessing from your email address that this copyright line probably
>> isn't right :-)
>
> Haha indeed, I just copy-pasted it from the other files; the same goes for
> the rest of the source files.
>
> Any suggestions on what it should be? I'm not currently employed by
> anyone (as Google keeps reminding us).

It should be (c) 2019 Jan Bobek as you wrote it. The license text should
be the same (assuming you are happy to license it, which I assume you
are given you are contributing to RISU ;-)

>
> -Jan


--
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
  2019-07-10 18:21     ` Jan Bobek
@ 2019-07-11  9:26       ` Richard Henderson
  2019-07-11 13:10         ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11  9:26 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/10/19 8:21 PM, Jan Bobek wrote:
> Doesn't B8 (without REX.W) work for x86_64, too? It zeroes the upper
> part of the destination, so it's effectively zero-extending, and it's
> one byte shorter than C7 (no ModR/M byte needed).

Sorry, I shouldn't have been quite so terse.  What I meant is

  if (!$is_x86_64 || (0 <= $imm && $imm <= 0xffffffff))

so that 32-bit always uses the 5-byte encoding instead of the 6-byte.


> That being said, I moved most of this function to risugen_x86_asm and
> included a bunch of comments regarding different cases, so it should
> be easier to understand.

Great.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-10 18:29     ` Jan Bobek
@ 2019-07-11  9:32       ` Richard Henderson
  2019-07-11 13:29         ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11  9:32 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/10/19 8:29 PM, Jan Bobek wrote:
>>> +# Arithmetic Instructions
>>> +PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
>>> +PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
>>> +PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
>>> +PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }
> 
> Not this one, at least according to the Intel docs:
> 
> NP 0F D4 /r: PADDQ mm, mm/m64          (MMX)
> 66 0F D4 /r: PADDQ xmm1, xmm2/m128     (SSE2)
> 
> The SSE2 version is added in a later patch.

That's not how I read the Intel docs.

In the CPUID feature flag column of the MMX PADDQ, I see SSE2.  While the insn
affects the mmx registers, it was not added with the original MMX instruction set.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-10 18:32     ` Jan Bobek
@ 2019-07-11  9:34       ` Richard Henderson
  2019-07-11  9:44         ` Alex Bennée
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11  9:34 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/10/19 8:32 PM, Jan Bobek wrote:
> On 7/3/19 5:49 PM, Richard Henderson wrote:
>> On 7/1/19 6:35 AM, Jan Bobek wrote:
>>> +MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
>>> +MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
>>
>> Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
>> can't be set.
> 
> Actually, my CPU chewed it without choking even when the bits were
> set, but it will taken care of in v3.

That's interesting data.

I wonder if it's worth retaining this as a feature in order to check qemu's
implementation?


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-11  9:34       ` Richard Henderson
@ 2019-07-11  9:44         ` Alex Bennée
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2019-07-11  9:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jan Bobek, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> On 7/10/19 8:32 PM, Jan Bobek wrote:
>> On 7/3/19 5:49 PM, Richard Henderson wrote:
>>> On 7/1/19 6:35 AM, Jan Bobek wrote:
>>>> +MOVQ            MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
>>>> +MOVQ_mem        MMX     00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
>>>
>>> Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
>>> can't be set.
>>
>> Actually, my CPU chewed it without choking even when the bits were
>> set, but it will taken care of in v3.
>
> That's interesting data.
>
> I wonder if it's worth retaining this as a feature in order to check qemu's
> implementation?

We could be some time, c.f. BlackHat 2017

  https://www.youtube.com/watch?v=KrksBdWcZgQ

I suspect if we set https://github.com/xoreaxeaxeax/sandsifter on QEMU
we might find a few breakages.

>
>
> r~


--
Alex Bennée


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
  2019-07-11  9:26       ` Richard Henderson
@ 2019-07-11 13:10         ` Jan Bobek
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 13:10 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 628 bytes --]

On 7/11/19 5:26 AM, Richard Henderson wrote:
> On 7/10/19 8:21 PM, Jan Bobek wrote:
>> Doesn't B8 (without REX.W) work for x86_64, too? It zeroes the upper
>> part of the destination, so it's effectively zero-extending, and it's
>> one byte shorter than C7 (no ModR/M byte needed).
> 
> Sorry, I shouldn't have been quite so terse.  What I meant is
> 
>   if (!$is_x86_64 || (0 <= $imm && $imm <= 0xffffffff))
> 
> so that 32-bit always uses the 5-byte encoding instead of the 6-byte.

Oh, I see. I double-checked my new code and it never uses the C7 move
in 32-bit mode, but thanks for pointing it out.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-11  9:32       ` Richard Henderson
@ 2019-07-11 13:29         ` Jan Bobek
  2019-07-11 13:57           ` Richard Henderson
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 13:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 1773 bytes --]

On 7/11/19 5:32 AM, Richard Henderson wrote:
> On 7/10/19 8:29 PM, Jan Bobek wrote:
>>>> +# Arithmetic Instructions
>>>> +PADDB           MMX     00001111 11111100 !emit { modrm(); mem(size => 8); }
>>>> +PADDW           MMX     00001111 11111101 !emit { modrm(); mem(size => 8); }
>>>> +PADDD           MMX     00001111 11111110 !emit { modrm(); mem(size => 8); }
>>>> +PADDQ           MMX     00001111 11010100 !emit { modrm(); mem(size => 8); }
>>
>> Not this one, at least according to the Intel docs:
>>
>> NP 0F D4 /r: PADDQ mm, mm/m64          (MMX)
>> 66 0F D4 /r: PADDQ xmm1, xmm2/m128     (SSE2)
>>
>> The SSE2 version is added in a later patch.
> 
> That's not how I read the Intel docs.
> 
> In the CPUID feature flag column of the MMX PADDQ, I see SSE2.  While the insn
> affects the mmx registers, it was not added with the original MMX instruction set.

I know what you mean; for example, PSUBQ is like that. I know about
these kind of instructions because "{name}_{enc}" does not form a
unique key, and risugen would complain about that. That's why there is
PSUBQ_mm and PSUBQ in the final x86.risu file.

However, I downloaded a fresh copy of Intel SDM off the Intel website
this morning (just to make sure) and in Volume 2B, Section "4.3
Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
in the 4th row, and the CPUID column says MMX. On the other hand, I
can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
so it's a bit confusing.

If you know for a fact that it didn't come until SSE2 and the manual
is wrong, I will change it.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-11  6:45       ` Alex Bennée
@ 2019-07-11 13:33         ` Jan Bobek
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 13:33 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Peter Maydell, Richard Henderson, QEMU Developers


[-- Attachment #1.1: Type: text/plain, Size: 1115 bytes --]



On 7/11/19 2:45 AM, Alex Bennée wrote:
> 
> Jan Bobek <jan.bobek@gmail.com> writes:
> 
>> On 7/3/19 6:01 PM, Peter Maydell wrote:
>>> On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>>>>
>>>> Add an x86 configuration file with all MMX instructions.
>>>>
>>>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>>>
>>>> --- /dev/null
>>>> +++ b/x86.risu
>>>> @@ -0,0 +1,96 @@
>>>> +###############################################################################
>>>> +# Copyright (c) 2019 Linaro Limited
>>>
>>> I'm guessing from your email address that this copyright line probably
>>> isn't right :-)
>>
>> Haha indeed, I just copy-pasted it from the other files; the same goes for
>> the rest of the source files.
>>
>> Any suggestions on what it should be? I'm not currently employed by
>> anyone (as Google keeps reminding us).
> 
> It should be (c) 2019 Jan Bobek as you wrote it. The license text should
> be the same (assuming you are happy to license it, which I assume you
> are given you are contributing to RISU ;-)

Sounds great, thank you!

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-11 13:29         ` Jan Bobek
@ 2019-07-11 13:57           ` Richard Henderson
  2019-07-11 21:29             ` Jan Bobek
  0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11 13:57 UTC (permalink / raw)
  To: Jan Bobek, qemu-devel; +Cc: Alex Bennée

On 7/11/19 3:29 PM, Jan Bobek wrote:
> However, I downloaded a fresh copy of Intel SDM off the Intel website
> this morning (just to make sure) and in Volume 2B, Section "4.3
> Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
> Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
> in the 4th row, and the CPUID column says MMX. On the other hand, I
> can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
> in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
> so it's a bit confusing.
> 
> If you know for a fact that it didn't come until SSE2 and the manual
> is wrong, I will change it.

Interesting.  I see what you see in

  253665-069US January 2019

but I first looked at

  325462-058US April 2016

which definitely has this marked as SSE2.

In the 2019 version, "5.6.3 SSE2 128-Bit SIMD Integer Instructions" is the
first mention of PADDQ.  Whereas "5.4.3 MMX Packed Arithmetic Instructions"
mentions PADD{B,W,D} but not Q.

I tend to think that this is a bug in the current manual.

Checking in binutils I see

> paddq, 2, 0x660fd4, None, 2, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> paddq, 2, 0xfd4, None, 2, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }

and both contain CpuSSE2. If you like, I could run this by one of the Intel GCC
folk to be sure.


r~


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
  2019-07-11 13:57           ` Richard Henderson
@ 2019-07-11 21:29             ` Jan Bobek
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 21:29 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: Alex Bennée


[-- Attachment #1.1: Type: text/plain, Size: 1817 bytes --]

On 7/11/19 9:57 AM, Richard Henderson wrote:
> On 7/11/19 3:29 PM, Jan Bobek wrote:
>> However, I downloaded a fresh copy of Intel SDM off the Intel website
>> this morning (just to make sure) and in Volume 2B, Section "4.3
>> Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
>> Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
>> in the 4th row, and the CPUID column says MMX. On the other hand, I
>> can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
>> in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
>> so it's a bit confusing.
>>
>> If you know for a fact that it didn't come until SSE2 and the manual
>> is wrong, I will change it.
> 
> Interesting.  I see what you see in
> 
>   253665-069US January 2019
> 
> but I first looked at
> 
>   325462-058US April 2016
> 
> which definitely has this marked as SSE2.
> 
> In the 2019 version, "5.6.3 SSE2 128-Bit SIMD Integer Instructions" is the
> first mention of PADDQ.  Whereas "5.4.3 MMX Packed Arithmetic Instructions"
> mentions PADD{B,W,D} but not Q.
> 
> I tend to think that this is a bug in the current manual.
> 
> Checking in binutils I see
> 
>> paddq, 2, 0x660fd4, None, 2, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>> paddq, 2, 0xfd4, None, 2, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
> 
> and both contain CpuSSE2. If you like, I could run this by one of the Intel GCC
> folk to be sure.

I think this is convincing enough for me; it was a good idea to check
binutils! I find it interesting that they'd get it wrong in a more
recent version of the manual, though.

-Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-07-11 21:29 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01  4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
2019-07-03 15:22   ` Richard Henderson
2019-07-10 17:48     ` Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
2019-07-03 15:37   ` Richard Henderson
2019-07-10 18:02     ` Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
2019-07-03 15:47   ` Richard Henderson
2019-07-10 18:08     ` Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
2019-07-03 16:11   ` Richard Henderson
2019-07-10 18:21     ` Jan Bobek
2019-07-11  9:26       ` Richard Henderson
2019-07-11 13:10         ` Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
2019-07-03 21:35   ` Richard Henderson
2019-07-10 18:29     ` Jan Bobek
2019-07-11  9:32       ` Richard Henderson
2019-07-11 13:29         ` Jan Bobek
2019-07-11 13:57           ` Richard Henderson
2019-07-11 21:29             ` Jan Bobek
2019-07-03 21:49   ` Richard Henderson
2019-07-10 18:32     ` Jan Bobek
2019-07-11  9:34       ` Richard Henderson
2019-07-11  9:44         ` Alex Bennée
2019-07-03 22:01   ` Peter Maydell
2019-07-10 18:35     ` Jan Bobek
2019-07-11  6:45       ` Alex Bennée
2019-07-11 13:33         ` Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions Jan Bobek
2019-07-01  4:35 ` [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions Jan Bobek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.