* [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images
@ 2019-07-01 4:35 Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
` (12 more replies)
0 siblings, 13 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
This is a v2 of the patch series first posted in [1]. This version also
implements the VEX prefix, hence all SIMD extensions up to AVX2 are
supported. Notable exceptions are LDMXCSR (cannot constrain memory
contents yet) and all forms of VGATHER (VSIB not implemented).
Note that this is still not the final version; I am planning to
implement randomization of VSIB to test VGATHER, and improve the way
registers are randomized (as discussed in e.g. [2]).
Changes since v1:
- risugen_common: rewrote insnv to make it clearer, added a comment
to randint_constr;
- risugen_x86_asm: fixed a typo in rex_encode;
- risugen_x86: use more than one opcode in write_mov_reg_imm to
optimize space usage;
- x86.risu: added all SIMD extensnions up to AVX2.
References:
1. https://lists.nongnu.org/archive/html/qemu-devel/2019-06/msg04123.html
2. https://lists.nongnu.org/archive/html/qemu-devel/2019-06/msg06489.html
Jan Bobek (14):
risugen_common: add insnv, randint_constr, rand_fill
risugen_x86_asm: add module
risugen_x86_emit: add module
risugen_x86: add module
risugen: allow all byte-aligned instructions
x86.risu: add MMX instructions
x86.risu: add SSE instructions
x86.risu: add SSE2 instructions
x86.risu: add SSE3 instructions
x86.risu: add SSSE3 instructions
x86.risu: add SSE4.1 and SSE4.2 instructions
x86.risu: add AES and PCLMULQDQ instructions
x86.risu: add AVX instructions
x86.risu: add AVX2 instructions
risugen | 15 +-
risugen_common.pm | 107 ++++-
risugen_x86.pm | 498 +++++++++++++++++++++
risugen_x86_asm.pm | 252 +++++++++++
risugen_x86_emit.pm | 91 ++++
x86.risu | 1026 +++++++++++++++++++++++++++++++++++++++++++
6 files changed, 1977 insertions(+), 12 deletions(-)
create mode 100644 risugen_x86.pm
create mode 100644 risugen_x86_asm.pm
create mode 100644 risugen_x86_emit.pm
create mode 100644 x86.risu
--
2.20.1
^ permalink raw reply [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-03 15:22 ` Richard Henderson
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
` (11 subsequent siblings)
12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add three common utility functions:
- insnv allows emitting variable-length instructions in little-endian
or big-endian byte order; it subsumes functionality of former
insn16() and insn32() functions.
- randint_constr allows generating random integers according to
several constraints passed as arguments.
- rand_fill uses randint_constr to fill a given hash with
(optionally constrained) random values.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
risugen_common.pm | 107 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 101 insertions(+), 6 deletions(-)
diff --git a/risugen_common.pm b/risugen_common.pm
index 71ee996..c5d861e 100644
--- a/risugen_common.pm
+++ b/risugen_common.pm
@@ -23,7 +23,8 @@ BEGIN {
require Exporter;
our @ISA = qw(Exporter);
- our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16 $bytecount
+ our @EXPORT = qw(open_bin close_bin set_endian insn32 insn16
+ $bytecount insnv randint_constr rand_fill
progress_start progress_update progress_end
eval_with_fields is_pow_of_2 sextract ctz
dump_insn_details);
@@ -37,7 +38,7 @@ my $bigendian = 0;
# (default is little endian, 0).
sub set_endian
{
- $bigendian = @_;
+ ($bigendian) = @_;
}
sub open_bin
@@ -52,18 +53,112 @@ sub close_bin
close(BIN) or die "can't close output file: $!";
}
+sub insnv(%)
+{
+ my (%args) = @_;
+
+ # Default to big-endian order, so that the instruction bytes are
+ # emitted in the same order as they are written in the
+ # configuration file.
+ $args{bigendian} = 1 unless defined $args{bigendian};
+
+ my $bitcur = 0;
+ my $bitend = 8 * $args{len};
+ while ($bitcur < $bitend) {
+ my $format;
+ my $bitlen;
+
+ if ($bitcur + 64 <= $bitend) {
+ $format = "Q";
+ $bitlen = 64;
+ } elsif ($bitcur + 32 <= $bitend) {
+ $format = "L";
+ $bitlen = 32;
+ } elsif ($bitcur + 16 <= $bitend) {
+ $format = "S";
+ $bitlen = 16;
+ } else {
+ $format = "C";
+ $bitlen = 8;
+ }
+
+ $format .= ($args{bigendian} ? ">" : "<") if $bitlen > 8;
+
+ my $bitmask = (1 << $bitlen) - 1;
+ my $value = $args{value} >> ($args{bigendian}
+ ? $bitend - $bitcur - $bitlen
+ : $bitcur);
+
+ print BIN pack($format, $value & $bitmask);
+ $bytecount += $bitlen / 8;
+
+ $bitcur += $bitlen;
+ }
+}
+
sub insn32($)
{
my ($insn) = @_;
- print BIN pack($bigendian ? "N" : "V", $insn);
- $bytecount += 4;
+ insnv(value => $insn, len => 4, bigendian => $bigendian);
}
sub insn16($)
{
my ($insn) = @_;
- print BIN pack($bigendian ? "n" : "v", $insn);
- $bytecount += 2;
+ insnv(value => $insn, len => 2, bigendian => $bigendian);
+}
+
+sub randint_constr(%)
+{
+ my (%args) = @_;
+ my $bitlen = $args{bitlen};
+ my $halfrange = 1 << ($bitlen - 1);
+
+ while (1) {
+ my $value = int(rand(2 * $halfrange));
+ $value -= $halfrange if defined $args{signed} && $args{signed};
+ $value &= ~$args{fixedbitmask} if defined $args{fixedbitmask};
+ $value |= $args{fixedbits} if defined $args{fixedbits};
+
+ if (defined $args{constraint}) {
+ # The idea is: if the most significant bit of
+ # $args{constraint} is zero, $args{constraint} is the
+ # value we want to return; if the most significant bit is
+ # one, ~$args{constraint} (its bit inversion) is the value
+ # we want to *avoid*, so we try again.
+
+ if (!($args{constraint} >> 63)) {
+ $value = $args{constraint};
+ } elsif ($value == ~$args{constraint}) {
+ next;
+ }
+ }
+
+ return $value;
+ }
+}
+
+sub rand_fill($$)
+{
+ my ($target, $constraints) = @_;
+
+ for (keys %{$target}) {
+ my %args = (bitlen => $target->{$_}{bitlen});
+
+ $args{fixedbits} = $target->{$_}{fixedbits}
+ if defined $target->{$_}{fixedbits};
+ $args{fixedbitmask} = $target->{$_}{fixedbitmask}
+ if defined $target->{$_}{fixedbitmask};
+ $args{signed} = $target->{$_}{signed}
+ if defined $target->{$_}{signed};
+
+ $args{constraint} = $constraints->{$_}
+ if defined $constraints->{$_};
+
+ $target->{$_} = randint_constr(%args);
+ }
+
+ return $target;
}
# Progress bar implementation
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-03 15:37 ` Richard Henderson
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
` (10 subsequent siblings)
12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
The module risugen_x86_asm.pm exports several constants and the
function write_insn, which work in tandem to allow emission of x86
instructions in more clear and structured manner.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
risugen_x86_asm.pm | 252 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 252 insertions(+)
create mode 100644 risugen_x86_asm.pm
diff --git a/risugen_x86_asm.pm b/risugen_x86_asm.pm
new file mode 100644
index 0000000..5640531
--- /dev/null
+++ b/risugen_x86_asm.pm
@@ -0,0 +1,252 @@
+#!/usr/bin/perl -w
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###############################################################################
+
+# risugen_x86_asm -- risugen_x86's helper module for x86 assembly
+package risugen_x86_asm;
+
+use strict;
+use warnings;
+
+use risugen_common;
+
+our @ISA = qw(Exporter);
+our @EXPORT = qw(
+ write_insn
+ VEX_L_128 VEX_L_256
+ VEX_P_NONE VEX_P_DATA16 VEX_P_REP VEX_P_REPNE
+ VEX_M_0F VEX_M_0F38 VEX_M_0F3A
+ VEX_V_UNUSED
+ REG_EAX REG_ECX REG_EDX REG_EBX REG_ESP REG_EBP REG_ESI REG_EDI
+ MOD_INDIRECT MOD_INDIRECT_DISP8 MOD_INDIRECT_DISP32 MOD_DIRECT
+ X86PFX_DATA16 X86PFX_REPNE X86PFX_REP
+ X86OP_LEA X86OP_XOR X86OP_ALU_imm8 X86OP_MOV X86OP_SAHF X86OP_CALL
+ X86OP_JMP X86OP_UD1 X86OP_VMOVAPS X86OP_MOVAPS
+ );
+
+use constant {
+ VEX_L_128 => 0,
+ VEX_L_256 => 1,
+
+ VEX_P_NONE => 0b00,
+ VEX_P_DATA16 => 0b01,
+ VEX_P_REP => 0b10,
+ VEX_P_REPNE => 0b11,
+
+ VEX_M_0F => 0b00001,
+ VEX_M_0F38 => 0b00010,
+ VEX_M_0F3A => 0b00011,
+
+ VEX_V_UNUSED => 0b1111,
+
+ REG_EAX => 0,
+ REG_ECX => 1,
+ REG_EDX => 2,
+ REG_EBX => 3,
+ REG_ESP => 4,
+ REG_EBP => 5,
+ REG_ESI => 6,
+ REG_EDI => 7,
+
+ MOD_INDIRECT => 0b00,
+ MOD_INDIRECT_DISP8 => 0b01,
+ MOD_INDIRECT_DISP32 => 0b10,
+ MOD_DIRECT => 0b11,
+
+ X86PFX_DATA16 => {value => 0x66, len => 1},
+ X86PFX_REPNE => {value => 0xF2, len => 1},
+ X86PFX_REP => {value => 0xF3, len => 1},
+
+ X86OP_LEA => {value => 0x8D, len => 1},
+ X86OP_XOR => {value => 0x33, len => 1},
+ X86OP_ALU_imm8 => {value => 0x83, len => 1},
+ X86OP_MOV => {value => 0x8B, len => 1},
+ X86OP_SAHF => {value => 0x9E, len => 1},
+ X86OP_CALL => {value => 0xE8, len => 1},
+ X86OP_JMP => {value => 0xE9, len => 1},
+
+ X86OP_UD1 => {value => 0x0FB9, len => 2},
+ X86OP_VMOVAPS => {value => 0x28, len => 1},
+ X86OP_MOVAPS => {value => 0x0F28, len => 2},
+};
+
+sub rex_encode(%)
+{
+ my (%args) = @_;
+
+ $args{w} = 0 unless defined $args{w};
+ $args{r} = 0 unless defined $args{r};
+ $args{x} = 0 unless defined $args{x};
+ $args{b} = 0 unless defined $args{b};
+
+ return (value => 0x40
+ | (($args{w} ? 1 : 0) << 3)
+ | (($args{r} ? 1 : 0) << 2)
+ | (($args{x} ? 1 : 0) << 1)
+ | ($args{b} ? 1 : 0),
+ len => 1);
+}
+
+sub vex_encode(%)
+{
+ my (%args) = @_;
+
+ $args{r} = 1 unless defined $args{r};
+ $args{x} = 1 unless defined $args{x};
+ $args{b} = 1 unless defined $args{b};
+ $args{v} = VEX_V_UNUSED unless defined $args{v};
+ $args{p} = VEX_P_NONE unless defined $args{p};
+
+ die "l field undefined"
+ unless defined $args{l};
+ die "v field out-of-range: $args{v}"
+ unless 0b0000 <= $args{v} && $args{v} <= 0b1111;
+ die "p field out-of-range: $args{p}"
+ unless 0b00 <= $args{p} && $args{p} <= 0b11;
+
+ if ($args{x} && $args{b} && !defined $args{m} && !defined $args{w}) {
+ # We can use the 2-byte VEX prefix
+ return (value => (0xC5 << 8)
+ | (($args{r} ? 1 : 0) << 7)
+ | ($args{v} << 3)
+ | (($args{l} ? 1 : 0) << 2)
+ | $args{p},
+ len => 2);
+ } else {
+ # We have to use the 3-byte VEX prefix
+ die "m field undefined"
+ unless defined $args{m};
+ die "m field out-of-range: $args{m}"
+ unless 0b00000 <= $args{m} && $args{m} <= 0b11111;
+ die "w field undefined"
+ unless defined $args{w};
+
+ return (value => (0xC4 << 16)
+ | (($args{r} ? 1 : 0) << 15)
+ | (($args{x} ? 1 : 0) << 14)
+ | (($args{b} ? 1 : 0) << 13)
+ | ($args{m} << 8)
+ | (($args{w} ? 1 : 0) << 7)
+ | ($args{v} << 3)
+ | (($args{l} ? 1 : 0) << 2)
+ | $args{p},
+ len => 3);
+ }
+}
+
+sub modrm_encode(%)
+{
+ my (%args) = @_;
+
+ die "MOD field out-of-range: $args{mod}"
+ unless 0 <= $args{mod} && $args{mod} <= 3;
+ die "REG field out-of-range: $args{reg}"
+ unless 0 <= $args{reg} && $args{reg} <= 7;
+ die "RM field out-of-range: $args{rm}"
+ unless 0 <= $args{rm} && $args{rm} <= 7;
+
+ return (value =>
+ ($args{mod} << 6)
+ | ($args{reg} << 3)
+ | $args{rm},
+ len => 1);
+}
+
+sub sib_encode(%)
+{
+ my (%args) = @_;
+
+ die "SS field out-of-range: $args{ss}"
+ unless 0 <= $args{ss} && $args{ss} <= 3;
+ die "INDEX field out-of-range: $args{index}"
+ unless 0 <= $args{index} && $args{index} <= 7;
+ die "BASE field out-of-range: $args{base}"
+ unless 0 <= $args{base} && $args{base} <= 7;
+
+ return (value =>
+ ($args{ss} << 6)
+ | ($args{index} << 3)
+ | $args{base},
+ len => 1);
+}
+
+sub write_insn(%)
+{
+ my (%insn) = @_;
+
+ my @tokens;
+ push @tokens, "EVEX" if defined $insn{evex};
+ push @tokens, "VEX" if defined $insn{vex};
+ push @tokens, "REP" if defined $insn{rep};
+ push @tokens, "REPNE" if defined $insn{repne};
+ push @tokens, "DATA16" if defined $insn{data16};
+ push @tokens, "REX" if defined $insn{rex};
+ push @tokens, "OP" if defined $insn{opcode};
+ push @tokens, "MODRM" if defined $insn{modrm};
+ push @tokens, "SIB" if defined $insn{sib};
+ push @tokens, "DISP" if defined $insn{disp};
+ push @tokens, "IMM" if defined $insn{imm};
+ push @tokens, "END";
+
+ # (EVEX | VEX | ((REP | REPNE)? DATA16? REX?)) OP (MODRM SIB? DISP?)? IMM? END
+
+ my $token = shift @tokens;
+ if ($token eq "EVEX") {
+ insnv(evex_encode(%{$insn{evex}}));
+ $token = shift @tokens;
+ } elsif ($token eq "VEX") {
+ insnv(vex_encode(%{$insn{vex}}));
+ $token = shift @tokens;
+ } else {
+ if ($token eq "REP") {
+ insnv(%{&X86PFX_REP});
+ $token = shift @tokens;
+ } elsif ($token eq "REPNE") {
+ insnv(%{&X86PFX_REPNE});
+ $token = shift @tokens;
+ }
+ if ($token eq "DATA16") {
+ insnv(%{&X86PFX_DATA16});
+ $token = shift @tokens;
+ }
+ if ($token eq "REX") {
+ insnv(rex_encode(%{$insn{rex}}));
+ $token = shift @tokens;
+ }
+ }
+
+ die "Unexpected instruction tokens where OP expected: $token @tokens\n"
+ unless $token eq "OP";
+
+ insnv(%{$insn{opcode}});
+ $token = shift @tokens;
+
+ if ($token eq "MODRM") {
+ insnv(modrm_encode(%{$insn{modrm}}));
+ $token = shift @tokens;
+
+ if ($token eq "SIB") {
+ insnv(sib_encode(%{$insn{sib}}));
+ $token = shift @tokens;
+ }
+ if ($token eq "DISP") {
+ insnv(%{$insn{disp}}, bigendian => 0);
+ $token = shift @tokens;
+ }
+ }
+ if ($token eq "IMM") {
+ insnv(%{$insn{imm}}, bigendian => 0);
+ $token = shift @tokens;
+ }
+
+ die "Unexpected junk tokens at the end of instruction: $token @tokens\n"
+ unless $token eq "END";
+}
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: add module
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-03 15:47 ` Richard Henderson
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
` (9 subsequent siblings)
12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
The helper module risugen_x86_emit.pm exports a single function
"parse_emitblock", which serves to capture and return instruction
constraints described by "emit" blocks in an x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
risugen | 2 +-
risugen_x86_emit.pm | 91 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 92 insertions(+), 1 deletion(-)
create mode 100644 risugen_x86_emit.pm
diff --git a/risugen b/risugen
index e690b18..fe3d00e 100755
--- a/risugen
+++ b/risugen
@@ -43,7 +43,7 @@ my @pattern_re = (); # include pattern
my @not_pattern_re = (); # exclude pattern
# Valid block names (keys in blocks hash)
-my %valid_blockname = ( constraints => 1, memory => 1 );
+my %valid_blockname = ( constraints => 1, memory => 1, emit => 1 );
sub parse_risu_directive($$@)
{
diff --git a/risugen_x86_emit.pm b/risugen_x86_emit.pm
new file mode 100644
index 0000000..127a524
--- /dev/null
+++ b/risugen_x86_emit.pm
@@ -0,0 +1,91 @@
+#!/usr/bin/perl -w
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###############################################################################
+
+# risugen_x86_emit -- risugen_x86's helper module for emit blocks
+package risugen_x86_emit;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+
+our @ISA = qw(Exporter);
+our @EXPORT = qw(parse_emitblock);
+
+my $emit_opts;
+
+sub rep(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{rep} = \%opts;
+}
+
+sub repne(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{repne} = \%opts;
+}
+
+sub data16(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{data16} = \%opts;
+}
+
+sub rex(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{rex} = \%opts;
+}
+
+sub vex(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{vex} = \%opts;
+}
+
+sub modrm(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{modrm} = \%opts;
+}
+
+sub mem(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{mem} = \%opts;
+}
+
+sub imm(%)
+{
+ my (%opts) = @_;
+ $emit_opts->{imm} = \%opts;
+}
+
+sub parse_emitblock($$)
+{
+ my ($rec, $insn) = @_;
+ my $insnname = $rec->{name};
+ my $opcode = $insn->{opcode}{value};
+
+ $emit_opts = {};
+
+ my $emitblock = $rec->{blocks}{"emit"};
+ if (defined $emitblock) {
+ eval_with_fields($insnname, $opcode, $rec, "emit", $emitblock);
+ }
+
+ return $emit_opts;
+}
+
+1;
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (2 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-03 16:11 ` Richard Henderson
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions Jan Bobek
` (8 subsequent siblings)
12 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
The risugen_x86.pm module contains most of the code specific to Intel
i386 and x86_64 architectures. This commit also adds --x86_64 option,
which enables emission of 64-bit (rather than 32-bit) assembly.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
risugen | 6 +-
risugen_x86.pm | 498 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 503 insertions(+), 1 deletion(-)
create mode 100644 risugen_x86.pm
diff --git a/risugen b/risugen
index fe3d00e..09a702a 100755
--- a/risugen
+++ b/risugen
@@ -310,6 +310,7 @@ Valid options:
Useful to test before support for FP is available.
--sve : enable sve floating point
--be : generate instructions in Big-Endian byte order (ppc64 only).
+ --x86_64 : generate 64-bit (rather than 32-bit) x86 code.
--help : print this message
EOT
}
@@ -322,6 +323,7 @@ sub main()
my $fp_enabled = 1;
my $sve_enabled = 0;
my $big_endian = 0;
+ my $is_x86_64 = 0;
my ($infile, $outfile);
GetOptions( "help" => sub { usage(); exit(0); },
@@ -338,6 +340,7 @@ sub main()
},
"be" => sub { $big_endian = 1; },
"no-fp" => sub { $fp_enabled = 0; },
+ "x86_64" => sub { $is_x86_64 = 1; },
"sve" => sub { $sve_enabled = 1; },
) or return 1;
# allow "--pattern re,re" and "--pattern re --pattern re"
@@ -372,7 +375,8 @@ sub main()
'keys' => \@insn_keys,
'arch' => $full_arch[0],
'subarch' => $full_arch[1] || '',
- 'bigendian' => $big_endian
+ 'bigendian' => $big_endian,
+ 'x86_64' => $is_x86_64
);
write_test_code(\%params);
diff --git a/risugen_x86.pm b/risugen_x86.pm
new file mode 100644
index 0000000..fd16c45
--- /dev/null
+++ b/risugen_x86.pm
@@ -0,0 +1,498 @@
+#!/usr/bin/perl -w
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###############################################################################
+
+# risugen_x86 -- risugen module for Intel i386/x86_64 architectures
+package risugen_x86;
+
+use strict;
+use warnings;
+
+use risugen_common;
+use risugen_x86_asm;
+use risugen_x86_emit;
+
+require Exporter;
+
+our @ISA = qw(Exporter);
+our @EXPORT = qw(write_test_code);
+
+use constant {
+ RISUOP_COMPARE => 0, # compare registers
+ RISUOP_TESTEND => 1, # end of test, stop
+ RISUOP_SETMEMBLOCK => 2, # eax is address of memory block (8192 bytes)
+ RISUOP_GETMEMBLOCK => 3, # add the address of memory block to eax
+ RISUOP_COMPAREMEM => 4, # compare memory block
+
+ # Maximum alignment restriction permitted for a memory op.
+ MAXALIGN => 64,
+ MEMBLOCK_LEN => 8192,
+};
+
+my $periodic_reg_random = 1;
+my $is_x86_64 = 0;
+
+sub write_risuop($)
+{
+ my ($op) = @_;
+
+ write_insn(opcode => X86OP_UD1,
+ modrm => {mod => MOD_DIRECT,
+ reg => REG_EAX,
+ rm => $op});
+}
+
+sub write_mov_rr($$)
+{
+ my ($r1, $r2) = @_;
+
+ my %insn = (opcode => X86OP_MOV,
+ modrm => {mod => MOD_DIRECT,
+ reg => ($r1 & 0x7),
+ rm => ($r2 & 0x7)});
+
+ $insn{rex}{w} = 1 if $is_x86_64;
+ $insn{rex}{r} = 1 if $r1 >= 8;
+ $insn{rex}{b} = 1 if $r2 >= 8;
+
+ write_insn(%insn);
+}
+
+sub write_mov_reg_imm($$)
+{
+ my ($reg, $imm) = @_;
+ my %insn;
+
+ if (0 <= $imm && $imm <= 0xffffffff) {
+ %insn = (opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
+ imm => {value => $imm, len => 4});
+ } elsif (-0x80000000 <= $imm && $imm <= 0x7fffffff) {
+ %insn = (opcode => {value => 0xC7, len => 1},
+ modrm => {mod => MOD_DIRECT,
+ reg => 0, rm => ($reg & 0x7)},
+ imm => {value => $imm, len => 4});
+
+ $insn{rex}{w} = 1 if $is_x86_64;
+ } else {
+ %insn = (rex => {w => 1},
+ opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
+ imm => {value => $imm, len => 8});
+ }
+
+ $insn{rex}{b} = 1 if $reg >= 8;
+ write_insn(%insn);
+}
+
+sub write_random_regdata()
+{
+ my $reg_cnt = $is_x86_64 ? 16 : 8;
+ my $bitlen = $is_x86_64 ? 64 : 32;
+
+ # initialize flags register
+ write_insn(opcode => X86OP_XOR,
+ modrm => {mod => MOD_DIRECT,
+ reg => REG_EAX,
+ rm => REG_EAX});
+ write_insn(opcode => X86OP_SAHF);
+
+ # general purpose registers
+ for (my $reg = 0; $reg < $reg_cnt; $reg++) {
+ if ($reg != REG_ESP) {
+ my $imm = randint_constr(bitlen => $bitlen, signed => 1);
+ write_mov_reg_imm($reg, $imm);
+ }
+ }
+}
+
+sub write_random_datablock($)
+{
+ my ($datalen) = @_;
+
+ # Write a block of random data, $datalen bytes long, aligned
+ # according to MAXALIGN, and load its address into EAX/RAX.
+
+ $datalen += MAXALIGN - 1;
+
+ # First, load current EIP/RIP into EAX/RAX. Easy to do on x86_64
+ # thanks to RIP-relative addressing, but on i386 we need to play
+ # some well-known tricks with CALL instruction.
+ if ($is_x86_64) {
+ # 4-byte AND + 5-byte JMP
+ my $disp32 = 4 + 5 + (MAXALIGN - 1);
+ my $reg = REG_EAX;
+
+ write_insn(rex => {w => 1},
+ opcode => X86OP_LEA,
+ modrm => {mod => MOD_INDIRECT,
+ reg => $reg, rm => REG_EBP},
+ disp => {value => $disp32, len => 4});
+
+ write_insn(rex => {w => 1},
+ opcode => X86OP_ALU_imm8,
+ modrm => {mod => MOD_DIRECT,
+ reg => 4, rm => $reg},
+ imm => {value => ~(MAXALIGN - 1),
+ len => 1});
+
+ } else {
+ # 1-byte POP + 3-byte ADD + 3-byte AND + 5-byte JMP
+ my $imm8 = 1 + 3 + 3 + 5 + (MAXALIGN - 1);
+ my $reg = REG_EAX;
+
+ # displacement = next instruction
+ write_insn(opcode => X86OP_CALL,
+ imm => {value => 0x00000000, len => 4});
+
+ write_insn(opcode => {value => 0x58 | ($reg & 0x7),
+ len => 1});
+
+ write_insn(opcode => X86OP_ALU_imm8,
+ modrm => {mod => MOD_DIRECT,
+ reg => 0, rm => $reg},
+ imm => {value => $imm8, len => 1});
+
+ write_insn(opcode => X86OP_ALU_imm8,
+ modrm => {mod => MOD_DIRECT,
+ reg => 4, rm => $reg},
+ imm => {value => ~(MAXALIGN - 1),
+ len => 1});
+ }
+
+ # JMP over the data blob.
+ write_insn(opcode => X86OP_JMP,
+ imm => {value => $datalen, len => 4});
+
+ # Generate the random data
+ for (my $w = 8; 0 < $w; $w /= 2) {
+ for (; $w <= $datalen; $datalen -= $w) {
+ insnv(%{rand_insn_imm(size => $w)});
+ }
+ }
+}
+
+sub write_random_ymmdata()
+{
+ my $ymm_cnt = $is_x86_64 ? 16 : 8;
+ my $ymm_len = 32;
+ my $datalen = $ymm_cnt * $ymm_len;
+
+ # Generate random data blob
+ write_random_datablock($datalen);
+
+ # Load the random data into YMM regs.
+ for (my $ymm_reg = 0; $ymm_reg < $ymm_cnt; $ymm_reg++) {
+ write_insn(vex => {l => VEX_L_256, p => VEX_P_DATA16,
+ r => !($ymm_reg >= 8)},
+ opcode => X86OP_VMOVAPS,
+ modrm => {mod => MOD_INDIRECT_DISP32,
+ reg => ($ymm_reg & 0x7),
+ rm => REG_EAX},
+ disp => {value => $ymm_reg * $ymm_len,
+ len => 4});
+ }
+}
+
+sub write_memblock_setup()
+{
+ # Generate random data blob
+ write_random_datablock(MEMBLOCK_LEN);
+ # Pointer is in EAX/RAX; set the memblock
+ write_risuop(RISUOP_SETMEMBLOCK);
+}
+
+sub write_random_register_data()
+{
+ write_random_ymmdata();
+ write_random_regdata();
+ write_risuop(RISUOP_COMPARE);
+}
+
+sub rand_insn_imm(%)
+{
+ my (%args) = @_;
+
+ return {
+ value => randint_constr(bitlen => ($args{size} * 8), signed => 1),
+ len => $args{size}
+ };
+}
+
+sub rand_insn_opcode($)
+{
+ # Given an instruction-details array, generate an instruction
+ my ($rec) = @_;
+ my $insnname = $rec->{name};
+ my $insnwidth = $rec->{width};
+
+ my $constraintfailures = 0;
+
+ INSN: while(1) {
+ my $opcode = randint_constr(bitlen => 32,
+ fixedbits => $rec->{fixedbits},
+ fixedbitmask => $rec->{fixedbitmask});
+
+ my $constraint = $rec->{blocks}{"constraints"};
+ if (defined $constraint) {
+ # user-specified constraint: evaluate in an environment
+ # with variables set corresponding to the variable fields.
+ my $v = eval_with_fields($insnname, $opcode, $rec, "constraints", $constraint);
+ if (!$v) {
+ $constraintfailures++;
+ if ($constraintfailures > 10000) {
+ print "10000 consecutive constraint failures for $insnname constraints string:\n$constraint\n";
+ exit (1);
+ }
+ next INSN;
+ }
+ }
+
+ # OK, we got a good one
+ $constraintfailures = 0;
+
+ return {
+ value => $opcode >> (32 - $insnwidth),
+ len => $insnwidth / 8
+ };
+ }
+}
+
+sub rand_insn_modrm($$)
+{
+ my ($opts, $insn) = @_;
+ my $modrm;
+
+ while (1) {
+ $modrm = rand_fill({mod => {bitlen => 2},
+ reg => {bitlen => 3},
+ rm => {bitlen => 3}},
+ $opts);
+
+ if ($modrm->{mod} != MOD_DIRECT) {
+ # Displacement only; we cannot use this since we
+ # don't know absolute address of the memblock.
+ next if $modrm->{mod} == MOD_INDIRECT && $modrm->{rm} == REG_EBP;
+
+ if ($modrm->{rm} == REG_ESP) {
+ # SIB byte present
+ my $sib = rand_fill({ss => {bitlen => 2},
+ index => {bitlen => 3},
+ base => {bitlen => 3}}, {});
+
+ # We cannot modify ESP/RSP during the tests
+ next if $sib->{base} == REG_ESP;
+
+ # When base and index register are the same,
+ # computing the correct memblock addresses and
+ # offsets gets way too complicated...
+ next if $sib->{base} == $sib->{index};
+
+ # No base register
+ next if $modrm->{mod} == MOD_INDIRECT && $sib->{base} == REG_EBP;
+
+ $insn->{sib} = $sib;
+ }
+
+ $insn->{disp} = rand_insn_imm(size => 1)
+ if $modrm->{mod} == MOD_INDIRECT_DISP8;
+
+ $insn->{disp} = rand_insn_imm(size => 4)
+ if $modrm->{mod} == MOD_INDIRECT_DISP32;
+ }
+
+ $insn->{modrm} = $modrm;
+ last;
+ }
+}
+
+sub rand_insn_rex($$)
+{
+ my ($opts, $insn) = @_;
+
+ $opts->{w} = 0 unless defined $opts->{w};
+ $opts->{x} = 0 unless defined $opts->{x} || defined $insn->{sib};
+
+ my $rex = rand_fill({w => {bitlen => 1},
+ r => {bitlen => 1},
+ b => {bitlen => 1},
+ x => {bitlen => 1}},
+ $opts);
+
+ $insn->{rex} = $rex
+ if $rex->{w} || $rex->{r} || $rex->{b} || $rex->{x};
+}
+
+sub rand_insn_vex($$)
+{
+ my ($opts, $insn) = @_;
+ my $vex;
+
+ $opts->{r} = 1 unless $is_x86_64;
+ $opts->{x} = 1 unless $is_x86_64 && (defined $opts->{x} || defined $insn->{sib});
+ $opts->{b} = 1 unless $is_x86_64;
+ $opts->{p} = 0 unless defined $opts->{p};
+
+ $vex->{r} = {bitlen => 1};
+ $vex->{v} = {bitlen => 4};
+ $vex->{l} = {bitlen => 1};
+ $vex->{p} = {bitlen => 2};
+
+ # Note that VEX.X, VEX.B, VEX.M and VEX.W are only present in the
+ # 3-byte VEX prefix. Since VEX.M is an extension of opcode, it
+ # makes no sense to randomize it; therefore, we can only include
+ # VEX.X, VEX.B and VEX.W if we are given a meaningful value for
+ # VEX.M.
+ if (defined $opts->{m}) {
+ $vex->{x} = {bitlen => 1};
+ $vex->{b} = {bitlen => 1};
+ $vex->{m} = {bitlen => 5};
+ $vex->{w} = {bitlen => 1};
+ }
+
+ $insn->{vex} = rand_fill($vex, $opts);
+}
+
+sub write_mem_getoffset($$)
+{
+ my ($opts, $insn) = @_;
+ my $offset, my $index;
+
+ $opts->{size} = 0 unless defined $opts->{size};
+ $opts->{align} = 1 unless defined $opts->{align};
+
+ if (!defined $opts->{base}
+ && defined $insn->{modrm}
+ && $insn->{modrm}{mod} != MOD_DIRECT) {
+
+ $opts->{base} = (defined $insn->{sib}
+ ? $insn->{sib}{base}
+ : $insn->{modrm}{rm});
+
+ if ($insn->{modrm}{mod} == MOD_INDIRECT && $opts->{base} == REG_EBP) {
+ delete $opts->{base}; # No base register
+ } else {
+ $opts->{base} |= $insn->{rex}{b} << 3 if defined $insn->{rex};
+ $opts->{base} |= (!$insn->{vex}{b}) << 3 if defined $insn->{vex};
+ }
+ }
+
+ if (!defined $opts->{index} && defined $insn->{sib}) {
+ $opts->{index} = $insn->{sib}{index};
+ $opts->{index} |= $insn->{rex}{x} << 3 if defined $insn->{rex};
+ $opts->{index} |= (!$insn->{vex}{x}) << 3 if defined $insn->{vex};
+ delete $opts->{index} if $opts->{index} == REG_ESP; # ESP means "none"
+ }
+
+ $opts->{ss} = $insn->{sib}{ss} if !defined $opts->{ss} && defined $insn->{sib};
+ $opts->{disp} = $insn->{disp} if !defined $opts->{disp} && defined $insn->{disp};
+
+ $offset = int(rand(MEMBLOCK_LEN - $opts->{size}));
+ $offset &= ~($opts->{align} - 1);
+
+ $offset -= $opts->{disp}{value} if defined $opts->{disp};
+
+ if (defined $opts->{index}) {
+ $index = randint_constr(bitlen => 32, signed => 1);
+ $offset -= $index * (1 << $opts->{ss});
+ }
+
+ if (defined $opts->{base} && defined $offset) {
+ write_mov_reg_imm(REG_EAX, $offset);
+ write_risuop(RISUOP_GETMEMBLOCK);
+ write_mov_rr($opts->{base}, REG_EAX);
+ }
+ if (defined $opts->{index} && defined $index) {
+ write_mov_reg_imm($opts->{index}, $index);
+ }
+}
+
+sub gen_one_insn($)
+{
+ my ($rec) = @_;
+ my $insn;
+
+ $insn->{opcode} = rand_insn_opcode($rec);
+ my $opts = parse_emitblock($rec, $insn);
+
+ # Operation with a ModR/M byte can potentially use a memory
+ # operand
+ $opts->{mem} = {}
+ unless (defined $opts->{mem}
+ || !defined $opts->{modrm});
+
+ # If none of REX/VEX/EVEX are specified, default to REX
+ $opts->{rex} = {}
+ unless (defined $opts->{rex}
+ || defined $opts->{vex}
+ || defined $opts->{evex}
+ || !defined $opts->{modrm});
+
+ # REX requires x86_64
+ delete $opts->{rex}
+ unless $is_x86_64;
+
+ $insn->{rep} = $opts->{rep} if defined $opts->{rep};
+ $insn->{repne} = $opts->{repne} if defined $opts->{repne};
+ $insn->{data16} = $opts->{data16} if defined $opts->{data16};
+
+ rand_insn_modrm($opts->{modrm}, $insn) if defined $opts->{modrm};
+
+ rand_insn_vex($opts->{vex}, $insn) if defined $opts->{vex};
+ # TODO rand_insn_evex($opts->{evex}, $insn) if defined $opts->{evex};
+ rand_insn_rex($opts->{rex}, $insn) if defined $opts->{rex};
+
+ $insn->{imm} = rand_insn_imm(%{$opts->{imm}}) if defined $opts->{imm};
+
+ write_mem_getoffset($opts->{mem}, $insn);
+ write_insn(%{$insn});
+}
+
+sub write_test_code($)
+{
+ my ($params) = @_;
+
+ my $numinsns = $params->{ 'numinsns' };
+ my $outfile = $params->{ 'outfile' };
+
+ my %insn_details = %{ $params->{ 'details' } };
+ my @keys = @{ $params->{ 'keys' } };
+
+ $is_x86_64 = $params->{ 'x86_64' };
+
+ open_bin($outfile);
+
+ # TODO better random number generator?
+ srand(0);
+
+ print "Generating code using patterns: @keys...\n";
+ progress_start(78, $numinsns);
+
+ write_memblock_setup();
+
+ # memblock setup doesn't clean its registers, so this must come afterwards.
+ write_random_register_data();
+
+ for my $i (1..$numinsns) {
+ my $insn_enc = $keys[int rand (@keys)];
+ gen_one_insn($insn_details{$insn_enc});
+ write_risuop(RISUOP_COMPARE);
+ # Rewrite the registers periodically. This avoids the tendency
+ # for the VFP registers to decay to NaNs and zeroes.
+ if ($periodic_reg_random && ($i % 100) == 0) {
+ write_random_register_data();
+ }
+ progress_update($i);
+ }
+ write_risuop(RISUOP_TESTEND);
+ progress_end();
+ close_bin();
+}
+
+1;
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (3 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
` (7 subsequent siblings)
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Accept all instructions whose bit length is divisible by 8. Note that
the maximum instruction length (as specified in the config file) is 32
bits, hence this change permits instructions which are 8 bits or 24
bits long (16-bit instructions have already been considered valid).
Note that while valid x86 instructions may be up to 15 bytes long, the
length constraint described above only applies to the main opcode
field, which is usually only 1 or 2 bytes long. Therefore, the primary
purpose of this change is to allow 1-byte x86 opcodes.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
risugen | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/risugen b/risugen
index 09a702a..17bf98f 100755
--- a/risugen
+++ b/risugen
@@ -229,12 +229,11 @@ sub parse_config_file($)
push @fields, [ $var, $bitpos, $bitmask ];
}
}
- if ($bitpos == 16) {
- # assume this is a half-width thumb instruction
+ if ($bitpos % 8 == 0) {
# Note that we don't fiddle with the bitmasks or positions,
# which means the generated insn will be in the high halfword!
- $insnwidth = 16;
- } elsif ($bitpos != 0) {
+ $insnwidth -= $bitpos;
+ } else {
print STDERR "$file:$.: ($insn $enc) not enough bits specified\n";
exit(1);
}
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (4 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-03 21:35 ` Richard Henderson
` (2 more replies)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions Jan Bobek
` (6 subsequent siblings)
12 siblings, 3 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add an x86 configuration file with all MMX instructions.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 96 insertions(+)
create mode 100644 x86.risu
diff --git a/x86.risu b/x86.risu
new file mode 100644
index 0000000..f2dd9b0
--- /dev/null
+++ b/x86.risu
@@ -0,0 +1,96 @@
+###############################################################################
+# Copyright (c) 2019 Linaro Limited
+# All rights reserved. This program and the accompanying materials
+# are made available under the terms of the Eclipse Public License v1.0
+# which accompanies this distribution, and is available at
+# http://www.eclipse.org/legal/epl-v10.html
+#
+# Contributors:
+# Jan Bobek - initial implementation
+###############################################################################
+
+# Input file for risugen defining x86 instructions
+.mode x86
+
+# Data Transfer Instructions
+MOVD MMX 00001111 011 d 1110 !emit { modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVD_mem MMX 00001111 011 d 1110 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVQ_mm MMX 00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
+
+# Arithmetic Instructions
+PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
+PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
+PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
+PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
+PADDSB MMX 00001111 11101100 !emit { modrm(); mem(size => 8); }
+PADDSW MMX 00001111 11101101 !emit { modrm(); mem(size => 8); }
+PADDUSB MMX 00001111 11011100 !emit { modrm(); mem(size => 8); }
+PADDUSW MMX 00001111 11011101 !emit { modrm(); mem(size => 8); }
+
+PSUBB MMX 00001111 11111000 !emit { modrm(); mem(size => 8); }
+PSUBW MMX 00001111 11111001 !emit { modrm(); mem(size => 8); }
+PSUBD MMX 00001111 11111010 !emit { modrm(); mem(size => 8); }
+PSUBSB MMX 00001111 11101000 !emit { modrm(); mem(size => 8); }
+PSUBSW MMX 00001111 11101001 !emit { modrm(); mem(size => 8); }
+PSUBUSB MMX 00001111 11011000 !emit { modrm(); mem(size => 8); }
+PSUBUSW MMX 00001111 11011001 !emit { modrm(); mem(size => 8); }
+
+PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
+PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
+
+PMADDWD MMX 00001111 11110101 !emit { modrm(); mem(size => 8); }
+
+# Comparison Instructions
+PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
+PCMPEQW MMX 00001111 01110101 !emit { modrm(); mem(size => 8); }
+PCMPEQD MMX 00001111 01110110 !emit { modrm(); mem(size => 8); }
+PCMPGTB MMX 00001111 01100100 !emit { modrm(); mem(size => 8); }
+PCMPGTW MMX 00001111 01100101 !emit { modrm(); mem(size => 8); }
+PCMPGTD MMX 00001111 01100110 !emit { modrm(); mem(size => 8); }
+
+# Logical Instructions
+PAND MMX 00001111 11011011 !emit { modrm(); mem(size => 8); }
+PANDN MMX 00001111 11011111 !emit { modrm(); mem(size => 8); }
+POR MMX 00001111 11101011 !emit { modrm(); mem(size => 8); }
+PXOR MMX 00001111 11101111 !emit { modrm(); mem(size => 8); }
+
+# Shift and Rotate Instructions
+PSLLW MMX 00001111 11110001 !emit { modrm(); mem(size => 8); }
+PSLLD MMX 00001111 11110010 !emit { modrm(); mem(size => 8); }
+PSLLQ MMX 00001111 11110011 !emit { modrm(); mem(size => 8); }
+
+PSLLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+
+PSRLW MMX 00001111 11010001 !emit { modrm(); mem(size => 8); }
+PSRLD MMX 00001111 11010010 !emit { modrm(); mem(size => 8); }
+PSRLQ MMX 00001111 11010011 !emit { modrm(); mem(size => 8); }
+
+PSRLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+
+PSRAW MMX 00001111 11100001 !emit { modrm(); mem(size => 8); }
+PSRAD MMX 00001111 11100010 !emit { modrm(); mem(size => 8); }
+
+PSRAW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PSRAD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+
+# Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
+PACKSSWB MMX 00001111 01100011 !emit { modrm(); mem(size => 8); }
+PACKSSDW MMX 00001111 01101011 !emit { modrm(); mem(size => 8); }
+PACKUSWB MMX 00001111 01100111 !emit { modrm(); mem(size => 8); }
+
+PUNPCKHBW MMX 00001111 01101000 !emit { modrm(); mem(size => 8); }
+PUNPCKHWD MMX 00001111 01101001 !emit { modrm(); mem(size => 8); }
+PUNPCKHDQ MMX 00001111 01101010 !emit { modrm(); mem(size => 8); }
+
+PUNPCKLBW MMX 00001111 01100000 !emit { modrm(); mem(size => 4); }
+PUNPCKLWD MMX 00001111 01100001 !emit { modrm(); mem(size => 4); }
+PUNPCKLDQ MMX 00001111 01100010 !emit { modrm(); mem(size => 4); }
+
+# State Management Instructions
+EMMS MMX 00001111 01110111 !emit { }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (5 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions Jan Bobek
` (5 subsequent siblings)
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add SSE instructions to the x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 100 insertions(+)
diff --git a/x86.risu b/x86.risu
index f2dd9b0..c29b210 100644
--- a/x86.risu
+++ b/x86.risu
@@ -19,6 +19,18 @@ MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MO
MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVQ_mm MMX 00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
+MOVAPS SSE 00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
+MOVUPS SSE 00001111 0001000 d !emit { modrm(); mem(size => 16); }
+MOVSS SSE 00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
+
+MOVLPS SSE 00001111 0001001 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVHPS SSE 00001111 0001011 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVLHPS SSE 00001111 00010110 !emit { modrm(mod => MOD_DIRECT); }
+MOVHLPS SSE 00001111 00010010 !emit { modrm(mod => MOD_DIRECT); }
+
+PMOVMSKB SSE 00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+MOVMSKPS SSE 00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+
# Arithmetic Instructions
PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
@@ -29,6 +41,9 @@ PADDSW MMX 00001111 11101101 !emit { modrm(); mem(size => 8); }
PADDUSB MMX 00001111 11011100 !emit { modrm(); mem(size => 8); }
PADDUSW MMX 00001111 11011101 !emit { modrm(); mem(size => 8); }
+ADDPS SSE 00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
+ADDSS SSE 00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
+
PSUBB MMX 00001111 11111000 !emit { modrm(); mem(size => 8); }
PSUBW MMX 00001111 11111001 !emit { modrm(); mem(size => 8); }
PSUBD MMX 00001111 11111010 !emit { modrm(); mem(size => 8); }
@@ -37,11 +52,47 @@ PSUBSW MMX 00001111 11101001 !emit { modrm(); mem(size => 8); }
PSUBUSB MMX 00001111 11011000 !emit { modrm(); mem(size => 8); }
PSUBUSW MMX 00001111 11011001 !emit { modrm(); mem(size => 8); }
+SUBPS SSE 00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
+SUBSS SSE 00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
+
PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
+PMULHUW SSE 00001111 11100100 !emit { modrm(); mem(size => 8); }
+
+MULPS SSE 00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
+MULSS SSE 00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
PMADDWD MMX 00001111 11110101 !emit { modrm(); mem(size => 8); }
+DIVPS SSE 00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
+DIVSS SSE 00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
+
+RCPPS SSE 00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
+RCPSS SSE 00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
+
+SQRTPS SSE 00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
+SQRTSS SSE 00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
+
+RSQRTPS SSE 00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
+RSQRTSS SSE 00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
+
+PMINUB SSE 00001111 11011010 !emit { modrm(); mem(size => 8); }
+PMINSW SSE 00001111 11101010 !emit { modrm(); mem(size => 8); }
+
+MINPS SSE 00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
+MINSS SSE 00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
+
+PMAXUB SSE 00001111 11011110 !emit { modrm(); mem(size => 8); }
+PMAXSW SSE 00001111 11101110 !emit { modrm(); mem(size => 8); }
+
+MAXPS SSE 00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
+MAXSS SSE 00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
+
+PAVGB SSE 00001111 11100000 !emit { modrm(); mem(size => 8); }
+PAVGW SSE 00001111 11100011 !emit { modrm(); mem(size => 8); }
+
+PSADBW SSE 00001111 11110110 !emit { modrm(); mem(size => 8); }
+
# Comparison Instructions
PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
PCMPEQW MMX 00001111 01110101 !emit { modrm(); mem(size => 8); }
@@ -50,11 +101,24 @@ PCMPGTB MMX 00001111 01100100 !emit { modrm(); mem(size => 8); }
PCMPGTW MMX 00001111 01100101 !emit { modrm(); mem(size => 8); }
PCMPGTD MMX 00001111 01100110 !emit { modrm(); mem(size => 8); }
+CMPPS SSE 00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+CMPSS SSE 00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
+
+UCOMISS SSE 00001111 00101110 !emit { modrm(); mem(size => 4); }
+COMISS SSE 00001111 00101111 !emit { modrm(); mem(size => 4); }
+
# Logical Instructions
PAND MMX 00001111 11011011 !emit { modrm(); mem(size => 8); }
+ANDPS SSE 00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
+
PANDN MMX 00001111 11011111 !emit { modrm(); mem(size => 8); }
+ANDNPS SSE 00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
+
POR MMX 00001111 11101011 !emit { modrm(); mem(size => 8); }
+ORPS SSE 00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
+
PXOR MMX 00001111 11101111 !emit { modrm(); mem(size => 8); }
+XORPS SSE 00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
# Shift and Rotate Instructions
PSLLW MMX 00001111 11110001 !emit { modrm(); mem(size => 8); }
@@ -92,5 +156,41 @@ PUNPCKLBW MMX 00001111 01100000 !emit { modrm(); mem(size => 4); }
PUNPCKLWD MMX 00001111 01100001 !emit { modrm(); mem(size => 4); }
PUNPCKLDQ MMX 00001111 01100010 !emit { modrm(); mem(size => 4); }
+UNPCKLPS SSE 00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
+UNPCKHPS SSE 00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
+
+PSHUFW SSE 00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
+SHUFPS SSE 00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
+PINSRW SSE 00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
+PEXTRW_reg SSE 00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+
+# Conversion Instructions
+CVTPI2PS SSE 00001111 00101010 !emit { modrm(); mem(size => 8); }
+CVTSI2SS SSE 00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
+CVTSI2SS_64 SSE 00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
+
+CVTPS2PI SSE 00001111 00101101 !emit { modrm(); mem(size => 8); }
+CVTSS2SI SSE 00001111 00101101 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTSS2SI_64 SSE 00001111 00101101 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+
+CVTTPS2PI SSE 00001111 00101100 !emit { modrm(); mem(size => 8); }
+CVTTSS2SI SSE 00001111 00101100 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTTSS2SI_64 SSE 00001111 00101100 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+
+# Cacheability Control, Prefetch, and Instruction Ordering Instructions
+MASKMOVQ SSE 00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
+MOVNTPS SSE 00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+MOVNTQ SSE 00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+
+PREFETCHT0 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
+PREFETCHT1 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
+PREFETCHT2 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 1); }
+PREFETCHNTA SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 0); mem(size => 1); }
+SFENCE SSE 00001111 10101110 !emit { modrm(mod => MOD_DIRECT, reg => 7); }
+
# State Management Instructions
EMMS MMX 00001111 01110111 !emit { }
+
+# LDMXCSR SSE 00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 4); }
+STMXCSR SSE 00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 4); }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (6 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions Jan Bobek
` (4 subsequent siblings)
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add SSE2 instructions to the x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+)
diff --git a/x86.risu b/x86.risu
index c29b210..9b63d6b 100644
--- a/x86.risu
+++ b/x86.risu
@@ -15,179 +15,332 @@
# Data Transfer Instructions
MOVD MMX 00001111 011 d 1110 !emit { modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
MOVD_mem MMX 00001111 011 d 1110 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+MOVD SSE2 00001111 011 d 1110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVD_mem SSE2 00001111 011 d 1110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVQ SSE2 00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+MOVQ_mem SSE2 00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVQ_mm MMX 00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
+MOVQ_xmm1 SSE2 00001111 01111110 !emit { rep(); modrm(); mem(size => 8); }
+MOVQ_xmm2 SSE2 00001111 11010110 !emit { data16(); modrm(); mem(size => 8); }
MOVAPS SSE 00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
+MOVAPD SSE2 00001111 0010100 d !emit { data16(); modrm(); mem(size => 16, align => 16); }
+MOVDQA SSE2 00001111 011 d 1111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MOVUPS SSE 00001111 0001000 d !emit { modrm(); mem(size => 16); }
+MOVUPD SSE2 00001111 0001000 d !emit { data16(); modrm(); mem(size => 16); }
+MOVDQU SSE2 00001111 011 d 1111 !emit { rep(); modrm(); mem(size => 16); }
MOVSS SSE 00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
+MOVSD SSE2 00001111 0001000 d !emit { repne(); modrm(); mem(size => 8); }
+
+MOVQ2DQ SSE2 00001111 11010110 !emit { rep(); modrm(mod => MOD_DIRECT); }
+MOVDQ2Q SSE2 00001111 11010110 !emit { repne(); modrm(mod => MOD_DIRECT); }
MOVLPS SSE 00001111 0001001 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVLPD SSE2 00001111 0001001 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVHPS SSE 00001111 0001011 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVHPD SSE2 00001111 0001011 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVLHPS SSE 00001111 00010110 !emit { modrm(mod => MOD_DIRECT); }
MOVHLPS SSE 00001111 00010010 !emit { modrm(mod => MOD_DIRECT); }
PMOVMSKB SSE 00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+PMOVMSKB SSE2 00001111 11010111 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
MOVMSKPS SSE 00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+MOVMKSPD SSE2 00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
# Arithmetic Instructions
PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
+PADDB SSE2 00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
+PADDW SSE2 00001111 11111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
+PADDD SSE2 00001111 11111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
+PADDQ SSE2 00001111 11010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDSB MMX 00001111 11101100 !emit { modrm(); mem(size => 8); }
+PADDSB SSE2 00001111 11101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDSW MMX 00001111 11101101 !emit { modrm(); mem(size => 8); }
+PADDSW SSE2 00001111 11101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDUSB MMX 00001111 11011100 !emit { modrm(); mem(size => 8); }
+PADDUSB SSE2 00001111 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PADDUSW MMX 00001111 11011101 !emit { modrm(); mem(size => 8); }
+PADDUSW SSE2 00001111 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
ADDPS SSE 00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
+ADDPD SSE2 00001111 01011000 !emit { data16(); modrm(); mem(size => 16, align => 16) }
ADDSS SSE 00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
+ADDSD SSE2 00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
PSUBB MMX 00001111 11111000 !emit { modrm(); mem(size => 8); }
+PSUBB SSE2 00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBW MMX 00001111 11111001 !emit { modrm(); mem(size => 8); }
+PSUBW SSE2 00001111 11111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBD MMX 00001111 11111010 !emit { modrm(); mem(size => 8); }
+PSUBD SSE2 00001111 11111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSUBQ_64 SSE2 00001111 11111011 !emit { modrm(); mem(size => 8); }
+PSUBQ SSE2 00001111 11111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBSB MMX 00001111 11101000 !emit { modrm(); mem(size => 8); }
+PSUBSB SSE2 00001111 11101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBSW MMX 00001111 11101001 !emit { modrm(); mem(size => 8); }
+PSUBSW SSE2 00001111 11101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBUSB MMX 00001111 11011000 !emit { modrm(); mem(size => 8); }
+PSUBUSB SSE2 00001111 11011000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBUSW MMX 00001111 11011001 !emit { modrm(); mem(size => 8); }
+PSUBUSW SSE2 00001111 11011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
SUBPS SSE 00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
+SUBPD SSE2 00001111 01011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
SUBSS SSE 00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
+SUBSD SSE2 00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
+PMULLW SSE2 00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
+PMULHW SSE2 00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMULHUW SSE 00001111 11100100 !emit { modrm(); mem(size => 8); }
+PMULHUW SSE2 00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULUDQ_64 SSE2 00001111 11110100 !emit { modrm(); mem(size => 8); }
+PMULUDQ SSE2 00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MULPS SSE 00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
+MULPD SSE2 00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MULSS SSE 00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
+MULSD SSE2 00001111 01011001 !emit { repne(); modrm(); mem(size => 8); }
PMADDWD MMX 00001111 11110101 !emit { modrm(); mem(size => 8); }
+PMADDWD SSE2 00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
DIVPS SSE 00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
+DIVPD SSE2 00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
DIVSS SSE 00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
+DIVSD SSE2 00001111 01011110 !emit { repne(); modrm(); mem(size => 8); }
RCPPS SSE 00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
RCPSS SSE 00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
SQRTPS SSE 00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
+SQRTPD SSE2 00001111 01010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
SQRTSS SSE 00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
+SQRTSD SSE2 00001111 01010001 !emit { repne(); modrm(); mem(size => 8); }
RSQRTPS SSE 00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
RSQRTSS SSE 00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
PMINUB SSE 00001111 11011010 !emit { modrm(); mem(size => 8); }
+PMINUB SSE2 00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMINSW SSE 00001111 11101010 !emit { modrm(); mem(size => 8); }
+PMINSW SSE2 00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MINPS SSE 00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
+MINPD SSE2 00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MINSS SSE 00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
+MINSD SSE2 00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
PMAXUB SSE 00001111 11011110 !emit { modrm(); mem(size => 8); }
+PMAXUB SSE2 00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMAXSW SSE 00001111 11101110 !emit { modrm(); mem(size => 8); }
+PMAXSW SSE2 00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MAXPS SSE 00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
+MAXPD SSE2 00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MAXSS SSE 00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
+MAXSD SSE2 00001111 01011111 !emit { repne(); modrm(); mem(size => 8); }
PAVGB SSE 00001111 11100000 !emit { modrm(); mem(size => 8); }
+PAVGB SSE2 00001111 11100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PAVGW SSE 00001111 11100011 !emit { modrm(); mem(size => 8); }
+PAVGW SSE2 00001111 11100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSADBW SSE 00001111 11110110 !emit { modrm(); mem(size => 8); }
+PSADBW SSE2 00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
# Comparison Instructions
PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
+PCMPEQB SSE2 00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPEQW MMX 00001111 01110101 !emit { modrm(); mem(size => 8); }
+PCMPEQW SSE2 00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPEQD MMX 00001111 01110110 !emit { modrm(); mem(size => 8); }
+PCMPEQD SSE2 00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPGTB MMX 00001111 01100100 !emit { modrm(); mem(size => 8); }
+PCMPGTB SSE2 00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPGTW MMX 00001111 01100101 !emit { modrm(); mem(size => 8); }
+PCMPGTW SSE2 00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPGTD MMX 00001111 01100110 !emit { modrm(); mem(size => 8); }
+PCMPGTD SSE2 00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
CMPPS SSE 00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+CMPPD SSE2 00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
CMPSS SSE 00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
+CMPSD SSE2 00001111 11000010 !emit { repne(); modrm(); mem(size => 8); imm(size => 1); }
UCOMISS SSE 00001111 00101110 !emit { modrm(); mem(size => 4); }
+UCOMISD SSE2 00001111 00101110 !emit { data16(); modrm(); mem(size => 8); }
+
COMISS SSE 00001111 00101111 !emit { modrm(); mem(size => 4); }
+COMISD SSE2 00001111 00101111 !emit { data16(); modrm(); mem(size => 8); }
# Logical Instructions
PAND MMX 00001111 11011011 !emit { modrm(); mem(size => 8); }
+PAND SSE2 00001111 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
ANDPS SSE 00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
+ANDPD SSE2 00001111 01010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PANDN MMX 00001111 11011111 !emit { modrm(); mem(size => 8); }
+PANDN SSE2 00001111 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
ANDNPS SSE 00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
+ANDNPD SSE2 00001111 01010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
POR MMX 00001111 11101011 !emit { modrm(); mem(size => 8); }
+POR SSE2 00001111 11101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
ORPS SSE 00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
+ORPD SSE2 00001111 01010110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PXOR MMX 00001111 11101111 !emit { modrm(); mem(size => 8); }
+PXOR SSE2 00001111 11101111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
XORPS SSE 00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
+XORPD SSE2 00001111 01010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
# Shift and Rotate Instructions
PSLLW MMX 00001111 11110001 !emit { modrm(); mem(size => 8); }
+PSLLW SSE2 00001111 11110001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSLLD MMX 00001111 11110010 !emit { modrm(); mem(size => 8); }
+PSLLD SSE2 00001111 11110010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSLLQ MMX 00001111 11110011 !emit { modrm(); mem(size => 8); }
+PSLLQ SSE2 00001111 11110011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSLLDQ SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
PSLLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+PSLLQ_imm SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSRLW MMX 00001111 11010001 !emit { modrm(); mem(size => 8); }
+PSRLW SSE2 00001111 11010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSRLD MMX 00001111 11010010 !emit { modrm(); mem(size => 8); }
+PSRLD SSE2 00001111 11010010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSRLQ MMX 00001111 11010011 !emit { modrm(); mem(size => 8); }
+PSRLQ SSE2 00001111 11010011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSRLDQ SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
PSRLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+PSRLQ_imm SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRAW MMX 00001111 11100001 !emit { modrm(); mem(size => 8); }
+PSRAW SSE2 00001111 11100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSRAD MMX 00001111 11100010 !emit { modrm(); mem(size => 8); }
+PSRAD SSE2 00001111 11100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSRAW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PSRAW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PSRAD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
# Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
PACKSSWB MMX 00001111 01100011 !emit { modrm(); mem(size => 8); }
+PACKSSWB SSE2 00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PACKSSDW MMX 00001111 01101011 !emit { modrm(); mem(size => 8); }
+PACKSSDW SSE2 00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PACKUSWB MMX 00001111 01100111 !emit { modrm(); mem(size => 8); }
+PACKUSWB SSE2 00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKHBW MMX 00001111 01101000 !emit { modrm(); mem(size => 8); }
+PUNPCKHBW SSE2 00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKHWD MMX 00001111 01101001 !emit { modrm(); mem(size => 8); }
+PUNPCKHWD SSE2 00001111 01101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKHDQ MMX 00001111 01101010 !emit { modrm(); mem(size => 8); }
+PUNPCKHDQ SSE2 00001111 01101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PUNPCKHQDQ SSE2 00001111 01101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKLBW MMX 00001111 01100000 !emit { modrm(); mem(size => 4); }
+PUNPCKLBW SSE2 00001111 01100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKLWD MMX 00001111 01100001 !emit { modrm(); mem(size => 4); }
+PUNPCKLWD SSE2 00001111 01100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKLDQ MMX 00001111 01100010 !emit { modrm(); mem(size => 4); }
+PUNPCKLDQ SSE2 00001111 01100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PUNPCKLQDQ SSE2 00001111 01101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
UNPCKLPS SSE 00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
+UNPCKLPD SSE2 00001111 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
UNPCKHPS SSE 00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
+UNPCKHPD SSE2 00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSHUFW SSE 00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
+PSHUFLW SSE2 00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+PSHUFHW SSE2 00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+PSHUFD SSE2 00001111 01110000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
SHUFPS SSE 00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+SHUFPD SSE2 00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
PINSRW SSE 00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
+PINSRW SSE2 00001111 11000100 !emit { data16(); modrm(); mem(size => 2); imm(size => 1); }
+
PEXTRW_reg SSE 00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+PEXTRW_reg SSE2 00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
# Conversion Instructions
CVTPI2PS SSE 00001111 00101010 !emit { modrm(); mem(size => 8); }
CVTSI2SS SSE 00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
CVTSI2SS_64 SSE 00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
+CVTPI2PD SSE2 00001111 00101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTSI2SD SSE2 00001111 00101010 !emit { repne(); modrm(); mem(size => 4); }
+CVTSI2SD_64 SSE2 00001111 00101010 !emit { repne(); rex(w => 1); modrm(); mem(size => 8); }
CVTPS2PI SSE 00001111 00101101 !emit { modrm(); mem(size => 8); }
CVTSS2SI SSE 00001111 00101101 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
CVTSS2SI_64 SSE 00001111 00101101 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTPD2PI SSE2 00001111 00101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTSD2SI SSE2 00001111 00101101 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
+CVTSD2SI_64 SSE2 00001111 00101101 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
CVTTPS2PI SSE 00001111 00101100 !emit { modrm(); mem(size => 8); }
CVTTSS2SI SSE 00001111 00101100 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
CVTTSS2SI_64 SSE 00001111 00101100 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+CVTTPD2PI SSE2 00001111 00101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTTSD2SI SSE2 00001111 00101100 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
+CVTTSD2SI_64 SSE2 00001111 00101100 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
+
+CVTPD2DQ SSE2 00001111 11100110 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+CVTTPD2DQ SSE2 00001111 11100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTDQ2PD SSE2 00001111 11100110 !emit { rep(); modrm(); mem(size => 8); }
+
+CVTPS2PD SSE2 00001111 01011010 !emit { modrm(); mem(size => 8); }
+CVTPD2PS SSE2 00001111 01011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTSS2SD SSE2 00001111 01011010 !emit { rep(); modrm(); mem(size => 4); }
+CVTSD2SS SSE2 00001111 01011010 !emit { repne(); modrm(); mem(size => 8); }
+
+CVTDQ2PS SSE2 00001111 01011011 !emit { modrm(); mem(size => 16, align => 16); }
+CVTPS2DQ SSE2 00001111 01011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+CVTTPS2DQ SSE2 00001111 01011011 !emit { rep(); modrm(); mem(size => 16, align => 16); }
# Cacheability Control, Prefetch, and Instruction Ordering Instructions
MASKMOVQ SSE 00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
+MASKMOVDQU SSE2 00001111 11110111 !emit { data16(); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
+
MOVNTPS SSE 00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+MOVNTPD SSE2 00001111 00101011 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+
+MOVNTI SSE2 00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+MOVNTI_64 SSE2 00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTQ SSE 00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+MOVNTDQ SSE2 00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
PREFETCHT0 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
PREFETCHT1 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
PREFETCHT2 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 1); }
PREFETCHNTA SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 0); mem(size => 1); }
+CFLUSH SSE2 00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 7); mem(size => 1); }
SFENCE SSE 00001111 10101110 !emit { modrm(mod => MOD_DIRECT, reg => 7); }
+LFENCE SSE2 00001111 10101110 !emit { modrm(mod => 0b11, reg => 0b101); }
+MFENCE SSE2 00001111 10101110 !emit { modrm(mod => 0b11, reg => 0b111); }
+PAUSE SSE2 10010000 !emit { rep(); }
# State Management Instructions
EMMS MMX 00001111 01110111 !emit { }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (7 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions Jan Bobek
` (3 subsequent siblings)
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add SSE3 instructions to the x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/x86.risu b/x86.risu
index 9b63d6b..01181dd 100644
--- a/x86.risu
+++ b/x86.risu
@@ -49,6 +49,11 @@ PMOVMSKB SSE2 00001111 11010111 !emit { data16(); modrm(mod => MOD_DIR
MOVMSKPS SSE 00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
MOVMKSPD SSE2 00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+LDDQU SSE3 00001111 11110000 !emit { repne(); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+MOVSHDUP SSE3 00001111 00010110 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+MOVSLDUP SSE3 00001111 00010010 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+MOVDDUP SSE3 00001111 00010010 !emit { repne(); modrm(); mem(size => 8); }
+
# Arithmetic Instructions
PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
PADDB SSE2 00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -72,6 +77,9 @@ ADDPD SSE2 00001111 01011000 !emit { data16(); modrm(); mem(size =>
ADDSS SSE 00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
ADDSD SSE2 00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
+HADDPS SSE3 00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+HADDPD SSE3 00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
PSUBB MMX 00001111 11111000 !emit { modrm(); mem(size => 8); }
PSUBB SSE2 00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSUBW MMX 00001111 11111001 !emit { modrm(); mem(size => 8); }
@@ -94,6 +102,12 @@ SUBPD SSE2 00001111 01011100 !emit { data16(); modrm(); mem(size =>
SUBSS SSE 00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
SUBSD SSE2 00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
+HSUBPS SSE3 00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+HSUBPD SSE3 00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
+ADDSUBPS SSE3 00001111 11010000 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+ADDSUBPD SSE3 00001111 11010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
PMULLW SSE2 00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (8 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions Jan Bobek
` (2 subsequent siblings)
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add SSSE3 instructions to the x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/x86.risu b/x86.risu
index 01181dd..35992d6 100644
--- a/x86.risu
+++ b/x86.risu
@@ -77,6 +77,13 @@ ADDPD SSE2 00001111 01011000 !emit { data16(); modrm(); mem(size =>
ADDSS SSE 00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
ADDSD SSE2 00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
+PHADDW_64 SSSE3 00001111 00111000 00000001 !emit { modrm(); mem(size => 8); }
+PHADDW SSSE3 00001111 00111000 00000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHADDD_64 SSSE3 00001111 00111000 00000010 !emit { modrm(); mem(size => 8); }
+PHADDD SSSE3 00001111 00111000 00000010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHADDSW_64 SSSE3 00001111 00111000 00000011 !emit { modrm(); mem(size => 8); }
+PHADDSW SSSE3 00001111 00111000 00000011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
HADDPS SSE3 00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
HADDPD SSE3 00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -102,6 +109,13 @@ SUBPD SSE2 00001111 01011100 !emit { data16(); modrm(); mem(size =>
SUBSS SSE 00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
SUBSD SSE2 00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
+PHSUBW_64 SSSE3 00001111 00111000 00000101 !emit { modrm(); mem(size => 8); }
+PHSUBW SSSE3 00001111 00111000 00000101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHSUBD_64 SSSE3 00001111 00111000 00000110 !emit { modrm(); mem(size => 8); }
+PHSUBD SSSE3 00001111 00111000 00000110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PHSUBSW_64 SSSE3 00001111 00111000 00000111 !emit { modrm(); mem(size => 8); }
+PHSUBSW SSSE3 00001111 00111000 00000111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
HSUBPS SSE3 00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
HSUBPD SSE3 00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -117,6 +131,9 @@ PMULHUW SSE2 00001111 11100100 !emit { data16(); modrm(); mem(size =>
PMULUDQ_64 SSE2 00001111 11110100 !emit { modrm(); mem(size => 8); }
PMULUDQ SSE2 00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULHRSW_64 SSSE3 00001111 00111000 00001011 !emit { modrm(); mem(size => 8); }
+PMULHRSW SSSE3 00001111 00111000 00001011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
MULPS SSE 00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
MULPD SSE2 00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MULSS SSE 00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
@@ -124,6 +141,8 @@ MULSD SSE2 00001111 01011001 !emit { repne(); modrm(); mem(size =>
PMADDWD MMX 00001111 11110101 !emit { modrm(); mem(size => 8); }
PMADDWD SSE2 00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMADDUBSW_64 SSSE3 00001111 00111000 00000100 !emit { modrm(); mem(size => 8); }
+PMADDUBSW SSSE3 00001111 00111000 00000100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
DIVPS SSE 00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
DIVPD SSE2 00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -169,6 +188,20 @@ PAVGW SSE2 00001111 11100011 !emit { data16(); modrm(); mem(size =>
PSADBW SSE 00001111 11110110 !emit { modrm(); mem(size => 8); }
PSADBW SSE2 00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PABSB_64 SSSE3 00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
+PABSB SSSE3 00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PABSW_64 SSSE3 00001111 00111000 00011101 !emit { modrm(); mem(size => 8); }
+PABSW SSSE3 00001111 00111000 00011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PABSD_64 SSSE3 00001111 00111000 00011110 !emit { modrm(); mem(size => 8); }
+PABSD SSSE3 00001111 00111000 00011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
+PSIGNB_64 SSSE3 00001111 00111000 00001000 !emit { modrm(); mem(size => 8); }
+PSIGNB SSSE3 00001111 00111000 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSIGNW_64 SSSE3 00001111 00111000 00001001 !emit { modrm(); mem(size => 8); }
+PSIGNW SSSE3 00001111 00111000 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSIGND_64 SSSE3 00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
+PSIGND SSSE3 00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
# Comparison Instructions
PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
PCMPEQB SSE2 00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -256,6 +289,9 @@ PSRAW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIR
PSRAD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+PALIGNR_64 SSSE3 00001111 00111010 00001111 !emit { modrm(); mem(size => 8); imm(size => 1); }
+PALIGNR SSSE3 00001111 00111010 00001111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
# Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
PACKSSWB MMX 00001111 01100011 !emit { modrm(); mem(size => 8); }
PACKSSWB SSE2 00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -285,6 +321,8 @@ UNPCKLPD SSE2 00001111 00010100 !emit { data16(); modrm(); mem(size =>
UNPCKHPS SSE 00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
UNPCKHPD SSE2 00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PSHUFB_64 SSSE3 00001111 00111000 00000000 !emit { modrm(); mem(size => 8); }
+PSHUFB SSSE3 00001111 00111000 00000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PSHUFW SSE 00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
PSHUFLW SSE2 00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
PSHUFHW SSE2 00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (9 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions Jan Bobek
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add SSE4.1 and SSE4.2 instructions to the x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/x86.risu b/x86.risu
index 35992d6..a73e209 100644
--- a/x86.risu
+++ b/x86.risu
@@ -124,10 +124,12 @@ ADDSUBPD SSE3 00001111 11010000 !emit { data16(); modrm(); mem(size =>
PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
PMULLW SSE2 00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULLD SSE4_1 00001111 00111000 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
PMULHW SSE2 00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMULHUW SSE 00001111 11100100 !emit { modrm(); mem(size => 8); }
PMULHUW SSE2 00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMULDQ SSE4_1 00001111 00111000 00101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMULUDQ_64 SSE2 00001111 11110100 !emit { modrm(); mem(size => 8); }
PMULUDQ SSE2 00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -162,18 +164,28 @@ RSQRTSS SSE 00001111 01010010 !emit { rep(); modrm(); mem(size => 4)
PMINUB SSE 00001111 11011010 !emit { modrm(); mem(size => 8); }
PMINUB SSE2 00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINUW SSE4_1 00001111 00111000 00111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINUD SSE4_1 00001111 00111000 00111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINSB SSE4_1 00001111 00111000 00111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMINSW SSE 00001111 11101010 !emit { modrm(); mem(size => 8); }
PMINSW SSE2 00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMINSD SSE4_1 00001111 00111000 00111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MINPS SSE 00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
MINPD SSE2 00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MINSS SSE 00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
MINSD SSE2 00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
+PHMINPOSUW SSE4_1 00001111 00111000 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
PMAXUB SSE 00001111 11011110 !emit { modrm(); mem(size => 8); }
PMAXUB SSE2 00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXUW SSE4_1 00001111 00111000 00111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXUD SSE4_1 00001111 00111000 00111111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXSB SSE4_1 00001111 00111000 00111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PMAXSW SSE 00001111 11101110 !emit { modrm(); mem(size => 8); }
PMAXSW SSE2 00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PMAXSD SSE4_1 00001111 00111000 00111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
MAXPS SSE 00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
MAXPD SSE2 00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -187,6 +199,7 @@ PAVGW SSE2 00001111 11100011 !emit { data16(); modrm(); mem(size =>
PSADBW SSE 00001111 11110110 !emit { modrm(); mem(size => 8); }
PSADBW SSE2 00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+MPSADBW SSE4_1 00001111 00111010 01000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
PABSB_64 SSSE3 00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
PABSB SSSE3 00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -202,6 +215,14 @@ PSIGNW SSSE3 00001111 00111000 00001001 !emit { data16(); modrm(); me
PSIGND_64 SSSE3 00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
PSIGND SSSE3 00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+DPPS SSE4_1 00001111 00111010 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+DPPD SSE4_1 00001111 00111010 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
+ROUNDPS SSE4_1 00001111 00111010 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+ROUNDPD SSE4_1 00001111 00111010 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+ROUNDSS SSE4_1 00001111 00111010 00001010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+ROUNDSD SSE4_1 00001111 00111010 00001011 !emit { data16(); modrm(); mem(size => 8); imm(size => 1); }
+
# Comparison Instructions
PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
PCMPEQB SSE2 00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -209,12 +230,21 @@ PCMPEQW MMX 00001111 01110101 !emit { modrm(); mem(size => 8); }
PCMPEQW SSE2 00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPEQD MMX 00001111 01110110 !emit { modrm(); mem(size => 8); }
PCMPEQD SSE2 00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PCMPEQQ SSE4_1 00001111 00111000 00101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPGTB MMX 00001111 01100100 !emit { modrm(); mem(size => 8); }
PCMPGTB SSE2 00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPGTW MMX 00001111 01100101 !emit { modrm(); mem(size => 8); }
PCMPGTW SSE2 00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PCMPGTD MMX 00001111 01100110 !emit { modrm(); mem(size => 8); }
PCMPGTD SSE2 00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PCMPGTQ SSE4_2 00001111 00111000 00110111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+
+PCMPESTRM SSE4_2 00001111 00111010 01100000 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+PCMPESTRI SSE4_2 00001111 00111010 01100001 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+PCMPISTRM SSE4_2 00001111 00111010 01100010 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+PCMPISTRI SSE4_2 00001111 00111010 01100011 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+
+PTEST SSE4_1 00001111 00111000 00010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
CMPPS SSE 00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
CMPPD SSE2 00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
@@ -299,6 +329,7 @@ PACKSSDW MMX 00001111 01101011 !emit { modrm(); mem(size => 8); }
PACKSSDW SSE2 00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PACKUSWB MMX 00001111 01100111 !emit { modrm(); mem(size => 8); }
PACKUSWB SSE2 00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PACKUSDW SSE4_1 00001111 00111000 00101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
PUNPCKHBW MMX 00001111 01101000 !emit { modrm(); mem(size => 8); }
PUNPCKHBW SSE2 00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
@@ -331,13 +362,50 @@ PSHUFD SSE2 00001111 01110000 !emit { data16(); modrm(); mem(size =>
SHUFPS SSE 00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
SHUFPD SSE2 00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+BLENDPS SSE4_1 00001111 00111010 00001100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+BLENDPD SSE4_1 00001111 00111010 00001101 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+BLENDVPS SSE4_1 00001111 00111000 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+BLENDVPD SSE4_1 00001111 00111000 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PBLENDVB SSE4_1 00001111 00111000 00010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+PBLENDW SSE4_1 00001111 00111010 00001110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+
+INSERTPS SSE4_1 00001111 00111010 00100001 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+PINSRB SSE4_1 00001111 00111010 00100000 !emit { data16(); modrm(); mem(size => 1); imm(size => 1); }
PINSRW SSE 00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
PINSRW SSE2 00001111 11000100 !emit { data16(); modrm(); mem(size => 2); imm(size => 1); }
+PINSRD SSE4_1 00001111 00111010 00100010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+PINSRQ SSE4_1 00001111 00111010 00100010 !emit { data16(); rex(w => 1); modrm(); mem(size => 8); imm(size => 1); }
+
+EXTRACTPS SSE4_1 00001111 00111010 00010111 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+EXTRACTPS_mem SSE4_1 00001111 00111010 00010111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+
+PEXTRB SSE4_1 00001111 00111010 00010100 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRB_mem SSE4_1 00001111 00111010 00010100 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 1); imm(size => 1); }
+PEXTRW SSE4_1 00001111 00111010 00010101 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRW_mem SSE4_1 00001111 00111010 00010101 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 2); imm(size => 1); }
+PEXTRD SSE4_1 00001111 00111010 00010110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRD_mem SSE4_1 00001111 00111010 00010110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+PEXTRQ SSE4_1 00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+PEXTRQ_mem SSE4_1 00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); imm(size => 1); }
PEXTRW_reg SSE 00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
PEXTRW_reg SSE2 00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
# Conversion Instructions
+PMOVSXBW SSE4_1 00001111 00111000 00100000 !emit { data16(); modrm(); mem(size => 8); }
+PMOVSXBD SSE4_1 00001111 00111000 00100001 !emit { data16(); modrm(); mem(size => 4); }
+PMOVSXBQ SSE4_1 00001111 00111000 00100010 !emit { data16(); modrm(); mem(size => 2); }
+PMOVSXWD SSE4_1 00001111 00111000 00100011 !emit { data16(); modrm(); mem(size => 8); }
+PMOVSXWQ SSE4_1 00001111 00111000 00100100 !emit { data16(); modrm(); mem(size => 4); }
+PMOVSXDQ SSE4_1 00001111 00111000 00100101 !emit { data16(); modrm(); mem(size => 8); }
+
+PMOVZXBW SSE4_1 00001111 00111000 00110000 !emit { data16(); modrm(); mem(size => 8); }
+PMOVZXBD SSE4_1 00001111 00111000 00110001 !emit { data16(); modrm(); mem(size => 4); }
+PMOVZXBQ SSE4_1 00001111 00111000 00110010 !emit { data16(); modrm(); mem(size => 2); }
+PMOVZXWD SSE4_1 00001111 00111000 00110011 !emit { data16(); modrm(); mem(size => 8); }
+PMOVZXWQ SSE4_1 00001111 00111000 00110100 !emit { data16(); modrm(); mem(size => 4); }
+PMOVZXDQ SSE4_1 00001111 00111000 00110101 !emit { data16(); modrm(); mem(size => 8); }
+
CVTPI2PS SSE 00001111 00101010 !emit { modrm(); mem(size => 8); }
CVTSI2SS SSE 00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
CVTSI2SS_64 SSE 00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
@@ -383,6 +451,7 @@ MOVNTI SSE2 00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem
MOVNTI_64 SSE2 00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTQ SSE 00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTDQ SSE2 00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+MOVNTDQA SSE4_1 00001111 00111000 00101010 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
PREFETCHT0 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
PREFETCHT1 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (10 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions Jan Bobek
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add AVX instructions to the x86 configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 288 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 288 insertions(+)
diff --git a/x86.risu b/x86.risu
index 17a5082..d3115ac 100644
--- a/x86.risu
+++ b/x86.risu
@@ -17,452 +17,736 @@ MOVD MMX 00001111 011 d 1110 !emit { modrm(mod => MOD_DIRECT, rm
MOVD_mem MMX 00001111 011 d 1110 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
MOVD SSE2 00001111 011 d 1110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
MOVD_mem SSE2 00001111 011 d 1110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
+VMOVD AVX 011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+VMOVD_mem AVX 011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVQ SSE2 00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
MOVQ_mem SSE2 00001111 011 d 1110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVQ AVX 011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 1, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
+VMOVQ_mem AVX 011 d 1110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, W => 1, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVQ_mm MMX 00001111 011 d 1111 !emit { modrm(); mem(size => 8); }
MOVQ_xmm1 SSE2 00001111 01111110 !emit { rep(); modrm(); mem(size => 8); }
+VMOVQ_xmm1 AVX 01111110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
MOVQ_xmm2 SSE2 00001111 11010110 !emit { data16(); modrm(); mem(size => 8); }
+VMOVQ_xmm2 AVX 11010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
MOVAPS SSE 00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
+VMOVAPS AVX 0010100 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
MOVAPD SSE2 00001111 0010100 d !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMOVAPD AVX 0010100 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
MOVDQA SSE2 00001111 011 d 1111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMOVDQA AVX 011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
MOVUPS SSE 00001111 0001000 d !emit { modrm(); mem(size => 16); }
+VMOVUPS AVX 0001000 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
MOVUPD SSE2 00001111 0001000 d !emit { data16(); modrm(); mem(size => 16); }
+VMOVUPD AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
MOVDQU SSE2 00001111 011 d 1111 !emit { rep(); modrm(); mem(size => 16); }
+VMOVDQU AVX 011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
MOVSS SSE 00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
+VMOVSS AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
+VMOVSS_mem AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
MOVSD SSE2 00001111 0001000 d !emit { repne(); modrm(); mem(size => 8); }
+VMOVSD AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
+VMOVSD_mem AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVQ2DQ SSE2 00001111 11010110 !emit { rep(); modrm(mod => MOD_DIRECT); }
MOVDQ2Q SSE2 00001111 11010110 !emit { repne(); modrm(mod => MOD_DIRECT); }
MOVLPS SSE 00001111 0001001 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPS_ld AVX 00010010 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPS_st AVX 00010011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVLPD SSE2 00001111 0001001 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPD_ld AVX 00010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVLPD_st AVX 00010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVHPS SSE 00001111 0001011 d !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPS_ld AVX 00010110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPS_st AVX 00010111 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVHPD SSE2 00001111 0001011 d !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPD_ld AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
+VMOVHPD_st AVX 00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVLHPS SSE 00001111 00010110 !emit { modrm(mod => MOD_DIRECT); }
+VMOVLHPS AVX 00010110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
MOVHLPS SSE 00001111 00010010 !emit { modrm(mod => MOD_DIRECT); }
+VMOVHLPS AVX 00010010 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
PMOVMSKB SSE 00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
PMOVMSKB SSE2 00001111 11010111 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VPMOVMSKB AVX 11010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
MOVMSKPS SSE 00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPS AVX 01010000 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
MOVMKSPD SSE2 00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPD AVX 01010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
LDDQU SSE3 00001111 11110000 !emit { repne(); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VLDDQU AVX 11110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
MOVSHDUP SSE3 00001111 00010110 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+VMOVSHDUP AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
MOVSLDUP SSE3 00001111 00010010 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+VMOVSLDUP AVX 00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
MOVDDUP SSE3 00001111 00010010 !emit { repne(); modrm(); mem(size => 8); }
+VMOVDDUP AVX 00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
# Arithmetic Instructions
PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
PADDB SSE2 00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDB AVX 11111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
PADDW SSE2 00001111 11111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDW AVX 11111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
PADDD SSE2 00001111 11111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDD AVX 11111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
PADDQ SSE2 00001111 11010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDQ AVX 11010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDSB MMX 00001111 11101100 !emit { modrm(); mem(size => 8); }
PADDSB SSE2 00001111 11101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDSB AVX 11101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDSW MMX 00001111 11101101 !emit { modrm(); mem(size => 8); }
PADDSW SSE2 00001111 11101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDSW AVX 11101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDUSB MMX 00001111 11011100 !emit { modrm(); mem(size => 8); }
PADDUSB SSE2 00001111 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDUSB AVX 11011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PADDUSW MMX 00001111 11011101 !emit { modrm(); mem(size => 8); }
PADDUSW SSE2 00001111 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPADDUSW AVX 11011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
ADDPS SSE 00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
+VADDPS AVX 01011000 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
ADDPD SSE2 00001111 01011000 !emit { data16(); modrm(); mem(size => 16, align => 16) }
+VADDPD AVX 01011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
ADDSS SSE 00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
+VADDSS AVX 01011000 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
ADDSD SSE2 00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
+VADDSD AVX 01011000 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
PHADDW_64 SSSE3 00001111 00111000 00000001 !emit { modrm(); mem(size => 8); }
PHADDW SSSE3 00001111 00111000 00000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHADDW AVX 00000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PHADDD_64 SSSE3 00001111 00111000 00000010 !emit { modrm(); mem(size => 8); }
PHADDD SSSE3 00001111 00111000 00000010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHADDD AVX 00000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PHADDSW_64 SSSE3 00001111 00111000 00000011 !emit { modrm(); mem(size => 8); }
PHADDSW SSSE3 00001111 00111000 00000011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHADDSW AVX 00000011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
HADDPS SSE3 00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VHADDPS AVX 01111100 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
HADDPD SSE3 00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VHADDPD AVX 01111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBB MMX 00001111 11111000 !emit { modrm(); mem(size => 8); }
PSUBB SSE2 00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBB AVX 11111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBW MMX 00001111 11111001 !emit { modrm(); mem(size => 8); }
PSUBW SSE2 00001111 11111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBW AVX 11111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBD MMX 00001111 11111010 !emit { modrm(); mem(size => 8); }
PSUBD SSE2 00001111 11111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBD AVX 11111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBQ_64 SSE2 00001111 11111011 !emit { modrm(); mem(size => 8); }
PSUBQ SSE2 00001111 11111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBQ AVX 11111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBSB MMX 00001111 11101000 !emit { modrm(); mem(size => 8); }
PSUBSB SSE2 00001111 11101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBSB AVX 11101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBSW MMX 00001111 11101001 !emit { modrm(); mem(size => 8); }
PSUBSW SSE2 00001111 11101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBSW AVX 11101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBUSB MMX 00001111 11011000 !emit { modrm(); mem(size => 8); }
PSUBUSB SSE2 00001111 11011000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBUSB AVX 11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSUBUSW MMX 00001111 11011001 !emit { modrm(); mem(size => 8); }
PSUBUSW SSE2 00001111 11011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSUBUSW AVX 11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
SUBPS SSE 00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
+VSUBPS AVX 01011100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
SUBPD SSE2 00001111 01011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VSUBPD AVX 01011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
SUBSS SSE 00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
+VSUBSS AVX 01011100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
SUBSD SSE2 00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
+VSUBSD AVX 01011100 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
PHSUBW_64 SSSE3 00001111 00111000 00000101 !emit { modrm(); mem(size => 8); }
PHSUBW SSSE3 00001111 00111000 00000101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHSUBW AVX 00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PHSUBD_64 SSSE3 00001111 00111000 00000110 !emit { modrm(); mem(size => 8); }
PHSUBD SSSE3 00001111 00111000 00000110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHSUBD AVX 00000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PHSUBSW_64 SSSE3 00001111 00111000 00000111 !emit { modrm(); mem(size => 8); }
PHSUBSW SSSE3 00001111 00111000 00000111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHSUBSW AVX 00000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
HSUBPS SSE3 00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VHSUBPS AVX 01111101 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
HSUBPD SSE3 00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VHSUBPD AVX 01111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
ADDSUBPS SSE3 00001111 11010000 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VADDSUBPS AVX 11010000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
ADDSUBPD SSE3 00001111 11010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VADDSUBPD AVX 11010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
PMULLW SSE2 00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULLW AVX 11010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMULLD SSE4_1 00001111 00111000 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULLD AVX 01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
PMULHW SSE2 00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULHW AVX 11100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMULHUW SSE 00001111 11100100 !emit { modrm(); mem(size => 8); }
PMULHUW SSE2 00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULHUW AVX 11100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMULDQ SSE4_1 00001111 00111000 00101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULDQ AVX 00101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMULUDQ_64 SSE2 00001111 11110100 !emit { modrm(); mem(size => 8); }
PMULUDQ SSE2 00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULUDQ AVX 11110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMULHRSW_64 SSSE3 00001111 00111000 00001011 !emit { modrm(); mem(size => 8); }
PMULHRSW SSSE3 00001111 00111000 00001011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMULHRSW AVX 00001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
MULPS SSE 00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
+VMULPS AVX 01011001 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
MULPD SSE2 00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMULPD AVX 01011001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
MULSS SSE 00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
+VMULSS AVX 01011001 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
MULSD SSE2 00001111 01011001 !emit { repne(); modrm(); mem(size => 8); }
+VMULSD AVX 01011001 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
PMADDWD MMX 00001111 11110101 !emit { modrm(); mem(size => 8); }
PMADDWD SSE2 00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMADDWD AVX 11110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMADDUBSW_64 SSSE3 00001111 00111000 00000100 !emit { modrm(); mem(size => 8); }
PMADDUBSW SSSE3 00001111 00111000 00000100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMADDUBSW AVX 00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
DIVPS SSE 00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
+VDIVPS AVX 01011110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
DIVPD SSE2 00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VDIVPD AVX 01011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
DIVSS SSE 00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
+VDIVSS AVX 01011110 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
DIVSD SSE2 00001111 01011110 !emit { repne(); modrm(); mem(size => 8); }
+VDIVSD AVX 01011110 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
RCPPS SSE 00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
+VRCPPS AVX 01010011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
RCPSS SSE 00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
+VRCPSS AVX 01010011 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
SQRTPS SSE 00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
+VSQRTPS AVX 01010001 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
SQRTPD SSE2 00001111 01010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VSQRTPD AVX 01010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
SQRTSS SSE 00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
+VSQRTSS AVX 01010001 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
SQRTSD SSE2 00001111 01010001 !emit { repne(); modrm(); mem(size => 8); }
+VSQRTSD AVX 01010001 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
RSQRTPS SSE 00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
+VRSQRTPS AVX 01010010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
RSQRTSS SSE 00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
+VRSQRTSS AVX 01010010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
PMINUB SSE 00001111 11011010 !emit { modrm(); mem(size => 8); }
PMINUB SSE2 00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINUB AVX 11011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMINUW SSE4_1 00001111 00111000 00111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINUW AVX 00111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMINUD SSE4_1 00001111 00111000 00111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINUD AVX 00111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMINSB SSE4_1 00001111 00111000 00111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINSB AVX 00111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMINSW SSE 00001111 11101010 !emit { modrm(); mem(size => 8); }
PMINSW SSE2 00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINSW AVX 11101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMINSD SSE4_1 00001111 00111000 00111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMINSD AVX 00111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
MINPS SSE 00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
+VMINPS AVX 01011101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
MINPD SSE2 00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMINPD AVX 01011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
MINSS SSE 00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
+VMINSS AVX 01011101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
MINSD SSE2 00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
+VMINSD AVX 01011101 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
PHMINPOSUW SSE4_1 00001111 00111000 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPHMINPOSUW AVX 01000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PMAXUB SSE 00001111 11011110 !emit { modrm(); mem(size => 8); }
PMAXUB SSE2 00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXUB AVX 11011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMAXUW SSE4_1 00001111 00111000 00111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXUW AVX 00111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMAXUD SSE4_1 00001111 00111000 00111111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXUD AVX 00111111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMAXSB SSE4_1 00001111 00111000 00111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXSB AVX 00111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PMAXSW SSE 00001111 11101110 !emit { modrm(); mem(size => 8); }
PMAXSW SSE2 00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXSW AVX 11101110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PMAXSD SSE4_1 00001111 00111000 00111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPMAXSD AVX 00111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
MAXPS SSE 00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
+VMAXPS AVX 01011111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
MAXPD SSE2 00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VMAXPD AVX 01011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
MAXSS SSE 00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
+VMAXSS AVX 01011111 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
MAXSD SSE2 00001111 01011111 !emit { repne(); modrm(); mem(size => 8); }
+VMAXSD AVX 01011111 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
PAVGB SSE 00001111 11100000 !emit { modrm(); mem(size => 8); }
PAVGB SSE2 00001111 11100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPAVGB AVX 11100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PAVGW SSE 00001111 11100011 !emit { modrm(); mem(size => 8); }
PAVGW SSE2 00001111 11100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPAVGW AVX 11100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSADBW SSE 00001111 11110110 !emit { modrm(); mem(size => 8); }
PSADBW SSE2 00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSADBW AVX 11110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
MPSADBW SSE4_1 00001111 00111010 01000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VMPSADBW AVX 01000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
PABSB_64 SSSE3 00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
PABSB SSSE3 00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPABSB AVX 00011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PABSW_64 SSSE3 00001111 00111000 00011101 !emit { modrm(); mem(size => 8); }
PABSW SSSE3 00001111 00111000 00011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPABSW AVX 00011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PABSD_64 SSSE3 00001111 00111000 00011110 !emit { modrm(); mem(size => 8); }
PABSD SSSE3 00001111 00111000 00011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPABSD AVX 00011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PSIGNB_64 SSSE3 00001111 00111000 00001000 !emit { modrm(); mem(size => 8); }
PSIGNB SSSE3 00001111 00111000 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSIGNB AVX 00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PSIGNW_64 SSSE3 00001111 00111000 00001001 !emit { modrm(); mem(size => 8); }
PSIGNW SSSE3 00001111 00111000 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSIGNW AVX 00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PSIGND_64 SSSE3 00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
PSIGND SSSE3 00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSIGND AVX 00001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
DPPS SSE4_1 00001111 00111010 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VDPPS AVX 01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
DPPD SSE4_1 00001111 00111010 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VDPPD AVX 01000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
ROUNDPS SSE4_1 00001111 00111010 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VROUNDPS AVX 00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
ROUNDPD SSE4_1 00001111 00111010 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VROUNDPD AVX 00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
ROUNDSS SSE4_1 00001111 00111010 00001010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+VROUNDSS AVX 00001010 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
ROUNDSD SSE4_1 00001111 00111010 00001011 !emit { data16(); modrm(); mem(size => 8); imm(size => 1); }
+VROUNDSD AVX 00001011 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 8); imm(size => 1); }
# AES Instructions
AESDEC AES 00001111 00111000 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESDEC AES_AVX 11011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
AESDECLAST AES 00001111 00111000 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESDECLAST AES_AVX 11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
AESENC AES 00001111 00111000 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESENC AES_AVX 11011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
AESENCLAST AES 00001111 00111000 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESENCLAST AES_AVX 11011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
AESIMC AES 00001111 00111000 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VAESIMC AES_AVX 11011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
AESKEYGENASSIST AES 00001111 00111010 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VAESKEYGENASSIST AES_AVX 11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
# PCLMULQDQ Instructions
PCLMULQDQ PCLMULQDQ 00001111 00111010 01000100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPCLMULQDQ PCLMULQDQ_AVX 01000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
# Comparison Instructions
PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
PCMPEQB SSE2 00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQB AVX 01110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PCMPEQW MMX 00001111 01110101 !emit { modrm(); mem(size => 8); }
PCMPEQW SSE2 00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQW AVX 01110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PCMPEQD MMX 00001111 01110110 !emit { modrm(); mem(size => 8); }
PCMPEQD SSE2 00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQD AVX 01110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PCMPEQQ SSE4_1 00001111 00111000 00101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPEQQ AVX 00101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PCMPGTB MMX 00001111 01100100 !emit { modrm(); mem(size => 8); }
PCMPGTB SSE2 00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTB AVX 01100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PCMPGTW MMX 00001111 01100101 !emit { modrm(); mem(size => 8); }
PCMPGTW SSE2 00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTW AVX 01100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PCMPGTD MMX 00001111 01100110 !emit { modrm(); mem(size => 8); }
PCMPGTD SSE2 00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTD AVX 01100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PCMPGTQ SSE4_2 00001111 00111000 00110111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPCMPGTQ AVX 00110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PCMPESTRM SSE4_2 00001111 00111010 01100000 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPESTRM AVX 01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
PCMPESTRI SSE4_2 00001111 00111010 01100001 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPESTRI AVX 01100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
PCMPISTRM SSE4_2 00001111 00111010 01100010 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPISTRM AVX 01100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
PCMPISTRI SSE4_2 00001111 00111010 01100011 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
+VPCMPISTRI AVX 01100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
PTEST SSE4_1 00001111 00111000 00010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPTEST AVX 00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+
+VTESTPS AVX 00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VTESTPD AVX 00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CMPPS SSE 00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VCMPPS AVX 11000010 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
CMPPD SSE2 00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VCMPPD AVX 11000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
CMPSS SSE 00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
+VCMPSS AVX 11000010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); imm(size => 1); }
CMPSD SSE2 00001111 11000010 !emit { repne(); modrm(); mem(size => 8); imm(size => 1); }
+VCMPSD AVX 11000010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); imm(size => 1); }
UCOMISS SSE 00001111 00101110 !emit { modrm(); mem(size => 4); }
+VUCOMISS AVX 00101110 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
UCOMISD SSE2 00001111 00101110 !emit { data16(); modrm(); mem(size => 8); }
+VUCOMISD AVX 00101110 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
COMISS SSE 00001111 00101111 !emit { modrm(); mem(size => 4); }
+VCOMISS AVX 00101111 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
COMISD SSE2 00001111 00101111 !emit { data16(); modrm(); mem(size => 8); }
+VCOMISD AVX 00101111 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
# Logical Instructions
PAND MMX 00001111 11011011 !emit { modrm(); mem(size => 8); }
PAND SSE2 00001111 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPAND AVX 11011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
ANDPS SSE 00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
+VANDPS AVX 01010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
ANDPD SSE2 00001111 01010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VANDPD AVX 01010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PANDN MMX 00001111 11011111 !emit { modrm(); mem(size => 8); }
PANDN SSE2 00001111 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPANDN AVX 11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
ANDNPS SSE 00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
+VANDNPS AVX 01010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
ANDNPD SSE2 00001111 01010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VANDNPD AVX 01010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
POR MMX 00001111 11101011 !emit { modrm(); mem(size => 8); }
POR SSE2 00001111 11101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPOR AVX 11101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
ORPS SSE 00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
+VORPS AVX 01010110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
ORPD SSE2 00001111 01010110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VORPD AVX 01010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PXOR MMX 00001111 11101111 !emit { modrm(); mem(size => 8); }
PXOR SSE2 00001111 11101111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPXOR AVX 11101111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
XORPS SSE 00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
+VXORPS AVX 01010111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
XORPD SSE2 00001111 01010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VXORPD AVX 01010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
# Shift and Rotate Instructions
PSLLW MMX 00001111 11110001 !emit { modrm(); mem(size => 8); }
PSLLW SSE2 00001111 11110001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSLLW AVX 11110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSLLD MMX 00001111 11110010 !emit { modrm(); mem(size => 8); }
PSLLD SSE2 00001111 11110010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSLLD AVX 11110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSLLQ MMX 00001111 11110011 !emit { modrm(); mem(size => 8); }
PSLLQ SSE2 00001111 11110011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSLLQ AVX 11110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSLLDQ SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
+VPSLLDQ AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
PSLLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLW_imm AVX 01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLD_imm AVX 01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLQ_imm SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLQ_imm AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSRLW MMX 00001111 11010001 !emit { modrm(); mem(size => 8); }
PSRLW SSE2 00001111 11010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRLW AVX 11010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRLD MMX 00001111 11010010 !emit { modrm(); mem(size => 8); }
PSRLD SSE2 00001111 11010010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRLD AVX 11010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRLQ MMX 00001111 11010011 !emit { modrm(); mem(size => 8); }
PSRLQ SSE2 00001111 11010011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRLQ AVX 11010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRLDQ SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
+VPSRLDQ AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
PSRLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLW_imm AVX 01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLD_imm AVX 01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLQ_imm SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLQ_imm AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRAW MMX 00001111 11100001 !emit { modrm(); mem(size => 8); }
PSRAW SSE2 00001111 11100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRAW AVX 11100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRAD MMX 00001111 11100010 !emit { modrm(); mem(size => 8); }
PSRAD SSE2 00001111 11100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSRAD AVX 11100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRAW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAW_imm AVX 01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAD_imm AVX 01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PALIGNR_64 SSSE3 00001111 00111010 00001111 !emit { modrm(); mem(size => 8); imm(size => 1); }
PALIGNR SSSE3 00001111 00111010 00001111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPALIGNR AVX 00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
# Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
PACKSSWB MMX 00001111 01100011 !emit { modrm(); mem(size => 8); }
PACKSSWB SSE2 00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKSSWB AVX 01100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PACKSSDW MMX 00001111 01101011 !emit { modrm(); mem(size => 8); }
PACKSSDW SSE2 00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKSSDW AVX 01101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PACKUSWB MMX 00001111 01100111 !emit { modrm(); mem(size => 8); }
PACKUSWB SSE2 00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKUSWB AVX 01100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PACKUSDW SSE4_1 00001111 00111000 00101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPACKUSDW AVX 00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PUNPCKHBW MMX 00001111 01101000 !emit { modrm(); mem(size => 8); }
PUNPCKHBW SSE2 00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHBW AVX 01101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKHWD MMX 00001111 01101001 !emit { modrm(); mem(size => 8); }
PUNPCKHWD SSE2 00001111 01101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHWD AVX 01101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKHDQ MMX 00001111 01101010 !emit { modrm(); mem(size => 8); }
PUNPCKHDQ SSE2 00001111 01101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHDQ AVX 01101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKHQDQ SSE2 00001111 01101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKHQDQ AVX 01101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKLBW MMX 00001111 01100000 !emit { modrm(); mem(size => 4); }
PUNPCKLBW SSE2 00001111 01100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLBW AVX 01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKLWD MMX 00001111 01100001 !emit { modrm(); mem(size => 4); }
PUNPCKLWD SSE2 00001111 01100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLWD AVX 01100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKLDQ MMX 00001111 01100010 !emit { modrm(); mem(size => 4); }
PUNPCKLDQ SSE2 00001111 01100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLDQ AVX 01100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PUNPCKLQDQ SSE2 00001111 01101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPUNPCKLQDQ AVX 01101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
UNPCKLPS SSE 00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
+VUNPCKLPS AVX 00010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
UNPCKLPD SSE2 00001111 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VUNPCKLPD AVX 00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
UNPCKHPS SSE 00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
+VUNPCKHPS AVX 00010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
UNPCKHPD SSE2 00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VUNPCKHPD AVX 00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSHUFB_64 SSSE3 00001111 00111000 00000000 !emit { modrm(); mem(size => 8); }
PSHUFB SSSE3 00001111 00111000 00000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPSHUFB AVX 00000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
PSHUFW SSE 00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
PSHUFLW SSE2 00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPSHUFLW AVX 01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
PSHUFHW SSE2 00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPSHUFHW AVX 01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
PSHUFD SSE2 00001111 01110000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPSHUFD AVX 01110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
SHUFPS SSE 00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VSHUFPS AVX 11000110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
SHUFPD SSE2 00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VSHUFPD AVX 11000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
BLENDPS SSE4_1 00001111 00111010 00001100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VBLENDPS AVX 00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
BLENDPD SSE4_1 00001111 00111010 00001101 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VBLENDPD AVX 00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
BLENDVPS SSE4_1 00001111 00111000 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VBLENDVPS AVX 01001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
BLENDVPD SSE4_1 00001111 00111000 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VBLENDVPD AVX 01001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
PBLENDVB SSE4_1 00001111 00111000 00010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VPBLENDVB AVX 01001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
PBLENDW SSE4_1 00001111 00111010 00001110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
+VPBLENDW AVX 00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
INSERTPS SSE4_1 00001111 00111010 00100001 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+VINSERTPS AVX 00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
PINSRB SSE4_1 00001111 00111010 00100000 !emit { data16(); modrm(); mem(size => 1); imm(size => 1); }
+VPINSRB AVX 00100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 1); imm(size => 1); }
PINSRW SSE 00001111 11000100 !emit { modrm(); mem(size => 2); imm(size => 1); }
PINSRW SSE2 00001111 11000100 !emit { data16(); modrm(); mem(size => 2); imm(size => 1); }
+VPINSRW AVX 11000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, w => 0); modrm(); mem(size => 2); imm(size => 1); }
PINSRD SSE4_1 00001111 00111010 00100010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
+VPINSRD AVX 00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 4); imm(size => 1); }
PINSRQ SSE4_1 00001111 00111010 00100010 !emit { data16(); rex(w => 1); modrm(); mem(size => 8); imm(size => 1); }
+VPINSRQ AVX 00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1); modrm(); mem(size => 8); imm(size => 1); }
EXTRACTPS SSE4_1 00001111 00111010 00010111 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
EXTRACTPS_mem SSE4_1 00001111 00111010 00010111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+VEXTRACTPS AVX 00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VEXTRACTPS_mem AVX 00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
PEXTRB SSE4_1 00001111 00111010 00010100 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
PEXTRB_mem SSE4_1 00001111 00111010 00010100 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 1); imm(size => 1); }
+VPEXTRB AVX 00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRB_mem AVX 00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 1); imm(size => 1); }
PEXTRW SSE4_1 00001111 00111010 00010101 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
PEXTRW_mem SSE4_1 00001111 00111010 00010101 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 2); imm(size => 1); }
+VPEXTRW AVX 00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRW_mem AVX 00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 2); imm(size => 1); }
PEXTRD SSE4_1 00001111 00111010 00010110 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
PEXTRD_mem SSE4_1 00001111 00111010 00010110 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
+VPEXTRD AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRD_mem AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
PEXTRQ SSE4_1 00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
PEXTRQ_mem SSE4_1 00001111 00111010 00010110 !emit { data16(); rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); imm(size => 1); }
+VPEXTRQ AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
+VPEXTRQ_mem AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 8); imm(size => 1); }
PEXTRW_reg SSE 00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
PEXTRW_reg SSE2 00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+VPEXTRW_reg AVX 11000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+
+VPERMILPS AVX 00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPS_imm AVX 00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPERMILPD AVX 00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPD_imm AVX 00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
# Conversion Instructions
PMOVSXBW SSE4_1 00001111 00111000 00100000 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVSXBW AVX 00100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVSXBD SSE4_1 00001111 00111000 00100001 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVSXBD AVX 00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
PMOVSXBQ SSE4_1 00001111 00111000 00100010 !emit { data16(); modrm(); mem(size => 2); }
+VPMOVSXBQ AVX 00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
PMOVSXWD SSE4_1 00001111 00111000 00100011 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVSXWD AVX 00100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVSXWQ SSE4_1 00001111 00111000 00100100 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVSXWQ AVX 00100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
PMOVSXDQ SSE4_1 00001111 00111000 00100101 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVSXDQ AVX 00100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVZXBW SSE4_1 00001111 00111000 00110000 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVZXBW AVX 00110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVZXBD SSE4_1 00001111 00111000 00110001 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVZXBD AVX 00110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
PMOVZXBQ SSE4_1 00001111 00111000 00110010 !emit { data16(); modrm(); mem(size => 2); }
+VPMOVZXBQ AVX 00110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
PMOVZXWD SSE4_1 00001111 00111000 00110011 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVZXWD AVX 00110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVZXWQ SSE4_1 00001111 00111000 00110100 !emit { data16(); modrm(); mem(size => 4); }
+VPMOVZXWQ AVX 00110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
PMOVZXDQ SSE4_1 00001111 00111000 00110101 !emit { data16(); modrm(); mem(size => 8); }
+VPMOVZXDQ AVX 00110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
CVTPI2PS SSE 00001111 00101010 !emit { modrm(); mem(size => 8); }
CVTSI2SS SSE 00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
CVTSI2SS_64 SSE 00001111 00101010 !emit { rep(); rex(w => 1); modrm(); mem(size => 8); }
+VCVTSI2SS AVX 00101010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 0); modrm(); mem(size => 4); }
+VCVTSI2SS_64 AVX 00101010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 1); modrm(); mem(size => 8); }
CVTPI2PD SSE2 00001111 00101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
CVTSI2SD SSE2 00001111 00101010 !emit { repne(); modrm(); mem(size => 4); }
CVTSI2SD_64 SSE2 00001111 00101010 !emit { repne(); rex(w => 1); modrm(); mem(size => 8); }
+VCVTSI2SD AVX 00101010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 0); modrm(); mem(size => 4); }
+VCVTSI2SD_64 AVX 00101010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 1); modrm(); mem(size => 8); }
CVTPS2PI SSE 00001111 00101101 !emit { modrm(); mem(size => 8); }
CVTSS2SI SSE 00001111 00101101 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
CVTSS2SI_64 SSE 00001111 00101101 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTSS2SI AVX 00101101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTSS2SI_64 AVX 00101101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
CVTPD2PI SSE2 00001111 00101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
CVTSD2SI SSE2 00001111 00101101 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
CVTSD2SI_64 SSE2 00001111 00101101 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTSD2SI AVX 00101101 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTSD2SI_64 AVX 00101101 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
CVTTPS2PI SSE 00001111 00101100 !emit { modrm(); mem(size => 8); }
CVTTSS2SI SSE 00001111 00101100 !emit { rep(); modrm(reg => ~REG_ESP); mem(size => 4); }
CVTTSS2SI_64 SSE 00001111 00101100 !emit { rep(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTTSS2SI AVX 00101100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
+VCVTTSS2SI_64 AVX 00101100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 4); }
CVTTPD2PI SSE2 00001111 00101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
CVTTSD2SI SSE2 00001111 00101100 !emit { repne(); modrm(reg => ~REG_ESP); mem(size => 8); }
CVTTSD2SI_64 SSE2 00001111 00101100 !emit { repne(); rex(w => 1); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTTSD2SI AVX 00101100 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
+VCVTTSD2SI_64 AVX 00101100 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F, w => 1, v => VEX_V_UNUSED); modrm(reg => ~REG_ESP); mem(size => 8); }
CVTPD2DQ SSE2 00001111 11100110 !emit { repne(); modrm(); mem(size => 16, align => 16); }
+VCVTPD2DQ AVX 11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTTPD2DQ SSE2 00001111 11100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VCVTTPD2DQ AVX 11100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTDQ2PD SSE2 00001111 11100110 !emit { rep(); modrm(); mem(size => 8); }
+VCVTDQ2PD AVX 11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
CVTPS2PD SSE2 00001111 01011010 !emit { modrm(); mem(size => 8); }
+VCVTPS2PD AVX 01011010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
CVTPD2PS SSE2 00001111 01011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VCVTPD2PS AVX 01011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTSS2SD SSE2 00001111 01011010 !emit { rep(); modrm(); mem(size => 4); }
+VCVTSS2SD AVX 01011010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
CVTSD2SS SSE2 00001111 01011010 !emit { repne(); modrm(); mem(size => 8); }
+VCVTSD2SS AVX 01011010 !emit { vex(l => 0, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 8); }
CVTDQ2PS SSE2 00001111 01011011 !emit { modrm(); mem(size => 16, align => 16); }
+VCVTDQ2PS AVX 01011011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTPS2DQ SSE2 00001111 01011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
+VCVTPS2DQ AVX 01011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTTPS2DQ SSE2 00001111 01011011 !emit { rep(); modrm(); mem(size => 16, align => 16); }
+VCVTTPS2DQ AVX 01011011 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
# Cacheability Control, Prefetch, and Instruction Ordering Instructions
MASKMOVQ SSE 00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
MASKMOVDQU SSE2 00001111 11110111 !emit { data16(); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
+VMASKMOVDQU AVX 11110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
+
+VMASKMOVPS AVX 001011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VMASKMOVPD AVX 001011 d 1 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
MOVNTPS SSE 00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPS AVX 00101011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
MOVNTPD SSE2 00001111 00101011 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPD AVX 00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
MOVNTI SSE2 00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
MOVNTI_64 SSE2 00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTQ SSE 00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTDQ SSE2 00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQ AVX 11100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
MOVNTDQA SSE4_1 00001111 00111000 00101010 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQA AVX 00101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
PREFETCHT0 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
PREFETCHT1 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
@@ -476,6 +760,10 @@ PAUSE SSE2 10010000 !emit { rep(); }
# State Management Instructions
EMMS MMX 00001111 01110111 !emit { }
+VZEROUPPER AVX 01110111 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); }
+VZEROALL AVX 01110111 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); }
# LDMXCSR SSE 00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 4); }
+# VLDMXCSR AVX 10101110 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 4); }
STMXCSR SSE 00001111 10101110 !emit { modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 4); }
+VSTMXCSR AVX 10101110 !emit { vex(l => 0, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT, reg => 3); mem(size => 4); }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
` (11 preceding siblings ...)
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions Jan Bobek
@ 2019-07-01 4:35 ` Jan Bobek
12 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-01 4:35 UTC (permalink / raw)
To: qemu-devel; +Cc: Jan Bobek, Alex Bennée, Richard Henderson
Add AVX2 instructions to the configuration file.
Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
---
x86.risu | 257 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 257 insertions(+)
diff --git a/x86.risu b/x86.risu
index d3115ac..74c4ce8 100644
--- a/x86.risu
+++ b/x86.risu
@@ -33,16 +33,22 @@ VMOVQ_xmm2 AVX 11010110 !emit { vex(l => VEX_L_128, p => VEX_P
MOVAPS SSE 00001111 0010100 d !emit { modrm(); mem(size => 16, align => 16); }
VMOVAPS AVX 0010100 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
+VMOVAPS AVX2 0010100 d !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32, align => 32); }
MOVAPD SSE2 00001111 0010100 d !emit { data16(); modrm(); mem(size => 16, align => 16); }
VMOVAPD AVX 0010100 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
+VMOVAPD AVX2 0010100 d !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32, align => 32); }
MOVDQA SSE2 00001111 011 d 1111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VMOVDQA AVX 011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16, align => 16); }
+VMOVDQA AVX2 011 d 1111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32, align => 32); }
MOVUPS SSE 00001111 0001000 d !emit { modrm(); mem(size => 16); }
VMOVUPS AVX 0001000 d !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVUPS AVX2 0001000 d !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
MOVUPD SSE2 00001111 0001000 d !emit { data16(); modrm(); mem(size => 16); }
VMOVUPD AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVUPD AVX2 0001000 d !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
MOVDQU SSE2 00001111 011 d 1111 !emit { rep(); modrm(); mem(size => 16); }
VMOVDQU AVX 011 d 1111 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVDQU AVX2 011 d 1111 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
MOVSS SSE 00001111 0001000 d !emit { rep(); modrm(); mem(size => 4); }
VMOVSS AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(mod => MOD_DIRECT); }
VMOVSS_mem AVX 0001000 d !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 4); }
@@ -73,50 +79,67 @@ VMOVHLPS AVX 00010010 !emit { vex(l => VEX_L_128, m => VEX_
PMOVMSKB SSE 00001111 11010111 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
PMOVMSKB SSE2 00001111 11010111 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
VPMOVMSKB AVX 11010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VPMOVMSKB AVX2 11010111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
MOVMSKPS SSE 00001111 01010000 !emit { modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
VMOVMSKPS AVX 01010000 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPS AVX2 01010000 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
MOVMKSPD SSE2 00001111 01010000 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
VMOVMSKPD AVX 01010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
+VMOVMSKPD AVX2 01010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); }
LDDQU SSE3 00001111 11110000 !emit { repne(); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
VLDDQU AVX 11110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VLDDQU AVX2 11110000 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
MOVSHDUP SSE3 00001111 00010110 !emit { rep(); modrm(); mem(size => 16, align => 16); }
VMOVSHDUP AVX 00010110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVSHDUP AVX2 00010110 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
MOVSLDUP SSE3 00001111 00010010 !emit { rep(); modrm(); mem(size => 16, align => 16); }
VMOVSLDUP AVX 00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VMOVSLDUP AVX2 00010010 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
MOVDDUP SSE3 00001111 00010010 !emit { repne(); modrm(); mem(size => 8); }
VMOVDDUP AVX 00010010 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VMOVDDUP AVX2 00010010 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
# Arithmetic Instructions
PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
PADDB SSE2 00001111 11111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDB AVX 11111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDB AVX2 11111100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
PADDW SSE2 00001111 11111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDW AVX 11111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDW AVX2 11111101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
PADDD SSE2 00001111 11111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDD AVX 11111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDD AVX2 11111110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
PADDQ SSE2 00001111 11010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDQ AVX 11010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDQ AVX2 11010100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDSB MMX 00001111 11101100 !emit { modrm(); mem(size => 8); }
PADDSB SSE2 00001111 11101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDSB AVX 11101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDSB AVX2 11101100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDSW MMX 00001111 11101101 !emit { modrm(); mem(size => 8); }
PADDSW SSE2 00001111 11101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDSW AVX 11101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDSW AVX2 11101101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDUSB MMX 00001111 11011100 !emit { modrm(); mem(size => 8); }
PADDUSB SSE2 00001111 11011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDUSB AVX 11011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDUSB AVX2 11011100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PADDUSW MMX 00001111 11011101 !emit { modrm(); mem(size => 8); }
PADDUSW SSE2 00001111 11011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPADDUSW AVX 11011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPADDUSW AVX2 11011101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
ADDPS SSE 00001111 01011000 !emit { modrm(); mem(size => 16, align => 16); }
VADDPS AVX 01011000 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDPS AVX2 01011000 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
ADDPD SSE2 00001111 01011000 !emit { data16(); modrm(); mem(size => 16, align => 16) }
VADDPD AVX 01011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDPD AVX2 01011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
ADDSS SSE 00001111 01011000 !emit { rep(); modrm(); mem(size => 4); }
VADDSS AVX 01011000 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
ADDSD SSE2 00001111 01011000 !emit { repne(); modrm(); mem(size => 8); }
@@ -125,47 +148,62 @@ VADDSD AVX 01011000 !emit { vex(l => 0, p => VEX_P_REPNE,
PHADDW_64 SSSE3 00001111 00111000 00000001 !emit { modrm(); mem(size => 8); }
PHADDW SSSE3 00001111 00111000 00000001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPHADDW AVX 00000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHADDW AVX2 00000001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PHADDD_64 SSSE3 00001111 00111000 00000010 !emit { modrm(); mem(size => 8); }
PHADDD SSSE3 00001111 00111000 00000010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPHADDD AVX 00000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHADDD AVX2 00000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PHADDSW_64 SSSE3 00001111 00111000 00000011 !emit { modrm(); mem(size => 8); }
PHADDSW SSSE3 00001111 00111000 00000011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPHADDSW AVX 00000011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHADDSW AVX2 00000011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
HADDPS SSE3 00001111 01111100 !emit { repne(); modrm(); mem(size => 16, align => 16); }
VHADDPS AVX 01111100 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHADDPS AVX2 01111100 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 32); }
HADDPD SSE3 00001111 01111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VHADDPD AVX 01111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHADDPD AVX2 01111100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBB MMX 00001111 11111000 !emit { modrm(); mem(size => 8); }
PSUBB SSE2 00001111 11111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBB AVX 11111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBB AVX2 11111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBW MMX 00001111 11111001 !emit { modrm(); mem(size => 8); }
PSUBW SSE2 00001111 11111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBW AVX 11111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBW AVX2 11111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBD MMX 00001111 11111010 !emit { modrm(); mem(size => 8); }
PSUBD SSE2 00001111 11111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBD AVX 11111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBD AVX2 11111010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBQ_64 SSE2 00001111 11111011 !emit { modrm(); mem(size => 8); }
PSUBQ SSE2 00001111 11111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBQ AVX 11111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBQ AVX2 11111011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBSB MMX 00001111 11101000 !emit { modrm(); mem(size => 8); }
PSUBSB SSE2 00001111 11101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBSB AVX 11101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBSB AVX2 11101000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBSW MMX 00001111 11101001 !emit { modrm(); mem(size => 8); }
PSUBSW SSE2 00001111 11101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBSW AVX 11101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBSW AVX2 11101001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBUSB MMX 00001111 11011000 !emit { modrm(); mem(size => 8); }
PSUBUSB SSE2 00001111 11011000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBUSB AVX 11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBUSB AVX2 11011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSUBUSW MMX 00001111 11011001 !emit { modrm(); mem(size => 8); }
PSUBUSW SSE2 00001111 11011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSUBUSW AVX 11011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSUBUSW AVX2 11011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
SUBPS SSE 00001111 01011100 !emit { modrm(); mem(size => 16, align => 16); }
VSUBPS AVX 01011100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VSUBPS AVX2 01011100 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
SUBPD SSE2 00001111 01011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VSUBPD AVX 01011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VSUBPD AVX2 01011100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
SUBSS SSE 00001111 01011100 !emit { rep(); modrm(); mem(size => 4); }
VSUBSS AVX 01011100 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
SUBSD SSE2 00001111 01011100 !emit { repne(); modrm(); mem(size => 8); }
@@ -174,48 +212,64 @@ VSUBSD AVX 01011100 !emit { vex(l => 0, p => VEX_P_REPNE,
PHSUBW_64 SSSE3 00001111 00111000 00000101 !emit { modrm(); mem(size => 8); }
PHSUBW SSSE3 00001111 00111000 00000101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPHSUBW AVX 00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHSUBW AVX2 00000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PHSUBD_64 SSSE3 00001111 00111000 00000110 !emit { modrm(); mem(size => 8); }
PHSUBD SSSE3 00001111 00111000 00000110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPHSUBD AVX 00000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHSUBD AVX2 00000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PHSUBSW_64 SSSE3 00001111 00111000 00000111 !emit { modrm(); mem(size => 8); }
PHSUBSW SSSE3 00001111 00111000 00000111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPHSUBSW AVX 00000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPHSUBSW AVX2 00000111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
HSUBPS SSE3 00001111 01111101 !emit { repne(); modrm(); mem(size => 16, align => 16); }
VHSUBPS AVX 01111101 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHSUBPS AVX2 01111101 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 32); }
HSUBPD SSE3 00001111 01111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VHSUBPD AVX 01111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VHSUBPD AVX2 01111101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
ADDSUBPS SSE3 00001111 11010000 !emit { repne(); modrm(); mem(size => 16, align => 16); }
VADDSUBPS AVX 11010000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDSUBPS AVX2 11010000 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F); modrm(); mem(size => 32); }
ADDSUBPD SSE3 00001111 11010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VADDSUBPD AVX 11010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VADDSUBPD AVX2 11010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMULLW MMX 00001111 11010101 !emit { modrm(); mem(size => 8); }
PMULLW SSE2 00001111 11010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULLW AVX 11010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULLW AVX2 11010101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMULLD SSE4_1 00001111 00111000 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULLD AVX 01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMULLD AVX2 01000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMULHW MMX 00001111 11100101 !emit { modrm(); mem(size => 8); }
PMULHW SSE2 00001111 11100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULHW AVX 11100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULHW AVX2 11100101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMULHUW SSE 00001111 11100100 !emit { modrm(); mem(size => 8); }
PMULHUW SSE2 00001111 11100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULHUW AVX 11100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULHUW AVX2 11100100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMULDQ SSE4_1 00001111 00111000 00101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULDQ AVX 00101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMULDQ AVX2 00101000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMULUDQ_64 SSE2 00001111 11110100 !emit { modrm(); mem(size => 8); }
PMULUDQ SSE2 00001111 11110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULUDQ AVX 11110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMULUDQ AVX2 11110100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMULHRSW_64 SSSE3 00001111 00111000 00001011 !emit { modrm(); mem(size => 8); }
PMULHRSW SSSE3 00001111 00111000 00001011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMULHRSW AVX 00001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMULHRSW AVX2 00001011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
MULPS SSE 00001111 01011001 !emit { modrm(); mem(size => 16, align => 16); }
VMULPS AVX 01011001 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMULPS AVX2 01011001 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
MULPD SSE2 00001111 01011001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VMULPD AVX 01011001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMULPD AVX2 01011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
MULSS SSE 00001111 01011001 !emit { rep(); modrm(); mem(size => 4); }
VMULSS AVX 01011001 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
MULSD SSE2 00001111 01011001 !emit { repne(); modrm(); mem(size => 8); }
@@ -224,14 +278,18 @@ VMULSD AVX 01011001 !emit { vex(l => VEX_L_128, p => VEX_P
PMADDWD MMX 00001111 11110101 !emit { modrm(); mem(size => 8); }
PMADDWD SSE2 00001111 11110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMADDWD AVX 11110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMADDWD AVX2 11110101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMADDUBSW_64 SSSE3 00001111 00111000 00000100 !emit { modrm(); mem(size => 8); }
PMADDUBSW SSSE3 00001111 00111000 00000100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMADDUBSW AVX 00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMADDUBSW AVX2 00000100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
DIVPS SSE 00001111 01011110 !emit { modrm(); mem(size => 16, align => 16); }
VDIVPS AVX 01011110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VDIVPS AVX2 01011110 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
DIVPD SSE2 00001111 01011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VDIVPD AVX 01011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VDIVPD AVX2 01011110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
DIVSS SSE 00001111 01011110 !emit { rep(); modrm(); mem(size => 4); }
VDIVSS AVX 01011110 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
DIVSD SSE2 00001111 01011110 !emit { repne(); modrm(); mem(size => 8); }
@@ -239,13 +297,16 @@ VDIVSD AVX 01011110 !emit { vex(l => 0, p => VEX_P_REPNE,
RCPPS SSE 00001111 01010011 !emit { modrm(); mem(size => 16, align => 16); }
VRCPPS AVX 01010011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VRCPPS AVX2 01010011 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
RCPSS SSE 00001111 01010011 !emit { rep(); modrm(); mem(size => 4); }
VRCPSS AVX 01010011 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
SQRTPS SSE 00001111 01010001 !emit { modrm(); mem(size => 16, align => 16); }
VSQRTPS AVX 01010001 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VSQRTPS AVX2 01010001 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
SQRTPD SSE2 00001111 01010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VSQRTPD AVX 01010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VSQRTPD AVX2 01010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
SQRTSS SSE 00001111 01010001 !emit { rep(); modrm(); mem(size => 4); }
VSQRTSS AVX 01010001 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
SQRTSD SSE2 00001111 01010001 !emit { repne(); modrm(); mem(size => 8); }
@@ -253,28 +314,37 @@ VSQRTSD AVX 01010001 !emit { vex(l => 0, p => VEX_P_REPNE,
RSQRTPS SSE 00001111 01010010 !emit { modrm(); mem(size => 16, align => 16); }
VRSQRTPS AVX 01010010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VRSQRTPS AVX2 01010010 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
RSQRTSS SSE 00001111 01010010 !emit { rep(); modrm(); mem(size => 4); }
VRSQRTSS AVX 01010010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
PMINUB SSE 00001111 11011010 !emit { modrm(); mem(size => 8); }
PMINUB SSE2 00001111 11011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMINUB AVX 11011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMINUB AVX2 11011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMINUW SSE4_1 00001111 00111000 00111010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMINUW AVX 00111010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINUW AVX2 00111010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMINUD SSE4_1 00001111 00111000 00111011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMINUD AVX 00111011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINUD AVX2 00111011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMINSB SSE4_1 00001111 00111000 00111000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMINSB AVX 00111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINSB AVX2 00111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMINSW SSE 00001111 11101010 !emit { modrm(); mem(size => 8); }
PMINSW SSE2 00001111 11101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMINSW AVX 11101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMINSW AVX2 11101010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMINSD SSE4_1 00001111 00111000 00111001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMINSD AVX 00111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMINSD AVX2 00111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
MINPS SSE 00001111 01011101 !emit { modrm(); mem(size => 16, align => 16); }
VMINPS AVX 01011101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMINPS AVX2 01011101 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
MINPD SSE2 00001111 01011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VMINPD AVX 01011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMINPD AVX2 01011101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
MINSS SSE 00001111 01011101 !emit { rep(); modrm(); mem(size => 4); }
VMINSS AVX 01011101 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
MINSD SSE2 00001111 01011101 !emit { repne(); modrm(); mem(size => 8); }
@@ -286,22 +356,30 @@ VPHMINPOSUW AVX 01000001 !emit { vex(l => VEX_L_128, p
PMAXUB SSE 00001111 11011110 !emit { modrm(); mem(size => 8); }
PMAXUB SSE2 00001111 11011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMAXUB AVX 11011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMAXUB AVX2 11011110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMAXUW SSE4_1 00001111 00111000 00111110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMAXUW AVX 00111110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXUW AVX2 00111110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMAXUD SSE4_1 00001111 00111000 00111111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMAXUD AVX 00111111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXUD AVX2 00111111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMAXSB SSE4_1 00001111 00111000 00111100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMAXSB AVX 00111100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXSB AVX2 00111100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PMAXSW SSE 00001111 11101110 !emit { modrm(); mem(size => 8); }
PMAXSW SSE2 00001111 11101110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMAXSW AVX 11101110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPMAXSW AVX2 11101110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PMAXSD SSE4_1 00001111 00111000 00111101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPMAXSD AVX 00111101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPMAXSD AVX2 00111101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
MAXPS SSE 00001111 01011111 !emit { modrm(); mem(size => 16, align => 16); }
VMAXPS AVX 01011111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMAXPS AVX2 01011111 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
MAXPD SSE2 00001111 01011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VMAXPD AVX 01011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VMAXPD AVX2 01011111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
MAXSS SSE 00001111 01011111 !emit { rep(); modrm(); mem(size => 4); }
VMAXSS AVX 01011111 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
MAXSD SSE2 00001111 01011111 !emit { repne(); modrm(); mem(size => 8); }
@@ -310,45 +388,58 @@ VMAXSD AVX 01011111 !emit { vex(l => 0, p => VEX_P_REPNE,
PAVGB SSE 00001111 11100000 !emit { modrm(); mem(size => 8); }
PAVGB SSE2 00001111 11100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPAVGB AVX 11100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPAVGB AVX2 11100000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PAVGW SSE 00001111 11100011 !emit { modrm(); mem(size => 8); }
PAVGW SSE2 00001111 11100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPAVGW AVX 11100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPAVGW AVX2 11100011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSADBW SSE 00001111 11110110 !emit { modrm(); mem(size => 8); }
PSADBW SSE2 00001111 11110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSADBW AVX 11110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSADBW AVX2 11110110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
MPSADBW SSE4_1 00001111 00111010 01000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VMPSADBW AVX 01000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VMPSADBW AVX2 01000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
PABSB_64 SSSE3 00001111 00111000 00011100 !emit { modrm(); mem(size => 8); }
PABSB SSSE3 00001111 00111000 00011100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPABSB AVX 00011100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPABSB AVX2 00011100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
PABSW_64 SSSE3 00001111 00111000 00011101 !emit { modrm(); mem(size => 8); }
PABSW SSSE3 00001111 00111000 00011101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPABSW AVX 00011101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPABSW AVX2 00011101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
PABSD_64 SSSE3 00001111 00111000 00011110 !emit { modrm(); mem(size => 8); }
PABSD SSSE3 00001111 00111000 00011110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPABSD AVX 00011110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPABSD AVX2 00011110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
PSIGNB_64 SSSE3 00001111 00111000 00001000 !emit { modrm(); mem(size => 8); }
PSIGNB SSSE3 00001111 00111000 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSIGNB AVX 00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSIGNB AVX2 00001000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PSIGNW_64 SSSE3 00001111 00111000 00001001 !emit { modrm(); mem(size => 8); }
PSIGNW SSSE3 00001111 00111000 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSIGNW AVX 00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSIGNW AVX2 00001001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PSIGND_64 SSSE3 00001111 00111000 00001010 !emit { modrm(); mem(size => 8); }
PSIGND SSSE3 00001111 00111000 00001010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSIGND AVX 00001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSIGND AVX2 00001010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
DPPS SSE4_1 00001111 00111010 01000000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VDPPS AVX 01000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VDPPS AVX2 01000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
DPPD SSE4_1 00001111 00111010 01000001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VDPPD AVX 01000001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
ROUNDPS SSE4_1 00001111 00111010 00001000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VROUNDPS AVX 00001000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VROUNDPS AVX2 00001000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
ROUNDPD SSE4_1 00001111 00111010 00001001 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VROUNDPD AVX 00001001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VROUNDPD AVX2 00001001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
ROUNDSS SSE4_1 00001111 00111010 00001010 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
VROUNDSS AVX 00001010 !emit { vex(l => 0, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
ROUNDSD SSE4_1 00001111 00111010 00001011 !emit { data16(); modrm(); mem(size => 8); imm(size => 1); }
@@ -376,25 +467,33 @@ VPCLMULQDQ PCLMULQDQ_AVX 01000100 !emit { vex(l => VEX_L
PCMPEQB MMX 00001111 01110100 !emit { modrm(); mem(size => 8); }
PCMPEQB SSE2 00001111 01110100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPEQB AVX 01110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPEQB AVX2 01110100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PCMPEQW MMX 00001111 01110101 !emit { modrm(); mem(size => 8); }
PCMPEQW SSE2 00001111 01110101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPEQW AVX 01110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPEQW AVX2 01110101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PCMPEQD MMX 00001111 01110110 !emit { modrm(); mem(size => 8); }
PCMPEQD SSE2 00001111 01110110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPEQD AVX 01110110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPEQD AVX2 01110110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PCMPEQQ SSE4_1 00001111 00111000 00101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPEQQ AVX 00101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPCMPEQQ AVX2 00101001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PCMPGTB MMX 00001111 01100100 !emit { modrm(); mem(size => 8); }
PCMPGTB SSE2 00001111 01100100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPGTB AVX 01100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPGTB AVX2 01100100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PCMPGTW MMX 00001111 01100101 !emit { modrm(); mem(size => 8); }
PCMPGTW SSE2 00001111 01100101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPGTW AVX 01100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPGTW AVX2 01100101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PCMPGTD MMX 00001111 01100110 !emit { modrm(); mem(size => 8); }
PCMPGTD SSE2 00001111 01100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPGTD AVX 01100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPCMPGTD AVX2 01100110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PCMPGTQ SSE4_2 00001111 00111000 00110111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPCMPGTQ AVX 00110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPCMPGTQ AVX2 00110111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PCMPESTRM SSE4_2 00001111 00111010 01100000 !emit { data16(); modrm(); mem(size => 16); imm(size => 1); }
VPCMPESTRM AVX 01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
@@ -407,14 +506,19 @@ VPCMPISTRI AVX 01100011 !emit { vex(l => VEX_L_128, p
PTEST SSE4_1 00001111 00111000 00010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPTEST AVX 00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VPTEST AVX2 00010111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
VTESTPS AVX 00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VTESTPS AVX2 00001110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
VTESTPD AVX 00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VTESTPD AVX2 00001111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
CMPPS SSE 00001111 11000010 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
VCMPPS AVX 11000010 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VCMPPS AVX2 11000010 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
CMPPD SSE2 00001111 11000010 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VCMPPD AVX 11000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VCMPPD AVX2 11000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
CMPSS SSE 00001111 11000010 !emit { rep(); modrm(); mem(size => 4); imm(size => 1); }
VCMPSS AVX 11000010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); imm(size => 1); }
CMPSD SSE2 00001111 11000010 !emit { repne(); modrm(); mem(size => 8); imm(size => 1); }
@@ -434,172 +538,246 @@ VCOMISD AVX 00101111 !emit { vex(l => 0, p => VEX_P_DATA16,
PAND MMX 00001111 11011011 !emit { modrm(); mem(size => 8); }
PAND SSE2 00001111 11011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPAND AVX 11011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPAND AVX2 11011011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
ANDPS SSE 00001111 01010100 !emit { modrm(); mem(size => 16, align => 16); }
VANDPS AVX 01010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDPS AVX2 01010100 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
ANDPD SSE2 00001111 01010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VANDPD AVX 01010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDPD AVX2 01010100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PANDN MMX 00001111 11011111 !emit { modrm(); mem(size => 8); }
PANDN SSE2 00001111 11011111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPANDN AVX 11011111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPANDN AVX2 11011111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
ANDNPS SSE 00001111 01010101 !emit { modrm(); mem(size => 16, align => 16); }
VANDNPS AVX 01010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDNPS AVX2 01010101 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
ANDNPD SSE2 00001111 01010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VANDNPD AVX 01010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VANDNPD AVX2 01010101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
POR MMX 00001111 11101011 !emit { modrm(); mem(size => 8); }
POR SSE2 00001111 11101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPOR AVX 11101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPOR AVX2 11101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
ORPS SSE 00001111 01010110 !emit { modrm(); mem(size => 16, align => 16); }
VORPS AVX 01010110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VORPS AVX2 01010110 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
ORPD SSE2 00001111 01010110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VORPD AVX 01010110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VORPD AVX2 01010110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PXOR MMX 00001111 11101111 !emit { modrm(); mem(size => 8); }
PXOR SSE2 00001111 11101111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPXOR AVX 11101111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPXOR AVX2 11101111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
XORPS SSE 00001111 01010111 !emit { modrm(); mem(size => 16, align => 16); }
VXORPS AVX 01010111 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VXORPS AVX2 01010111 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
XORPD SSE2 00001111 01010111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VXORPD AVX 01010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VXORPD AVX2 01010111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
# Shift and Rotate Instructions
PSLLW MMX 00001111 11110001 !emit { modrm(); mem(size => 8); }
PSLLW SSE2 00001111 11110001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSLLW AVX 11110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSLLW AVX2 11110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSLLD MMX 00001111 11110010 !emit { modrm(); mem(size => 8); }
PSLLD SSE2 00001111 11110010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSLLD AVX 11110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSLLD AVX2 11110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSLLQ MMX 00001111 11110011 !emit { modrm(); mem(size => 8); }
PSLLQ SSE2 00001111 11110011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSLLQ AVX 11110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSLLQ AVX2 11110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSLLDQ SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
VPSLLDQ AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
+VPSLLDQ AVX2 01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 7); imm(size => 1); }
PSLLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
VPSLLW_imm AVX 01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLW_imm AVX2 01110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
VPSLLD_imm AVX 01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLD_imm AVX2 01110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
PSLLQ_imm SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
VPSLLQ_imm AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+VPSLLQ_imm AVX2 01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 6); imm(size => 1); }
+
+VPSLLVD_xmm AVX2 01000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPSLLVD AVX2 01000111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
+VPSLLVQ_xmm AVX2 01000111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 16); }
+VPSLLVQ AVX2 01000111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 32); }
PSRLW MMX 00001111 11010001 !emit { modrm(); mem(size => 8); }
PSRLW SSE2 00001111 11010001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSRLW AVX 11010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRLW AVX2 11010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRLD MMX 00001111 11010010 !emit { modrm(); mem(size => 8); }
PSRLD SSE2 00001111 11010010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSRLD AVX 11010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRLD AVX2 11010010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRLQ MMX 00001111 11010011 !emit { modrm(); mem(size => 8); }
PSRLQ SSE2 00001111 11010011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSRLQ AVX 11010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRLQ AVX2 11010011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRLDQ SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
VPSRLDQ AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
+VPSRLDQ AVX2 01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 3); imm(size => 1); }
PSRLW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
VPSRLW_imm AVX 01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLW_imm AVX2 01110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
VPSRLD_imm AVX 01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLD_imm AVX2 01110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLQ_imm MMX 00001111 01110011 !emit { modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
PSRLQ_imm SSE2 00001111 01110011 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
VPSRLQ_imm AVX 01110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+VPSRLQ_imm AVX2 01110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 2); imm(size => 1); }
+
+VPSRLVD_xmm AVX2 01000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPSRLVD AVX2 01000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
+VPSRLVQ_xmm AVX2 01000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 16); }
+VPSRLVQ AVX2 01000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(); mem(size => 32); }
PSRAW MMX 00001111 11100001 !emit { modrm(); mem(size => 8); }
PSRAW SSE2 00001111 11100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSRAW AVX 11100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRAW AVX2 11100001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRAD MMX 00001111 11100010 !emit { modrm(); mem(size => 8); }
PSRAD SSE2 00001111 11100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSRAD AVX 11100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPSRAD AVX2 11100010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
PSRAW_imm MMX 00001111 01110001 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAW_imm SSE2 00001111 01110001 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
VPSRAW_imm AVX 01110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAW_imm AVX2 01110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAD_imm MMX 00001111 01110010 !emit { modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
PSRAD_imm SSE2 00001111 01110010 !emit { data16(); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
VPSRAD_imm AVX 01110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+VPSRAD_imm AVX2 01110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(mod => MOD_DIRECT, reg => 4); imm(size => 1); }
+
+VPSRAVD_xmm AVX2 01000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPSRAVD AVX2 01000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
PALIGNR_64 SSSE3 00001111 00111010 00001111 !emit { modrm(); mem(size => 8); imm(size => 1); }
PALIGNR SSSE3 00001111 00111010 00001111 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VPALIGNR AVX 00001111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VPALIGNR AVX2 00001111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
# Shuffle, Unpack, Blend, Insert, Extract, Broadcast, Permute, Scatter Instructions
PACKSSWB MMX 00001111 01100011 !emit { modrm(); mem(size => 8); }
PACKSSWB SSE2 00001111 01100011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPACKSSWB AVX 01100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPACKSSWB AVX2 01100011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PACKSSDW MMX 00001111 01101011 !emit { modrm(); mem(size => 8); }
PACKSSDW SSE2 00001111 01101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPACKSSDW AVX 01101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPACKSSDW AVX2 01101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PACKUSWB MMX 00001111 01100111 !emit { modrm(); mem(size => 8); }
PACKUSWB SSE2 00001111 01100111 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPACKUSWB AVX 01100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPACKUSWB AVX2 01100111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PACKUSDW SSE4_1 00001111 00111000 00101011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPACKUSDW AVX 00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPACKUSDW AVX2 00101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PUNPCKHBW MMX 00001111 01101000 !emit { modrm(); mem(size => 8); }
PUNPCKHBW SSE2 00001111 01101000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKHBW AVX 01101000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHBW AVX2 01101000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKHWD MMX 00001111 01101001 !emit { modrm(); mem(size => 8); }
PUNPCKHWD SSE2 00001111 01101001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKHWD AVX 01101001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHWD AVX2 01101001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKHDQ MMX 00001111 01101010 !emit { modrm(); mem(size => 8); }
PUNPCKHDQ SSE2 00001111 01101010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKHDQ AVX 01101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHDQ AVX2 01101010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKHQDQ SSE2 00001111 01101101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKHQDQ AVX 01101101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKHQDQ AVX2 01101101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKLBW MMX 00001111 01100000 !emit { modrm(); mem(size => 4); }
PUNPCKLBW SSE2 00001111 01100000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKLBW AVX 01100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLBW AVX2 01100000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKLWD MMX 00001111 01100001 !emit { modrm(); mem(size => 4); }
PUNPCKLWD SSE2 00001111 01100001 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKLWD AVX 01100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLWD AVX2 01100001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKLDQ MMX 00001111 01100010 !emit { modrm(); mem(size => 4); }
PUNPCKLDQ SSE2 00001111 01100010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKLDQ AVX 01100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLDQ AVX2 01100010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PUNPCKLQDQ SSE2 00001111 01101100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPUNPCKLQDQ AVX 01101100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VPUNPCKLQDQ AVX2 01101100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
UNPCKLPS SSE 00001111 00010100 !emit { modrm(); mem(size => 16, align => 16); }
VUNPCKLPS AVX 00010100 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKLPS AVX2 00010100 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
UNPCKLPD SSE2 00001111 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VUNPCKLPD AVX 00010100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKLPD AVX2 00010100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
UNPCKHPS SSE 00001111 00010101 !emit { modrm(); mem(size => 16, align => 16); }
VUNPCKHPS AVX 00010101 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKHPS AVX2 00010101 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); }
UNPCKHPD SSE2 00001111 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VUNPCKHPD AVX 00010101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); }
+VUNPCKHPD AVX2 00010101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); }
PSHUFB_64 SSSE3 00001111 00111000 00000000 !emit { modrm(); mem(size => 8); }
PSHUFB SSSE3 00001111 00111000 00000000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPSHUFB AVX 00000000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 16); }
+VPSHUFB AVX2 00000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38); modrm(); mem(size => 32); }
PSHUFW SSE 00001111 01110000 !emit { modrm(); mem(size => 8); imm(size => 1); }
PSHUFLW SSE2 00001111 01110000 !emit { repne(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VPSHUFLW AVX 01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPSHUFLW AVX2 01110000 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
PSHUFHW SSE2 00001111 01110000 !emit { rep(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VPSHUFHW AVX 01110000 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPSHUFHW AVX2 01110000 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
PSHUFD SSE2 00001111 01110000 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VPSHUFD AVX 01110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPSHUFD AVX2 01110000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
SHUFPS SSE 00001111 11000110 !emit { modrm(); mem(size => 16, align => 16); imm(size => 1); }
VSHUFPS AVX 11000110 !emit { vex(l => VEX_L_128, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VSHUFPS AVX2 11000110 !emit { vex(l => VEX_L_256, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
SHUFPD SSE2 00001111 11000110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VSHUFPD AVX 11000110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 16); imm(size => 1); }
+VSHUFPD AVX2 11000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F); modrm(); mem(size => 32); imm(size => 1); }
BLENDPS SSE4_1 00001111 00111010 00001100 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VBLENDPS AVX 00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDPS AVX2 00001100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
BLENDPD SSE4_1 00001111 00111010 00001101 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VBLENDPD AVX 00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDPD AVX2 00001101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
BLENDVPS SSE4_1 00001111 00111000 00010100 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VBLENDVPS AVX 01001010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDVPS AVX2 01001010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
BLENDVPD SSE4_1 00001111 00111000 00010101 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VBLENDVPD AVX 01001011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VBLENDVPD AVX2 01001011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
PBLENDVB SSE4_1 00001111 00111000 00010000 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VPBLENDVB AVX 01001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VPBLENDVB AVX2 01001100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
PBLENDW SSE4_1 00001111 00111010 00001110 !emit { data16(); modrm(); mem(size => 16, align => 16); imm(size => 1); }
VPBLENDW AVX 00001110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 16); imm(size => 1); }
+VPBLENDW AVX2 00001110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 32); imm(size => 1); }
+VPBLENDD_xmm AVX2 00000010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VPBLENDD AVX2 00000010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
INSERTPS SSE4_1 00001111 00111010 00100001 !emit { data16(); modrm(); mem(size => 4); imm(size => 1); }
VINSERTPS AVX 00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A); modrm(); mem(size => 4); imm(size => 1); }
@@ -613,6 +791,9 @@ VPINSRD AVX 00100010 !emit { vex(l => VEX_L_128, p
PINSRQ SSE4_1 00001111 00111010 00100010 !emit { data16(); rex(w => 1); modrm(); mem(size => 8); imm(size => 1); }
VPINSRQ AVX 00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1); modrm(); mem(size => 8); imm(size => 1); }
+VINSERTF128 AVX2 00011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+VINSERTI128 AVX2 00111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 16); imm(size => 1); }
+
EXTRACTPS SSE4_1 00001111 00111010 00010111 !emit { data16(); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
EXTRACTPS_mem SSE4_1 00001111 00111010 00010111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 4); imm(size => 1); }
VEXTRACTPS AVX 00010111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); imm(size => 1); }
@@ -639,37 +820,94 @@ PEXTRW_reg SSE 00001111 11000101 !emit { modrm(mod => MOD_DIRECT, reg =
PEXTRW_reg SSE2 00001111 11000101 !emit { data16(); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
VPEXTRW_reg AVX 11000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, w => 0, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT, reg => ~REG_ESP); imm(size => 1); }
+VEXTRACTF128 AVX2 00011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VEXTRACTI128 AVX2 00111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+
+VPBROADCASTB_xmm AVX2 01111000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 1); }
+VPBROADCASTB AVX2 01111000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 1); }
+VPBROADCASTW_xmm AVX2 01111001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPBROADCASTW AVX2 01111001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPBROADCASTD_xmm AVX2 01011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPBROADCASTD AVX2 01011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPBROADCASTQ_xmm AVX2 01011001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPBROADCASTQ AVX2 01011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VBROADCASTSS_xmm AVX2 00011000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VBROADCASTSS AVX2 00011000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VBROADCASTSD AVX2 00011001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VBROADCASTF128 AVX2 00011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VBROADCASTI128 AVX2 01011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+
+VPERM2F128 AVX2 00000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
+VPERM2I128 AVX2 01000110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0); modrm(); mem(size => 32); imm(size => 1); }
+VPERMD AVX2 00110110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
+VPERMPS AVX2 00010110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
VPERMILPS AVX 00001100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPS AVX2 00001100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
VPERMILPS_imm AVX 00000100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPERMILPS_imm AVX2 00000100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
VPERMILPD AVX 00001101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 16); }
+VPERMILPD AVX2 00001101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(); mem(size => 32); }
VPERMILPD_imm AVX 00000101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 16); imm(size => 1); }
+VPERMILPD_imm AVX2 00000101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 0, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
+VPERMQ AVX2 00000000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
+VPERMPD AVX2 00000001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F3A, w => 1, v => VEX_V_UNUSED); modrm(); mem(size => 32); imm(size => 1); }
+
+# TODO These instructions use VSIB byte, which is not implemented yet
+# VGATHERDPS AVX2 10010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERDPS AVX2 10010010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERDPD AVX2 10010010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VGATHERDPD AVX2 10010010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPS AVX2 10010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPS AVX2 10010011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPD AVX2 10010011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VGATHERQPD AVX2 10010011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDD AVX2 10010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDD AVX2 10010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDQ AVX2 10010000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERDQ AVX2 10010000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQD AVX2 10010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQD AVX2 10010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQQ AVX2 10010001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
+# VPGATHERQQ AVX2 10010001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); }
# Conversion Instructions
PMOVSXBW SSE4_1 00001111 00111000 00100000 !emit { data16(); modrm(); mem(size => 8); }
VPMOVSXBW AVX 00100000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVSXBW AVX2 00100000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PMOVSXBD SSE4_1 00001111 00111000 00100001 !emit { data16(); modrm(); mem(size => 4); }
VPMOVSXBD AVX 00100001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVSXBD AVX2 00100001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVSXBQ SSE4_1 00001111 00111000 00100010 !emit { data16(); modrm(); mem(size => 2); }
VPMOVSXBQ AVX 00100010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPMOVSXBQ AVX2 00100010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
PMOVSXWD SSE4_1 00001111 00111000 00100011 !emit { data16(); modrm(); mem(size => 8); }
VPMOVSXWD AVX 00100011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVSXWD AVX2 00100011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PMOVSXWQ SSE4_1 00001111 00111000 00100100 !emit { data16(); modrm(); mem(size => 4); }
VPMOVSXWQ AVX 00100100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVSXWQ AVX2 00100100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVSXDQ SSE4_1 00001111 00111000 00100101 !emit { data16(); modrm(); mem(size => 8); }
VPMOVSXDQ AVX 00100101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVSXDQ AVX2 00100101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PMOVZXBW SSE4_1 00001111 00111000 00110000 !emit { data16(); modrm(); mem(size => 8); }
VPMOVZXBW AVX 00110000 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVZXBW AVX2 00110000 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PMOVZXBD SSE4_1 00001111 00111000 00110001 !emit { data16(); modrm(); mem(size => 4); }
VPMOVZXBD AVX 00110001 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVZXBD AVX2 00110001 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVZXBQ SSE4_1 00001111 00111000 00110010 !emit { data16(); modrm(); mem(size => 2); }
VPMOVZXBQ AVX 00110010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 2); }
+VPMOVZXBQ AVX2 00110010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
PMOVZXWD SSE4_1 00001111 00111000 00110011 !emit { data16(); modrm(); mem(size => 8); }
VPMOVZXWD AVX 00110011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVZXWD AVX2 00110011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
PMOVZXWQ SSE4_1 00001111 00111000 00110100 !emit { data16(); modrm(); mem(size => 4); }
VPMOVZXWQ AVX 00110100 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 4); }
+VPMOVZXWQ AVX2 00110100 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
PMOVZXDQ SSE4_1 00001111 00111000 00110101 !emit { data16(); modrm(); mem(size => 8); }
VPMOVZXDQ AVX 00110101 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VPMOVZXDQ AVX2 00110101 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTPI2PS SSE 00001111 00101010 !emit { modrm(); mem(size => 8); }
CVTSI2SS SSE 00001111 00101010 !emit { rep(); modrm(); mem(size => 4); }
@@ -706,15 +944,20 @@ VCVTTSD2SI_64 AVX 00101100 !emit { vex(l => 0, p => VEX_P_REPNE,
CVTPD2DQ SSE2 00001111 11100110 !emit { repne(); modrm(); mem(size => 16, align => 16); }
VCVTPD2DQ AVX 11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTPD2DQ AVX2 11100110 !emit { vex(l => VEX_L_256, p => VEX_P_REPNE, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
CVTTPD2DQ SSE2 00001111 11100110 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VCVTTPD2DQ AVX 11100110 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTTPD2DQ AVX2 11100110 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
CVTDQ2PD SSE2 00001111 11100110 !emit { rep(); modrm(); mem(size => 8); }
VCVTDQ2PD AVX 11100110 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VCVTDQ2PD AVX2 11100110 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTPS2PD SSE2 00001111 01011010 !emit { modrm(); mem(size => 8); }
VCVTPS2PD AVX 01011010 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 8); }
+VCVTPS2PD AVX2 01011010 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
CVTPD2PS SSE2 00001111 01011010 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VCVTPD2PS AVX 01011010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTPD2PS AVX2 01011010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
CVTSS2SD SSE2 00001111 01011010 !emit { rep(); modrm(); mem(size => 4); }
VCVTSS2SD AVX 01011010 !emit { vex(l => 0, p => VEX_P_REP, m => VEX_M_0F); modrm(); mem(size => 4); }
CVTSD2SS SSE2 00001111 01011010 !emit { repne(); modrm(); mem(size => 8); }
@@ -722,10 +965,13 @@ VCVTSD2SS AVX 01011010 !emit { vex(l => 0, p => VEX_P_REPNE,
CVTDQ2PS SSE2 00001111 01011011 !emit { modrm(); mem(size => 16, align => 16); }
VCVTDQ2PS AVX 01011011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTDQ2PS AVX2 01011011 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
CVTPS2DQ SSE2 00001111 01011011 !emit { data16(); modrm(); mem(size => 16, align => 16); }
VCVTPS2DQ AVX 01011011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTPS2DQ AVX2 01011011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
CVTTPS2DQ SSE2 00001111 01011011 !emit { rep(); modrm(); mem(size => 16, align => 16); }
VCVTTPS2DQ AVX 01011011 !emit { vex(l => VEX_L_128, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 16); }
+VCVTTPS2DQ AVX2 01011011 !emit { vex(l => VEX_L_256, p => VEX_P_REP, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(); mem(size => 32); }
# Cacheability Control, Prefetch, and Instruction Ordering Instructions
MASKMOVQ SSE 00001111 11110111 !emit { modrm(mod => MOD_DIRECT); mem(size => 8, base => REG_EDI); }
@@ -733,20 +979,31 @@ MASKMOVDQU SSE2 00001111 11110111 !emit { data16(); modrm(mod => MOD_DIR
VMASKMOVDQU AVX 11110111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => MOD_DIRECT); mem(size => 16, base => REG_EDI); }
VMASKMOVPS AVX 001011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VMASKMOVPS AVX2 001011 d 0 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
VMASKMOVPD AVX 001011 d 1 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VMASKMOVPD AVX2 001011 d 1 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
+
+VPMASKMOVD_xmm AVX2 100011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VPMASKMOVD AVX2 100011 d 0 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 0); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
+VPMASKMOVQ_xmm AVX2 100011 d 0 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 16); }
+VPMASKMOVQ AVX2 100011 d 0 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 32); }
MOVNTPS SSE 00001111 00101011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
VMOVNTPS AVX 00101011 !emit { vex(l => VEX_L_128, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPS AVX2 00101011 !emit { vex(l => VEX_L_256, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
MOVNTPD SSE2 00001111 00101011 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
VMOVNTPD AVX 00101011 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTPD AVX2 00101011 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
MOVNTI SSE2 00001111 11000011 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 4); }
MOVNTI_64 SSE2 00001111 11000011 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTQ SSE 00001111 11100111 !emit { modrm(mod => ~MOD_DIRECT); mem(size => 8); }
MOVNTDQ SSE2 00001111 11100111 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
VMOVNTDQ AVX 11100111 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQ AVX2 11100111 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
MOVNTDQA SSE4_1 00001111 00111000 00101010 !emit { data16(); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
VMOVNTDQA AVX 00101010 !emit { vex(l => VEX_L_128, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 16, align => 16); }
+VMOVNTDQA AVX2 00101010 !emit { vex(l => VEX_L_256, p => VEX_P_DATA16, m => VEX_M_0F38, v => VEX_V_UNUSED); modrm(mod => ~MOD_DIRECT); mem(size => 32, align => 32); }
PREFETCHT0 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 1); mem(size => 1); }
PREFETCHT1 SSE 00001111 00011000 !emit { modrm(mod => ~MOD_DIRECT, reg => 2); mem(size => 1); }
--
2.20.1
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
@ 2019-07-03 15:22 ` Richard Henderson
2019-07-10 17:48 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 15:22 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/1/19 6:35 AM, Jan Bobek wrote:
> + while ($bitcur < $bitend) {
> + my $format;
> + my $bitlen;
> +
> + if ($bitcur + 64 <= $bitend) {
> + $format = "Q";
> + $bitlen = 64;
> + } elsif ($bitcur + 32 <= $bitend) {
> + $format = "L";
> + $bitlen = 32;
> + } elsif ($bitcur + 16 <= $bitend) {
> + $format = "S";
> + $bitlen = 16;
> + } else {
> + $format = "C";
> + $bitlen = 8;
> + }
> +
> + $format .= ($args{bigendian} ? ">" : "<") if $bitlen > 8;
It now occurs to me to wonder if it's worth simplifying this function to always
emit bytes, and thus take care of all of the endianness ourselves, since we're
doing it anyway for larger/odd-sized hunks.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
@ 2019-07-03 15:37 ` Richard Henderson
2019-07-10 18:02 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 15:37 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/1/19 6:35 AM, Jan Bobek wrote:
> + VEX_V_UNUSED => 0b1111,
I think perhaps this is a mistake. Yes, that's what goes in the field, but
what goes in the field is ~(logical_value).
While for general RISU-ish operation, ~(random_number) is just as random as
+(random_number), the difference will be if we ever want to explicitly emit
with this interface a specific vex instruction which also requires the v-register.
> +sub rex_encode(%)
> +{
> + my (%args) = @_;
> +
> + $args{w} = 0 unless defined $args{w};
> + $args{r} = 0 unless defined $args{r};
> + $args{x} = 0 unless defined $args{x};
> + $args{b} = 0 unless defined $args{b};
> +
> + return (value => 0x40
> + | (($args{w} ? 1 : 0) << 3)
> + | (($args{r} ? 1 : 0) << 2)
> + | (($args{x} ? 1 : 0) << 1)
> + | ($args{b} ? 1 : 0),
> + len => 1);
> +}
Does
(defined $args{w} && $args{w}) << 3
work? That seems tidier to me than splitting these conditions.
> + return (value => (0xC4 << 16)
> + | (($args{r} ? 1 : 0) << 15)
> + | (($args{x} ? 1 : 0) << 14)
> + | (($args{b} ? 1 : 0) << 13)
Further down in vex_encode, and along the lines of VEX_V_UNUSED, this appears
to be actively wrong, since these bits are encoded as inverses. What this
*really* means is that because of that, rex_encode and vex_encode will not
encode the same registers for a given instruction. Which really does feel
bug-like, random inputs or no.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: add module
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
@ 2019-07-03 15:47 ` Richard Henderson
2019-07-10 18:08 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 15:47 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/1/19 6:35 AM, Jan Bobek wrote:
> +sub parse_emitblock($$)
> +{
> + my ($rec, $insn) = @_;
> + my $insnname = $rec->{name};
> + my $opcode = $insn->{opcode}{value};
> +
> + $emit_opts = {};
> +
> + my $emitblock = $rec->{blocks}{"emit"};
> + if (defined $emitblock) {
> + eval_with_fields($insnname, $opcode, $rec, "emit", $emitblock);
> + }
And if !defined? Silently discard?
Is this just weirdness higher in the risugen stack,
such that this might be called maybe_parse_emitblock?
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
@ 2019-07-03 16:11 ` Richard Henderson
2019-07-10 18:21 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 16:11 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/1/19 6:35 AM, Jan Bobek wrote:
> +sub write_mov_rr($$)
> +{
> + my ($r1, $r2) = @_;
> +
> + my %insn = (opcode => X86OP_MOV,
> + modrm => {mod => MOD_DIRECT,
> + reg => ($r1 & 0x7),
> + rm => ($r2 & 0x7)});
> +
> + $insn{rex}{w} = 1 if $is_x86_64;
> + $insn{rex}{r} = 1 if $r1 >= 8;
> + $insn{rex}{b} = 1 if $r2 >= 8;
This is where maybe it's better to leave rex.[rb] to risugen_x86_asm, and just
leave $modrm{reg} and $modrm{rm} as 4-bit quantities.
> +sub write_mov_reg_imm($$)
> +{
> + my ($reg, $imm) = @_;
> + my %insn;
> +
> + if (0 <= $imm && $imm <= 0xffffffff) {
Should include !$is_x86_64 here,
> + %insn = (opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
> + imm => {value => $imm, len => 4});
> + } elsif (-0x80000000 <= $imm && $imm <= 0x7fffffff) {
> + %insn = (opcode => {value => 0xC7, len => 1},
> + modrm => {mod => MOD_DIRECT,
> + reg => 0, rm => ($reg & 0x7)},
> + imm => {value => $imm, len => 4});
> +
> + $insn{rex}{w} = 1 if $is_x86_64;
making this unconditional.
> +sub write_random_ymmdata()
> +{
> + my $ymm_cnt = $is_x86_64 ? 16 : 8;
> + my $ymm_len = 32;
> + my $datalen = $ymm_cnt * $ymm_len;
> +
> + # Generate random data blob
> + write_random_datablock($datalen);
> +
> + # Load the random data into YMM regs.
> + for (my $ymm_reg = 0; $ymm_reg < $ymm_cnt; $ymm_reg++) {
> + write_insn(vex => {l => VEX_L_256, p => VEX_P_DATA16,
> + r => !($ymm_reg >= 8)},
Again, vex.r should be handled in vex_encode.
> + opcode => X86OP_VMOVAPS,
> + modrm => {mod => MOD_INDIRECT_DISP32,
> + reg => ($ymm_reg & 0x7),
> + rm => REG_EAX},
> + disp => {value => $ymm_reg * $ymm_len,
> + len => 4});
> + }
So... this now generates code that cannot run without AVX2.
Which is probably fine for testing right now, since we do
want to be able to notice effects of SSE/AVX insns on the
high bits of the registers.
But we'll probably need to have the same --xsave=foo
command-line option that we have for risu itself.
That would let you initialize only 16-bytes here, or
for avx512 initialize 64-bytes, plus the k-registers.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
@ 2019-07-03 21:35 ` Richard Henderson
2019-07-10 18:29 ` Jan Bobek
2019-07-03 21:49 ` Richard Henderson
2019-07-03 22:01 ` Peter Maydell
2 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 21:35 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/1/19 6:35 AM, Jan Bobek wrote:
> Add an x86 configuration file with all MMX instructions.
>
> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> ---
> x86.risu | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 96 insertions(+)
> create mode 100644 x86.risu
Note that most of these MMX instructions affect the FPU, not the vector unit.
We would want to extend risu again to handle this. You'd also need to seed the
FPU with random data.
I was thinking for a moment that this is really beyond what you've signed up
for, but on second thoughts it's not. Decoding SSE is really tangled with
decoding MMX, via the 0x66 prefix, and you'll want to be able to verify that
you don't regress.
> +# State Management Instructions
> +EMMS MMX 00001111 01110111 !emit { }
I'm not sure this is really testable, because of the state change. But we'll
see what happens with the aforementioned dumping.
> +# Arithmetic Instructions
> +PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
> +PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
> +PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
> +PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
PADDQ is sse2.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
2019-07-03 21:35 ` Richard Henderson
@ 2019-07-03 21:49 ` Richard Henderson
2019-07-10 18:32 ` Jan Bobek
2019-07-03 22:01 ` Peter Maydell
2 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-03 21:49 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/1/19 6:35 AM, Jan Bobek wrote:
> +MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
> +MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
can't be set.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
2019-07-03 21:35 ` Richard Henderson
2019-07-03 21:49 ` Richard Henderson
@ 2019-07-03 22:01 ` Peter Maydell
2019-07-10 18:35 ` Jan Bobek
2 siblings, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2019-07-03 22:01 UTC (permalink / raw)
To: Jan Bobek; +Cc: Richard Henderson, Alex Bennée, QEMU Developers
On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>
> Add an x86 configuration file with all MMX instructions.
>
> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
> --- /dev/null
> +++ b/x86.risu
> @@ -0,0 +1,96 @@
> +###############################################################################
> +# Copyright (c) 2019 Linaro Limited
I'm guessing from your email address that this copyright line probably
isn't right :-)
thanks
-- PMM
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill
2019-07-03 15:22 ` Richard Henderson
@ 2019-07-10 17:48 ` Jan Bobek
0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 17:48 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 1218 bytes --]
Hi Richard,
sorry for replying so late. I read your comments last week; as I
mentioned in our weekly update email, I ended up adding/removing quite
a lot since v2, so I wasn't 100% sure how much of it will remain
relevant.
Anyways,
On 7/3/19 11:22 AM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> + while ($bitcur < $bitend) {
>> + my $format;
>> + my $bitlen;
>> +
>> + if ($bitcur + 64 <= $bitend) {
>> + $format = "Q";
>> + $bitlen = 64;
>> + } elsif ($bitcur + 32 <= $bitend) {
>> + $format = "L";
>> + $bitlen = 32;
>> + } elsif ($bitcur + 16 <= $bitend) {
>> + $format = "S";
>> + $bitlen = 16;
>> + } else {
>> + $format = "C";
>> + $bitlen = 8;
>> + }
>> +
>> + $format .= ($args{bigendian} ? ">" : "<") if $bitlen > 8;
>
> It now occurs to me to wonder if it's worth simplifying this function to always
> emit bytes, and thus take care of all of the endianness ourselves, since we're
> doing it anyway for larger/odd-sized hunks.
Good point. *facepalm*
I will include this change in v3.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module
2019-07-03 15:37 ` Richard Henderson
@ 2019-07-10 18:02 ` Jan Bobek
0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:02 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 2344 bytes --]
On 7/3/19 11:37 AM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> + VEX_V_UNUSED => 0b1111,
>
> I think perhaps this is a mistake. Yes, that's what goes in the field, but
> what goes in the field is ~(logical_value).
>
> While for general RISU-ish operation, ~(random_number) is just as random as
> +(random_number), the difference will be if we ever want to explicitly emit
> with this interface a specific vex instruction which also requires the v-register.
See below.
>> +sub rex_encode(%)
>> +{
>> + my (%args) = @_;
>> +
>> + $args{w} = 0 unless defined $args{w};
>> + $args{r} = 0 unless defined $args{r};
>> + $args{x} = 0 unless defined $args{x};
>> + $args{b} = 0 unless defined $args{b};
>> +
>> + return (value => 0x40
>> + | (($args{w} ? 1 : 0) << 3)
>> + | (($args{r} ? 1 : 0) << 2)
>> + | (($args{x} ? 1 : 0) << 1)
>> + | ($args{b} ? 1 : 0),
>> + len => 1);
>> +}
>
> Does
>
> (defined $args{w} && $args{w}) << 3
>
> work? That seems tidier to me than splitting these conditions.
It does, I will change it. Thanks!
>> + return (value => (0xC4 << 16)
>> + | (($args{r} ? 1 : 0) << 15)
>> + | (($args{x} ? 1 : 0) << 14)
>> + | (($args{b} ? 1 : 0) << 13)
>
> Further down in vex_encode, and along the lines of VEX_V_UNUSED, this appears
> to be actively wrong, since these bits are encoded as inverses. What this
> *really* means is that because of that, rex_encode and vex_encode will not
> encode the same registers for a given instruction. Which really does feel
> bug-like, random inputs or no.
So, vex_encode, rex_encode and friends were meant to be really
low-level functions; they literally just encode the bits from what you
pass in, without any concern for what the fields even mean. In that
spirit, write_insn itself never did much of error-checking.
I have added quite a lot of code to risugen_x86_asm in v3; most
importantly, there are now asm_insn_* functions which are more
high-level, in that you pass in the logical values and they care of
error checks and encoding. I also removed write_insn and all the
encoding-related symbolic constants from the public interface of the
module.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: add module
2019-07-03 15:47 ` Richard Henderson
@ 2019-07-10 18:08 ` Jan Bobek
0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:08 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 852 bytes --]
On 7/3/19 11:47 AM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +sub parse_emitblock($$)
>> +{
>> + my ($rec, $insn) = @_;
>> + my $insnname = $rec->{name};
>> + my $opcode = $insn->{opcode}{value};
>> +
>> + $emit_opts = {};
>> +
>> + my $emitblock = $rec->{blocks}{"emit"};
>> + if (defined $emitblock) {
>> + eval_with_fields($insnname, $opcode, $rec, "emit", $emitblock);
>> + }
>
> And if !defined? Silently discard?
>
> Is this just weirdness higher in the risugen stack,
> such that this might be called maybe_parse_emitblock?
If !defined, there _is_ no emit block, and we treat that as an empty
block. The caller gets an empty hash, and it's up to them to decide
what that means. I could rename it, but the difference doesn't seem
that important to me...?
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
2019-07-03 16:11 ` Richard Henderson
@ 2019-07-10 18:21 ` Jan Bobek
2019-07-11 9:26 ` Richard Henderson
0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:21 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 3220 bytes --]
On 7/3/19 12:11 PM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +sub write_mov_rr($$)
>> +{
>> + my ($r1, $r2) = @_;
>> +
>> + my %insn = (opcode => X86OP_MOV,
>> + modrm => {mod => MOD_DIRECT,
>> + reg => ($r1 & 0x7),
>> + rm => ($r2 & 0x7)});
>> +
>> + $insn{rex}{w} = 1 if $is_x86_64;
>> + $insn{rex}{r} = 1 if $r1 >= 8;
>> + $insn{rex}{b} = 1 if $r2 >= 8;
>
> This is where maybe it's better to leave rex.[rb] to risugen_x86_asm, and just
> leave $modrm{reg} and $modrm{rm} as 4-bit quantities.
That's what I have in v3, stay tuned!
>> +sub write_mov_reg_imm($$)
>> +{
>> + my ($reg, $imm) = @_;
>> + my %insn;
>> +
>> + if (0 <= $imm && $imm <= 0xffffffff) {
>
> Should include !$is_x86_64 here,
>
>> + %insn = (opcode => {value => 0xB8 | ($reg & 0x7), len => 1},
>> + imm => {value => $imm, len => 4});
>> + } elsif (-0x80000000 <= $imm && $imm <= 0x7fffffff) {
>> + %insn = (opcode => {value => 0xC7, len => 1},
>> + modrm => {mod => MOD_DIRECT,
>> + reg => 0, rm => ($reg & 0x7)},
>> + imm => {value => $imm, len => 4});
>> +
>> + $insn{rex}{w} = 1 if $is_x86_64;
>
> making this unconditional.
Doesn't B8 (without REX.W) work for x86_64, too? It zeroes the upper
part of the destination, so it's effectively zero-extending, and it's
one byte shorter than C7 (no ModR/M byte needed).
That being said, I moved most of this function to risugen_x86_asm and
included a bunch of comments regarding different cases, so it should
be easier to understand.
>> +sub write_random_ymmdata()
>> +{
>> + my $ymm_cnt = $is_x86_64 ? 16 : 8;
>> + my $ymm_len = 32;
>> + my $datalen = $ymm_cnt * $ymm_len;
>> +
>> + # Generate random data blob
>> + write_random_datablock($datalen);
>> +
>> + # Load the random data into YMM regs.
>> + for (my $ymm_reg = 0; $ymm_reg < $ymm_cnt; $ymm_reg++) {
>> + write_insn(vex => {l => VEX_L_256, p => VEX_P_DATA16,
>> + r => !($ymm_reg >= 8)},
>
> Again, vex.r should be handled in vex_encode.
As I said, there will be more high-level instruction-assembling
functions exported by risugen_x86_asm in v3, which take care of this.
>> + opcode => X86OP_VMOVAPS,
>> + modrm => {mod => MOD_INDIRECT_DISP32,
>> + reg => ($ymm_reg & 0x7),
>> + rm => REG_EAX},
>> + disp => {value => $ymm_reg * $ymm_len,
>> + len => 4});
>> + }
>
> So... this now generates code that cannot run without AVX2.
>
> Which is probably fine for testing right now, since we do
> want to be able to notice effects of SSE/AVX insns on the
> high bits of the registers.
>
> But we'll probably need to have the same --xsave=foo
> command-line option that we have for risu itself.
>
> That would let you initialize only 16-bytes here, or
> for avx512 initialize 64-bytes, plus the k-registers.
Ah yes, indeed.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-03 21:35 ` Richard Henderson
@ 2019-07-10 18:29 ` Jan Bobek
2019-07-11 9:32 ` Richard Henderson
0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:29 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 1822 bytes --]
On 7/3/19 5:35 PM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> Add an x86 configuration file with all MMX instructions.
>>
>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>> ---
>> x86.risu | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 96 insertions(+)
>> create mode 100644 x86.risu
>
> Note that most of these MMX instructions affect the FPU, not the vector unit.
> We would want to extend risu again to handle this. You'd also need to seed the
> FPU with random data.
>
> I was thinking for a moment that this is really beyond what you've signed up
> for, but on second thoughts it's not. Decoding SSE is really tangled with
> decoding MMX, via the 0x66 prefix, and you'll want to be able to verify that
> you don't regress.
Honestly, I added MMX instructions just for completeness; I figured it can't
hurt, and you can always filter them out via command-line switches. You have
a point with the regression testing, though...
>> +# State Management Instructions
>> +EMMS MMX 00001111 01110111 !emit { }
>
> I'm not sure this is really testable, because of the state change. But we'll
> see what happens with the aforementioned dumping.
>
>> +# Arithmetic Instructions
>> +PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
>> +PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
>> +PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
>> +PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
Not this one, at least according to the Intel docs:
NP 0F D4 /r: PADDQ mm, mm/m64 (MMX)
66 0F D4 /r: PADDQ xmm1, xmm2/m128 (SSE2)
The SSE2 version is added in a later patch.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-03 21:49 ` Richard Henderson
@ 2019-07-10 18:32 ` Jan Bobek
2019-07-11 9:34 ` Richard Henderson
0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:32 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
On 7/3/19 5:49 PM, Richard Henderson wrote:
> On 7/1/19 6:35 AM, Jan Bobek wrote:
>> +MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
>> +MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
>
> Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
> can't be set.
Actually, my CPU chewed it without choking even when the bits were
set, but it will taken care of in v3.
-Jan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-03 22:01 ` Peter Maydell
@ 2019-07-10 18:35 ` Jan Bobek
2019-07-11 6:45 ` Alex Bennée
0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-10 18:35 UTC (permalink / raw)
To: Peter Maydell; +Cc: Richard Henderson, Alex Bennée, QEMU Developers
[-- Attachment #1.1: Type: text/plain, Size: 747 bytes --]
On 7/3/19 6:01 PM, Peter Maydell wrote:
> On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>>
>> Add an x86 configuration file with all MMX instructions.
>>
>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>
>> --- /dev/null
>> +++ b/x86.risu
>> @@ -0,0 +1,96 @@
>> +###############################################################################
>> +# Copyright (c) 2019 Linaro Limited
>
> I'm guessing from your email address that this copyright line probably
> isn't right :-)
Haha indeed, I just copy-pasted it from the other files; the same goes for
the rest of the source files.
Any suggestions on what it should be? I'm not currently employed by
anyone (as Google keeps reminding us).
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-10 18:35 ` Jan Bobek
@ 2019-07-11 6:45 ` Alex Bennée
2019-07-11 13:33 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Alex Bennée @ 2019-07-11 6:45 UTC (permalink / raw)
To: Jan Bobek; +Cc: Peter Maydell, Richard Henderson, QEMU Developers
Jan Bobek <jan.bobek@gmail.com> writes:
> On 7/3/19 6:01 PM, Peter Maydell wrote:
>> On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>>>
>>> Add an x86 configuration file with all MMX instructions.
>>>
>>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>>
>>> --- /dev/null
>>> +++ b/x86.risu
>>> @@ -0,0 +1,96 @@
>>> +###############################################################################
>>> +# Copyright (c) 2019 Linaro Limited
>>
>> I'm guessing from your email address that this copyright line probably
>> isn't right :-)
>
> Haha indeed, I just copy-pasted it from the other files; the same goes for
> the rest of the source files.
>
> Any suggestions on what it should be? I'm not currently employed by
> anyone (as Google keeps reminding us).
It should be (c) 2019 Jan Bobek as you wrote it. The license text should
be the same (assuming you are happy to license it, which I assume you
are given you are contributing to RISU ;-)
>
> -Jan
--
Alex Bennée
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
2019-07-10 18:21 ` Jan Bobek
@ 2019-07-11 9:26 ` Richard Henderson
2019-07-11 13:10 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11 9:26 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/10/19 8:21 PM, Jan Bobek wrote:
> Doesn't B8 (without REX.W) work for x86_64, too? It zeroes the upper
> part of the destination, so it's effectively zero-extending, and it's
> one byte shorter than C7 (no ModR/M byte needed).
Sorry, I shouldn't have been quite so terse. What I meant is
if (!$is_x86_64 || (0 <= $imm && $imm <= 0xffffffff))
so that 32-bit always uses the 5-byte encoding instead of the 6-byte.
> That being said, I moved most of this function to risugen_x86_asm and
> included a bunch of comments regarding different cases, so it should
> be easier to understand.
Great.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-10 18:29 ` Jan Bobek
@ 2019-07-11 9:32 ` Richard Henderson
2019-07-11 13:29 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11 9:32 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/10/19 8:29 PM, Jan Bobek wrote:
>>> +# Arithmetic Instructions
>>> +PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
>>> +PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
>>> +PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
>>> +PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
>
> Not this one, at least according to the Intel docs:
>
> NP 0F D4 /r: PADDQ mm, mm/m64 (MMX)
> 66 0F D4 /r: PADDQ xmm1, xmm2/m128 (SSE2)
>
> The SSE2 version is added in a later patch.
That's not how I read the Intel docs.
In the CPUID feature flag column of the MMX PADDQ, I see SSE2. While the insn
affects the mmx registers, it was not added with the original MMX instruction set.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-10 18:32 ` Jan Bobek
@ 2019-07-11 9:34 ` Richard Henderson
2019-07-11 9:44 ` Alex Bennée
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11 9:34 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/10/19 8:32 PM, Jan Bobek wrote:
> On 7/3/19 5:49 PM, Richard Henderson wrote:
>> On 7/1/19 6:35 AM, Jan Bobek wrote:
>>> +MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
>>> +MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
>>
>> Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
>> can't be set.
>
> Actually, my CPU chewed it without choking even when the bits were
> set, but it will taken care of in v3.
That's interesting data.
I wonder if it's worth retaining this as a feature in order to check qemu's
implementation?
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-11 9:34 ` Richard Henderson
@ 2019-07-11 9:44 ` Alex Bennée
0 siblings, 0 replies; 38+ messages in thread
From: Alex Bennée @ 2019-07-11 9:44 UTC (permalink / raw)
To: Richard Henderson; +Cc: Jan Bobek, qemu-devel
Richard Henderson <richard.henderson@linaro.org> writes:
> On 7/10/19 8:32 PM, Jan Bobek wrote:
>> On 7/3/19 5:49 PM, Richard Henderson wrote:
>>> On 7/1/19 6:35 AM, Jan Bobek wrote:
>>>> +MOVQ MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => MOD_DIRECT, rm => ~REG_ESP); }
>>>> +MOVQ_mem MMX 00001111 011 d 1110 !emit { rex(w => 1); modrm(mod => ~MOD_DIRECT); mem(size => 8); }
>>>
>>> Oh, note that there are only 8 mmx registers, so the respective rex.{r,b} bit
>>> can't be set.
>>
>> Actually, my CPU chewed it without choking even when the bits were
>> set, but it will taken care of in v3.
>
> That's interesting data.
>
> I wonder if it's worth retaining this as a feature in order to check qemu's
> implementation?
We could be some time, c.f. BlackHat 2017
https://www.youtube.com/watch?v=KrksBdWcZgQ
I suspect if we set https://github.com/xoreaxeaxeax/sandsifter on QEMU
we might find a few breakages.
>
>
> r~
--
Alex Bennée
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: add module
2019-07-11 9:26 ` Richard Henderson
@ 2019-07-11 13:10 ` Jan Bobek
0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 13:10 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 628 bytes --]
On 7/11/19 5:26 AM, Richard Henderson wrote:
> On 7/10/19 8:21 PM, Jan Bobek wrote:
>> Doesn't B8 (without REX.W) work for x86_64, too? It zeroes the upper
>> part of the destination, so it's effectively zero-extending, and it's
>> one byte shorter than C7 (no ModR/M byte needed).
>
> Sorry, I shouldn't have been quite so terse. What I meant is
>
> if (!$is_x86_64 || (0 <= $imm && $imm <= 0xffffffff))
>
> so that 32-bit always uses the 5-byte encoding instead of the 6-byte.
Oh, I see. I double-checked my new code and it never uses the C7 move
in 32-bit mode, but thanks for pointing it out.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-11 9:32 ` Richard Henderson
@ 2019-07-11 13:29 ` Jan Bobek
2019-07-11 13:57 ` Richard Henderson
0 siblings, 1 reply; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 13:29 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 1773 bytes --]
On 7/11/19 5:32 AM, Richard Henderson wrote:
> On 7/10/19 8:29 PM, Jan Bobek wrote:
>>>> +# Arithmetic Instructions
>>>> +PADDB MMX 00001111 11111100 !emit { modrm(); mem(size => 8); }
>>>> +PADDW MMX 00001111 11111101 !emit { modrm(); mem(size => 8); }
>>>> +PADDD MMX 00001111 11111110 !emit { modrm(); mem(size => 8); }
>>>> +PADDQ MMX 00001111 11010100 !emit { modrm(); mem(size => 8); }
>>
>> Not this one, at least according to the Intel docs:
>>
>> NP 0F D4 /r: PADDQ mm, mm/m64 (MMX)
>> 66 0F D4 /r: PADDQ xmm1, xmm2/m128 (SSE2)
>>
>> The SSE2 version is added in a later patch.
>
> That's not how I read the Intel docs.
>
> In the CPUID feature flag column of the MMX PADDQ, I see SSE2. While the insn
> affects the mmx registers, it was not added with the original MMX instruction set.
I know what you mean; for example, PSUBQ is like that. I know about
these kind of instructions because "{name}_{enc}" does not form a
unique key, and risugen would complain about that. That's why there is
PSUBQ_mm and PSUBQ in the final x86.risu file.
However, I downloaded a fresh copy of Intel SDM off the Intel website
this morning (just to make sure) and in Volume 2B, Section "4.3
Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
in the 4th row, and the CPUID column says MMX. On the other hand, I
can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
so it's a bit confusing.
If you know for a fact that it didn't come until SSE2 and the manual
is wrong, I will change it.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-11 6:45 ` Alex Bennée
@ 2019-07-11 13:33 ` Jan Bobek
0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 13:33 UTC (permalink / raw)
To: Alex Bennée; +Cc: Peter Maydell, Richard Henderson, QEMU Developers
[-- Attachment #1.1: Type: text/plain, Size: 1115 bytes --]
On 7/11/19 2:45 AM, Alex Bennée wrote:
>
> Jan Bobek <jan.bobek@gmail.com> writes:
>
>> On 7/3/19 6:01 PM, Peter Maydell wrote:
>>> On Mon, 1 Jul 2019 at 05:43, Jan Bobek <jan.bobek@gmail.com> wrote:
>>>>
>>>> Add an x86 configuration file with all MMX instructions.
>>>>
>>>> Signed-off-by: Jan Bobek <jan.bobek@gmail.com>
>>>
>>>> --- /dev/null
>>>> +++ b/x86.risu
>>>> @@ -0,0 +1,96 @@
>>>> +###############################################################################
>>>> +# Copyright (c) 2019 Linaro Limited
>>>
>>> I'm guessing from your email address that this copyright line probably
>>> isn't right :-)
>>
>> Haha indeed, I just copy-pasted it from the other files; the same goes for
>> the rest of the source files.
>>
>> Any suggestions on what it should be? I'm not currently employed by
>> anyone (as Google keeps reminding us).
>
> It should be (c) 2019 Jan Bobek as you wrote it. The license text should
> be the same (assuming you are happy to license it, which I assume you
> are given you are contributing to RISU ;-)
Sounds great, thank you!
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-11 13:29 ` Jan Bobek
@ 2019-07-11 13:57 ` Richard Henderson
2019-07-11 21:29 ` Jan Bobek
0 siblings, 1 reply; 38+ messages in thread
From: Richard Henderson @ 2019-07-11 13:57 UTC (permalink / raw)
To: Jan Bobek, qemu-devel; +Cc: Alex Bennée
On 7/11/19 3:29 PM, Jan Bobek wrote:
> However, I downloaded a fresh copy of Intel SDM off the Intel website
> this morning (just to make sure) and in Volume 2B, Section "4.3
> Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
> Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
> in the 4th row, and the CPUID column says MMX. On the other hand, I
> can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
> in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
> so it's a bit confusing.
>
> If you know for a fact that it didn't come until SSE2 and the manual
> is wrong, I will change it.
Interesting. I see what you see in
253665-069US January 2019
but I first looked at
325462-058US April 2016
which definitely has this marked as SSE2.
In the 2019 version, "5.6.3 SSE2 128-Bit SIMD Integer Instructions" is the
first mention of PADDQ. Whereas "5.4.3 MMX Packed Arithmetic Instructions"
mentions PADD{B,W,D} but not Q.
I tend to think that this is a bug in the current manual.
Checking in binutils I see
> paddq, 2, 0x660fd4, None, 2, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
> paddq, 2, 0xfd4, None, 2, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
and both contain CpuSSE2. If you like, I could run this by one of the Intel GCC
folk to be sure.
r~
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions
2019-07-11 13:57 ` Richard Henderson
@ 2019-07-11 21:29 ` Jan Bobek
0 siblings, 0 replies; 38+ messages in thread
From: Jan Bobek @ 2019-07-11 21:29 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: Alex Bennée
[-- Attachment #1.1: Type: text/plain, Size: 1817 bytes --]
On 7/11/19 9:57 AM, Richard Henderson wrote:
> On 7/11/19 3:29 PM, Jan Bobek wrote:
>> However, I downloaded a fresh copy of Intel SDM off the Intel website
>> this morning (just to make sure) and in Volume 2B, Section "4.3
>> Instructions (M-U)," page 4-208 titled "PADDB/PADDW/PADDD/PADDQ—Add
>> Packed Integers," there's the NP 0F D4 /r PADDQ mm, mm/m64 instruction
>> in the 4th row, and the CPUID column says MMX. On the other hand, I
>> can't find it in the Volume 1, Section 5.4 "MMX(tm) Instructions," or
>> in Vol. 1, Chapter 9 "Programming with Intel(R) MMX(tm) Technology,"
>> so it's a bit confusing.
>>
>> If you know for a fact that it didn't come until SSE2 and the manual
>> is wrong, I will change it.
>
> Interesting. I see what you see in
>
> 253665-069US January 2019
>
> but I first looked at
>
> 325462-058US April 2016
>
> which definitely has this marked as SSE2.
>
> In the 2019 version, "5.6.3 SSE2 128-Bit SIMD Integer Instructions" is the
> first mention of PADDQ. Whereas "5.4.3 MMX Packed Arithmetic Instructions"
> mentions PADD{B,W,D} but not Q.
>
> I tend to think that this is a bug in the current manual.
>
> Checking in binutils I see
>
>> paddq, 2, 0x660fd4, None, 2, CpuSSE2, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
>> paddq, 2, 0xfd4, None, 2, CpuSSE2, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
>
> and both contain CpuSSE2. If you like, I could run this by one of the Intel GCC
> folk to be sure.
I think this is convincing enough for me; it was a good idea to check
binutils! I find it interesting that they'd get it wrong in a more
recent version of the manual, though.
-Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2019-07-11 21:29 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01 4:35 [Qemu-devel] [RISU RFC PATCH v2 00/14] Support for generating x86 MMX/SSE/AVX test images Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 01/14] risugen_common: add insnv, randint_constr, rand_fill Jan Bobek
2019-07-03 15:22 ` Richard Henderson
2019-07-10 17:48 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 02/14] risugen_x86_asm: add module Jan Bobek
2019-07-03 15:37 ` Richard Henderson
2019-07-10 18:02 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 03/14] risugen_x86_emit: " Jan Bobek
2019-07-03 15:47 ` Richard Henderson
2019-07-10 18:08 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 04/14] risugen_x86: " Jan Bobek
2019-07-03 16:11 ` Richard Henderson
2019-07-10 18:21 ` Jan Bobek
2019-07-11 9:26 ` Richard Henderson
2019-07-11 13:10 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 05/14] risugen: allow all byte-aligned instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 06/14] x86.risu: add MMX instructions Jan Bobek
2019-07-03 21:35 ` Richard Henderson
2019-07-10 18:29 ` Jan Bobek
2019-07-11 9:32 ` Richard Henderson
2019-07-11 13:29 ` Jan Bobek
2019-07-11 13:57 ` Richard Henderson
2019-07-11 21:29 ` Jan Bobek
2019-07-03 21:49 ` Richard Henderson
2019-07-10 18:32 ` Jan Bobek
2019-07-11 9:34 ` Richard Henderson
2019-07-11 9:44 ` Alex Bennée
2019-07-03 22:01 ` Peter Maydell
2019-07-10 18:35 ` Jan Bobek
2019-07-11 6:45 ` Alex Bennée
2019-07-11 13:33 ` Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 07/14] x86.risu: add SSE instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 08/14] x86.risu: add SSE2 instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 09/14] x86.risu: add SSE3 instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 10/14] x86.risu: add SSSE3 instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 11/14] x86.risu: add SSE4.1 and SSE4.2 instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 13/14] x86.risu: add AVX instructions Jan Bobek
2019-07-01 4:35 ` [Qemu-devel] [RISU RFC PATCH v2 14/14] x86.risu: add AVX2 instructions Jan Bobek
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.