All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/37] target/i386: new decoder + AVX implementation
@ 2022-09-11 23:03 Paolo Bonzini
  2022-09-11 23:03 ` [PATCH 01/37] target/i386: Define XMMReg and access macros, align ZMM registers Paolo Bonzini
                   ` (36 more replies)
  0 siblings, 37 replies; 86+ messages in thread
From: Paolo Bonzini @ 2022-09-11 23:03 UTC (permalink / raw)
  To: qemu-devel

This series fleshes out more of the prototype table-driven decoder
and uses it to implement SSE and AVX.

As expected, there's a lot more lines here than in Paul's version
(roughly twice as much), but in my opinion it is a price worth paying
for future maintainability and for easier code review.  Nevertheless,
I cannot even explain how much his work helped; the authorship in
the log below doesn't tell the whole story because a bunch of patches
were extracted from his post and somewhat redone, not to mention the
wonderful tests.  In other words, he did all the hard and important work
that allowed me to proceed piecewise (and fast).

Interestingly, this also found a bunch of errors in the Intel manual,
which I have noted in the source and have forwarded to the contacts I
have there.  But overall the manual proved to be very easy to convert
to source code.  About 10% of the instructions needed some kind of
special casing, most of them due to differences between register and
memory operands.

Inspired by Richard's series at
https://lore.kernel.org/qemu-devel/20220822223722.1697758-1-richard.henderson@linaro.org,
I also took the opportunity to use gvec here and there, especially for
moves.  However, helpers are not converted to gvec style.  Another
possible cleanup to be done on top entails checking which helpers really
need the cpu_env argument.  Most of the integer ones probably don't,
for example, because the separate generator functions of the new decoder
provide a little more flexibility there.

This is not quite ready for commit, mostly because I haven't yet tested
it on big-endian architectures and on system emulation, but I thought I'd
throw it out early.  SSE4a translation is not tested yet either, and I will
probably convert 3DNow to the new decoder too, because it seems trivial.

Compared to the very first post, of course the decoder core is more
fleshed out.  New features include: supporting MMX instructions without
having to write them down separately; 4-operand instructions; CPUID
feature testing; simplified handling of opcodes with mandatory 66/F3/F2
prefixes.  On the other hand I left out the implementation of the one
byte opcodes, focusing on MMX and SSE instead.  Some "specials" are
not needed (e.g. NoSeg which is used for LEA) so it's not included.

Patches 1-4 are cleanups to translate.c that come in handy with the new
decoder.

Patches 5-11 add the generic framework for x86 decoding.

Patches 12-13 move all existing VEX instructions to the new decoder.
While at it, it also add the other integer instructions in the 0F38 and
0F3A opcode range, because the corresponding patches for SSE and AVX
instructions are big enough.  As of patch 13, however, these non-VEX
instructions will still be translated by the old decoder.

Patches 14-19 extend the helpers to support AVX 3- and 4-operand
instructions.  Patch 15 is by far the nastiest but it's not really
possible to split it further; a consolation is that the translate.c
part of it goes away with the SSE reimplementation.

Unlike in Paul's AVX series, I chose to implement the "merging" behavior
of AVX 3-operand scalar operations (VADDSx, VSQRTSx, etc.) in the
helpers rather than in TCG ops, so that change is also introduced at
this step; for more information see patches 16 and 17.

Patches 20-31 implement SSE and AVX instruction translation in the new
decoder.  The patches generally operate on groups of 8 to 24 opcodes,
though there are a couple bigger ones for the 0F38 and 0F3A opcodes.

Patches 32-35 finally enable AVX, and patches 35-36 removes now-unused
translator and helper code.

Paolo

Paolo Bonzini (32):
  target/i386: make ldo/sto operations consistent with ldq
  target/i386: REPZ and REPNZ are mutually exclusive
  target/i386: introduce insn_get_addr
  target/i386: add core of new i386 decoder
  target/i386: add ALU load/writeback core
  target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext
  target/i386: add CPUID feature checks to new decoder
  target/i386: validate VEX prefixes via the instructions' exception classes
  target/i386: validate SSE prefixes directly in the decoding table
  target/i386: add scalar 0F 38 and 0F 3A instruction to new decoder
  target/i386: remove scalar VEX instructions from old decoder
  target/i386: extend helpers to support VEX.V 3- and 4- operand encodings
  target/i386: support operand merging in binary scalar helpers
  target/i386: provide 3-operand versions of unary scalar helpers
  target/i386: implement additional AVX comparison operators
  target/i386: Introduce 256-bit vector helpers
  target/i386: reimplement 0x0f 0x60-0x6f, add AVX
  target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX
  target/i386: reimplement 0x0f 0x50-0x5f, add AVX
  target/i386: reimplement 0x0f 0x78-0x7f, add AVX
  target/i386: reimplement 0x0f 0x70-0x77, add AVX
  target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
  target/i386: reimplement 0x0f 0x3a, add AVX
  target/i386: reimplement 0x0f 0x38, add AVX
  target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX
  target/i386: reimplement 0x0f 0x10-0x17, add AVX
  target/i386: reimplement 0x0f 0x28-0x2f, add AVX
  target/i386: implement XSAVE and XRSTOR of AVX registers
  target/i386: implement VLDMXCSR/VSTMXCSR
  tests/tcg: extend SSE tests to AVX
  target/i386: move 3DNow completely out of gen_sse
  target/i386: remove old SSE decoder

Paul Brook (3):
  target/i386: add AVX_EN hflag
  target/i386: Prepare ops_sse_header.h for 256 bit AVX
  target/i386: Enable AVX cpuid bits when using TCG

Richard Henderson (2):
  target/i386: Define XMMReg and access macros, align ZMM registers
  target/i386: Use tcg gvec ops for pmovmskb

 target/i386/cpu.c                |   10 +-
 target/i386/cpu.h                |   59 +-
 target/i386/helper.c             |   12 +
 target/i386/helper.h             |    9 +
 target/i386/ops_sse.h            |  655 ++++++---
 target/i386/ops_sse_header.h     |  339 +++--
 target/i386/tcg/decode-new.c.inc | 1694 ++++++++++++++++++++++
 target/i386/tcg/decode-new.h     |  242 ++++
 target/i386/tcg/emit.c.inc       | 2323 ++++++++++++++++++++++++++++++
 target/i386/tcg/fpu_helper.c     |  134 +-
 target/i386/tcg/translate.c      | 2071 ++------------------------
 tests/tcg/i386/Makefile.target   |    2 +-
 tests/tcg/i386/test-avx.c        |  201 +--
 tests/tcg/i386/test-avx.py       |    3 +-
 14 files changed, 5374 insertions(+), 2380 deletions(-)
 create mode 100644 target/i386/tcg/decode-new.c.inc
 create mode 100644 target/i386/tcg/decode-new.h
 create mode 100644 target/i386/tcg/emit.c.inc

-- 
2.37.2



^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2022-09-15  7:17 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-11 23:03 [RFC PATCH 00/37] target/i386: new decoder + AVX implementation Paolo Bonzini
2022-09-11 23:03 ` [PATCH 01/37] target/i386: Define XMMReg and access macros, align ZMM registers Paolo Bonzini
2022-09-11 23:03 ` [PATCH 02/37] target/i386: make ldo/sto operations consistent with ldq Paolo Bonzini
2022-09-12  8:33   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 03/37] target/i386: REPZ and REPNZ are mutually exclusive Paolo Bonzini
2022-09-12  8:37   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 04/37] target/i386: introduce insn_get_addr Paolo Bonzini
2022-09-12  8:39   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 05/37] target/i386: add core of new i386 decoder Paolo Bonzini
2022-09-12  9:27   ` Richard Henderson
2022-09-12 10:54   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 06/37] target/i386: add ALU load/writeback core Paolo Bonzini
2022-09-12 10:02   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 07/37] target/i386: add CPUID[EAX=7, ECX=0].ECX to DisasContext Paolo Bonzini
2022-09-12 10:02   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 08/37] target/i386: add CPUID feature checks to new decoder Paolo Bonzini
2022-09-12 10:05   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 09/37] target/i386: add AVX_EN hflag Paolo Bonzini
2022-09-12 10:06   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 10/37] target/i386: validate VEX prefixes via the instructions' exception classes Paolo Bonzini
2022-09-12 10:39   ` Richard Henderson
2022-09-12 10:42   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 11/37] target/i386: validate SSE prefixes directly in the decoding table Paolo Bonzini
2022-09-12 10:51   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 12/37] target/i386: add scalar 0F 38 and 0F 3A instruction to new decoder Paolo Bonzini
2022-09-12 11:04   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 13/37] target/i386: remove scalar VEX instructions from old decoder Paolo Bonzini
2022-09-12 11:06   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 14/37] target/i386: Prepare ops_sse_header.h for 256 bit AVX Paolo Bonzini
2022-09-12 11:09   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 15/37] target/i386: extend helpers to support VEX.V 3- and 4- operand encodings Paolo Bonzini
2022-09-12 11:11   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 16/37] target/i386: support operand merging in binary scalar helpers Paolo Bonzini
2022-09-12 11:11   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 17/37] target/i386: provide 3-operand versions of unary " Paolo Bonzini
2022-09-12 11:14   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 18/37] target/i386: implement additional AVX comparison operators Paolo Bonzini
2022-09-12 11:19   ` Richard Henderson
2022-09-11 23:03 ` [PATCH 19/37] target/i386: Introduce 256-bit vector helpers Paolo Bonzini
2022-09-12 11:19   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 20/37] target/i386: reimplement 0x0f 0x60-0x6f, add AVX Paolo Bonzini
2022-09-12 11:41   ` Richard Henderson
2022-09-13 10:56     ` Paolo Bonzini
2022-09-13 11:35       ` Richard Henderson
2022-09-12 13:01   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 21/37] target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, " Paolo Bonzini
2022-09-12 13:19   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 22/37] target/i386: reimplement 0x0f 0x50-0x5f, " Paolo Bonzini
2022-09-12 13:46   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 23/37] target/i386: reimplement 0x0f 0x78-0x7f, " Paolo Bonzini
2022-09-12 13:56   ` Richard Henderson
2022-09-14 16:17     ` Paolo Bonzini
2022-09-11 23:04 ` [PATCH 24/37] target/i386: reimplement 0x0f 0x70-0x77, " Paolo Bonzini
2022-09-12 14:29   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 25/37] target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, " Paolo Bonzini
2022-09-12 15:06   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 26/37] target/i386: reimplement 0x0f 0x3a, " Paolo Bonzini
2022-09-12 15:33   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 27/37] target/i386: Use tcg gvec ops for pmovmskb Paolo Bonzini
2022-09-13  8:16   ` Richard Henderson
2022-09-14 22:59     ` Paolo Bonzini
2022-09-15  6:48       ` Richard Henderson
2022-09-11 23:04 ` [PATCH 28/37] target/i386: reimplement 0x0f 0x38, add AVX Paolo Bonzini
2022-09-13  9:31   ` Richard Henderson
2022-09-14 17:04     ` Paolo Bonzini
2022-09-15  6:50       ` Richard Henderson
2022-09-11 23:04 ` [PATCH 29/37] target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, " Paolo Bonzini
2022-09-13  9:44   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 30/37] target/i386: reimplement 0x0f 0x10-0x17, " Paolo Bonzini
2022-09-13 10:14   ` Richard Henderson
2022-09-14 22:45     ` Paolo Bonzini
2022-09-15  6:51       ` Richard Henderson
2022-09-13 10:38   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 31/37] target/i386: reimplement 0x0f 0x28-0x2f, " Paolo Bonzini
2022-09-13 10:24   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 32/37] target/i386: implement XSAVE and XRSTOR of AVX registers Paolo Bonzini
2022-09-13 10:27   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 33/37] target/i386: Enable AVX cpuid bits when using TCG Paolo Bonzini
2022-09-13 10:28   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 34/37] target/i386: implement VLDMXCSR/VSTMXCSR Paolo Bonzini
2022-09-13 10:32   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 35/37] tests/tcg: extend SSE tests to AVX Paolo Bonzini
2022-09-13 10:33   ` Richard Henderson
2022-09-11 23:04 ` [PATCH 36/37] target/i386: move 3DNow completely out of gen_sse Paolo Bonzini
2022-09-13 10:34   ` Richard Henderson
2022-09-13 10:39 ` [RFC PATCH 00/37] target/i386: new decoder + AVX implementation Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.