[Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements

* [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements
@ 2018-12-18  6:38 Richard Henderson
  2018-12-18  6:38 ` [Qemu-devel] [PATCH 01/34] tcg: Add logical simplifications during gvec expand Richard Henderson
                   ` (35 more replies)
  0 siblings, 36 replies; 75+ messages in thread
From: Richard Henderson @ 2018-12-18  6:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: mark.cave-ayland, qemu-ppc, david

This implements some of the things that I talked about with Mark
this morning / yesterday.  In particular:

(0) Implement expanders for nand, nor, eqv logical operations.

(1) Implement saturating arithmetic for the tcg backend.

    While I had expanders for these, they always went to helpers.
    It's easy enough to expand byte and half-word operations for x86.
    Beyond that, 32 and 64-bit operations can be expanded with integers.

(2) Implement minmax arithmetic for the tcg backend.

    While I had integral minmax operations, I had not yet added
    any vector expanders for this.  (The integral stuff came in
    for atomic minmax.)

(3) Trivial conversions to minmax for target/arm.

(4) Patches 11-18 are identical to Mark's.

(5) Patches 19-25 implement splat and logicals for VMX and VSX.

    VSX is no more difficult than VMX for these.  It does seem to be
    just about everything that we can do for VSX at the momement.

(6) Patches 26-33 implement saturating arithmetic for VMX.

(7) Patch 34 implements minmax arithmetic for VMX.

I've tested the new operations via aarch64 guest, as that's the set
of risu test cases I've got handy.  The rest is untested so far.

r~

Mark Cave-Ayland (8):
  target/ppc: introduce get_fpr() and set_fpr() helpers for FP register
    access
  target/ppc: introduce get_avr64() and set_avr64() helpers for VMX
    register access
  target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}()
    helpers for VSR register access
  target/ppc: switch FPR, VMX and VSX helpers to access data directly
    from cpu_env
  target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  target/ppc: move FP and VMX registers into aligned vsr register array
  target/ppc: convert VMX logical instructions to use vector operations
  target/ppc: convert vaddu[b,h,w,d] and vsubu[b,h,w,d] over to use
    vector operations

Richard Henderson (26):
  tcg: Add logical simplifications during gvec expand
  target/arm: Rely on optimization within tcg_gen_gvec_or
  tcg: Add gvec expanders for nand, nor, eqv
  tcg: Add write_aofs to GVecGen4
  tcg: Add opcodes for vector saturated arithmetic
  tcg/i386: Implement vector saturating arithmetic
  tcg: Add opcodes for vector minmax arithmetic
  tcg/i386: Implement vector minmax arithmetic
  target/arm: Use vector minmax expanders for aarch64
  target/arm: Use vector minmax expanders for aarch32
  target/ppc: convert vspltis[bhw] to use vector operations
  target/ppc: convert vsplt[bhw] to use vector operations
  target/ppc: nand, nor, eqv are now generic vector operations
  target/ppc: convert VSX logical operations to vector operations
  target/ppc: convert xxspltib to vector operations
  target/ppc: convert xxspltw to vector operations
  target/ppc: convert xxsel to vector operations
  target/ppc: Pass integer to helper_mtvscr
  target/ppc: Use helper_mtvscr for reset and gdb
  target/ppc: Remove vscr_nj and vscr_sat
  target/ppc: Add helper_mfvscr
  target/ppc: Use mtvscr/mfvscr for vmstate
  target/ppc: Add set_vscr_sat
  target/ppc: Split out VSCR_SAT to a vector field
  target/ppc: convert vadd*s and vsub*s to vector operations
  target/ppc: convert vmin* and vmax* to vector operations

 accel/tcg/tcg-runtime.h             |  23 +
 target/ppc/cpu.h                    |  30 +-
 target/ppc/helper.h                 |  57 +-
 target/ppc/internal.h               |  29 +-
 tcg/aarch64/tcg-target.h            |   2 +
 tcg/i386/tcg-target.h               |   2 +
 tcg/tcg-op-gvec.h                   |  18 +
 tcg/tcg-op.h                        |  11 +
 tcg/tcg-opc.h                       |   8 +
 tcg/tcg.h                           |   2 +
 accel/tcg/tcg-runtime-gvec.c        | 257 +++++++++
 linux-user/ppc/signal.c             |  24 +-
 target/arm/translate-a64.c          |  41 +-
 target/arm/translate-sve.c          |   6 +-
 target/arm/translate.c              |  37 +-
 target/ppc/arch_dump.c              |  15 +-
 target/ppc/gdbstub.c                |   8 +-
 target/ppc/int_helper.c             | 194 +++----
 target/ppc/machine.c                | 116 +++-
 target/ppc/monitor.c                |   4 +-
 target/ppc/translate.c              |  74 ++-
 target/ppc/translate/dfp-impl.inc.c |   2 +-
 target/ppc/translate/fp-impl.inc.c  | 490 ++++++++++++----
 target/ppc/translate/vmx-impl.inc.c | 349 +++++++-----
 target/ppc/translate/vsx-impl.inc.c | 834 +++++++++++++++++++---------
 target/ppc/translate_init.inc.c     |  31 +-
 tcg/i386/tcg-target.inc.c           | 106 ++++
 tcg/tcg-op-gvec.c                   | 305 ++++++++--
 tcg/tcg-op-vec.c                    |  75 ++-
 tcg/tcg.c                           |  10 +
 30 files changed, 2275 insertions(+), 885 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 75+ messages in thread