[PATCH v2 00/45] target/arm: Implement fp16 for AArch32 VFP and Neon

* [PATCH v2 00/45] target/arm: Implement fp16 for AArch32 VFP and Neon
@ 2020-08-28 18:33 Peter Maydell
  2020-08-28 18:33 ` [PATCH v2 01/45] target/arm: Remove local definitions of float constants Peter Maydell
                   ` (44 more replies)
  0 siblings, 45 replies; 75+ messages in thread
From: Peter Maydell @ 2020-08-28 18:33 UTC (permalink / raw)
  To: qemu-arm, qemu-devel

This patchset implements fp16 support for AArch32, both VFP and Neon.

Patches 1-21 and 45 are the same as from the v1 vfp-only series,
and have all been reviewed. (I've included the minor fixups to
use 'f16' and the 'dh_ctype_f16' type.) Patches 22-44 are new and
cover Neon.

thanks
-- PMM

Peter Maydell (45):
  target/arm: Remove local definitions of float constants
  target/arm: Use correct ID register check for aa32_fp16_arith
  target/arm: Implement VFP fp16 for VFP_BINOP operations
  target/arm: Implement VFP fp16 VMLA, VMLS, VNMLS, VNMLA, VNMUL
  target/arm: Macroify trans functions for VFMA, VFMS, VFNMA, VFNMS
  target/arm: Implement VFP fp16 for fused-multiply-add
  target/arm: Macroify uses of do_vfp_2op_sp() and do_vfp_2op_dp()
  target/arm: Implement VFP fp16 for VABS, VNEG, VSQRT
  target/arm: Implement VFP fp16 for VMOV immediate
  target/arm: Implement VFP fp16 VCMP
  target/arm: Implement VFP fp16 VLDR and VSTR
  target/arm: Implement VFP fp16 VCVT between float and integer
  target/arm: Make VFP_CONV_FIX macros take separate float type and
    float size
  target/arm: Use macros instead of open-coding fp16 conversion helpers
  target/arm: Implement VFP fp16 VCVT between float and fixed-point
  target/arm: Implement VFP vp16 VCVT-with-specified-rounding-mode
  target/arm: Implement VFP fp16 VSEL
  target/arm: Implement VFP fp16 VRINT*
  target/arm: Implement new VFP fp16 insn VINS
  target/arm: Implement new VFP fp16 insn VMOVX
  target/arm: Implement VFP fp16 VMOV between gp and halfprec registers
  fpu: Add float16 comparison functions
  target/arm: Implement FP16 for Neon VADD, VSUB, VABD, VMUL
  target/arm: Implement fp16 for Neon VRECPE, VRSQRTE using gvec
  target/arm: Implement fp16 for Neon VABS, VNEG of floats
  target/arm: Implement fp16 for VCEQ, VCGE, VCGT comparisons
  target/arm: Implement fp16 for VACGE, VACGT
  target/arm: Implement fp16 for Neon VMAX, VMIN
  target/arm: Implement fp16 for Neon VMAXNM, VMINNM
  target/arm: Implement fp16 for Neon VMLA, VMLS operations
  target/arm: Implement fp16 for Neon VFMA, VMFS
  target/arm: Implement fp16 for Neon fp compare-vs-0
  target/arm: Implement fp16 for Neon VRECPS
  target/arm: Implement fp16 for Neon VRSQRTS
  target/arm: Implement fp16 for Neon pairwise fp ops
  target/arm: Implement fp16 for Neon float-integer VCVT
  target/arm: Convert Neon VCVT fixed-point to gvec
  target/arm: Implement fp16 for Neon VCVT fixed-point
  target/arm: Implement fp16 for Neon VCVT with rounding modes
  target/arm: Implement fp16 for Neon VRINT-with-specified-rounding-mode
  target/arm: Implement fp16 for Neon VRINTX
  target/arm/vec_helper: Handle oprsz less than 16 bytes in indexed
    operations
  target/arm/vec_helper: Add gvec fp indexed multiply-and-add operations
  target/arm: Implement fp16 for Neon VMUL, VMLA, VMLS
  target/arm: Enable FP16 in '-cpu max'

 include/fpu/softfloat.h         |  41 ++
 target/arm/cpu.h                |   7 +-
 target/arm/helper.h             | 133 +++++-
 target/arm/neon-dp.decode       |   8 +-
 target/arm/vfp-uncond.decode    |  27 +-
 target/arm/vfp.decode           |  34 +-
 target/arm/cpu.c                |   3 +-
 target/arm/cpu64.c              |  10 +-
 target/arm/helper-a64.c         |  11 -
 target/arm/translate-sve.c      |   4 -
 target/arm/vec_helper.c         | 431 ++++++++++++++++-
 target/arm/vfp_helper.c         | 244 ++++------
 target/arm/translate-neon.c.inc | 751 ++++++++++-------------------
 target/arm/translate-vfp.c.inc  | 810 ++++++++++++++++++++++++++++----
 14 files changed, 1719 insertions(+), 795 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 75+ messages in thread